IDEAS

Ideas for improving various aspects of the bot.

Tools

Automatic Documentation Generator

There's currently spotty documentation for rbot's plugins, settings, and commands (when you're not using rbot itself). It would be cool to have a script that grabs the settings, commands, and documentation from all the plugins and generates some HTML.

This would offer two things:

  • Online documentation
  • An easy way to audit the current documentation (to see settings/commands that are missing or have spotty documentation)

Yes, we need this. I've been working on some of the RDoc documentation, but this is obviously more technical and aimed at those interested in developing for rbot (be it core components or plugins). Can we use RDoc for user doc? --tango_

Core

Database

The database backend problem seriously needs to be solved. I've got 70 megs of transaction logfiles that I can't delete or it corrupts my database.

IRC Server Connections

Currently, only one server can be connected to at a time. Also, only the standard IRC protocol can be used (no SILC support). The solution to both is to create a generic IRC-serber backend that supports multiple protocols.

With multi-server/multi-protocol support, the config file it could look something like this:

servers:
  - irc://irc.prison.net/#bastardos,#oldwarez
  - irc://botnick:password@irc.freenode.net/#rbot,#rubyonrails
  - silc://secretserver.samurai.com/#seppuku

For more info on SILC support, see the SILC_Protocol wiki page.

With the new IRC framework and RFC2812 client restructuring this shouldn't be too hard, at the core level it's probably just a matter of creating an Irc::Client and an Irc::Socket for each client/server connection. Everything else should be automatic or so. There may be some issues with concurrent plugin access for some plugins, and stuff like that. And of course the IRC logging facility should take the multiplicity of the servers into consideration. --tango_

Basic server commands (/WHO, /NAMES, etc)

Currently, there's no way to directly query the server for commands. This has been discussed in ticket #75. A new message class needs to be created, as well as a command for querying the server.

httputils

  • A threaded HTTP grabber that takes a block as callback, with timeout.
  • The http grabber can stat the file by getting just headers and size info, or returning the appropriate HTTPError if it doesn't exist.
  • HTTPS support (use the ruby http2 library?)
    • We already support https --tango_
  • A CGI::unescape work-alike that supports all of the escaped HTML entities.
    • We use htmlentities when available, since [497] --tango_

Profiling

rbot can get really slow with a lot of plugins loaded and a big markov/facts database. It would be nice to be able to see how long each plugin takes to process each message that the bot receives. We should add an option to display a trace of each plugin that the message was fed to, and how long each plugin took to process it.

There's a new debugging plugin (since [569]). It doesn't do anything more that dumping some information which can be used to track memory leaks, but I'm pretty sure it can be extended for more stuff. Or were you thinking more about at the core level? --tango_

Web Server

  • The webserver should be written in Camping, because, heck, it's the most 4k webserver there is! :D
  • The webserver will have plugins! Plugins are great!
  • Each plugin can add methods that allow it to provide a simple web interface (for displaying information and interacting with the web-user). Good plugin candidates are:
    • urls
    • quotes
    • stats
    • keywords
    • the searchable logs

Questions…

  • What's wrong with WEBrick? --keegan
    • Nothing! WEBrick is a fine server, and Camping can use it. Camping is just a framework that sits atop a webserver. --epitron
  • Should the webserver be integrated into the IRC bot, or should the IRC bot and webserver be separate servers that communicate via DRB or something?
    • The DRb way is now possible, see [698]: so I'd rather prefer a separate webserver with a module to connect to rbot, and an rbot plugin to handle such remote connections. --tango_

I18n/L10n

A very nice thing to have would be to make the bot more international, by having localized replies, insults and even command names. The best way to achieve this is by using the gettext package, available here. However, this is quite some work.

Modification-aware registry values

Currently, the only way to modify a registry value is to assign it to the new value. For example, if @registry['myhash'] is a hash, @registry['myhash']['newkey'] = newvalue won't work; one has to build the new value and reassign it, e.g. @registry['myhash'] = @registry['myhash'].merge {'newkey' => newvalue}. It would be possible to eliminate this need if the registry

  • stores the object returned by registry[key], in e.g. registry.cache[key]
  • in further lookups, registry[key] returns registry.cache[key]
  • in further assignments, registry[key]= assigns to registry.cache[key]
  • marshals and saves registry.cache[key] to the database, at appropriate times

The disadvantages of this are:

  • Once a value is accessed, it needs to be stored in cache at all time, as the plugin can modify the returned value at any time. Even after all references to the object outside registry are gone, there is no way for Registry to know and respond, as far as I know.
  • There is no way for the registry to know when the value has been modified, so it would be difficult to add a swapping mechanism
  • Almost any advantage of using a database rather than a real hash would be lost
  • For plugins that only uses registry to store and retrieve simple values, this is unnecessary

Another approach free of some of the disadvantages is to provide a different method just for one-shot modifications. However, the syntax is a little more complicated.

# Add a method to modify a key easily
class Irc::BotRegistryAccessor
  # Do stuff with the object stored at key. If the passed block returns anything other
  # than :not_modified, then the modified object will be written back to the registry.
  #
  # For example,
  #   registry.with('my_hash') {|h| h[k] = v; h.merge!(h1);}
  #   registry.with('my_set') {|s| :not_modified unless s.add?(e);}
  def with(key)
    # doesn't make sense to modify nothing
    raise IndexError unless @registry.has_key?(key)
    obj = self[key]
    # write the object back unless the block returns :not_modified
    self[key] = obj unless (yield obj) == :not_modified
  end
end

Plugins

Forecast plugin

  • Threading so it doesn't freeze the bot for 10 seconds every time someone tries to !forecast.

Markov Plugin

  • Optionally, when collecting data, create a markov chain for each user. This would allow the bot to have different selectable "personalities".
  • Recognize its own name so that it doesn't get stored in the dataset.
  • Whenever anyone says something to the bot that's not a command, it should randomly reply back.
  • When the bot reads a message that's addressed to another user, it could substitute a symbol for the name (like #{user}), so that when the bot generates a random reply, it can insert the name of the user it's talking to. (i.e. it can say "you're awesome, STEVE", where STEVE is whoever triggered the response.)
  • The current markov chat algorithm is only good for parroting what people say. To make it a little more interesting, the bot could keep a bidirectional markov chain, and instead of replying based on any word somebody says, it could reply based on specific "salient" or "interesting" words in the sentence, and then construct a reply out of that/those word(s). Saliency could be determined by how frequently the word is said on average vs. how much it's been said in this conversation.)

So, with a bi-directional markov chain, instead of a response chain looking like this...

word1->word2->word3->word4

...it would look like this...

word8<-word6<-word4<-word2<-SALIENT_WORD->word1->word3<-word5<-SALIENT_WORD->word7->word9

Holidays plugin

A plugin that displays all of the crazy holidays that are happening today. (For example, here's what's going on in January: http://www.brownielocks.com/january.htm)

Daily Events plugin

At midnight (or a user-specified time), display a bunch of statistics from that day. (For example, number of words said by each person, active users, tracked keywords, urls, etc.)

URL Plugin

  • Cache the URL's title in the database
  • Optionally make it shorten the url using TinyUrl? (configurable setting)
  • Better search feature (default to 3 results, when there's too many have it say "(12 more matches...)")

Google plugin

  • !search sucks.. !google pwns (google is the new verb people use when they mean search. :) )
    • Done since [288] --tango_

Stats plugin

  • Make it stop spamming the console with error messages:
    plugin stats|track|untrack|listtokens listen() failed: undefined method `private?' for #<Irc::JoinMessage:0xb7786ad4>
    (eval):88:in `listen'
    
  • Put the stats on one line:
    15:43 <@noeld> !stats cock
    15:43  * pookie Stats for cock.  Said 346 times.  The top sayers are
              noeld:116, CrazyDazed:113, and blong:42.
    15:43  * pookie noeld has said it 116 times.
    15:43 <@noeld> can that be all put into 1 line?
    15:44 <@noeld> and also make it shorter and maybe add an ascii cock
    15:45 <@epitron> will do!
    
  • Store the time that a word started being tracked, so you can see how often the words are said. (Also, mabye a "says per hour"/"minute"/"day"/"week" stat?)
  • Typing "!track word" when word is already being tracked should not delete word from the database.

Stats.mod plugin

  • A stats plugin similar to stats.mod available for eggdrop bots. Counts words, lines, smileys pr. user. pr. day and total. Might have performance problems.

Quotes plugin

  • Rewrite it totally using the new routes stuff. It currently parses the user's commands manually by listening to all channel messages and passing 'em through a slew of regexes.
  • Rename !getquote to !quote

Roulette plugin

  • Users should be kicked when they lose
    • Configurable option since [528] --tango_

digg plugin

  • Should display all diggs on one line by default. (With time in brackets?)

Log Searcher

  • Let users search the log (with regular expressions)
  • Web interface!

urbandictionary

  • Lookup words in the urban dictionary (how to pick which one? rating? show how many results there are and let the user supply a numbered param?)

WordNet

  • Lookup words in wordnet

Semantic Text Matching

A mini-language for matching messages in the channel in a more powerful, intuitive, and readable way.

The language is very simple: it's just literal strings interspersed with <thing> tokens. A <thing> can match a whole array of things. For example, "<word>" matches any word. "<word:noun>" matches any noun. "</regex.*/>" matches a regex. And "<word type_of_cookie>" would match a word and store it in a named match variable called "type_of_cookie".

Examples of expressions:

  • "<word>, g!"
    • match any word that's followed by ", g!"
  • "my favorite word: <word favorite>"
    • match someone's favorite word and store it in a match variable called "favorite"
  • "I like <word:noun thing>!"
    • match a word that's been RubyLinguistics?-analyzed to be a noun, then store it in the variable "thing"
  • "<word:noun,verb> warriors"
    • match a word that's a noun or a verb
  • "I <word:verb thing_person_thinks_about_you> you!"
    • match a verb and save it in a big long descriptive variable
  • "<number> years young"
    • match a number
  • "<url>"
    • match any url
  • "<url:http>"
    • match only http urls
  • "<url:ftp,http u>"
    • match an ftp or http url and store the result in the varialbe "u"
  • "<nick who> sucks donkey cocks!"
    • match the nick of a person in the channel and store it in the variable "who"
  • "</I like (.+)!/ groups>"
    • match a regular expression and store any matches in the array "groups"
  • "</(noo|new)b(|ie|y)/ groups>"
    • more advanced matching..

This pattern matching scheme could be used all over the place! It would make the routing for plugins and other command matching much more semantic.

Other ideas for patterns: <sentence>, <question>, <noun-phrase>, <emoticon>, <curse-word>, <fact> (something in the fact database)

It could also make for fun statistics.

Also, awesome extra power could be achieved by using the Ruby Linguistics module.

The following plugin could make use of thesemantic text matching plugin:

Responder Plugin

Using the Semantic Text Matching feature above, this plugin lets users create custom responses to things people might say in the channel. For example:

!respond_to "<word> is a loser!" with "totally man!"
!respond_to "<word x> <word y> are fun" with "yes, $nick, I like $x $y too..."
etc..