In between submitting job applications, I've been working on an IRC bot called frogbot. For those of you unfamiliar with Internet Relay Chat, it's basically a protocol for setting up online chatrooms that anyone can run on their own network of servers (as opposed to AIM or MSN chatrooms that run on AOL's or Microsoft's servers). IRC has been around for about fifteen years (yes, that's before you even knew what the Internet was), and clients are available for just about every computing platform (Windows, Mac, Linux, UNIX, PDAs, cell phones...).
A bot is a program that logs in as if it were a real person, and can either pretend to be a human (an AI bot) or can simply respond to specific commands or do other useful things. For example, one of the things my bot does is watches for anyone to type “(sp?)” after a word, which is a common way to indicate that you're not sure whether you've spelled that word correctly. The bot takes the word in question, sends it to Google to get spelling suggestions, and displays the correct spelling in the chatroom (called a “channel” on IRC).
So anyway, enough with the introductions. If any of that was new to you, the rest of this probably won't make much sense, so you may want to stop reading now, before you hurt yourself. The problem I've run into is, the IRC protocol is absolutely horrible and is totally not set up in a way that makes sense.
MODE
messages are sent to indicate any of three possible
things:
m
mode determines
whether users without voice or ops can speak;
o
for ops and v
for voice;
b
to ban someone from
the channel.
The latter two types of modes take an argument - either a nick (for
channel user modes, which aren't really officially called channel user
modes, that's just what I call them because I have to call them something),
or something else like a hostmask for modes like b
. Regular
channel modes do not take an argument.
So the first problem is that we have one kind of message (MODE
)
which can mean three different things. The second problem is, it can mean
all three different things at the same time. The first argument to the
MODE
message is the list of modes that are changing, with
+
or -
signs to indicate whether the mode is
being set or unset. For example, +om-v
indicates that someone
is being given channel operator status, the channel is being set to muted
mode, and voice is being take away from someone. Or +vvb
indicates that two people are being given voice and someone is being
banned from the channel.
Most of the time, these modes are set one at a time, so this isn't nearly
so confusing, but they can be set all at once. Anyway, that's not too bad.
The obnoxious part is, which of those modes take an argument, and which don't?
There is no list of these, because each different IRC server has its own
modes that it supports in addition to the most common ones. So, if the
first argument is +om-v
, the next argument is a nick to be
opped, and the next argument is another nick to be devoiced; the
+m
in the middle applies to the channel. How do we know
the first and third of these modes take arguments and the second one doesn't?
Well, we just have to figure out somehow that o
and v
take arguments while m
doesn't. Some servers will report a
list of which modes take an argument when you connect (as a fifth argument
to the myinfo
message, not defined in
RFC 2812); others won't.
So, for those that do, I'll use that, and those that don't, I'll use a
pre-defined list that's somewhat guessed based on what a few servers seem
to support (most notably, dancer-ircd used by
FreeNode doesn't provide a list, so
I'm using their documentation
here).
Why is all of this necessary just to parse a MODE
message?
Why wasn't this functionality implemented in a consistent way? Why is the
inconsistent behavior not adequately documented anywhere I can find? How
did such a broken protocol ever become so popular? If you know the answers
to any of these questions, feel free to drop me
a line and share.