|
I've been working on fighting spam lately. I've had some pretty serious
measures in place, but the
recent increase in spam levels has been driving me nuts, so I've been
devoting more time to actively fighting it.
Most people don't have any idea how serious the problem is, because all
they see is a few messages per day that get past the filters. It only takes
a few seconds to delete them, so what's the big deal? Well, those of us
who actually administer mail servers see a quite different picture. The
mail servers I currently maintain are very small (only a few dozen users total),
so keep in mind that these numbers a very small compared to
what most mail servers see.
My first line of defense against spam is a handful of blacklists of IP
addresses known to be sources of spam. Any connection attempt from an IP
address on a blacklist is immediately denied. I don't know anything about
the message they would have sent, and often they reconnect to try again, so
the number of blocked connections is probably a lot higher than the number
of spam messages that would have been received if they hadn't been blocked.
However, the total so far in the month of November is over 40,000 blocked
connections.
The second line of defense is some custom scripts I wrote that block
messages based on what's called the message envelope information. This is
roughly equivalent to looking at something you get in the mail and throwing
it out because you can tell it's another credit card offer without even
opening it, just because of where it came from. I reject messages based on
the envelope sender and recipient, as well as the DNS hostname of the connecting
server and the greeting (HELO/EHLO line) it sent. This method has blocked
about 5,300 connections so far this month, but again, this could include
repeated attempts, so it's not necessarily meaningful.
Each message is scanned for viruses by
ClamAV, which also identifies some phishing scams. These are moved to
a folder I have access to, so users never see them. Also, suspicious-looking
attachments are removed and replaced with a warning message.
But then we get to the interesting part.
SpamAssassin analyzes the
content of the message to determine whether it matches a whole slew of
spam-like characteristics. Some of the rules are included with SpamAssassin;
others come from the SpamAssassin
Rules Emporium and are updated frequently. Image-based spam is
analyzed by the
FuzzyOcr plugin which processes each frame of an animation, applies
various color filters, runs it through optical character recognition
software, and compares the result to a word list. Several custom rules
I've written recently look for specific spam I've received. All these
things add points to a score, and if the score is above 5, the message
gets moved to a quarantine folder for the user to review. If the score
is above 15, we assume it's definitely spam, and the message is moved into
a system-wide spam folder that users never see.
So how many messages go into this system-wide spam folder? Over 4,000
so far this month scored 15 or higher. Add to that about 1,200 spams scored
under 15 that I've received myself, plus whatever other users have been
getting, and you're looking at spam coming in about every 5 minutes on average,
24 hours a day, 7 days a week.
The frustrating thing is that the spammers have access to the same
filtering software that I'm using. They can check to see whether the filters
will catch their message before they send it, and then tweak it until it
passes. One spammer recently has been using very specific patterns that I
can easily match, but he keeps changing patterns every couple of days, so
I have to keep updating my rules every couple of days - the general-case
rules aren't working, so I'm basically looking for specific subject lines.
It used to be that a 486 with 32 MB of RAM was perfectly adequate for
handling SMTP (as long as that's all it was doing). Now, because of all this
advanced spam filtering, a Pentium 4 with 1.5GB of RAM (or more) is more
reasonable. And after all of that, we still get spam in our inboxes. I
wish Congress would start paying attention; they're the only ones with the
power to fix this mess, by earmarking funding for enforcement. I'm not
holding my breath.
|