I've been working on crafting some new spam filtering rules. Rather than trying to decipher the content of the message, I'm only looking at where the message came from and how it was delivered (spammers have gotten really good at making the message content indistinguishable from legitimate mail, from a software perspective). Spammers are thieves, and they steal resources from whatever systems they can gain control of. It's usually pretty easy to tell the difference between a message sent from a real legitimate mail server, and a message sent from somebody's residential DSL connection in a country I couldn't find on a map.
Unfortunately, I've run into a couple of problems. The first problem is that a few spammers have set up legitimate-looking mail servers—these aren't botnet machines; these are real servers colocated in a datacenter somewhere, with domain names that match the “from” address and all linked URLs, with static IP addresses and properly configured reverse DNS and even SPF records. Fortunately there are still a few telltale signs, and I've been able to collect a database of several thousand IP addresses and domain names used in this way.
The second problem is legitimate mail sent from horribly broken mail servers. These are servers with reverse DNS names that either don't resolve at all, or resolve to the wrong IP address. The server identifies itself with a HELO line that matches the broken reverse DNS name, just like a lot of spam. Everything about it looks sleazy, but it's legitimate mail that my users need to receive.
Some of it is from mailing lists, and some of it is from individuals with stupid ISPs. However, some of it comes from companies like PayPal and AT&T Wireless, both of whom use a third-party marketing company to e-mail promotional offers to their customers. The marketing company apparently doesn't know how to configure their servers properly, and there's no real way to distinguish between these legitimate messages and well-done phishing attempts.
I saw something similarly annoying several months ago - a legitimate message from Wachovia to one of their banking customers where all the links in the HTML e-mail (including links pretending to go to visa.com) actually went to a redirector, NOT on wachovia.com. You would think a bank would know better. How can we educate users to avoid phishing scams if banks are using the same sleazy tricks that the phishers use?
Anyway, I don't mind blocking legitimate mail from a stupid marketing company, but it seems there are a lot of incompetent mail administrators out there. I've been analyzing five weeks of data, during which time two of my servers saw about 25,000 messages from servers with broken reverse DNS. Of those, all but a few hundred were rejected for other reasons, and many of the rest were quarantined as possible spam. Still, there's an awful lot of spam hitting users' mailboxes that I could easily block, if it weren't for the handful of legitimate mail that would be blocked too.
Ugh.