
At work, we have a
Debian Linux box that serves as our mail and web server.
Recently, I upgraded that box from Debian Woody to Sarge. After that upgrade was complete, I decided to try and improve upon the
spam filtering implementation.
The mail server runs
qmail as the
SMTP server, and I have the mail delivered to local users via
procmail. Using procmail allows me to do all kinds of neat filtering on the server side. As an example, I automatically place mailing list traffic into separate
IMAP folders. Similarly, mail from customer domains is automatically placed in their own company specific IMAP folders, thereby organizing my incoming emails automatically.

One of the filters that I had incoming mail going through was
SpamAssassin. When I first installed SpamAssassin 2+ years ago, it did a very good job of detecting and flagging spam. However, the spammers have gotten rather sophisticated over the last two years, and many of them now actually test their spam against SpamAssassin to try and thwart it. While SpamAssassin was still able to detect and flag an ample amount of spam, more and more seemed to slip through as the months went by.
I decided to try out
DSPAM after hearing good reports on its performance
from a friend, and reading about it on its website. It wasn't available in the Debian repository, so I downloaded the source tarball and then built and installed it.
DSPAM works best if you have a corpus of spam and non-spam (or ham) to train it. If you do not have a good selection (in the thousands), then I would recommend not using it until you do. Once you train DSPAM with emails that are good and bad, then you can put it to work effectively.
DSPAM is not a fire and forget type of spam fighting solution. It requires a certain amount of vigilance on the part of the user to correct falsely flagged spam or ham. I set up three folders in each user's IMAP folder hierarchy for this purpose:
- Spam/ - This is the folder where DSPAM sends all emails it detects as spam.
- Spam/Missed/ - This folder is where users place emails which DSPAM did not detect as spam, but should have.
- Spam/NotSpam/ - This folder is where emails are placed which were detected as spam, but are not.
Every night, cron executes a bash script I wrote which crawls through the Missed/ and NotSpam/ directories, correcting DSPAM's mistakes.
So, how is it working? After about 3 weeks of use, my accuracy rate is up to over 95%, and that is with out as much initial training as I recommended. I expect that over the course of the next month or two that I will be able to get that accuracy up to the triple 9's range - 99.9%.
Over all, I am extremely happy with DSPAM's performance. It is a bit trickier to install and get going, but once it is humming along nicely, it purrs.