Tuesday, February 26, 2008

By Request - My dspam Training Script

In a post I made about one year ago, I mentioned a script which I created which trains dspam to recognize missed spam email, and corrects it when it falsely identifies a good ( or "ham") email as spam. Someone has requested that I post that script, so here it is. Please note that my qmail installation uses the maildir format!

--- start file: train-spam.sh ---
#!/bin/sh

# train-spam.sh
#
# Description: Checks each user's /home/Maildir/.Spam.Missed
# directories to see if the user placed any "missed" spam
# messages which got through SpamAssassin to their INBOX.
# If there are messages in this directory, then the script
# invokes sa-learn to update the site-wide tokens to try
# and improve the defenses for next time...
#

# learn_spam - Function which takes a directory and a user as
# arguments, and then feeds that directory to our anti-spam
# applications for further SPAM training.
#
# Arguments:
# $1 - Directory name containing SPAM emails. Required
# $2 - User name. If it is not provided, $USER will be used.
#
# Example:
# learn_spam /home/alank/Maildir/.Spam.Missed/cur alank
#
function learn_spam {

# loop through all emails in given directory
for email in $(ls $1); do

# process SPAM email using DSPAM
/usr/local/bin/dspam --mode=teft --source=error --class=spam --feature=chained,noise --user $2 < $1/$email
echo -n "."

# delete SPAM email
rm $1/$email

done # end of email loop

} # end function learn_spam

# learn_ham - Function which takes a directory and a user as
# arguments, and then feeds that directory to our anti-spam
# applications for further HAM training.
#
# Arguments:
# $1 - Directory name containing HAM emails. Required
# $2 - User name. If it is not provided, $USER will be used.
#
# Example:
# learn_ham /home/alank/Maildir/.Spam.NotSpam/cur alank
#
function learn_ham {

# loop through all emails in given directory
for email in $(ls $1); do

# process HAM email using DSPAM
/usr/local/bin/dspam --mode=teft --source=error --class=innocent --feature=chained,noise --user $2 < $1/$email
echo -n "."

# delete HAM
rm $1/$email

done # end of email loop

} # end function learn_ham

#
# Script starts here!
#

# loop through all user home directories
for file in $(ls /home); do

# if there is a Spam/Missed maildir
if [ -d /home/$file/Maildir/.Spam.Missed/cur ]; then

# then process any missed SPAM
echo -n "missed spam for $file: "
learn_spam /home/$file/Maildir/.Spam.Missed/cur $file
learn_spam /home/$file/Maildir/.Spam.Missed/new $file
echo ""

fi # end if

# if there is a Spam/NotSpam dir
if [ -d /home/$file/Maildir/.Spam.NotSpam/cur ]; then

# then process any falsely identified spam, i.e. HAM
echo -n "false positives for $file: "
learn_ham /home/$file/Maildir/.Spam.NotSpam/cur $file
learn_ham /home/$file/Maildir/.Spam.NotSpam/new $file
echo ""

fi # end if

done # end for loop

echo "Done!"
--- end file:
train-spam.sh ---

I place the above script in /root and create a cron job to run it every day in the early morning. You will need to edit some parts of the script if your missed spam and not spam directories are named differently. Good luck, and I hope it is helpful in your continuing battle against spam!

2 comments:

  1. Hi, I like your setup.
    I see your post has been here for a while now. Are you still using this setup?
    I have dspam working now, but am not sure how I can make it deliver the spam mails into the IMAP/Spam folder. Could you give me a hint here.

    Many thanks
    Roland

    ReplyDelete
  2. Roland,

    Thanks much for the compliment. I was using this setup until about 2 weeks ago, when I migrated our email over to Google. I will say that the combination I had running (rblsmtpd, greylisting, dspam, qmail with procmail delivery) when I managed our own mail server did an excellent job of cutting down spam - I see much more spam in my spam folder with Google.

    However, the lack of stress associated with managing the email server is a net win for me, hence the decision.

    As for how to deliver it to a certain folder, I used qmail with procmail delivery. Procmail is awesome, and allowed me to automatically filter emails to other folders. Here is a link which shows how to integrate DSPAM into your procmail script, and then filter it automatically:

    http://splodge.fluff.org/docs/dspam-for-sa-users

    Good luck, Alan.

    ReplyDelete