Thousands of mail servers everyday pass millions of messages through Spamassassin, yet most of these installations are using only the standard Spamassassin rules. Many servers aren’t taking advantage of many of the great features Spamassassin has to offer.
The first thing you need to know about enabling custom rules and setting up Spamassassin in general is this: do your work in local.cf or unique files call from local.cf and do NOT change the default files for the rules. When there are updates your changes will be overwritten if you modify other rules.
If you open local.cf you will see a default file with some commented out common features, the first feature we want to enable is Bayesian filtering. You can do this by adding the following to the bottom of the file:
use_bayes 1 #bayes_auto_learn 1 bayes_ignore_header X-Bogosity bayes_ignore_header X-Spam-Flag bayes_ignore_header X-Spam-Status
This will enable Bayesian filtering and prevent the Bayesian filtering from relying on headers that maybe forged or set by spamassassin itself in other areas. In addition we tell the Bayesian filters to NOT auto classify specific messages as spam or not spam based on statistical analysis… this is a feature will will want eventually but not until we have at least a few hundred of messages classified each as spam and ham.
How do we do manual classification of messages? We use the sa-learn program… it’s very easy. On the server open your mailbox in mutt (mutt -f /path/to/box/or/Maildir) then we can save the messages that are SPAM into a spam-folder and the messages that are HAM into a ham folder. Once we have done this for a few hundred messages we run sa-learn against them and classify them.
core:~# sa-learn --showdots --spam /home/testuser/Maildir/spammy core:~# sa-learn --showdots --ham /home/testuser/Maildir/goodmail
This will teach spamassassin about the specific messages we are receiving… and teach it about mail it shouldn’t mark as spam. Once we have at least 100 of each we can turn the bayes_auto_learn on by uncommenting it from the local.cf file. Once we do this spamassassin will then mark items as spam and ham automatically when they are on the extreme ends of the scale. There are some additional useful items to add to the local.cf for bayes filtering as well:
bayes_auto_learn_threshold_spam 10 bayes_auto_learn_threshold_nonspam 0.50
These tell spamassassin to use a threshold score of 10 before adding a message to the bayes database as spam, and it must have a lower score than 0.50 to be auto classified as ham. These are good starting values that can be later tweaked downward as the system becomes more familiar with both types of messages your network receives.
If your mail server is single purpose and doesn’t have a wide variety of perl libraries you should probably have at least Net::DNS and its pre-requisites installed to correctly utilize spamassassin’s rbl checks.
The next item to review is dns block-lists. Many people feel strongly regarding these either in the positive or negative. To enable them you simply add the following to the local.cf file:
You can further customize which lists you wish to set scores for by setting the actual rule name of the list with a score of zero. For example if you wish to disable the dynamic ip address check for sorbs, you would find it in the 20_dnsbl_tests.cf
header RCVD_IN_SORBS_DUL eval:check_rbl('sorbs-lastexternal', 'dnsbl.sorbs.net.', '127.0.0.10') describe RCVD_IN_SORBS_DUL SORBS: sent directly from dynamic IP address tflags RCVD_IN_SORBS_DUL net
Which shows the name is RCVD_IN_SORBS_DUL. To disable it in the local.cf you would add a line like this:
score RCVD_IN_SORBS_DUL 0
while still keeping the rest of the dnsbls intact.
A further thing which many people neglect to do is alter the weighting of the rules. This is as easy as determining the rules name and applying a score to it in the local.cf file.
score RULE_NAME newscore
For example, if you wanted to alter the weighting for the rules related to some types of viagra spam you might add the following.
score TT_OBSCURED_VIAGRA 5 score FM_VIAGRA_SPAM1114 4 score DRUG_ED_GENERIC 3 score DRUG_ED_ONLINE 3 score DRUG_ED_CAPS 2
The last thing you might want to do is write a filter for a custom type of spam you’re receiving. Lets make the assumption you’re frequently getting a spam for fooglesnaps. The rule for this is very easy to write.
body LOCAL_FOOGLESNAPS_RULE /\bfooglesnaps\b/i score LOCAL_FOOGLESNAPS_RULE 0.5 describe LOCAL_FOOGLESNAPS_RULE This is to help catch the fooglesnaps spam.
What this rule says is anytime you see fooglesnaps regardless of case (case insensitive) with a word break on both sides, this rule applies. The next line says what the base scoring for the rule is, in this case half a point. Always start around 0.1-0.5 and work your way up to make sure you don’t mark mail incorrectly. The final line is just a human readable description of the rule. You can use pretty much any perl regex rule to filter your spam. The sky is the limit, however you should be sure you NEED a new rule and can’t just use a black or white-list to correct your delivery issue because rules are more expensive on the cpu and memory than simple black and white lists.
This covers some of the basic and common setup tasks for a new spamassassin installation that many users miss or or don’t enable. As always if you need further assistance with this or any other open source application or issue, the experts at Pantek Inc. are available 24/7 at firstname.lastname@example.org, 216-344-1614, and 1-877-LINUX-FIX! We look forward to working with you.