Spamassassin Resource Usage Optimization

santrix

Well-Known Member
Nov 30, 2008
229
4
68
1.) I read somewhere a while ago that it was more efficient to set spamassassin to scan mail for all accounts, rather than letting individual user accounts decide if they want it enabled on their account. On the release build this is currently under Exim Configuration Editor > SpamAssassinTM Options > SpamAssassinTM: Enabled for all accounts without the option to disable it.

Is there any truth in this?

2.) Also, with regard to

ACL Options > SpamAssassinTM: Reject mail with a spam score greater than 20 at SMTP time.

I use this setting on my servers, and noticed recently that I also have the following set:

SpamAssassinTM: Bounce mail when the spam score is greater than 20

Would I be right in thinking there is no point in having the latter set for two reasons... the first being that the message would already have been rejected at SMTP time, the second being that bouncing would mean sending a rejection email to the original sender, which would just cause back scatter?

Thanks
 

santrix

Well-Known Member
Nov 30, 2008
229
4
68
Sorry to bump this... Just thought it deserved another bite before I gave up trying to find out... it just seemed a pretty important decision with regards to cpu utilization.
 

cPanelDavidL

Member
Staff member
Nov 12, 2007
17
1
51
santrix said:
1.) I read somewhere a while ago that it was more efficient to set spamassassin to scan mail for all accounts, rather than letting individual user accounts decide if they want it enabled on their account. On the release build this is currently under Exim Configuration Editor > SpamAssassinTM Options > SpamAssassinTM: Enabled for all accounts without the option to disable it.

Is there any truth in this?

2.) Also, with regard to

ACL Options > SpamAssassinTM: Reject mail with a spam score greater than 20 at SMTP time.

I use this setting on my servers, and noticed recently that I also have the following set:

SpamAssassinTM: Bounce mail when the spam score is greater than 20

Would I be right in thinking there is no point in having the latter set for two reasons... the first being that the message would already have been rejected at SMTP time, the second being that bouncing would mean sending a rejection email to the original sender, which would just cause back scatter?
In regards to question 1) - The only way I would see performance impacted would be the # of people that have SpamAssassin working for them in regards to the amount of email being scanned. If you have one very active account with vast amounts of throughput this lone account could cause more load than every other account being SpamAssassin enabled if those have little throughput - so this is very relative.

2) I would advise to fail the message and not return to sender. You are 100% correct in thinking that this could result in a backscatter listing.
 

mtindor

Well-Known Member
Sep 14, 2004
1,515
141
343
inside a catfish
cPanel Access Level
Root Administrator
I run my spamassassin per-user and always have. However, I definitely believe it is more efficient to use a global spamassassin configuration. With per-user, you have to keep in mind that various spamassassin files (the bayes.* files, auto-whitelist, etc.) are created in ~/.spamassassin for each user and can become fairly large. If these files aren't pruned/reset on occasion, they can cause a little more load on the server when spamassassin is working on that account's mail.

In a global configuration you only have one set of spamassassin files to ever have to worry about - and you can edit one single file to increase the scores of certain spamassassin rules that may be useful in fighting spam, whereas if you go per-user, you likely will not be doing that on behalf of your customers but instead they will need to do it (if they even know how).

So potentially you can save on resources by using a global spamassassin configuration, but it isn't guaranteed.

If you are using reject ACLs (which is indeed advisable), you should not have any bouncing set up... heck, as you've already mentioned, you shouldn't use the bounces in any case because of the very likely potential for backscatter. Reject-during-SMTP is always much more efficient, causes much less of a load on the system, and is significantly more net-friendly than accept/bounce.

Incidentally, the servers I take care of are in the states, and for 4-5 years now I've been rejecting spam with a score of 12 or greater. I have never ever seen a piece of legitimate mail with a score of 12 or greater. Your mileage may vary, but you might want to consider rejecting at a low score. It won't be any more efficient with regard to spamassassin (since spamassassin will scan the mail anyway), but it'll be better for the server [and the user accounts] if those spam don't end up in somebody's spam folder.

So, in summary, you've got a good understanding of what is the best way to do things. The global spamassassin configuration is just something youll have to mull over. If I didn't have so many users that are used to having the ability to customize their own spamassassin, I'd switch to global.

Mike
 

santrix

Well-Known Member
Nov 30, 2008
229
4
68
Thanks guys - that's helpful. I didn't realise that the bayesian stats build up per user - I always assumed it would all go into a central pot. I will have a look at some user accounts to see how large their SA config files have become.

I just noticed when watching the output from "top" for a while, that spamd is run under the local user account, so it's always spawning new instances and destroying them, which must be more of a burden that having it running under a system account all the time - or not?
 

mtindor

Well-Known Member
Sep 14, 2004
1,515
141
343
inside a catfish
cPanel Access Level
Root Administrator
I just noticed when watching the output from "top" for a while, that spamd is run under the local user account, so it's always spawning new instances and destroying them, which must be more of a burden that having it running under a system account all the time - or not?
I would doubt that there is any appreciable effect noticeable simply because spamassassassin forks off as a user [vs in a global config forking as root]. But, the bayes.tok file and the auto-whitelist can get large (I've seen them above 100 MB) and I know it takes a toll when spamd is working on filtering a users' mail in a domain that has spamassassin files that large.

Mike