Wednesday, March 12, 2008



While I often complain about the amount of spam I get, I sometimes forget about the amount of spam that I never see. Much of the spam filtering relies on statistics; a Bayesian filter will assign a spam-likelihood to a message based on the occurrence of certain phrases. The more that words like Viagra, Meds, etc., appear in a message, the more likely it is to be spam.

By telling a system when a message is Spam, it will learn and continuously improve its accuracy. At the same time, it is important to glance through the Spam folder every now to make sure that messages that are tagged as Spam are indeed so. Telling a system about these so-called false-positives is also a way to teach the system and allow it to become more accurate.


So, next time you sigh deeply to complain that a staggering 10 spam messages made it to your Inbox, remind yourself that a few hundred never made it there in the first place, and appreciate that when thousands of people cooperate (even if they do not know it), filtering can be very effective.

No comments:

Post a Comment

Please share your view and opinions on what I wrote. In order to maintain quality, all comments will be moderated for merit. Contributions that call me out on statements that appear unfounded, wrong, or simply with which you disagree are highly appreciated and are even encouraged. Spam and 'me too' answers will be ignored.