Wednesday, March 12, 2008
While I often complain about the amount of spam I get, I sometimes forget about the amount of spam that I never see. Much of the spam filtering relies on statistics; a Bayesian filter will assign a spam-likelihood to a message based on the occurrence of certain phrases. The more that words like Viagra, Meds, etc., appear in a message, the more likely it is to be spam.
By telling a system when a message is Spam, it will learn and continuously improve its accuracy. At the same time, it is important to glance through the Spam folder every now to make sure that messages that are tagged as Spam are indeed so. Telling a system about these so-called false-positives is also a way to teach the system and allow it to become more accurate.
So, next time you sigh deeply to complain that a staggering 10 spam messages made it to your Inbox, remind yourself that a few hundred never made it there in the first place, and appreciate that when thousands of people cooperate (even if they do not know it), filtering can be very effective.