Chris Roberts

Combatting Comment Spam

• Posted in Technology

Despite being a blog of modest (low) visitor numbers, the backlinks that I have acquired have had the negative effect of attracting comment spammers. The complete lack of any form of moderation system on the site did nothing to dissuade the spammers, either! Call me naive but I certainly didn't appreciate quite how much of a problem comment spam is.

For context, at the time of writing this blog is receiving around 165 spam comments per day.

So, given that this blog is based on a blog engine written in my spare time, these were the options open to me...

  1. Continue to manually 'prune' the comments on a daily basis.
  2. Alter the blog engine such that every comment requires manual approval.
  3. Migrate the blog to a different blog engine, with baked-in spam filtering.
  4. Install some form of Captcha to help reduce the volume of spam.
  5. Implement a 'proper' spam filtering system in my existing platform.

The first two options will require more daily administration than I want to commit to. Migrating to another blog engine would involve quite a bit of work up-front and would deny me a hobby project.
From other projects I've worked on I'm aware that Captcha's are no silver bullet and can be a significant nuisance to the end-user.

This leaves me with the only option of implementing some form of 'proper' spam filtering system. Whilst I did write my own blog engine, even I draw the line at attempting to implement my own spam detection algorithm.

Looking through the spam I have been receiving, there's a good proportion that could be removed with a simple dictionary of 'blacklisted' words. There are, however, a significant number of comments that would be difficult to automatically classify as spam using such simple techniques.

Saying Goodbye to Spam

After a bit of research, I came across Akismet - the self confessed "best way in the world to protect you from web spam". Akismet is a hosted service which you can run your comments past to get an immediate 'Spam / No-Spam' indication.

After signing up for a free account, you can simply download one of a number of plugins for off the shelf blog engines, or APIs for various different languages. In my case, I downloaded the .NET API, followed the sample code and had it up and running within about 15 lines of code.

There are further operations available through the API, including the ability to tell Akismet that a comment that it has classified as spam is actually 'ham' (Akismet's name for a real comment), or vice-versa.

So far - after a whole 24 hours of use - I couldn't be happier. Akismet's performance appears to be faultless having correctly identified all spam comments received.

I should also point out that whilst Akismet offer free accounts for personal sites, they do charge for corporate or high-volume sites. They're also more than happy for you to pay what you feel their service is worth to you. Assuming the service continues to perform as well as it has done so far, I will certainly be happy to send some money their way!