Defeating manual spam, or damn dastardly conniving commenters!

19. October 2009

I'm not especially keen on another meta-blog post, but the issue came up in email recently and I've this penchant for expounding at some length on interesting subjects, even in the least suitable medium, target audience: one. Fortunately I have a blog.

Not so fortunately, manually entered spam has been an issue. When you optimize for humans you regrettably include manual spammers.

Such spam is surprisingly devious, but here are some common characteristics:

  • Complimentary: "Wow this is a great post!"
  • Relative: "I don't like spam."
  • Unconstructive: Adds nothing of value.
  • Disingenuous: "I appreciate it because..."
  • Erroneous: "...this should keep spam out of my email inbox."

See! Manual spam is recognizably crude.

ego spam

Of course the payload is the link, and they aren't all blatant advertisements, but even sites which might appear legit may advertise themselves unscrupulously. As expected they will lack real content.

Here with BlogEngine.NET the payload is put in the website field and not the body (the name field is the link text). Neither asking for your name nor indicating the website field is ignored by search engines made a difference.

Akismet did however, and thus far I've had zero false positives, only false negatives. Some have been crafted so cleverly as to be very close, but after investigating I've concurred. If Akismet marks a unique (but crude) comment as spam I expect the link to be unsatisfactory given that it's the defining constant.

My recent addition of reCaptcha seems to have made the largest difference. Most likely because there's now some difficulty involved. I actually feel pretty good about this because the duality of distinguishing computers from humans while simultaneously solving complex problems that computers do poorly absolutely fascinates me. Given that solving my captcha is now no longer a technical waste of time, I know some readers will begin to feel that it isn't as well. ;)

Implementing a naive captcha in BlogEngine.NET

30. January 2009

10-04-09 Keith Ratliff went to the very involved work of converting BlogEngine's comment submission process from JavaScript-centric to postback and standard ASP.NET validation, thereby enabling a more or less drag and drop installation of reCAPTCHA. Hooray Keith! Fantastic work.

A couple years ago Mad Kristensen implemented an invisible captcha into BlogEngine.NET, but as my blog has attested to, this is not enough.

Instead of inconveniencing readers with a captcha, you can use your own clever validation trick. The more unique it is, the less likely it will be automatically discovered and circumvented. When it is, you need a new trick.

A naive captcha is basically a captcha that's always the same image, and works off of the principle that you're site isn't important enough for spammers to manually specify (how cheerful!), but if it's good enough for Coding Horror it's good enough for me.

Of course being an image itself resists the automated discovery of this particular trick, and if it is discovered, manually or otherwise, it's easy to change the image (it need not even be of text).

Implementing my own naive captcha here has been quite effective so far. My next step may be Akismet for manually entered spam.

Implement your own

The patched (against vanilla BlogEngine.NET 1.4.5) files are available here. For making the change to your existing and customized blog, take a look at this comparison courtesy of Beyond Compare 3, or view the compact version below, this post needed some color.

You'll want to change the paths and formatting in CommentView.ascx to suit your liking, also the word "chicken".

Oh, and don't forget that my code sucks, sometimes intentionally even, because I'm lazy. Someone please be my guest and make this a properly coded BlogEngine.NET extension. Furthermore my first attempt was with the strictly-server-side RegularExpressionValidator control you see commented out below, which I couldn't get to work, so I used existing mechanisms instead.

Modified (check margin) lines are in red. Unimportant differences are in blue (mostly, the JavaScript isn't truly commented). The rest is context.

Powered by my custom BlogEngine.NET. Content © 2010 Christopher S. Galpin