Installing akismet
akismet is an excellent WordPress plugin is a Web-based service combined with a WordPress plugin that, with essentially no configuration from the blog administrator, automatically kills comment spam. I’ve had it up for a day now and it seems to be 100% effective, in that I was getting dozens of comment spam a day before, versus zero that have gotten through since, I brought up akismet. I’m impressed; this solution kicks a thousand kinds of ass. But installing it did have one little catch, and I had one other concern I wanted to get out of the way before going down this road.
The instructions on the official site, brief as they are, manage to be confusing. They instruct you to get an API key from the “My Account” page on your WordPress blog. But you don’t get an API key from the WordPress installation you’re adding akismet to - you have to sign up for a wordpress.com hosted blog (on this page) and get the key from the totally separate blog you create there, which blog you’ll presumably never use if you have your own.
The whole thing is kind of awkward and wasteful, and I wonder, just a little wistfully, why the akismet folks would put this hoop here for us to jump through. Isn’t there some kind of identifier computed as a secure hash on some combination of configuration options that could be used instead? Something like the MD5 of the blog’s name concatenated with a user-supplied password? Or a key managed within the framework of the akismet website, and associated with the blog that is actually running the akismet plugin? I can’t make sense of the current scheme, which encourages users to use the same API key for multiple blogs and is not especially useful as an identifier.
The only other thing that slightly bothers me about akismet is its opaqueness. This I have reconciled myself to. Just as with Google’s PageRank, akismet’s spamminess tests actually do benefit from security through obscurity. You only need to keep the algorithm from being gamed by would-be spammers until you change it again, not for the long term, and that’s precisely the kind of protection that obscurity grants. You get an additional period of time (a constant term added to the complexity), rather than having each period getting longer (some higher-order operation on the complexity) as you would by strengthening the algorithm or building its knowledge base/bias. Even that increment is useful in this setting, as opposed to being useless in (e.g.) crypto. For AI-related reasons, it is harder to improve comment-spam-fighting algorithms than it is to improve spamming techniques, so those extra days and weeks are important in buying time to improve the algorithm, or its bias, again. Hence, I’ve chosen to accept some unknowns in order to get function, rather than mess around with a really open solution that will certainly have greater vulnerabilities.
That said, I bet they’ve got some really cool stuff going on under the hood - a collection of hacks and sidecars and special-purpose classifiers that a biologist could recognize and love. I would love to get a look.

entirely safe and fun » Blog Archive » More meta:
[…] entirely safe and fun Very helpful! « Installing akismet […]
1 May 2006, 8:45 pmMatt:
There is definitely some interesting stuff under the hood. It’s one of the funnest things I’ve worked on in a long time.
1 May 2006, 11:34 pmColin:
Thanks for coming by! If you ever feel it would be appropriate to chat about some of the tech, even abstractly, I’d love to set up a talk. My main interest, and my academic and real-world career, has been (machine) learning systems in the real world, and in particular I’m fascinated by the strong biases that functioning systems exhibit - those special-purpose gadgets I was referring to in my last para being a great example of precisely the sort of thing that ML system designers tend to disavow but then incorporate, which I think misses an important point about how successful biological systems work. We’re better off embracing engineering approaches from the get-go.
akismet is one of those places where the rubber meets the road, up against motivated humans and their bot creations. You’ve got a steady supply of state-of-the-art adversaries, which is an awesome resource. You’ve even got volunteer human labelers, be they ever so unreliable, who help in generating a huge data set - human-labeled data was, for some .coms, one of those expenses which made their business model impractical. You’ve got the tech. I would love to learn about it.
2 May 2006, 7:36 amColin:
I just looked at my last comment… Geez, what a fanboy. sigh
2 May 2006, 7:37 amentirely safe and fun » Blog Archive » The wild Adirondacks:
[…] I’ve reported before on my installation of the Akismet comment-spam-fighting plugin, and it continues to amaze with its accuracy and precision. As far as I know, just one spam comment has gotten through since I installed it, and it has yet to flag a legit comment as spam. […]
24 July 2006, 10:52 am