Cleaning up the mess – analysing spam blog comments

I keep this blog mostly for my personal journal, so I won’t forget what I worked on, and to keep my writing skills up. I never thought anyone would seriously be interested in what I wrote, so I’ve paid little attention to the comments I received for my posts. To keep the steady flow of spam at minimum, I’ve set up Spam Karma 2, and didn’t worry too much about it anymore.

Just now when I entered the admin area for routine upgrade, I’ve noticed that beside the 2500 spam comments I also have about 30 not marked as spam, and I became curious. Are these real comments, or have they bypassed the spam filter?

Blogs are basically spammed for two reasons, either the network trolls pick your post for food and start munching away on each others’ comments, or it’s used as part of some Search Engine Optimization (SEO) scheme which is based on the fact that the search engines consider a potential hit more relevant if more links point at it.

I’ve blogged about some “now” topics, like time-lapse and Android tablets, and it seems that it’s a kind of sweet spot to those who just want to sell their junk be it non-subscription medicine or fashionable cheap sunglasses.

The reason I think this deserves it’s own post is because I’ve read through some of the comments and it showed that several different engines of a varying level of relevancy are in use for spamming. There is always a link provided as the comment’s author’s own site, which is the page that the SEO is for. There is also an e-mail address that is valid, but usually is a machine generated one.

The simplest of all is just a elaborately worded congratulation on the article or a promise to distribute it on other channels like reddit. Well it’s just social engineering, all about flattering the author, so he will show it on among his other comments. These kind of comments are using a template and are so blend they can be used on basically any post. The fixed template is it’s main weakness. Even the simplest spam filters can be trained to detect these fixed templates. How they got through? Probably my spam template database didn’t yet contain the exact template when it arrived.

The next type is the machine generated lorem ipsum type spams. There are subtle differences among these as well. The simplest of these is the web scraper based commenter, which takes a – possibly completely random – part of an other page and is publishing it as a post. Some of these are so crude, they cut the first and last words in half. It’s quite difficult to detect these, as the text contained in it is coherent, since these texts are published several sites, the spam filter can use the comments reported by others as templates and detect them.

A more interesting approach is when several key phrases are used as seeds and are put in random context with a lorem ipsum type generator. These generators vary in complexity, some look like an alphabet soup, but some have punctuation and capitalization as if it was proper text. The idea of using this type of spam is that by posting the keywords the page pointing at the target will be even more relevant scoring higher on the SEO. How are these handled in the spam filter? Well the text is generated usually using words from the very same page/blog. The link that is the target of the SEO is what it gives it away.

The “best” ones I’ve found were both readable and almost relevant to the article itself. They were surely written by humans and some of those were so relevant I almost accepted them. If my blog had more traffic and commenters they would have almost certainly have passed as valid comments. I strongly suspect these were written by actual people and are a the most relevant boilerplate comment (they are always positive and supportive) is selected based on the analysis of the actual article. I think these templates are collected from various forums by people and are categorized and regularly changed to bypass the filters.

At the end I became so paranoid I marked all my comments as spam. If there was anyone whose actual comment I inadvertently removed, I’m sorry!

NB: this article I intend to use as a kind of honeytrap. All comments (passing Spam Karma) on it will be preserved and allowed so as to prove my point.
If an actual human is about to share his thoughts, he’s welcome, but please state I’m an actual human, just to avoid confusion. 🙂

31 thoughts on “Cleaning up the mess – analysing spam blog comments

  1. My pal recommended I’ll like this site. He / she was entirely perfect. This particular blog post definitely built my own day time. A person can not consider simply the fact that whole lot moment I had created invested because of this details! Thank you!

  2. Interesting entry. Let’s see where my comment ends up. I am a human, despite what you might think 😉

  3. 999 Nearly every certified photographer is aware how worthwhile this can be. Ladron se la folla mientras duerme, If that service fails, then the obligation rests, cheap louis vuitton replicated handbags not on your own on the Armenians, louis vuitton purses 2011 but to some impressive degree upon those people nations..
    bracelet charm
    [url=http://ssilverearrings.blogspot.com/]bracelet charm[/url]

  4. 999 Nearly every pro photographer is aware of how essential this can be. Ladron se la folla mientras duerme, If that assistance fails, then the responsibility rests, louis vuitton price list not on your own upon the Armenians, louis vuitton mahina but to your terrific diploma on all those nations..
    london links friendship bracelet
    [url=http://diybraceletsidea.blogspot.com/]london links friendship bracelet[/url]

  5. 999 Any professional photographer understands how key this is certainly. Ladron se la folla mientras duerme, If that support fails, then the obligation rests, louis vuitton backpack cheap not by itself upon the Armenians, fake louis vuitton wallet but to a tremendous diploma on those people nations..
    Insert Underline
    [url=http://sososorry1990.livejournal.com/829.html]Insert Underline[/url]

  6. Out of basketball, Buber triggers McDevitt courtesy of in the role of all of the period category classroom indicitive of in the higher education council. When the able, along with larger difficult for a yearbook committee who has happen to be involved with all of the look ministry procedure. Younger crowd was in fact a great escort about Homecoming Occasion.

  7. Once you however feel the particular carrier hanging around the left arm is one of tasteful approach case , a ladies handbag designers have begun to placed the show inside the stomach ! Beckhams manner legend is amongst the first so that you can scent that gives you trendiness leading , around Victoria Beckham The year 2013 the fall and winter selection of a indicate floorboards , VB products is a substantial list of posture management of this is brand totes . In the present Venice show floors , Celine, Phil GN, Chloe ladies handbag of the identical type usually are stuck from the waistline, choose whether or not a big difference carrier method?

  8. Classy, prolonged band to accommodate substantial searching companion. Flip-up pvc travelling bag is a timeless L-code bags. Empowered by the Nippon art work with origami, uncomplicated yet definitely delightful folding tote is one of the global most popular items. Abruptly, constructed from ultra-lightweight nylon material, along with Russian leather-based lean and stylish, this line contains a vibrant and various every 3 months decoration Dian colors to choose from.

  9. I’m still learning from you, as I’m making my way to the top as well. I definitely liked reading all that is posted on your site.Keep the tips coming. I liked it!

Comments are closed.