Okay, these spambots are getting scary now

Spambots often come in waves. Most of them are caught by my spam filter (typically hundreds a day). But a few always get by and show up in the regular comments section. I delete them, but every now and then one amuses me and I feature it in a “spambot of the day” post. And sometimes there are a whole bunch that get by at once, usually from the same source (which, like most bots, is advertising another website, usually pushing a product or service).

That’s what happened earlier today. The product being promoted was a particular type of cellphone which shall remain nameless. But the strange thing was that, instead of the usual mindless spam, sometimes tangentially related but more often unrelated to the subject matter of the post on which it’s dumped, these were much more finely targeted and written in a way that made them sound more like real comments.

Example: on a thread entitled “The Periodic Table” (and here it is; it’s about, yes, the periodic table), I found the following. Unfinished, yet pertinent and all-too-human-sounding:

I can relate to your feelings about the periodic table, both in junior high science and beyond, but I can go further than that. As a kid growing up, we had various old games in the basement.

Well, I wish the bot had gone further than that, because I’d love to hear about those old games in the basement.

The very same bot (different names, but the same website was being advertised) made three comments on a post of mine entitled “Ah, Brave New World,” in which I’d linked to this article about a woman who wrote a book discussing the future of reproduction without sex. Here are the bot’s three comments, all posted under different but related names:

You listening, Lord? I’m ready to turn Amish now.

In fact, I think I’d prefer it to the world this woman envisions.

Talk about narcissism gone beserk! Thank you, thank you God I’m not a child of these people.

So, here’s my question: did the bot read the post and compose those rather on point comments (with a bit of humor thrown in, too) to fit it? Is the bot therefore an actual person? Or has the program automatically generating these things become that sophisticated?

Ah, Brave New World indeed!

[UPDATE: I just found some more, same source, but they came in earlier today. This one was on a post of mine titled “Netflix thinks,” and it mentions the film “Fiddler on the Roof.” The bot had this to say (and I am not making this up, believe me):

Speaking of Zero Mostel, we just threw out a toilet seat he had autographed. It said “Zero Mostel shat here” and had a cartoon sketch of himself. He “created” it when he played my father-in-law’s tent theater, loafing through Tevye, and still bringing the house down.

Next up, same bot on same post:

The sort of statistical data mining used by Netflix, Amazon, and other online merchants is a rather inexact science, but they keep using it because it does work a lot of the time. Of course, it’s only as good as the data sets that they are working with

My guess is that whoever is generating this particular bot has it down to a more exact science.]

}}} the non player characters in games,

I play computer games. Most times, the bots just CHEAT.

They either use knowledge the user doesn’t have, speed the user can’t have, and/or have their “stats” enhanced to make them “recover” tougher or attack harder… So, if you presume any “character type” has a certain number of “hit points”, ‘z’, which recover a 1/second, and an attack with item ‘x’ causes ‘y’ damage, then the bot will have 1.5z hit points vs a player, recover hit points 2x faster, and/or either do yx2 damage or strike 1.5x as often… things like that.

Game AI has a long way to go to really get close to the singularity… and there’s at least as much impetus to make good game AI as there is to make better spambots.

My guess would be that the selection scripts have gotten much more sophisticated — say you start with 10,000 snippets of text, cross-correlated on keywords. The bot then looks at the primary post and makes best one based on the number of keywords appearing in it.

Neo: Keep in mind that you are actually seeing a potential “observation bias” in your selection, there. You’re noting all the hits, you have no idea how many misses came from the same bot(s), I’d bet.

That is — I can’t imagine you scanned the obviously bad ones to see if they pointed to the same sites as some of the good ones.

A guess, but natural language recognition is really NOT that easy a problem.

That was, BTW, one of the key things about Watson on Jeopardy… it was actually parsing the questions, not being told what to look for. And that’s a VERY controlled, specific, scenario. And it still got it laughably wrong sometimes, ignoring key words that made the answer provided notably clear — example: Final Jeopardy in the first game:

Watson was the only contestant to miss the Final Jeopardy! response in the category U.S. CITIES (“Its largest airport was named for a World War II hero; its second largest, for a World War II battle”). Rutter and Jennings gave the correct response of Chicago, but Watson’s response was “What is Toronto????”

See the wiki entry for guesses as to how it screwed that one up.

P.S., Watson won it handily, since it only wagered a small amount, and was up quite a bit at that point.

Now, sounds promising for spambots? Nope:

Watson is made up of a cluster of ninety IBM Power 750 servers (plus additional I/O, network and cluster controller nodes in 10 racks) with a total of 2880 POWER7 processor cores and 16 Terabytes of RAM. Each Power 750 server uses a 3.5 GHz POWER7 eight core processor, with four threads per core.

That probably doesn’t mean a lot to you non-techies, but it’s a bit more processing power than any spambot is likely to manage to have.

Comments

Okay, these spambots are getting scary now — 17 Comments

M J R on December 12, 2012 at 1:55 pm said:

My guess [^un^educated] in this instance is that there’s minimal software out there that searches for keywords in a blog post, and then inserts words related to a found keyword into the spam post — and then neo gets to deal with the spam post.

Computer geeks out there, I would not at all mind being better educated in this. But the above is my guess (worth what you paid for it).
neo-neocon on December 12, 2012 at 1:59 pm said:

M J R: yes, there is. But I think you miss my point.

A lot of bots are like that; it’s nothing new. A lot of bot comments are at least tangentially related to the subject of the post, or a word that appears in it. But only tangentially, and only in passing. These comments I highlighted here are related to the total post—the message of the post rather than a word here and there. It is much more as though a real human being has read the post and is responding.
vanderleun on December 12, 2012 at 2:36 pm said:

Perhaps the dawning of the Singularity will appear first in spambots. After that…. SkyNet. And after that we will not be spambotted but “Schwartzeneggered!”
vanderleun on December 12, 2012 at 2:40 pm said:

On another note, I’ve noticed the same “intelligence” creep in the spambots hitting my site. I used to be able to spot them all just by their content (or lack thereof). Now I often have to open the original post to see exactly what is going on.

Last week these and other spam attacks were in the thousands and so many were getting through the filters I had to disable commenting until the storm passed.

Even now the assault continues. The spam filter in the last two days has intercepted 2,300 spambots. The leakage has, however, gone down substantially. What was hundreds of spambots getting through over the weekend is now down to a manageable dozen or so a day.
vanderleun on December 12, 2012 at 2:52 pm said:

And looking at the contents of the spam filter right now I note that the latest one caught reads:

“Several of these replies on this post are garbage, You should delete them.”
M J R on December 12, 2012 at 2:56 pm said:

neo, 1:59 pm —

I did miss the entire point; I saw where you were going, but I got sidetracked in my search for a coherent explanation. Now I got it, having needed to be hit over the head with it. Thanks for your forebearance.
artfldgr on December 12, 2012 at 3:42 pm said:

neural net and genetic algorithms…

the same tools behind the non player characters in games, expert systems, etc..
Jamie Irons on December 12, 2012 at 4:50 pm said:

Some of these “bots” (and I have to wonder if that is indeed what they are) are getting close to passing-the-Turing-test territory.

Most interesting.

Jamie Irons
IGotBupkis, Legally Defined Cyberbully in All 57 States on December 12, 2012 at 5:25 pm said:

}}} the non player characters in games,

I play computer games. Most times, the bots just CHEAT.

They either use knowledge the user doesn’t have, speed the user can’t have, and/or have their “stats” enhanced to make them “recover” tougher or attack harder… So, if you presume any “character type” has a certain number of “hit points”, ‘z’, which recover a 1/second, and an attack with item ‘x’ causes ‘y’ damage, then the bot will have 1.5z hit points vs a player, recover hit points 2x faster, and/or either do yx2 damage or strike 1.5x as often… things like that.

Game AI has a long way to go to really get close to the singularity… and there’s at least as much impetus to make good game AI as there is to make better spambots.

My guess would be that the selection scripts have gotten much more sophisticated — say you start with 10,000 snippets of text, cross-correlated on keywords. The bot then looks at the primary post and makes best one based on the number of keywords appearing in it.

Neo: Keep in mind that you are actually seeing a potential “observation bias” in your selection, there. You’re noting all the hits, you have no idea how many misses came from the same bot(s), I’d bet.

That is — I can’t imagine you scanned the obviously bad ones to see if they pointed to the same sites as some of the good ones.

A guess, but natural language recognition is really NOT that easy a problem.

That was, BTW, one of the key things about Watson on Jeopardy… it was actually parsing the questions, not being told what to look for. And that’s a VERY controlled, specific, scenario. And it still got it laughably wrong sometimes, ignoring key words that made the answer provided notably clear — example: Final Jeopardy in the first game:

Watson was the only contestant to miss the Final Jeopardy! response in the category U.S. CITIES (“Its largest airport was named for a World War II hero; its second largest, for a World War II battle”). Rutter and Jennings gave the correct response of Chicago, but Watson’s response was “What is Toronto????”

See the wiki entry for guesses as to how it screwed that one up.

P.S., Watson won it handily, since it only wagered a small amount, and was up quite a bit at that point.

Now, sounds promising for spambots? Nope:

Watson is made up of a cluster of ninety IBM Power 750 servers (plus additional I/O, network and cluster controller nodes in 10 racks) with a total of 2880 POWER7 processor cores and 16 Terabytes of RAM. Each Power 750 server uses a 3.5 GHz POWER7 eight core processor, with four threads per core.

That probably doesn’t mean a lot to you non-techies, but it’s a bit more processing power than any spambot is likely to manage to have.
IGotBupkis, Legally Defined Cyberbully in All 57 States on December 12, 2012 at 5:29 pm said:

Actually, Jamie, no — even Watson couldn’t get close to the Turing test. See above for my guess as to what the spambots are actually doing.
Baltimoron on December 12, 2012 at 9:13 pm said:

The coding for something like this is actually very simple, but its also very tedious, and it wouldn’t run very fast. But if the payoff is that they get a post up longer and more people read it…
IGotBupkis, Legally Defined Cyberbully in All 57 States on December 12, 2012 at 10:45 pm said:

Baltimoron — yes, you spend a lot more time on creating the database you pull entries from, and adding keywords that don’t appear in its own text.
Jewel on December 13, 2012 at 2:41 am said:

It wouldn’t surprise me in the least if the new economic reality generated jobs for out of work college grads paid by websites to write commentary that includes links back to the website. Vanderleun’s site was getting strange encrypted types of messages that were only letters and numbers. I was trying to imagine just how bad it must have been for the incoherent ones to get through like that.
Baltimoron on December 13, 2012 at 3:59 pm said:

Bupkis,

I figure someone took a dozen or more common subjects for blog postings and identified three or four keywords that could identify each one. Once you’ve categorized a blog, you build a string using a series of if/else statements that have searches for a second set of more specific keywords as their arguments.
As I said, it would be tedious, but a smart person could write a program that gives a very good approximation of a real post.
Pat on December 14, 2012 at 10:33 am said:

I confess. It was me that wrote:

“Speaking of Zero Mostel, we just threw out a toilet seat he had autographed. It said “Zero Mostel shat here” and had a cartoon sketch of himself. He “created” it when he played my father-in-law’s tent theater, loafing through Tevye, and still bringing the house down.”

It must have been in a blog comment. Only about 5 people in the world would know about that toilet seat and its history. I think the bot is mining your blog’s comments and matching keywords.
Pat on December 14, 2012 at 10:38 am said:

Yep, the bot stole my comment from here.

I googled “Zero Mostel shat here” with the quotes.
neo-neocon on December 14, 2012 at 1:14 pm said:

Pat: aha! I think you are correct. Makes sense.

Sneaky, clever bots. But not so brilliant as I’d thought.

HTML tags allowed in your comment: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>

The New Neo

A blog about political change, among other things

Okay, these spambots are getting scary now

Comments

Okay, these spambots are getting scary now — 17 Comments

Leave a Reply