7.12.2008

The Email Killer: Could AI Strangle the Internet?

Sometimes, questions trouble me. This is one of them.

The Premise

Consider this:

1. We distinguish spam by using a variety of techniques that help sort out human comments, emails, and requests from bot comments, emails, and requests. Basically, we use AI's failures in the Turing Test to filter spam.

So far we have done well. Surprisingly well. CAPTCHAs has been successful, and GMail's spam filter is down right impressive.

2. The field of articial intelligence loves to foil the Turing Test. We are working and progressing in machine vision. We are also beginning to create robots that appear to be emotional and conversational (yes, I know there is very far to go ... but progress is being made).

The Question

Don't the folks over at point 1 completely rely on the failure of the folks working over at point 2? Couldn't the success of AI result in a muddled, cluttered web, making some of our most valuable tools nearly unusable?

A Few Words from "The Know"

Luis von Ahn, the creator of CAPTCHA's addresses this problem at an official CAPTCHA page:
CAPTCHA tests are based on open problems in artificial intelligence (AI): decoding images of distorted text, for instance, is well beyond the capabilities of modern computers. Therefore, CAPTCHAs also offer well-defined challenges for the AI community, and induce security researchers, as well as otherwise malicious programmers, to work on advancing the field of AI. CAPTCHAs are thus a win-win situation: either a CAPTCHA is not broken and there is a way to differentiate humans from computers, or the CAPTCHA is broken and an AI problem is solved.
With apologies to von Ahn, I don't quite agree with his "win-win" assessment, but that is for later. For right now, are we even close to these AI problems being solved?

Well, Are We Close?

Yes, we are.

Just to pick on CAPTCHA a bit (it seems to be a good example), programmers are becoming increasingly proficient at breaking the CAPTCHA tests. If you're really interested, look here and here for some examples.

But to highlight the real problem for CAPTCHAs and protecting against spam in general, take a look at this paragraph from the Blight Watch:
The problem is that making Captchas more difficult shuts out more and more legitimate users. For most commercial purposes, designers want to make their websites and services easily available. Difficult Captchas have become tollgates that slow down or turn away traffic. Today, 20% of state-of-the-art Captchas are not solved correctly on the first try (and often, there’s no second try). At the same time, bots have evolved to the point that commercially available software can successfully defeat the most difficult Captcha 10-15% of the time.
To clarify things, the best bots are almost as proficient as we are in solving CAPTCHAs. And just to show that CAPTCHA isn't the only spam filter having a problem, take a look at this article from CNET: Spammers are winning -- and it's not even close. We're fighting a uphill battle, and it looks like we might just be playing in the foothills.

What are the Consequences?


The value of the web lies in its ability to provide data quickly - whether it is to access email from a close friend, look up a wikipedia reference on hairballs, or leave scathing comments on this blog.

Excessive spams undermines the value of this data. If it takes five minutes to find an email in a sea of spam, then emailing no longer becomes a time-saving endeavor. It becomes stressful, time-consuming, and ultimately not worth it. If I have to wade through 50 advertisements before I reach your comment on this post, then the value of your opinion is diminished.

Craigslist? Worthless.
Digg? Littered.
Ticketmaster? Ruined.
Blogs? Forget a dialogue. Might as well print a paper.

Even beyond that, every piece of that spam must also be shipped through the same wires you are trying to stream the latest episode of LOST through. Welcome to a slower web.

So back to von Ahn . . .

Looking one more time at a piece of that quote:
CAPTCHAs are thus a win-win situation: either a CAPTCHA is not broken and there is a way to differentiate humans from computers, or the CAPTCHA is broken and an AI problem is solved.
On one hand, yes. Absolutely. Someone will win. On the other hand? A win for AI means one heck of a loss for the internet as a whole.

I'll be the first to admit that my scenario might be a bit far-fetched and exaggerated. But it is a possibility, isn't it? There will be innovations in the spam prevention field, but at the heart of the matter, if it boils down to "Are you a bot or are you a person?," I have to say that I'm not terribly optimistic.

So I challenge someone to give me an intellectual back-rub. Let me know why I'm wrong. Let me know why the internet won't get overrun by spam. Let me know why AI is not about to strangle the internet.

Virtual high-five to any responses.

No comments: