A CAPTCHA is a program that can generate and grade tests that humans can pass but current computer programs cannot. For example, humans can read distorted text as the one shown below, but current computer programs can't:
The term CAPTCHA (for Completely Automated Public Turing Test To Tell Computers and Humans Apart) was coined in 2000 by Luis von Ahn, Manuel Blum, Nicholas Hopper and John Langford of Carnegie Mellon University. At the time, they developed the first CAPTCHA to be used by Yahoo.
CAPTCHA or Captcha (pronounced as cap-ch-uh) which stands for “Completely Automated Public Turing test to tell Computers and Humans Apart” is a type of challenge-response test to ensure that the response is only generated by humans and not by a computer. In simple words, CAPTCHA is the word verification test that you will come across the end of a sign-up form while signing up for Gmail or Yahoo account. The following image shows the typical samples of CAPTCHA.
Almost every Internet user will have an experience of CAPTCHA in their daily Internet usage, but only a few are aware of what it is and why they are used. So in this post you will find a detailed information on how CAPTCHA works and why they are used.
What Purpose does CAPTCHA Exactly Serve?
CAPTCPA is mainly used to prevent automated software (bots) from performing actions on behalf of actual humans. For example while signing up for a new email account, you will come across a CAPTCHA at the end of the sign-up form so as to ensure that the form is filled out only by a legitimate human and not by any of the automated software or a computer bot. The main goal of CAPTCHA is to put forth a test which is simple and straight forward for any human to answer but for a computer, it is almost impossible to solve.
What is the Need to Create a Test that Can Tell Computers and Humans Apart?
For many the CAPTCHA may seem to be silly and annoying, but in fact it has the ability to protect systems from malicious attacks where people try to game the system. Attackers can make use of automated softwares to generate a huge quantity of requests thereby causing a high load on the target server which would degrade the quality of service of a given system, whether due to abuse or resource expenditure. This can affect millions of legitimate users and their requests. CAPTCHAs can be deployed to protect systems that are vulnerable to email spam, such as the services from Gmail, Yahoo and Hotmail.
Who Uses CAPTCHA?
CAPTCHAs are mainly used by websites that offer services like online polls and registration forms. For example, Web-based email services like Gmail, Yahoo and Hotmail offer free email accounts for their users. However upon each sign-up process, CAPTCHAs are used to prevent spammers from using a bot to generate hundreds of spam mail accounts.
Designing a CAPTCHA System.
CAPTCHAs are designed on the fact that computers lack the ability that human beings have when it comes to processing visual data. It is more easily possible for humans to look at an image and pick out the patterns than a computer. This is because computers lack the real intelligence that humans have by default. CAPTCHAs are implemented by presenting users with an image which contains distorted or randomly stretched characters which only humans should be able to identify. Sometimes characters are striked out or presented with a noisy background to make it even more harder for computers to figure out the patterns.
Most, but not all, CAPTCHAs rely on a visual test. Some Websites implement a totally different CAPTCHA system to tell humans and computers apart. For example, a user is presented with 4 images in which 3 contains picture of animals and one contain a flower. The user is asked to select only those images which contain animals in them. This Turing test can easily be solved by any human, but almost impossible for a computer.
Breaking the CAPTCHA
The challenge in breaking the CAPTCHA lies in real hard task of teaching a computer how to process information in a way similar to how humans think. Algorithms with artificial intelligence (AI) will have to be designed in order to make the computer think like humans when it comes to recognizing the patterns in images. However there is no universal algorithm that could pass through and break any CAPTCHA system and hence each CAPTCHA algorithm must have to be tackled individually. It might not work 100 percent of the time, but it can work often enough to be worthwhile to spammers.
Where Can I Get a CAPTCHA For My Site?
A free, secure CAPTCHA implementation is available here from the reCAPTCHA project.
Applications of CAPTCHAs
CAPTCHAs have several applications for practical security, including (but not limited to):
Preventing Comment Spam in Blogs. Most bloggers are familiar with programs that submit bogus comments, usually for the purpose of raising search engine ranks of some website (e.g., "buy penny stocks here"). This is called comment spam. By using a CAPTCHA, only humans can enter comments on a blog. There is no need to make users sign up before they enter a comment, and no legitimate comments are ever lost!
Protecting Website Registration. Several companies (Yahoo!, Microsoft, etc.) offer free email services. Up until a few years ago, most of these services suffered from a specific type of attack: "bots" that would sign up for thousands of email accounts every minute. The solution to this problem was to use CAPTCHAs to ensure that only humans obtain free accounts. In general, free services should be protected with a CAPTCHA in order to prevent abuse by automated programs.
Online Polls. In November 1999, http://www.slashdot.org released an online poll asking which was the best graduate school in computer science (a dangerous question to ask over the web!). As is the case with most online polls, IP addresses of voters were recorded in order to prevent single users from voting more than once. However, students at Carnegie Mellon found a way to stuff the ballots using programs that voted for CMU thousands of times. CMU's score started growing rapidly. The next day, students at MIT wrote their own program and the poll became a contest between voting "bots." MIT finished with 21,156 votes, Carnegie Mellon with 21,032 and every other school with less than 1,000. Can the result of any online poll be trusted? Not unless the poll ensures that only humans can vote.
Preventing Dictionary Attacks. CAPTCHAs can also be used to prevent dictionary attacks in password systems. The idea is simple: prevent a computer from being able to iterate through the entire space of passwords by requiring it to solve a CAPTCHA after a certain number of unsuccessful logins.
Search Engine Bots. It is sometimes desirable to keep webpages unindexed to prevent others from finding them easily. There is an html tag to prevent search engine bots from reading web pages. The tag, however, doesn't guarantee that bots won't read a web page; it only serves to say "no bots, please." Search engine bots, since they usually belong to large companies, respect web pages that don't want to allow them in. However, in order to truly guarantee that bots won't enter a web site, CAPTCHAs are needed.
Worms and Spam. CAPTCHAs also offer a plausible solution against email worms and spam: "I will only accept an email if I know there is a human behind the other computer." A few companies are already marketing this idea.
Guidelines
If your website needs protection from abuse, it is recommended that you use a CAPTCHA. There are many CAPTCHA implementations, some better than others. The following guidelines are strongly recommended for any CAPTCHA:
Accessibility. CAPTCHAs must be accessible. CAPTCHAs based solely on reading text — or other visual-perception tasks — prevent visually impaired users from accessing the protected resource. Such CAPTCHAs may make a site incompatible with Section 508 in the United States. Any implementation of a CAPTCHA should allow blind users to get around the barrier, for example, by permitting users to opt for an audio CAPTCHA.
Image Security. Images of text should be distorted randomly before being presented to the user. Many implementations of CAPTCHAs use undistorted text, or text with only minor distortions. These implementations are vulnerable to simple automated attacks. For example, the CAPTCHAs shown below can all be broken using image processing techniques, mainly because they use a consistent font.
Script Security. Building a secure CAPTCHA is not easy. In addition to making the images unreadable by computers, the system should ensure that there are no easy ways around it at the script level. Common examples of insecurities in this respect include: (1) Systems that pass the answer to the CAPTCHA in plain text as part of the web form. (2) Systems where a solution to the same CAPTCHA can be used multiple times (this makes the CAPTCHA vulnerable to so-called "replay attacks").
Security Even After Wide-Spread Adoption. There are various "CAPTCHAs" that would be insecure if a significant number of sites start using them. An example of such a puzzle is asking text-based questions, such as a mathematical question ("what is 1+1"). Since a parser could easily be written that would allow bots to bypass this test, such "CAPTCHAs" rely on the fact that few sites use them, and thus that a bot author has no incentive to program their bot to solve that challenge. True CAPTCHAs should be secure even after a significant number of websites adopt them.