seminar report of captcha

28
A Seminar Report ON CAPTCHA Submitted in partial fulfillment of the requirements for the award of the degree of Bachelor of Technology In INFORMATION TECHNOLOGY (2011 -2012) By Department of Information Technology

Upload: abhishek-ghosh

Post on 21-Apr-2015

148 views

Category:

Documents


19 download

DESCRIPTION

Help in understanding captcha

TRANSCRIPT

Page 1: Seminar Report of Captcha

ASeminar Report

ON “CAPTCHA”

Submitted in partial fulfillment of the requirements for the award of thedegree of

Bachelor of TechnologyIn

INFORMATION TECHNOLOGY(2011 -2012)

By

Department of Information Technology

BALDEV RAM MIRDHA INSTITUTE OF TECHNOLOGY

JAIPUR

SUBMITTED TO:- SUBMITTED BY:-JITENDRA SHARMA ABHISHEK KUMAR GHOSH (SEMINAR GUIDE) ROLL NO: 08EBMITOO3

Page 2: Seminar Report of Captcha

BALDEV RAM MIRDHA INSTITUTE OF TECHNOLOGY

JAIPUR

Department of Information Technology

CERTIFICATECERTIFICATEThis is to certify that the Seminar titled

“CAPTCHA”

is a bonafide work carried out by following final year student

ABHISHEK KUMAR GHOSH

Under our guidance towards the partial accomplishment for the award

of the degree of Bachelor of Engineering (Information

Technology) by RAJASTHAN TECHNICAL UNIVERSITY during

the academic Year of 2011 –2012, of required standard.

GUIDE HEAD OF DEPARTMENT [Mr. JITENDRA SHARMA] [Mr. NIMISH ARVIND]

Page 3: Seminar Report of Captcha

ACKNOWLEDGEMENT

Achieving a milestone for any person alone is extremely difficult. However

there are some motivators who come across the curvaceous path like

twinkling star in the sky and make our task much easier. It becomes our

humble and foremost duty to acknowledge all of them.

My ethical accountability is to be extremely indebted to Mr. Nimish

Arvind (HOD) for his excellent guidance. I am highly obliged and thankful to

my seminar guide Mr. Jitendra Sharma, who provided immense support as

answer to my extreme queries that i kept firing at them during the preparation

of the seminar. I would also like to thank Ms. (Deepika Sainani) whose

support and cooperation helped in conducting the study smoothly.

I owe my sincere thanks to Principal Dr.R.K.Khanna (BMIT,Jaipur),

who provided me required guidance and facility.

Last but not least, we pay our sincere thanks and gratitude to all the

staff members of Baldev Ram Mirdha Institute of Technology, to provide

excellent opportunity and environment throughout my preparation of the

seminar. I am also thankful to all our colleagues and staff members for their

co-operation and support.

Page 4: Seminar Report of Captcha

PREFACE

Seminar Presentation forms an integral component of any professional

course. The institute where we pursue our studies can not provide that

practical knowledge on all aspects of learning. Often the study of a subject is

said to be incomplete until the student has been exposed to its practical

aspects. The theoretical studies provide the pools of knowledge whereas the

practical application make agile and competent.

As the important part of the engineering curriculum, each student has

to undergo through the Seminar Presentation. This B.Tech. course Seminar

helps a student in getting acquainted with the manner in which his/her

knowledge is being practically, normally different from what he/she has

learnt from books. Hence, when the student switches from the process of

learning to that of implementing his/her knowledge, he/she finds an abrupt

change. This is exactly why this seminar session during the B.E curriculum

becomes all the more important.

Seminar presentation is prescribed for the student of Technical College

as a part of the four year degree course of engineering by the AICTE. We are

required to give the presentation on any of the current topics or technology.

As an engineering student, I had opportunity to study and present the seminar

on “CAPTCHA”

Abhishek kumar ghosh

(08EBMIT003)

Page 5: Seminar Report of Captcha
Page 6: Seminar Report of Captcha

Contents

S.No Topic Page No.

1 Cover Page 1

2 Certificate 2

3 Acknowledgment 3

4 Preface 4

5 Contents 5

4 Abstract 6

5 Why use CAPTCHAS 7

6 Definitions 8

7 Types of CAPTCHAS 9

8 Major Areas Of Applications 11

9 ReCAPTCHA 14

10 Breaking of CAPTCHAS 16

11 New Proposed Approaches 17

12 Conclusion 19

13 Bibliography 20

Page 7: Seminar Report of Captcha

ABSTRACT

Use of INTERNET has remarkably increased Globally in the past 10-12 years and so is the need of the Security over it. Marketing and Advertisement over INTERNET has seen companies like GOOGLE being made, which at the moment is traded at 181 billion USD i.e. Almost twice of General Motors, McDonalds combined.

Well this presentation is about Security achieved over Internet using CAPTCHAS. CAPTCHAS are basically software programs which act as a test to any user over internet that the person (user) is a human or another machine. This concept is used by all the big companies over internet Google, yahoo or facebook (name any).So what are these CAPTCHAS? And what are their possible applications? This is what we cover in our presentation.

Page 8: Seminar Report of Captcha

Why USE CAPTCHAS

Well to completely understand its usage one can consider this story. Few years ago (November 99) www.Slashdot.org(a popular site in US) conducted following poll on internet.

Now students at CMU and MIT instantly wrote a program which increased their vote counts using software and ultimately the poll had to be taken down because both MIT and CMU had millions of votes while others struggled to reach thousands.

Page 9: Seminar Report of Captcha

There are situations like these where youneedtodistinguishwhether user is a machine or a computer. This is where we use CAPTCHAS.

Page 10: Seminar Report of Captcha

DEFINITIONS

CAPTCHA stands for

Completely Automated Public Turing test to tell Computers and Humans Apart

A.K.A. Reverse Turing Test, Human Interaction Proof

Turing Test: to conduct this test two people and a machine is needed here one person acts as an interrogator sitting in a separate room asking questions and receiving responses and goal of machine is to fool the interrogator.

The challenge here: develop a software program that can create and grade challenges most humans can pass but computers cannot.

Page 11: Seminar Report of Captcha

Types of CAPTCHAS

There are basically 3 types of CAPTCHAS

1.Text Based: These are the most commonly used CAPTCHAS. It can be further be divided into 3 parts:

GIMPY : Initially used by yahoo ,in this CAPTCHA two steps are followed as:

a) Pick a word or words from a small dictionary

b) Distort them and add noise and background

GIMPY-R: This was used by google and was basically a simple advance over gimpy. Here instead of a complete word individual letters are noised instead of complete words. steps followed are as

a) Pick random letters

b) Distort them, add noise and background

SIMARD’S: here further advances made and arcs being made into it ie. Curved geometrical shapes. Hence steps followed are as

a)Pick random letters and numbers

b)Distort them and add arcs

Page 12: Seminar Report of Captcha

2. Graphic Based CAPTCHAS :These are based on graphics ie. Images symbols and again is of two types:

BongoFollowing steps are followed in BONGO CAPTCHAS as:a)Display two series of blocksb)User must find the characteristic that sets the two series apartc)User is asked to determine which series each of four single blocks belongs to.

PIXThis is the second kind of graphics CAPTCHA using distorted images. Steps followed in its usage are asa) Create a large database of labeled imagesb) Pick a concrete objectc) Pick four images of the object from the images databased) Distort the imagese) Ask the user to pick the object for a list of words

3.Audio Based CAPTCHAS:These are based on humans ability to depict sounds that may be distorted, following algorithm is followed in using it: a) Pick a word or a sequence of numbers at randomb) Render them into an audio clip using a TTS softwarec) Distort the audio clip

d) Ask the user to identify and type the word or numbers

Page 13: Seminar Report of Captcha

MAJOR AREAS OF APPLICATIONS:

CAPTCHAs have several applications for practical security, including (but not limited to):

Preventing Comment Spam in Blogs. Most bloggers are familiar with programs that submit bogus comments, usually for the purpose of raising search engine ranks of some website (e.g., "buy penny stocks here"). This is called comment spam. By using a CAPTCHA, only humans can enter comments on a blog. There is no need to make users sign up before they enter a comment, and no legitimate comments are ever lost!

Protecting Website Registration. Several companies (Yahoo!, Microsoft, etc.) offer free email services. Up until a few years ago, most of these services suffered from a specific type of attack: "bots" that would sign up for thousands of email accounts every minute. The solution to this problem was to use CAPTCHAs to ensure that only humans obtain free accounts. In general, free services should be protected with a CAPTCHA in order to prevent abuse by automated scripts.

Protecting Email Addresses From Scrapers. Spammers crawl the Web in search of email addresses posted in clear text. CAPTCHAs provide an effective mechanism to hide your email address from Web scrapers. The idea is to require users to solve a CAPTCHA before showing your email address. A free and secure implementation that uses CAPTCHAs to obfuscate an email address can be found at reCAPTCHA MailHide.

Page 14: Seminar Report of Captcha

Online Polls. In November 1999, http://www.slashdot.org released an online poll asking which was the best graduate school in computer science (a dangerous question to ask over the web!). As is the case with most online polls, IP addresses of voters were recorded in order to prevent single users from voting more than once. However, students at Carnegie Mellon found a way to stuff the ballots using programs that voted for CMU thousands of times. CMU's score started growing rapidly. The next day, students at MIT wrote their own program and the poll became a contest between voting "bots." MIT finished with 21,156 votes, Carnegie Mellon with 21,032 and every other school with less than 1,000. Can the result of any online poll be trusted? Not unless the poll ensures that only humans can vote.

Preventing Dictionary Attacks. CAPTCHAs can also be used to prevent dictionary attacks in password systems. The idea is simple: prevent a computer from being able to iterate through the entire space of passwords by requiring it to solve a CAPTCHA after a certain number of unsuccessful logins. This is better than the classic approach of locking an account after a sequence of unsuccessful logins, since doing so allows an attacker to lock accounts at will.

Search Engine Bots. It is sometimes desirable to keep webpages unindexed to prevent others from finding them easily. There is an html tag to prevent search engine bots from reading web pages. The tag, however, doesn't guarantee that bots won't read a web page; it only serves to say "no bots, please." Search engine bots, since they

Page 15: Seminar Report of Captcha

usually belong to large companies, respect web pages that don't want to allow them in. However, in order to truly guarantee that bots won't enter a web site, CAPTCHAs are needed.

Worms and Spam. CAPTCHAs also offer a plausible solution against email worms and spam: "I will only accept an email if I know there is a human behind the other computer." A few companies are already marketing this idea

Page 16: Seminar Report of Captcha

ReCAPTCHA

ReCAPTCHA is a free CAPTCHA service that helps to digitize books, newspapers and old time radio shows

About 200 million CAPTCHAs are solved by humans around the world every day. In each case, roughly ten seconds of human time are being spent. Individually, that's not a lot of time, but in aggregate these little puzzles consume more than 150,000 hours of work each day. What if we could make positive use of this human effort? ReCAPTCHA does exactly that by channeling the effort spent solving CAPTCHAs online into "reading" books.

To archive human knowledge and to make information more accessible to the world, multiple projects are currently digitizing physical books that were written before the computer age. The book pages are being photographically scanned, and then transformed into text using "Optical Character Recognition" (OCR). The transformation into text is useful because scanning a book produces images, which are difficult to store on small devices, expensive to download, and cannot be searched. The problem is that OCR is not perfect.

ReCAPTCHA improves the process of digitizing books by sending words that cannot be read by computers to the Web in the form of CAPTCHAs for humans to decipher. More specifically, each word that cannot be read correctly by OCR is placed on an image and used as a CAPTCHA. This is possible because most OCR programs alert you when a word cannot be read correctly.

But if a computer can't read such a CAPTCHA, how does the system know the correct answer to the puzzle? Here's how: Each new word that cannot be read correctly by OCR is given to a user in conjunction with another word for which the answer

Page 17: Seminar Report of Captcha

is already known. The user is then asked to read both words. If they solve the one for which the answer is known, the system assumes their answer is correct for the new one. The system then gives the new image to a number of other people to determine, with higher confidence, whether the original answer was correct

Page 18: Seminar Report of Captcha

BREAKING OF CAPTCHAS

There are two methods used till now to break these CAPTCHAS one uses decoding software’s which removes noise and other uses humans

1. Some text based CAPTCHAs have been broken by software which has 3 properties as :

PreProcessing : Removal of background clutter and noise

Segmentation : Splitting the image into regions which each contain a single character.

Classification: Identifying the character in each region

2. Other CAPTCHAs can be broken by streaming the tests for unsuspecting users to solve.

Page 19: Seminar Report of Captcha

New Proposed Approaches

This new approach is Very similar to PIX CAPTCHAS as discussed earlier following are the steps followed in using it:

• Pick a concrete object

• Get 6 images at random from images.google.com that match the object

• Distort the images

• Build a list of 100 words: 90 from a full dictionary, 10 from the objects dictionary

• Prompt the user to pick the object from the list of words

• Make an HTTP call to images.google.com and search for the object

• Screen scrape the result of 2-3 pages to get the list of images

• Pick 6 images at random

• Randomly distort both the images and their URLs before displaying them

• Expire the CAPTCHA in 30-45 seconds

Page 20: Seminar Report of Captcha

Benefits of this approach

• The database already exists and is public

• The database is constantly being updated and maintained

• Adding “concrete objects” to the dictionary is virtually instantaneous

• Distortion prevents caching hacks

• Quick expiration limits streaming hacks

Drawbacks of this approach:

• Not accessible to people with disabilities (which is the case of most CAPTCHAs)

• Relies on Google’s infrastructure

• Unlike CAPTCHAs using random letters and numbers, the number of challenge words is limited.

Page 21: Seminar Report of Captcha

Conclusion

1.CAPTCHAS are any software that distinguishes human and machine.

2.Research in CAPTCHAS implies advancement in AI making computers understand how human thinks.

3.Internet companies are making billions of dollars every year, their security and services quality matters and so does the advancement in CAPTCHA technology.

4.Different methods of CAPTCHAS are being studied but new ideas like ReCAPTCHA using human time on internet is amazing.

Page 22: Seminar Report of Captcha

BIBLIOGRAPHY

[i] www.phpcaptcha.org

[ii] www.captcha.net

[iii] www.wikipedia.com

[iv]Research papers by Louis Ahn (Carmegie mellon university).