563.10.3 captcha

19
563.10.3 CAPTCHA Presented by: Sari Louis SPAM Group: Marc Gagnon, Sari Louis, Steve White University of Illinois Spring 2006

Upload: saishanker

Post on 18-Jun-2015

1.678 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: 563.10.3 captcha

563.10.3 CAPTCHA

Presented by: Sari Louis

SPAM Group: Marc Gagnon, Sari Louis, Steve White

University of Illinois

Spring 2006

Page 2: 563.10.3 captcha

2

Agenda

• Definition

• Background

• Applications

• Types of CAPTCHAs

• Breaking CAPTCHAs

• Proposed Approach

• Conclusion

Page 3: 563.10.3 captcha

3

Definition

• CAPTCHA stands for Completely Automated Public Turing test to tell Computers and Humans Apart

• A.K.A. Reverse Turing Test, Human Interaction Proof

• The challenge: develop a software program that can create and grade challenges most humans can pass but computers cannot

Page 4: 563.10.3 captcha

4

Background

• First used by Altavista in1997– Reduced SPAM add-url by over 95%

• CMU/Yahoo!– Automated the creating and grading of

challenges

• PARC– Relies on document image degradation to

prevent successful OCR– Conducted user-focused studies to assess

the effectiveness of CAPTCHAs

Page 5: 563.10.3 captcha

5

Background

• CAPTCHAs are based on open AI problems

• Breaking CAPTCHAs help advance AI by solving these open problems

• Improving CAPTCHAs help telling computers and human apart

• Win-win situation

Page 6: 563.10.3 captcha

6

Background - Papers

• Pessimal Print: A Reverse Turing TestAllison L. Coates, Henry S. Baird, Richard J. Fateman

• Telling Humans and Computer Apart AutomaticallyLuis von Ahn, Manuel Blum, and John Langford

• CAPTCHA: Using Hard AI Problems for SecurityLuis von Ahn, Manuel Blum, Nicholas J. Hopper, and John Langford

• Using Machine Learning to Break Visual Human Interaction Proofs (HIPs)Kumar Chellapilla, Patrice Y. Simard

Page 7: 563.10.3 captcha

7

Applications

• Free email services

• Online polls

• Dictionary attacks

• Newsgroups, Blogs, etc…

• SPAM

Page 8: 563.10.3 captcha

8

Types of CAPTCHAs

• Text based– Gimpy, ez-gimpy– Gimpy-r, Google CAPTCHA– Simard’s HIP (MSN)

• Graphic based– Bongo– Pix

• Audio based

Page 9: 563.10.3 captcha

9

Text Based CAPTCHAs

• Gimpy, ez-gimpy– Pick a word or words from a small dictionary– Distort them and add noise and background

• Gimpy-r, Google’s CAPTCHA– Pick random letters– Distort them, add noise and background

• Simard’s HIP– Pick random letters and numbers– Distort them and add arcs

Page 10: 563.10.3 captcha

10

Text Based CAPTCHAs

Page 11: 563.10.3 captcha

11

Graphic Based CAPTCHAs

• Bongo– Display two series of blocks– User must find the characteristic that sets the

two series apart– User is asked to determine which series each

of four single blocks belongs to

Difference? thick vs. thin lines

Page 12: 563.10.3 captcha

12

Graphic Based CAPTCHAs

• PIX– Create a large database of labeled images– Pick a concrete object– Pick four images of the object from the

images database– Distort the images– Ask the user to pick the object for a list of

words

Page 13: 563.10.3 captcha

13

Graphic Based CAPTCHAs

DogPool

Page 14: 563.10.3 captcha

14

Audio Based CAPTCHAs

• Pick a word or a sequence of numbers at random

• Render them into an audio clip using a TTS software

• Distort the audio clip

• Ask the user to identify and type the word or numbers

Page 15: 563.10.3 captcha

15

Breaking CAPTCHAs

• Most text based CAPTCHAs have been broken by software– OCR– Segmentation

• Other CAPTCHAs were broken by streaming the tests for unsuspecting users to solve.

Page 16: 563.10.3 captcha

16

Proposed Approach

• Very similar to PIX

• Pick a concrete object

• Get 6 images at random from images.google.com that match the object

• Distort the images

• Build a list of 100 words: 90 from a full dictionary, 10 from the objects dictionary

• Prompt the user to pick the object from the list of words

Page 17: 563.10.3 captcha

17

Proposed Approach - Technical

• Make an HTTP call to images.google.com and search for the object

• Screen scrape the result of 2-3 pages to get the list of images

• Pick 6 images at random

• Randomly distort both the images and their URLs before displaying them

• Expire the CAPTCHA in 30-45 seconds

Page 18: 563.10.3 captcha

18

Proposed Approach - Benefits

• The database already exists and is public

• The database is constantly being updated and maintained

• Adding “concrete objects” to the dictionary is virtually instantaneous

• Distortion prevents caching hacks

• Quick expiration limits streaming hacks

Page 19: 563.10.3 captcha

19

Proposed Approach - Drawbacks

• Not accessible to people with disabilities (which is the case of most CAPTCHAs)

• Relies on Google’s infrastructure

• Unlike CAPTCHAs using random letters and numbers, the number of challenge words is limited