collective intelligence captchas -...

41
Collective Intelligence CAPTCHAs Eran Hershko

Upload: hatruc

Post on 26-Jun-2018

250 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

Collective Intelligence

CAPTCHAsEran Hershko

Page 2: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

1) Introduction to CAPTCHA.

2) reCAPTCHA (and Collective Intelligence).

3) How To Break Two CAPTCHAs:

EZ- GIMPY & GIMPY.

4) Summery & Future Work.

Outline

Page 3: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)
Page 4: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

CAPTCHA

* e- mail websites- in order to stop spam.

* Blogs & forums- in order to stop automatic posting.

* Websites that sell tickets- in order to prevent scalpers from buying

a lot of tickets.

* …

Who Uses CAPTCHAs & Why?

A CAPTCHA is a test that can be automatically generated,

which most humans can pass,

but most computers can’t.

Page 5: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

Completely

Automated

Public

Turing test

to tell

Computers

and

Humans

Apart

CAPTCHA

EZ- GIMPY Code

Page 6: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

The CAPTCHA Paradox:

A CAPTCHA is a program that can generate and

grade tests that it itself can’t pass!

CAPTCHAThe CAPTCHA achieves two opposite goals:

1) If the CAPTCHA is not broken- there is a way to

differentiate humans from computers.

2) If the CAPTCHA is broken- a useful computer

vision problem is solved.

Part I

Part II

Page 7: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

EZ- GIMPYGIMPY

reCHAPTCHAESP-PIX

SQUIGL-PIX

The Evolution Of CAPTCHA

Page 8: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)
Page 9: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

reCAPTCHA

reCPATCHA uses Collective Intelligence

in order to contribute to humanity!

Collective Intelligence is a shared or group

intelligence that emerges from the

collaboration of many individuals.

Page 10: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

Who uses reCAPTCHA?

* reCAPTCHA is used by more than 40,000 websites!

* Google purchased reCAPTCHA in 2009.

Page 11: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

How Does It Work?

Come, come with me, and we

will make short work;

For, by your leaves, you shall

not stay alone

Till holy Church incorporate two

in one.

Come, come with me, and we will

make short work;

For, by your leaves, you shall not

stay alone

Till holy Ohurch incorporate two in

one.

Come, come with me, and we will

make short work;

For, by your leaves, you shall not

stay alone

Till holy Chulch incorporate two in

one.

Optical Character

Recognition (OCR) I

Optical Character

Recognition (OCR) IIDictionary

Ohurch

Chulch

Romeo & Juliet

“suspicious”

wordcontrol word

Come, come with me, and we will

make short work;

For, by your leaves, you shall not

stay alone

Till holy Church incorporate two in

one.

Come, come with me, and we will

make short work;

For, by your leaves, you shall not

stay alone

Till holy Chulch incorporate two in

one.

Church

Chulch

Page 12: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

How Does It Work?

fiery Church

fiery Bhurch

Page 13: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

How Does It Work?

= 1 point = 1/2 pointOCR

A Suspicious word is Correct

if Suspicious word > 2.5 points

fiery Church Church chief Church overlooks

Inquiry Church

Page 14: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

How Does It Work?

fiery Church Church chief Church overlooks

Come, come with me, and we will

make short work;

For, by your leaves, you shall not

stay alone

Till holy Ohurch incorporate two in

one.

Come, come with me, and we will

make short work;

For, by your leaves, you shall not

stay alone

Till holy Chulch incorporate two in

one.

OCR OCR

The Suspicious word becomes a Control word!

Page 15: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

How Does It Work?

The word is Unreadable!

Reject

Reject Reject

Reject

Reject

Reject

Page 16: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

Some Statistics

67.87%

How many humans are required for a word to be considered

correct?

17.86%

7.10%

3.11% 4.06%

* Including words which are considered unreadable.

…*

Page 17: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

Why is reCAPTCHA more secure

than CAPTCHA?

3) reCAPTCHA has two natural distortions and one artificial:

a. The fading of the text in time (natural).

b. The noise introduced by the scanning process (natural).

c. The added distortion (artificial).

Algorithms that succeeded in more than 90% in recognizing

CAPTCHA were completely unable to recognize reCAPTCHA!

1) reCAPTCHA uses only words that OCR already failed to decipher.

2) CAPTCHAs generate their own artificial distorted characters.

A smart learning algorithm can recognize them.

Page 18: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

Several Results

50

= 99.1%

OCR

= 83.5%

Page 19: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

Several Results

* After one year of running:

More than 1.2 billion reCAPTCHAs were solved!

More than 440 million suspicious words were correctly

deciphered!

reCAPTCHA has successfully achieved its goal

in efficiently harnessing Collective Intelligence!

Page 20: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

Breaking

EZ- GIMPY

CAPTCHA

Page 21: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

EZ- GIMPY- How is it done?

1) Choosing a word out of 561 words dictionary.

2) Distorting and blurring its characters.

3) Adding a cluttered and confusing background.

Page 22: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

The Algorithm

This algorithm treats every letter as an individual:

requires low

computational power

requires high

computational power

The algorithm’s steps:

Step A & B- Finding individual letters in

the image and extracting candidate words.

Step C- Choosing the most likely word.

Page 23: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

Step A

Producing a training set:

1) Extracting a letter from a EZ- GIMPY image.

1) 2) 3) 4)

4) Extracting the 2600 (26*100) Shape Contexts.

2) Running a Canny edge detection.

3) Sampling 100 points from the letter’s interior and exterior edges.

Page 24: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

Step A

Finding letters in the image:

1) Choosing randomly several sample points from the image.

1) 3)

2) Generating a shape context for each point.

3) Finding the letters from the training set with closest shape contexts.

Page 25: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

Step B

Finding Sequences of letters that form candidate words:

For every letter, trying to construct a possible word.

There are several constrains: letters must be from left to right, not

be too far from each other nor too close and the candidate words

must be from the dictionary.

profit roll

Page 26: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

Step C

Choosing the most likely word:

1) For each letter, building generalized shape contexts

(which assumes many possible deformations in the letters).

3) The answer to EZ- GIMPY is the word with the highest score.

2) Giving a score to each letter according to the distance.

Page 27: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

Results

* This algorithm has a success rate of 83% of the time.

collar canvas jewel

smile spade soap

line here till

Page 28: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

Breaking

GIMPY

CAPTCHA

Page 29: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

GIMPY- How is it done?

1) Choosing words out of 411 words dictionary.

2) Distorting and blurring the characters.

3) Locating the words randomly in the image in 5 pairs (one on the other).

4) Adding a cluttered and confusing background.

* The user must recognize 3 words correctly.

Page 30: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

The Algorithm

This algorithm treats every word as a whole and not individual letters:

requires low

computational power

requires high

computational power

The algorithm’s steps:

Step A & B- Finding candidate words in

the image.

Step C- Choosing the most likely words.

Page 31: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

Step A

Finding candidate words in the image:

1) Finding the suspicious places which contain pairs of words.

1) 2)

2) For every pair, conducting edge detection and finding the first two

letters and the last two letters, by using shape contexts.

3) Producing a list of the possible candidate words from the dictionary.

The result is a list of approximately 4 candidate words.

Page 32: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

Step B

Removing layers of words:

1) Removing the edges of the candidate word from the image and

repeating step A (trying to find candidate words).

2) Each pair of words in the image has approximately 16 pairs

of candidate words.

r o u n d

Page 33: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

Step C

Giving final score:

1) For each pair, producing a synthetic image of the two words overlaid

with their estimated locations.

2) Computing the shape contexts of the synthetic image.

3) Every suspicious word in a pair of the original image gets a score

according to the distance of its shape contexts from the shape contexts

of the synthetic word.

4) The three words with the highest scores are chosen as the answer to

the GIMPY CAPTCHA.

r o u n d c o w

r o w r o u n d

Page 34: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

Results

* This algorithm has a success rate of 33% in guessing the correct three

words of GIMPY.

true, with, sponge

narrow, bulb, right

carriage, potato, clock door, farm, important

church, tongue, bad sudden, oven, apple

* Applying this algorithm on EZ- GIMPY results in a success rate of

92% (The previous algorithm gave only 83%)..

Page 35: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

Summery

2) The reason of reCAPTCHA’s success:

Solving a reCAPTCHA is an action that people have to do anyway.

They feel better when it’s for an important cause.

1) EZ GIMPY is successfully broken (92% success).

There is still work to be done on GIMPY-

as a Computer Vision challenge.

3) The new CAPTCHAs will set new challenges in the Computer

Vision field.

Page 36: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

Future Work

Breaking reCAPTCHA and the new image based CAPTCHAs with

a reasonable rate of success .

Finding new forms of image based problems that humans can easily

solve but computers and computer vision algorithms can’t.

The “Evil” Side:

The “Good” Side:

The constant battle between “Good” and “Evil”

Page 37: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

Questions?

Page 38: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

1) ‘reCAPTCHA: Human-Based Character Recognition via

Web Security Measures’- Luis von Ahn et al.

2) ‘Recognizing Objects in Adversarial Clutter: Breaking a

Visual CAPTCHA’- G Mori et al.

3) ‘Shape Matching and Object Recognition Using Shape

Context’- Serge Belongie et al.

4) ‘Telling Humans And Computers Apart Automatically’-

Luis von Ahn et al.

5) ‘Breaking reCAPTCHA: A Holistic Approach via Shape

Recognition’-Paul Baecher et al.

6) http://www.google.com/recaptcha

References

Page 39: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)
Page 40: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

Related Work From 2011

Page 41: Collective Intelligence CAPTCHAs - Aboutlihi.eew.technion.ac.il/files/Teaching/2012_winter_048921/PPT/Eran.pdf1) Introduction to CAPTCHA. 2) reCAPTCHA (and Collective Intelligence)

New CAPTCHA

Which Uses Empathy