semantically equivalent adversarial rules for debugging ...the biggest city on the river rhine is...

47
Semantically Equivalent Adversarial Rules for Debugging NLP Models Marco Tulio Ribeiro Carlos Guestrin Sameer Singh (UC Irvine) 1

Upload: others

Post on 14-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Semantically Equivalent Adversarial Rules for

Debugging NLP ModelsMarco Tulio Ribeiro

Carlos GuestrinSameer Singh (UC Irvine)

1

Page 2: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

NLP / ML models are getting smarter: VQAWhat type of road sign is shown?

> STOP.

2

Visual7A [Zhu et al 2016]

Page 3: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

NLP / ML models are getting smarter: MC (SQuAD)

The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest river in Central and Western Europe (after the Danube), at about 1,230 km (760 mi)

How long is the Rhine?

1230km

Question: are they prone to oversensitivity? 3

BiDAF [Seo et al 2017]

Page 4: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Oversensitivity in images

Adversaries are indistinguishable to humans…But unlikely in the real world (except for attacks)

“panda” 57.7% confidence

“gibbon” 99.3% confidence

4

Page 5: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Adversarial examples

Find closest example with different prediction 5

Page 6: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

What about text?What type of road sign is shown?

> STOP.

What type of road sign is shown?

Perceptible by humans, unlikely in real world

What type of road sign is sho wn?

6

Page 7: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

What about text?What type of road sign is shown?

> STOP.

What type of road sign is shown?

A single word changes too much! 7

Page 8: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Semantics matterWhat type of road sign is shown?

> Do not Enter.

> STOP.

What type of road sign is shown?

Bug, and likely in the real world 8

Page 9: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Semantics matter

The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest river in Central and Western Europe (after the Danube), at about 1,230 km (760 mi)

How long is the Rhine?

> More than 1,050,000

> 1230km

How long is the Rhine?

Not all changes are the same: semantics matter 9

Page 10: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Adversarial Rules

Find rule that generates many adversaries 10

Page 11: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Generalizing adversariesWhat type of road sign is shown?

> Do not Enter.

> STOP.

What type of road sign is shown?

- flips 3.9% of examplesRule What NOUN Which NOUN 11

Page 12: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Semantics matterWhat color is the sky?

> Gray.

> Blue.

What color is the sky?

- flips 3.9% of examplesRule What NOUN Which NOUN 12

Page 13: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Semantics matter

The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest river in Central and Western Europe (after the Danube), at about 1,230 km (760 mi)

How long is the Rhine?

> More than 1,050,000

> 1230km

How long is the Rhine?

- flips 3% of examplesRule ? ?? 13

Page 14: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Semantics matter

Detailed investigation of chum salmon, Oncorhynchus keta, showed that these fish digest ctenophores 20 times as fast as an equal weight of shrimps.

What is the oncorhynchus also called?

> Oncorhynchus keta

What is the oncorhynchus also called?

- flips 3% of examplesRule ? ??

> chum salmon

14

Page 15: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Adversarial Rules

Rules are global and actionable, more interesting than individual adversaries

15

Page 16: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Semantically Equivalent Adversary (SEA)

16

Page 17: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Ingredients

Semantic score function

A black box model

Semantically Equivalent

Different prediction

17

Page 18: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Revisiting adversaries

Find closest example with different prediction

18

Page 19: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Sentence X

en - pt

en - fr

Portuguese Translation

French Translation

fr - en

pt - en

Semantic Similarity: Paraphrasing

Good movie!

Bom filme!

Bon film!

Translators Back translators

Score

Good movie Good film

Great movie …

Movie good 0.35 0.34 0.1 …

0.001

Language model

comes for free 19

[Mallinson et al, 2017]

Page 20: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Finding an adversaryWhat color is the tray? Pink

What colour is the tray? GreenWhich color is the tray? GreenWhat color is it? GreenWhat color is tray? PinkHow color is the tray? Green

20

Page 21: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Semantically Equivalent Adversarial Rules (SEARs)

21

Page 22: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

From SEAs to Rules

Find SEAsPropose

Candidate Rules

Select Small

Rule Set

22

Page 23: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Proposing Candidate Rules

(What → Which)

(What NOUN → Which NOUN) (WP type → Which type) (WP NOUN → Which NOUN)

(What type → Which type)

What type of road sign is shown?

What type of road sign is shown?

Candidate Rules: Exact Match

Context

POS Tags 23

What Which type of road sign is shown? What Which is the person looking at? What Which was I thinking?

Must not change semantics

Page 24: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

From SEAs to Rules

Find SEAsPropose

Candidate Rules

Select Small

Rule Set

24

Page 25: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Semantically Equivalent Adversarial Rules (SEARS)

Induces many flipped predictionsFlips different predictions

High Adversary CountNon-Redundancy

What NOUN → Which NOUN

What type → Which type color → colour

Selected Rules

25

Page 26: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Examples: VQA

26

Visual7a-Telling [Zhu et al 2016]

Page 27: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Examples: Machine Comprehension

27

BiDAF [Seo et al 2017]

Page 28: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Examples: Movie Review Sentiment Analysis

28

FastText [Joulin et al 2016]

Page 29: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Experiments

29

Page 30: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

1. SEAs vs Humans

30

Page 31: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Set upHumans Top scored SEA SEA (top 5) + Human

Evaluate adversaries for semantic equivalence

31

Page 32: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

How often can SEAs be produced?

0

11.5

23

34.5

46

Human SEA Human + SEA

45

3633.6

Human SEA Human + SEA

25

33.8

42.5

51.3

60

Human SEA Human + SEA

33

2625.3

Human SEA Human + SEA

Visual Question Answering Sentiment Analysis

SEAs find equivalent adversaries as often as HumansSEAs + Humans better than Humans

32

Page 33: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Humans produce different adversaries:

33

Humans did not produce these:

But they did produce these:

They are so easy to love… What kind of meat is on the boy’s plate?

How many suitcases? Also great directing and photography

Page 34: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

2. SEARs vs Experts

34

Page 35: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Part 1: experts come up with rules

35Objective: maximize mistakes with good rules

Page 36: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Part 2: experts evaluate our SEARs

Experts only accept good rules

36

Page 37: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Results

0

4

8

12

16

Visual QA Sentiment

10.9

14.2

3.33

Experts SEARs

0

5

10

15

20

Visual QA Sentiment

5.4

10.1

12.9

16.9

Finding Rules Evaluating SEARs

% correct predictions flipped Time (minutes)

37

Page 38: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

3. Fixing bugs

38

Page 39: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Closing the loop

(color → colour) (WP VBZ → WP’s)

Filter out bad rules

Augment training

Retrain model

Data

39

Page 40: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Results

0

3.5

7

10.5

14

Visual QA Sentiment

3.4

1.4

12.612.6

Original Augmented

% of

flip

s du

e to

bug

s

Fix bugs, no loss in accuracy

40

Page 41: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Conclusion

41

Semantics matter

SEASEARS

Models are prone to these bugs

SEAs and SEARs help find and fix them

Page 42: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Semantically Equivalent Adversarial Rules for

Debugging NLP ModelsMarco Tulio Ribeiro

Carlos GuestrinSameer Singh (UC Irvine)

42

Page 43: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

43

Semantic scoring is still a research problem…

Also: inaccurate for long texts

Page 44: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Problem: not comparable across instancesGood movie Good film

Great movie …

0.35 0.34 0.1 …

good great

excellent …

0.7 0.2 0.05

1 0.97 0.29 …

1 0.29 0.07 …

good movie good

44

Page 45: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Examples: VQA

45

Page 46: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

Examples: Movie Review Sentiment Analysis

46

FastText [Joulin et al 2016]

Page 47: Semantically Equivalent Adversarial Rules for Debugging ...The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest

47