semantically equivalent adversarial rules for debugging ...the biggest city on the river rhine is...

Semantically Equivalent Adversarial Rules for

Debugging NLP ModelsMarco Tulio Ribeiro

Carlos GuestrinSameer Singh (UC Irvine)

1

NLP / ML models are getting smarter: VQAWhat type of road sign is shown?

> STOP.

2

Visual7A [Zhu et al 2016]

NLP / ML models are getting smarter: MC (SQuAD)

The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest river in Central and Western Europe (after the Danube), at about 1,230 km (760 mi)

How long is the Rhine?

1230km

Question: are they prone to oversensitivity? 3

BiDAF [Seo et al 2017]

Oversensitivity in images

Adversaries are indistinguishable to humans…But unlikely in the real world (except for attacks)

“panda” 57.7% confidence

“gibbon” 99.3% confidence

4

Adversarial examples

Find closest example with different prediction 5

What about text?What type of road sign is shown?

> STOP.

What type of road sign is shown?

Perceptible by humans, unlikely in real world

What type of road sign is sho wn?

6

What about text?What type of road sign is shown?

> STOP.


A single word changes too much! 7

Semantics matterWhat type of road sign is shown?

> Do not Enter.

> STOP.


Bug, and likely in the real world 8

Semantics matter



> More than 1,050,000

> 1230km


Not all changes are the same: semantics matter 9

Adversarial Rules

Find rule that generates many adversaries 10

Generalizing adversariesWhat type of road sign is shown?

> Do not Enter.

> STOP.


- flips 3.9% of examplesRule What NOUN Which NOUN 11

Semantics matterWhat color is the sky?

> Gray.

> Blue.

What color is the sky?

- flips 3.9% of examplesRule What NOUN Which NOUN 12

Semantics matter



> More than 1,050,000

> 1230km


- flips 3% of examplesRule ? ?? 13

Semantics matter

Detailed investigation of chum salmon, Oncorhynchus keta, showed that these fish digest ctenophores 20 times as fast as an equal weight of shrimps.

What is the oncorhynchus also called?

> Oncorhynchus keta

What is the oncorhynchus also called?

- flips 3% of examplesRule ? ??

> chum salmon

14

Adversarial Rules

Rules are global and actionable, more interesting than individual adversaries

15

Semantically Equivalent Adversary (SEA)

16

Ingredients

Semantic score function

A black box model

Semantically Equivalent

Different prediction

17

Revisiting adversaries

Find closest example with different prediction

18

Sentence X

en - pt

en - fr

Portuguese Translation

French Translation

fr - en

pt - en

Semantic Similarity: Paraphrasing

Good movie!

Bom filme!

Bon film!

Translators Back translators

Score

Good movie Good film

Great movie …

Movie good 0.35 0.34 0.1 …

0.001

Language model

comes for free 19

[Mallinson et al, 2017]

Finding an adversaryWhat color is the tray? Pink

What colour is the tray? GreenWhich color is the tray? GreenWhat color is it? GreenWhat color is tray? PinkHow color is the tray? Green

20

Semantically Equivalent Adversarial Rules (SEARs)

21

From SEAs to Rules

Find SEAsPropose

Candidate Rules

Select Small

Rule Set

22

Proposing Candidate Rules

(What → Which)

(What NOUN → Which NOUN) (WP type → Which type) (WP NOUN → Which NOUN)

…

(What type → Which type)



Candidate Rules: Exact Match

Context

POS Tags 23

What Which type of road sign is shown? What Which is the person looking at? What Which was I thinking?

Must not change semantics

From SEAs to Rules

Find SEAsPropose

Candidate Rules

Select Small

Rule Set

24

Semantically Equivalent Adversarial Rules (SEARS)

Induces many flipped predictionsFlips different predictions

High Adversary CountNon-Redundancy

What NOUN → Which NOUN

What type → Which type color → colour

Selected Rules

25

Examples: VQA

26

Visual7a-Telling [Zhu et al 2016]

Examples: Machine Comprehension

27

BiDAF [Seo et al 2017]

Examples: Movie Review Sentiment Analysis

28

FastText [Joulin et al 2016]

Experiments

29

1. SEAs vs Humans

30

Set upHumans Top scored SEA SEA (top 5) + Human

Evaluate adversaries for semantic equivalence

31

How often can SEAs be produced?

0

11.5

23

34.5

46

Human SEA Human + SEA

45

3633.6


25

33.8

42.5

51.3

60


33

2625.3


Visual Question Answering Sentiment Analysis

SEAs find equivalent adversaries as often as HumansSEAs + Humans better than Humans

32

Humans produce different adversaries:

33

Humans did not produce these:

But they did produce these:

They are so easy to love… What kind of meat is on the boy’s plate?

How many suitcases? Also great directing and photography

2. SEARs vs Experts

34

Part 1: experts come up with rules

35Objective: maximize mistakes with good rules

Part 2: experts evaluate our SEARs

Experts only accept good rules

36

Results

0

4

8

12

16

Visual QA Sentiment

10.9

14.2

3.33

Experts SEARs

0

5

10

15

20

Visual QA Sentiment

5.4

10.1

12.9

16.9

Finding Rules Evaluating SEARs

% correct predictions flipped Time (minutes)

37

3. Fixing bugs

38

Closing the loop

(color → colour) (WP VBZ → WP’s)

…

Filter out bad rules

Augment training

Retrain model

Data

39

Results

0

3.5

7

10.5

14

Visual QA Sentiment

3.4

1.4

12.612.6

Original Augmented

% of

flip

s du

e to

bug

s

Fix bugs, no loss in accuracy

40

Conclusion

41

Semantics matter

SEASEARS

Models are prone to these bugs

SEAs and SEARs help find and fix them

Semantically Equivalent Adversarial Rules for

Debugging NLP ModelsMarco Tulio Ribeiro

Carlos GuestrinSameer Singh (UC Irvine)

42

43

Semantic scoring is still a research problem…

Also: inaccurate for long texts

Problem: not comparable across instancesGood movie Good film

Great movie …

0.35 0.34 0.1 …

good great

excellent …

0.7 0.2 0.05

…

1 0.97 0.29 …

1 0.29 0.07 …

good movie good

44

Examples: VQA

45

Examples: Movie Review Sentiment Analysis

46

FastText [Joulin et al 2016]

semantically equivalent adversarial rules for debugging ...the biggest city on the river rhine is...

Documents