semantically equivalent adversarial rules for debugging ...the biggest city on the river rhine is...
TRANSCRIPT
Semantically Equivalent Adversarial Rules for
Debugging NLP ModelsMarco Tulio Ribeiro
Carlos GuestrinSameer Singh (UC Irvine)
1
NLP / ML models are getting smarter: VQAWhat type of road sign is shown?
> STOP.
2
Visual7A [Zhu et al 2016]
NLP / ML models are getting smarter: MC (SQuAD)
The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest river in Central and Western Europe (after the Danube), at about 1,230 km (760 mi)
How long is the Rhine?
1230km
Question: are they prone to oversensitivity? 3
BiDAF [Seo et al 2017]
Oversensitivity in images
Adversaries are indistinguishable to humans…But unlikely in the real world (except for attacks)
“panda” 57.7% confidence
“gibbon” 99.3% confidence
4
Adversarial examples
Find closest example with different prediction 5
What about text?What type of road sign is shown?
> STOP.
What type of road sign is shown?
Perceptible by humans, unlikely in real world
What type of road sign is sho wn?
6
What about text?What type of road sign is shown?
> STOP.
What type of road sign is shown?
A single word changes too much! 7
Semantics matterWhat type of road sign is shown?
> Do not Enter.
> STOP.
What type of road sign is shown?
Bug, and likely in the real world 8
Semantics matter
The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest river in Central and Western Europe (after the Danube), at about 1,230 km (760 mi)
How long is the Rhine?
> More than 1,050,000
> 1230km
How long is the Rhine?
Not all changes are the same: semantics matter 9
Adversarial Rules
Find rule that generates many adversaries 10
Generalizing adversariesWhat type of road sign is shown?
> Do not Enter.
> STOP.
What type of road sign is shown?
- flips 3.9% of examplesRule What NOUN Which NOUN 11
Semantics matterWhat color is the sky?
> Gray.
> Blue.
What color is the sky?
- flips 3.9% of examplesRule What NOUN Which NOUN 12
Semantics matter
The biggest city on the river Rhine is Cologne, Germany with a population of more than 1,050,000 people. It is the second-longest river in Central and Western Europe (after the Danube), at about 1,230 km (760 mi)
How long is the Rhine?
> More than 1,050,000
> 1230km
How long is the Rhine?
- flips 3% of examplesRule ? ?? 13
Semantics matter
Detailed investigation of chum salmon, Oncorhynchus keta, showed that these fish digest ctenophores 20 times as fast as an equal weight of shrimps.
What is the oncorhynchus also called?
> Oncorhynchus keta
What is the oncorhynchus also called?
- flips 3% of examplesRule ? ??
> chum salmon
14
Adversarial Rules
Rules are global and actionable, more interesting than individual adversaries
15
Semantically Equivalent Adversary (SEA)
16
Ingredients
Semantic score function
A black box model
Semantically Equivalent
Different prediction
17
Revisiting adversaries
Find closest example with different prediction
18
Sentence X
en - pt
en - fr
Portuguese Translation
French Translation
fr - en
pt - en
Semantic Similarity: Paraphrasing
Good movie!
Bom filme!
Bon film!
Translators Back translators
Score
Good movie Good film
Great movie …
Movie good 0.35 0.34 0.1 …
0.001
Language model
comes for free 19
[Mallinson et al, 2017]
Finding an adversaryWhat color is the tray? Pink
What colour is the tray? GreenWhich color is the tray? GreenWhat color is it? GreenWhat color is tray? PinkHow color is the tray? Green
20
Semantically Equivalent Adversarial Rules (SEARs)
21
From SEAs to Rules
Find SEAsPropose
Candidate Rules
Select Small
Rule Set
22
Proposing Candidate Rules
(What → Which)
(What NOUN → Which NOUN) (WP type → Which type) (WP NOUN → Which NOUN)
…
(What type → Which type)
What type of road sign is shown?
What type of road sign is shown?
Candidate Rules: Exact Match
Context
POS Tags 23
What Which type of road sign is shown? What Which is the person looking at? What Which was I thinking?
Must not change semantics
From SEAs to Rules
Find SEAsPropose
Candidate Rules
Select Small
Rule Set
24
Semantically Equivalent Adversarial Rules (SEARS)
Induces many flipped predictionsFlips different predictions
High Adversary CountNon-Redundancy
What NOUN → Which NOUN
What type → Which type color → colour
Selected Rules
25
Examples: VQA
26
Visual7a-Telling [Zhu et al 2016]
Examples: Machine Comprehension
27
BiDAF [Seo et al 2017]
Examples: Movie Review Sentiment Analysis
28
FastText [Joulin et al 2016]
Experiments
29
1. SEAs vs Humans
30
Set upHumans Top scored SEA SEA (top 5) + Human
Evaluate adversaries for semantic equivalence
31
How often can SEAs be produced?
0
11.5
23
34.5
46
Human SEA Human + SEA
45
3633.6
Human SEA Human + SEA
25
33.8
42.5
51.3
60
Human SEA Human + SEA
33
2625.3
Human SEA Human + SEA
Visual Question Answering Sentiment Analysis
SEAs find equivalent adversaries as often as HumansSEAs + Humans better than Humans
32
Humans produce different adversaries:
33
Humans did not produce these:
But they did produce these:
They are so easy to love… What kind of meat is on the boy’s plate?
How many suitcases? Also great directing and photography
2. SEARs vs Experts
34
Part 1: experts come up with rules
35Objective: maximize mistakes with good rules
Part 2: experts evaluate our SEARs
Experts only accept good rules
36
Results
0
4
8
12
16
Visual QA Sentiment
10.9
14.2
3.33
Experts SEARs
0
5
10
15
20
Visual QA Sentiment
5.4
10.1
12.9
16.9
Finding Rules Evaluating SEARs
% correct predictions flipped Time (minutes)
37
3. Fixing bugs
38
Closing the loop
(color → colour) (WP VBZ → WP’s)
…
Filter out bad rules
Augment training
Retrain model
Data
39
Results
0
3.5
7
10.5
14
Visual QA Sentiment
3.4
1.4
12.612.6
Original Augmented
% of
flip
s du
e to
bug
s
Fix bugs, no loss in accuracy
40
Conclusion
41
Semantics matter
SEASEARS
Models are prone to these bugs
SEAs and SEARs help find and fix them
Semantically Equivalent Adversarial Rules for
Debugging NLP ModelsMarco Tulio Ribeiro
Carlos GuestrinSameer Singh (UC Irvine)
42
43
Semantic scoring is still a research problem…
Also: inaccurate for long texts
Problem: not comparable across instancesGood movie Good film
Great movie …
0.35 0.34 0.1 …
good great
excellent …
0.7 0.2 0.05
…
1 0.97 0.29 …
1 0.29 0.07 …
good movie good
44
Examples: VQA
45
Examples: Movie Review Sentiment Analysis
46
FastText [Joulin et al 2016]
47