ling 581: advanced computational linguistics lecture notes february 23rd
Post on 19-Dec-2015
220 views
TRANSCRIPT
Homework Task 2
• Part 1– Run the examples you showed on your slides from
Homework Task 1 using the Bikel Collins parser.– Evaluate how close the parses are to the “gold standard”
• Part 2– WSJ corpus: sections 00 through 24– Evaluation: on section 23– Training: normally 02-21 (20 sections)– How does the Bikel Collins vary in accuracy if you randomly
pick 1, 2, 3,…20 sections to do the training with… plot graph with evalb…
Task 2: Picking sections for training
• In directories • Use– cat *.mrg > wsj-XX.mrg
• Merging sections– cat wsj-XX.mrg wsj-
XY.mrg … > wsj-K.mrg (for K=2 to 20)
Task 2: Section 23• Sentences for parsingNo , it was n't Black Monday . But while the New York Stock Exchange did n't fall apart Friday as the
Dow Jones Industrial Average plunged 190.58 points -- most of it in the final hour -- it barely managed to stay this side of chaos .
Some `` circuit breakers '' installed after the October 1987 crash failed their first test , traders say , unable to cool the selling panic in both stocks and futures .
The 49 stock specialist firms on the Big Board floor -- the buyers and sellers of last resort who were criticized after the 1987 crash -- once again could n't handle the selling pressure .
Big investment banks refused to step up to the plate to support the beleaguered floor traders by buying big blocks of stock , traders say .
Heavy selling of Standard & Poor 's 500-stock index futures in Chicago relentlessly beat stocks downward .
Seven Big Board stocks -- UAL , AMR , BankAmerica , Walt Disney , Capital Cities/ABC , Philip Morris and Pacific Telesis Group -- stopped trading and never resumed .
The finger-pointing has already begun . `` The equity market was illiquid . Once again -LCB- the specialists -RCB- were not able to handle the
imbalances on the floor of the New York Stock Exchange , '' said Christopher Pedersen , senior vice president at Twenty-First Securities Corp .
• Bikel Collins input sentences((No (RB)) (, (,)) (it (PRP)) (was (VBD)) (n't (RB)) (Black (NNP)) (Monday (NNP)) (. (.)) )((But (CC)) (while (IN)) (the (DT)) (New (NNP)) (York (NNP)) (Stock (NNP)) (Exchange
(NNP)) (did (VBD)) (n't (RB)) (fall (VB)) (apart (RB)) (Friday (NNP)) (as (IN)) (the (DT)) (Dow (NNP)) (Jones (NNP)) (Industrial (NNP)) (Average (NNP)) (plunged (VBD)) (190.58 (CD)) (points (NNS)) (-- (:)) (most (JJS)) (of (IN)) (it (PRP)) (in (IN)) (the (DT)) (final (JJ)) (hour (NN)) (-- (:)) (it (PRP)) (barely (RB)) (managed (VBD)) (to (TO)) (stay (VB)) (this (DT)) (side (NN)) (of (IN)) (chaos (NN)) (. (.)) )
((Some (DT)) (`` (``)) (circuit (NN)) (breakers (NNS)) ('' ('')) (installed (VBN)) (after (IN)) (the (DT)) (October (NNP)) (1987 (CD)) (crash (NN)) (failed (VBD)) (their (PRP$)) (first (JJ)) (test (NN)) (, (,)) (traders (NNS)) (say (VBP)) (, (,)) (unable (JJ)) (to (TO)) (cool (VB)) (the (DT)) (selling (NN)) (panic (NN)) (in (IN)) (both (DT)) (stocks (NNS)) (and (CC)) (futures (NNS)) (. (.)) )
((The (DT)) (49 (CD)) (stock (NN)) (specialist (NN)) (firms (NNS)) (on (IN)) (the (DT)) (Big (NNP)) (Board (NNP)) (floor (NN)) (-- (:)) (the (DT)) (buyers (NNS)) (and (CC)) (sellers (NNS)) (of (IN)) (last (JJ)) (resort (NN)) (who (WP)) (were (VBD)) (criticized (VBN)) (after (IN)) (the (DT)) (1987 (CD)) (crash (NN)) (-- (:)) (once (RB)) (again (RB)) (could (MD)) (n't (RB)) (handle (VB)) (the (DT)) (selling (NN)) (pressure (NN)) (. (.)) )
((Big (JJ)) (investment (NN)) (banks (NNS)) (refused (VBD)) (to (TO)) (step (VB)) (up (IN)) (to (TO)) (the (DT)) (plate (NN)) (to (TO)) (support (VB)) (the (DT)) (beleaguered (JJ)) (floor (NN)) (traders (NNS)) (by (IN)) (buying (VBG)) (big (JJ)) (blocks (NNS)) (of (IN)) (stock (NN)) (, (,)) (traders (NNS)) (say (VBP)) (. (.)) )
((Heavy (JJ)) (selling (NN)) (of (IN)) (Standard (NNP)) (& (CC)) (Poor (NNP)) ('s (POS)) (500-stock (JJ)) (index (NN)) (futures (NNS)) (in (IN)) (Chicago (NNP)) (relentlessly (RB)) (beat (VBD)) (stocks (NNS)) (downward (RB)) (. (.)) )
((Seven (CD)) (Big (NNP)) (Board (NNP)) (stocks (NNS)) (-- (:)) (UAL (NNP)) (, (,)) (AMR (NNP)) (, (,)) (BankAmerica (NNP)) (, (,)) (Walt (NNP)) (Disney (NNP)) (, (,)) (Capital (NNP)) (Cities/ABC (NNP)) (, (,)) (Philip (NNP)) (Morris (NNP)) (and (CC)) (Pacific (NNP)) (Telesis (NNP)) (Group (NNP)) (-- (:)) (stopped (VBD)) (trading (VBG)) (and (CC)) (never (RB)) (resumed (VBD)) (. (.)) )
((The (DT)) (finger-pointing (NN)) (has (VBZ)) (already (RB)) (begun (VBN)) (. (.)) )((`` (``)) (The (DT)) (equity (NN)) (market (NN)) (was (VBD)) (illiquid (JJ)) (. (.)) )
wsj-23.txt RAWwsj-23.lsp Bikel Collins (Lisp SEXPs)wsj-23.mrg Gold Standard parses
Q: What do statistical parsers do?
Bikel-Collins2416/2416 sentencesBracketing Recall: 88.45 Precision: 88.56 F-measure: 88.50
Berkeley
Stanford
2415/2416 sentencesBracketing Recall: 88.98 Precision: 84.88 F-measure: 86.88
2412/2416 sentences**Bracketing Recall: 84.92 Precision: 81.87 F-measure: 83.37
*using COLLINS.prm settings**after fix to allow EVALB to run to completion
out of the box performance…
Research
Research
Often assumed that statistical models – are less brittle than symbolic models
• parses for ungrammatical data• are they sensitive to noise or small perturbations?
Robustness and Sensitivity
Examples1. Herman mixed the
water with the milk2. Herman mixed the
milk with the water3. Herman drank the
water with the milk4. Herman drank the
milk with the water
-60
-50
-40
-30
-20
-10
0
1 2 3 4
logprob
f(water)=117, f(milk)=21
(mix) (drink)
Robustness and Sensitivity
Examples1. Herman mixed the
water with the milk2. Herman mixed the
milk with the water3. Herman drank the
water with the milk4. Herman drank the
milk with the water
different PP attachment choices
(low)
(high)
logprob = -50.4
logprob = -47.2
Robustness and Sensitivity
First thoughts...• does milk forces low attachment?
(high attachment for other nouns like water, toys, etc.)Is there something special about the lexical item milk?
• 24 sentences in the WSJ Penn Treebank with milk in it, 21 as a noun
Robustness and Sensitivity
First thoughts...Is there something special about the lexical item milk?
• 24 sentences in the WSJ Penn Treebank with milk in it, 21 as a noun• but just one sentence (#5212) with PP attachment for milk
Could just one sentence out of 39,832 trainingexamples affect theattachment options?
Robustness and Sensitivity
• Simple perturbation experiment– alter that one sentence and retrain
parser
sentences
parses
derived counts
wsj-02-21.obj.gz
✕
Robustness and Sensitivity
• Simple perturbation experiment– alter that one sentence and retrain
delete the PP with 4% butterfat altogether
Robustness and Sensitivity
• Simple perturbation experiment– alter that one sentence and retrain
or bump it up to the VP level
Treebanksentences
wsj-02-21.mrg
Derivedcounts
wsj-02-21.obj.gz
training
the Bikel/Collins parser can be retrained in less time than it takes to make a cup of tea
Robustness and Sensitivity
• Result:– high attachment for previous PP adjunct to milk
Could just one sentence out of 39,832 trainingexamples affect theattachment options? YES
Why such extreme sensitivity to perturbation?logprobs are conditioned on many things; hence, lots of probabilities to estimate• smoothing• need every piece of data, even low frequency ones
Robustness and Sensitivity
• (Bikel 2004):– “it may come as a surprise that the [parser] needs to
access more than 219 million probabilities during the course of parsing the 1,917 sentences of Section 00 [of the PTB].''
Robustness and Sensitivity
• (mod ((with IN) (milk NN) PP (+START+) ((+START+ +START+)) NP-A NPB () false right) 1.0)
– modHeadWord (with IN)– headWord (milk NN)– modifier PP– previousMods (+START+) – previousWords ((+START+ +START+))– parent NP-A– head NPB– subcat ()– verbIntervening false– side right
• (mod ((+STOP+ +STOP+) (milk NN) +STOP+ (PP) ((with IN)) NP-A NPB () false right) 1.0)
– modHeadWord (+STOP+ +STOP+) – headWord (milk NN)– modifier +STOP+– previousMods (PP)– previousWords ((with IN))– parent NP-A– head NPB– subcat ()– verbIntervening false– side right
Frequency 1 observed data for:(NP (NP (DT a)(NN milk))(PP (IN with)(NP (ADJP (CD 4)(NN %))(NN butterfat))))
Robustness and Sensitivity
• Full story more complicated than described here...
• by picking different combinations of verbs and nouns, you can get a range of behaviors
Verb Noun Attachment
milk + noun noun + milk
drank water high high
mixed water low high
mixed computer low low
f(drank)=0might as well have
picked flubbed
Penn Treebank and Passive Sentences
• Wall Street Journal (WSJ) section of the Penn Treebank (PTB)– one million words from articles
published in 1989
– All sentences• Total: nearly 50,000 (49,208)
divided into 25 sections (0–24)• Training sections: 39,832• Test section: 2,416
– Passive Sentences• Total: (approx.) 6773• Training sections: 5507 (14%)• Test section: 327 (14%)
• Standard training/test set split
trainingsections 2–21
evaluation
0
2423
Experiment• Experiment
– Use Wow! as the 4th word of a sentence (as the only clue sentence is a passive sentence)
– Remove standard English passive signaling • i.e. no passive be + -en morphology
• Example– By 1997 , almost all remaining uses of cancer-causing asbestos will be outlawed .
Experiment• Experiment
– Use Wow! as the 4th word of a sentence (as the only clue sentence is a passive sentence)
– Remove standard English passive signaling • i.e. no passive be + -en morphology
• Example (sentence 00–64)– By 1997 , almost all remaining uses of cancer-causing asbestos will outlaw .
Wow!
⁁Wow!
Strategy: attach Wow! at the same syntactic level as the preceding word or lexeme
POS tagWOW
Results and Discussion• Original: The book was delivered .• Modified: 1 The 2 book 3 delivered 4 . • Test: insert Wow! at positions 1–4
– compare the probability of the (best) parse in each case
❹
❸
❷
❶
Results and Discussion• Original: The book was delivered .• Modified: 1 The 2 book 3 delivered 4 .• Test: insert Wow! at positions 1–4
– compare the probability of the (best) parse in each case
1 2 3 4
-40
-38
-36
-34
-32
-30
-28
-26
logprob score
betterWow! at position
Parser using 4th word Wow! training data
Results and Discussion• Original: The book was delivered .• Modified: 1 The 2 book 3 delivered 4 .• Test: insert Wow! at positions 1–4
– compare the probability of the (best) parse in each case
logprob score
betterWow! at position
1 2 3 4
-76
-74
-72
-70
-68
-66
-64
Parser using original training
data
Wow! is an unknown word
Results and Discussion• Original: A buffet breakfast was held in the art museum . • Modified: 1 A2 buffet 3 breakfast 4 held 5 in 6 the 7 art 8 museum 9 .• Test: insert Wow! at positions 1–9
logprob score
betterWow! at position
1 2 3 4 5 6 7 8 9
-68
-66
-64
-62
-60
-58
-56
-54
Parser using 4th word Wow! training data
Results and Discussion• Original: A buffet breakfast was held in the art museum . • Modified: 1 A2 buffet 3 breakfast 4 held 5 in 6 the 7 art 8 museum 9 .• Test: insert Wow! at positions 1–9
logprob score
betterWow! at position
1 2 3 4 5 6 7 8 9
-108
-106
-104
-102
-100
-98
-96
Parser using original training
data
Results and Discussion• On section 23 (test section)
– 327 passive sentences– 2416 totalOTD = original training data (i.e. no Wow!)Test data: -passwow1 = passives signaled using Wow!, -pass = passives signaled normally
wsj-23-passwow1
wsj-23-pass
wsj-23-passwow1 (OTD)
wsj-23-pass (OTD)
82 83 84 85 86 87 88 89 90
Test set: passive sentences only
F-Measure
Results and Discussion
Test: Wow! NP object, Trained on OTD
Test: 4th word Wow!, Trained on OTD
Test: Wow! NP object + regular passive, Trained on OTD
Test: Wow! 4th word, Trained on same
Test: OD, Trained on: Wow! 4th word
Test: OD, Trained on Wow! NP object
(Baseline) Test: OD, Trained on OTD
Test: OD, Trained on Wow! NP object + regular passive
Test: Wow! NP object, Trained on same
Test: Wow! NP object + regular passive, Trained on same
82.00 83.00 84.00 85.00 86.00 87.00 88.00 89.00 90.00
F-measure
• Comparison: – Wow! as object plus passive morphology– Wow! Inserted as NP object trace– Baseline (passive morphology)– Wow! 4th word