ling 581: advanced computational linguistics lecture notes february 23rd

32
LING 581: Advanced Computational Linguistics Lecture Notes February 23rd

Post on 19-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

LING 581: Advanced Computational Linguistics

Lecture NotesFebruary 23rd

Homework Task 2

• Part 1– Run the examples you showed on your slides from

Homework Task 1 using the Bikel Collins parser.– Evaluate how close the parses are to the “gold standard”

• Part 2– WSJ corpus: sections 00 through 24– Evaluation: on section 23– Training: normally 02-21 (20 sections)– How does the Bikel Collins vary in accuracy if you randomly

pick 1, 2, 3,…20 sections to do the training with… plot graph with evalb…

Task 2: Picking sections for training

• In directories • Use– cat *.mrg > wsj-XX.mrg

• Merging sections– cat wsj-XX.mrg wsj-

XY.mrg … > wsj-K.mrg (for K=2 to 20)

Task 2: Picking sections for training

Task 2: Section 23• Sentences for parsingNo , it was n't Black Monday . But while the New York Stock Exchange did n't fall apart Friday as the

Dow Jones Industrial Average plunged 190.58 points -- most of it in the final hour -- it barely managed to stay this side of chaos .

Some `` circuit breakers '' installed after the October 1987 crash failed their first test , traders say , unable to cool the selling panic in both stocks and futures .

The 49 stock specialist firms on the Big Board floor -- the buyers and sellers of last resort who were criticized after the 1987 crash -- once again could n't handle the selling pressure .

Big investment banks refused to step up to the plate to support the beleaguered floor traders by buying big blocks of stock , traders say .

Heavy selling of Standard & Poor 's 500-stock index futures in Chicago relentlessly beat stocks downward .

Seven Big Board stocks -- UAL , AMR , BankAmerica , Walt Disney , Capital Cities/ABC , Philip Morris and Pacific Telesis Group -- stopped trading and never resumed .

The finger-pointing has already begun . `` The equity market was illiquid . Once again -LCB- the specialists -RCB- were not able to handle the

imbalances on the floor of the New York Stock Exchange , '' said Christopher Pedersen , senior vice president at Twenty-First Securities Corp .

• Bikel Collins input sentences((No (RB)) (, (,)) (it (PRP)) (was (VBD)) (n't (RB)) (Black (NNP)) (Monday (NNP)) (. (.)) )((But (CC)) (while (IN)) (the (DT)) (New (NNP)) (York (NNP)) (Stock (NNP)) (Exchange

(NNP)) (did (VBD)) (n't (RB)) (fall (VB)) (apart (RB)) (Friday (NNP)) (as (IN)) (the (DT)) (Dow (NNP)) (Jones (NNP)) (Industrial (NNP)) (Average (NNP)) (plunged (VBD)) (190.58 (CD)) (points (NNS)) (-- (:)) (most (JJS)) (of (IN)) (it (PRP)) (in (IN)) (the (DT)) (final (JJ)) (hour (NN)) (-- (:)) (it (PRP)) (barely (RB)) (managed (VBD)) (to (TO)) (stay (VB)) (this (DT)) (side (NN)) (of (IN)) (chaos (NN)) (. (.)) )

((Some (DT)) (`` (``)) (circuit (NN)) (breakers (NNS)) ('' ('')) (installed (VBN)) (after (IN)) (the (DT)) (October (NNP)) (1987 (CD)) (crash (NN)) (failed (VBD)) (their (PRP$)) (first (JJ)) (test (NN)) (, (,)) (traders (NNS)) (say (VBP)) (, (,)) (unable (JJ)) (to (TO)) (cool (VB)) (the (DT)) (selling (NN)) (panic (NN)) (in (IN)) (both (DT)) (stocks (NNS)) (and (CC)) (futures (NNS)) (. (.)) )

((The (DT)) (49 (CD)) (stock (NN)) (specialist (NN)) (firms (NNS)) (on (IN)) (the (DT)) (Big (NNP)) (Board (NNP)) (floor (NN)) (-- (:)) (the (DT)) (buyers (NNS)) (and (CC)) (sellers (NNS)) (of (IN)) (last (JJ)) (resort (NN)) (who (WP)) (were (VBD)) (criticized (VBN)) (after (IN)) (the (DT)) (1987 (CD)) (crash (NN)) (-- (:)) (once (RB)) (again (RB)) (could (MD)) (n't (RB)) (handle (VB)) (the (DT)) (selling (NN)) (pressure (NN)) (. (.)) )

((Big (JJ)) (investment (NN)) (banks (NNS)) (refused (VBD)) (to (TO)) (step (VB)) (up (IN)) (to (TO)) (the (DT)) (plate (NN)) (to (TO)) (support (VB)) (the (DT)) (beleaguered (JJ)) (floor (NN)) (traders (NNS)) (by (IN)) (buying (VBG)) (big (JJ)) (blocks (NNS)) (of (IN)) (stock (NN)) (, (,)) (traders (NNS)) (say (VBP)) (. (.)) )

((Heavy (JJ)) (selling (NN)) (of (IN)) (Standard (NNP)) (& (CC)) (Poor (NNP)) ('s (POS)) (500-stock (JJ)) (index (NN)) (futures (NNS)) (in (IN)) (Chicago (NNP)) (relentlessly (RB)) (beat (VBD)) (stocks (NNS)) (downward (RB)) (. (.)) )

((Seven (CD)) (Big (NNP)) (Board (NNP)) (stocks (NNS)) (-- (:)) (UAL (NNP)) (, (,)) (AMR (NNP)) (, (,)) (BankAmerica (NNP)) (, (,)) (Walt (NNP)) (Disney (NNP)) (, (,)) (Capital (NNP)) (Cities/ABC (NNP)) (, (,)) (Philip (NNP)) (Morris (NNP)) (and (CC)) (Pacific (NNP)) (Telesis (NNP)) (Group (NNP)) (-- (:)) (stopped (VBD)) (trading (VBG)) (and (CC)) (never (RB)) (resumed (VBD)) (. (.)) )

((The (DT)) (finger-pointing (NN)) (has (VBZ)) (already (RB)) (begun (VBN)) (. (.)) )((`` (``)) (The (DT)) (equity (NN)) (market (NN)) (was (VBD)) (illiquid (JJ)) (. (.)) )

wsj-23.txt RAWwsj-23.lsp Bikel Collins (Lisp SEXPs)wsj-23.mrg Gold Standard parses

Q: What do statistical parsers do?

Bikel-Collins2416/2416 sentencesBracketing Recall: 88.45 Precision: 88.56 F-measure: 88.50

Berkeley

Stanford

2415/2416 sentencesBracketing Recall: 88.98 Precision: 84.88 F-measure: 86.88

2412/2416 sentences**Bracketing Recall: 84.92 Precision: 81.87 F-measure: 83.37

*using COLLINS.prm settings**after fix to allow EVALB to run to completion

out of the box performance…

Research

Research

Often assumed that statistical models – are less brittle than symbolic models

• parses for ungrammatical data• are they sensitive to noise or small perturbations?

Robustness and Sensitivity

Examples1. Herman mixed the

water with the milk2. Herman mixed the

milk with the water3. Herman drank the

water with the milk4. Herman drank the

milk with the water

-60

-50

-40

-30

-20

-10

0

1 2 3 4

logprob

f(water)=117, f(milk)=21

(mix) (drink)

Robustness and Sensitivity

Examples1. Herman mixed the

water with the milk2. Herman mixed the

milk with the water3. Herman drank the

water with the milk4. Herman drank the

milk with the water

different PP attachment choices

(low)

(high)

logprob = -50.4

logprob = -47.2

Robustness and Sensitivity

First thoughts...• does milk forces low attachment?

(high attachment for other nouns like water, toys, etc.)Is there something special about the lexical item milk?

• 24 sentences in the WSJ Penn Treebank with milk in it, 21 as a noun

Robustness and Sensitivity

First thoughts...Is there something special about the lexical item milk?

• 24 sentences in the WSJ Penn Treebank with milk in it, 21 as a noun• but just one sentence (#5212) with PP attachment for milk

Could just one sentence out of 39,832 trainingexamples affect theattachment options?

Robustness and Sensitivity

• Simple perturbation experiment– alter that one sentence and retrain

parser

sentences

parses

derived counts

wsj-02-21.obj.gz

Robustness and Sensitivity

• Simple perturbation experiment– alter that one sentence and retrain

delete the PP with 4% butterfat altogether

Robustness and Sensitivity

• Simple perturbation experiment– alter that one sentence and retrain

or bump it up to the VP level

Treebanksentences

wsj-02-21.mrg

Derivedcounts

wsj-02-21.obj.gz

training

the Bikel/Collins parser can be retrained in less time than it takes to make a cup of tea

Robustness and Sensitivity

• Result:– high attachment for previous PP adjunct to milk

Could just one sentence out of 39,832 trainingexamples affect theattachment options? YES

Why such extreme sensitivity to perturbation?logprobs are conditioned on many things; hence, lots of probabilities to estimate• smoothing• need every piece of data, even low frequency ones

Robustness and Sensitivity

• (Bikel 2004):– “it may come as a surprise that the [parser] needs to

access more than 219 million probabilities during the course of parsing the 1,917 sentences of Section 00 [of the PTB].''

Robustness and Sensitivity

• Trainer has a memory like a phone book:

Robustness and Sensitivity

• (mod ((with IN) (milk NN) PP (+START+) ((+START+ +START+)) NP-A NPB () false right) 1.0)

– modHeadWord (with IN)– headWord (milk NN)– modifier PP– previousMods (+START+) – previousWords ((+START+ +START+))– parent NP-A– head NPB– subcat ()– verbIntervening false– side right

• (mod ((+STOP+ +STOP+) (milk NN) +STOP+ (PP) ((with IN)) NP-A NPB () false right) 1.0)

– modHeadWord (+STOP+ +STOP+) – headWord (milk NN)– modifier +STOP+– previousMods (PP)– previousWords ((with IN))– parent NP-A– head NPB– subcat ()– verbIntervening false– side right

Frequency 1 observed data for:(NP (NP (DT a)(NN milk))(PP (IN with)(NP (ADJP (CD 4)(NN %))(NN butterfat))))

Robustness and Sensitivity

76.8% singular events 94.2% 5 or fewer occurrences

Robustness and Sensitivity

• Full story more complicated than described here...

• by picking different combinations of verbs and nouns, you can get a range of behaviors

Verb Noun Attachment

milk + noun noun + milk

drank water high high

mixed water low high

mixed computer low low

f(drank)=0might as well have

picked flubbed

An experiment with PTB Passives

• Documentation …– Passives in the PTB.pdf

Penn Treebank and Passive Sentences

• Wall Street Journal (WSJ) section of the Penn Treebank (PTB)– one million words from articles

published in 1989

– All sentences• Total: nearly 50,000 (49,208)

divided into 25 sections (0–24)• Training sections: 39,832• Test section: 2,416

– Passive Sentences• Total: (approx.) 6773• Training sections: 5507 (14%)• Test section: 327 (14%)

• Standard training/test set split

trainingsections 2–21

evaluation

0

2423

Experiment• Experiment

– Use Wow! as the 4th word of a sentence (as the only clue sentence is a passive sentence)

– Remove standard English passive signaling • i.e. no passive be + -en morphology

• Example– By 1997 , almost all remaining uses of cancer-causing asbestos will be outlawed .

Experiment• Experiment

– Use Wow! as the 4th word of a sentence (as the only clue sentence is a passive sentence)

– Remove standard English passive signaling • i.e. no passive be + -en morphology

• Example (sentence 00–64)– By 1997 , almost all remaining uses of cancer-causing asbestos will outlaw .

Wow!

⁁Wow!

Strategy: attach Wow! at the same syntactic level as the preceding word or lexeme

POS tagWOW

Results and Discussion• Original: The book was delivered .• Modified: 1 The 2 book 3 delivered 4 . • Test: insert Wow! at positions 1–4

– compare the probability of the (best) parse in each case

Results and Discussion• Original: The book was delivered .• Modified: 1 The 2 book 3 delivered 4 .• Test: insert Wow! at positions 1–4

– compare the probability of the (best) parse in each case

1 2 3 4

-40

-38

-36

-34

-32

-30

-28

-26

logprob score

betterWow! at position

Parser using 4th word Wow! training data

Results and Discussion• Original: The book was delivered .• Modified: 1 The 2 book 3 delivered 4 .• Test: insert Wow! at positions 1–4

– compare the probability of the (best) parse in each case

logprob score

betterWow! at position

1 2 3 4

-76

-74

-72

-70

-68

-66

-64

Parser using original training

data

Wow! is an unknown word

Results and Discussion• Original: A buffet breakfast was held in the art museum . • Modified: 1 A2 buffet 3 breakfast 4 held 5 in 6 the 7 art 8 museum 9 .• Test: insert Wow! at positions 1–9

logprob score

betterWow! at position

1 2 3 4 5 6 7 8 9

-68

-66

-64

-62

-60

-58

-56

-54

Parser using 4th word Wow! training data

Results and Discussion• Original: A buffet breakfast was held in the art museum . • Modified: 1 A2 buffet 3 breakfast 4 held 5 in 6 the 7 art 8 museum 9 .• Test: insert Wow! at positions 1–9

logprob score

betterWow! at position

1 2 3 4 5 6 7 8 9

-108

-106

-104

-102

-100

-98

-96

Parser using original training

data

Results and Discussion• On section 23 (test section)

– 327 passive sentences– 2416 totalOTD = original training data (i.e. no Wow!)Test data: -passwow1 = passives signaled using Wow!, -pass = passives signaled normally

wsj-23-passwow1

wsj-23-pass

wsj-23-passwow1 (OTD)

wsj-23-pass (OTD)

82 83 84 85 86 87 88 89 90

Test set: passive sentences only

F-Measure

Results and Discussion

Test: Wow! NP object, Trained on OTD

Test: 4th word Wow!, Trained on OTD

Test: Wow! NP object + regular passive, Trained on OTD

Test: Wow! 4th word, Trained on same

Test: OD, Trained on: Wow! 4th word

Test: OD, Trained on Wow! NP object

(Baseline) Test: OD, Trained on OTD

Test: OD, Trained on Wow! NP object + regular passive

Test: Wow! NP object, Trained on same

Test: Wow! NP object + regular passive, Trained on same

82.00 83.00 84.00 85.00 86.00 87.00 88.00 89.00 90.00

F-measure

• Comparison: – Wow! as object plus passive morphology– Wow! Inserted as NP object trace– Baseline (passive morphology)– Wow! 4th word

Results and Discussion• Comparison:

– Wow! as object plus passive morphology– Wow! Inserted as NP object trace– Baseline (passive morphology)– Wow! 4th word

89.088.087.0

Wow! as object + passive morphology

Wow! as object

Regular passive morphology