punctuation: making a point · — e.g., punctuation and capitalization raw word streams often...
TRANSCRIPT
![Page 1: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/1.jpg)
Punctuation: Making a Point
in Unsupervised Dependency Parsing
Valentin I. Spitkovsky
with Daniel Jurafsky (Stanford University)
and Hiyan Alshawi (Google Inc.)
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 1 / 25
![Page 2: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/2.jpg)
Example Raw Text
Example: Raw Word Stream
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 2 / 25
![Page 3: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/3.jpg)
Example Raw Text
Example: Raw Word Stream
ALTHOUGH IT PROBABLY HAS REDUCED THELEVEL OF EXPENDITURES FOR SOME
PURCHASERS UTILIZATION MANAGEMENTLIKE MOST OTHER COST CONTAINMENTSTRATEGIES DOESN’T APPEAR TO HAVE
ALTERED THE LONG-TERM RATE OFINCREASE IN HEALTH-CARE COSTS THE
INSTITUTE OF MEDICINE AN AFFILIATE OFTHE NATIONAL ACADEMY OF SCIENCESCONCLUDED AFTER A TWO-YEAR STUDY
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 2 / 25
![Page 4: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/4.jpg)
Example Unformatted Text
Example:
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 3 / 25
![Page 5: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/5.jpg)
Example Unformatted Text
Example:
formatting (missing structural cues):
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 3 / 25
![Page 6: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/6.jpg)
Example Unformatted Text
Example:
formatting (missing structural cues):— e.g., punctuation and capitalization
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 3 / 25
![Page 7: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/7.jpg)
Example Unformatted Text
Example:
formatting (missing structural cues):— e.g., punctuation and capitalization
raw word streams often difficult even for humans— e.g., transcribed utterances (Kim and Woodland, 2002)
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 3 / 25
![Page 8: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/8.jpg)
Example Unlexicalized Tokens
Example:
IN PRP RB VBZ VBN DT NN IN NNS IN DTNNS NN NN IN RBS JJ NN NN NNS VBZ RBVB TO VB VBN DT JJ NN IN NN IN JJ NNSDT NNP IN NNP DT NN IN DT NNP NNP IN
NNPS VBD IN DT JJ NN
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 4 / 25
![Page 9: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/9.jpg)
Example Formatted Text
Example:
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 5 / 25
![Page 10: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/10.jpg)
Example Formatted Text
Example:
[SBAR Although it probably has reduced the level ofexpenditures for some purchasers],
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 5 / 25
![Page 11: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/11.jpg)
Example Formatted Text
Example:
[SBAR Although it probably has reduced the level ofexpenditures for some purchasers], [NP utilization
management] —
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 5 / 25
![Page 12: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/12.jpg)
Example Formatted Text
Example:
[SBAR Although it probably has reduced the level ofexpenditures for some purchasers], [NP utilization
management] — [PP like most other costcontainment strategies] —
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 5 / 25
![Page 13: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/13.jpg)
Example Formatted Text
Example:
[SBAR Although it probably has reduced the level ofexpenditures for some purchasers], [NP utilization
management] — [PP like most other costcontainment strategies] — [VP doesn’t appear to
have altered the long-term rate of increase inhealth-care costs],
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 5 / 25
![Page 14: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/14.jpg)
Example Formatted Text
Example:
[SBAR Although it probably has reduced the level ofexpenditures for some purchasers], [NP utilization
management] — [PP like most other costcontainment strategies] — [VP doesn’t appear to
have altered the long-term rate of increase inhealth-care costs], [NP the Institute of Medicine],
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 5 / 25
![Page 15: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/15.jpg)
Example Formatted Text
Example:
[SBAR Although it probably has reduced the level ofexpenditures for some purchasers], [NP utilization
management] — [PP like most other costcontainment strategies] — [VP doesn’t appear to
have altered the long-term rate of increase inhealth-care costs], [NP the Institute of Medicine],
[NP an affiliate of the National Academy of
Sciences],
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 5 / 25
![Page 16: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/16.jpg)
Example Formatted Text
Example:
[SBAR Although it probably has reduced the level ofexpenditures for some purchasers], [NP utilization
management] — [PP like most other costcontainment strategies] — [VP doesn’t appear to
have altered the long-term rate of increase inhealth-care costs], [NP the Institute of Medicine],
[NP an affiliate of the National Academy of
Sciences], [VP concluded after a two-year study].
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 5 / 25
![Page 17: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/17.jpg)
Intuition Strong Cues
Intuition:
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 6 / 25
![Page 18: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/18.jpg)
Intuition Strong Cues
Intuition:
punctuation is a strong structural cue
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 6 / 25
![Page 19: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/19.jpg)
Intuition Strong Cues
Intuition:
punctuation is a strong structural cue— demarcates separable fragments
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 6 / 25
![Page 20: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/20.jpg)
Intuition Strong Cues
Intuition:
punctuation is a strong structural cue— demarcates separable fragments
we will make simplifying independence assumptions
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 6 / 25
![Page 21: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/21.jpg)
Intuition Strong Cues
Intuition:
punctuation is a strong structural cue— demarcates separable fragments
we will make simplifying independence assumptions— (unreasonably) strong in training
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 6 / 25
![Page 22: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/22.jpg)
Intuition Strong Cues
Intuition:
punctuation is a strong structural cue— demarcates separable fragments
we will make simplifying independence assumptions— (unreasonably) strong in training
less crude in inference— (reasonably) weak in final decoding
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 6 / 25
![Page 23: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/23.jpg)
Intuition Strong Assumption
Intuition:
strong constraint
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 7 / 25
![Page 24: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/24.jpg)
Intuition Strong Assumption
Intuition:
strong constraint: (head ← head) in training
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 7 / 25
![Page 25: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/25.jpg)
Intuition Strong Assumption
Intuition:
strong constraint: (head ← head) in training
word head , head word word ,
head word word word word word word word .
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 7 / 25
![Page 26: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/26.jpg)
Intuition Strong Assumption
Intuition:
strong constraint: (head ← head) in training
word head , head word word ,
head word word word word word word word .
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 7 / 25
![Page 27: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/27.jpg)
Intuition Strong Assumption
Intuition:
strong constraint: (head ← head) in training
word head , head word word ,
head word word word word word word word .
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 7 / 25
![Page 28: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/28.jpg)
Intuition Strong Assumption
Intuition:
strong constraint: (head ← head) in training
Other countries , including West Germany ,
may have a hard time justifying continued membership .
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 7 / 25
![Page 29: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/29.jpg)
Intuition Weak Assumption
Intuition:
weak constraint
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 8 / 25
![Page 30: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/30.jpg)
Intuition Weak Assumption
Intuition:
weak constraint: (head ← external word) in inference
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 8 / 25
![Page 31: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/31.jpg)
Intuition Weak Assumption
Intuition:
weak constraint: (head ← external word) in inference
word word head word word word ,
head word word word word word word word .
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 8 / 25
![Page 32: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/32.jpg)
Intuition Weak Assumption
Intuition:
weak constraint: (head ← external word) in inference
word word head word word word ,
head word word word word word word word .
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 8 / 25
![Page 33: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/33.jpg)
Intuition Weak Assumption
Intuition:
weak constraint: (head ← external word) in inference
word word head word word word ,
head word word word word word word word .
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 8 / 25
![Page 34: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/34.jpg)
Intuition Weak Assumption
Intuition:
weak constraint: (head ← external word) in inference
IFI also has nonvoting preferred shares ,
which are quoted on the Milan stock exchange .
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 8 / 25
![Page 35: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/35.jpg)
Linguistic Analysis Constituents
Linguistic Analysis:
punctuation and syntax are related(Nunberg, 1990; Briscoe, 1994; Jones 1994; Doran, 1998, inter alia)
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 9 / 25
![Page 36: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/36.jpg)
Linguistic Analysis Constituents
Linguistic Analysis:
punctuation and syntax are related(Nunberg, 1990; Briscoe, 1994; Jones 1994; Doran, 1998, inter alia)
49.4% of inter-punctuation fragments are constituents
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 9 / 25
![Page 37: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/37.jpg)
Linguistic Analysis Constituents
Linguistic Analysis:
punctuation and syntax are related(Nunberg, 1990; Briscoe, 1994; Jones 1994; Doran, 1998, inter alia)
49.4% of inter-punctuation fragments are constituents
lowest dominating non-terminals:%
S 32.5NP 27.2VP 13.3PP 10.1SBAR 6.7ADVP 3.3QP 2.5SINV 2.0ADJP 1.0
98.5
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 9 / 25
![Page 38: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/38.jpg)
Linguistic Analysis Strong Dependencies
Linguistic Analysis:
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 10 / 25
![Page 39: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/39.jpg)
Linguistic Analysis Strong Dependencies
Linguistic Analysis:
strong (in training)
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 10 / 25
![Page 40: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/40.jpg)
Linguistic Analysis Strong Dependencies
Linguistic Analysis:
strong (in training), e.g.,
... arrests followed a “ Snake Day ” at Utrecht ...
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 10 / 25
![Page 41: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/41.jpg)
Linguistic Analysis Strong Dependencies
Linguistic Analysis:
strong (in training), e.g.,
... arrests followed a “ Snake Day ” at Utrecht ...
— already 74.0% agreement with head-percolated trees
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 10 / 25
![Page 42: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/42.jpg)
Linguistic Analysis Weak Dependencies
Linguistic Analysis:
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 11 / 25
![Page 43: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/43.jpg)
Linguistic Analysis Weak Dependencies
Linguistic Analysis:
weak (in inference)
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 11 / 25
![Page 44: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/44.jpg)
Linguistic Analysis Weak Dependencies
Linguistic Analysis:
weak (in inference), e.g.,
Maryland Club also distributes tea , which ...
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 11 / 25
![Page 45: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/45.jpg)
Linguistic Analysis Weak Dependencies
Linguistic Analysis:
weak (in inference), e.g.,
Maryland Club also distributes tea , which ...
— now 92.9% agreement with head-percolated trees
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 11 / 25
![Page 46: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/46.jpg)
Linguistic Analysis Violations
Linguistic Analysis:
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 12 / 25
![Page 47: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/47.jpg)
Linguistic Analysis Violations
Linguistic Analysis:
generalization:
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 12 / 25
![Page 48: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/48.jpg)
Linguistic Analysis Violations
Linguistic Analysis:
generalization:— no path from the root may enter a fragment twice
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 12 / 25
![Page 49: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/49.jpg)
Linguistic Analysis Violations
Linguistic Analysis:
generalization:— no path from the root may enter a fragment twice— 95.0% agreement with head-percolated trees
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 12 / 25
![Page 50: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/50.jpg)
Linguistic Analysis Violations
Linguistic Analysis:
generalization:— no path from the root may enter a fragment twice— 95.0% agreement with head-percolated trees
simple violations:
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 12 / 25
![Page 51: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/51.jpg)
Linguistic Analysis Violations
Linguistic Analysis:
generalization:— no path from the root may enter a fragment twice— 95.0% agreement with head-percolated trees
simple violations: “seamless” quotations
Her recent report classifies the stock as a “hold.”
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 12 / 25
![Page 52: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/52.jpg)
Linguistic Analysis Violations
Linguistic Analysis:
generalization:— no path from the root may enter a fragment twice— 95.0% agreement with head-percolated trees
simple violations: “seamless” quotations and even lists
Her recent report classifies the stock as a “hold.”
The company said its directors , management and
subsidiaries will remain long-term investors and ...
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 12 / 25
![Page 53: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/53.jpg)
Motivation
Motivation: “Profiting from Markup”
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 13 / 25
![Page 54: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/54.jpg)
Motivation
Motivation: “Profiting from Markup”
..., whereas McCain is secure on the topic, Obama<a>[VP worries about winning the pro-Israel vote]</a>.
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 13 / 25
![Page 55: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/55.jpg)
Motivation
Motivation: “Profiting from Markup”
..., whereas McCain is secure on the topic, Obama<a>[VP worries about winning the pro-Israel vote]</a>.
“Capitalizing on Punctuation”
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 13 / 25
![Page 56: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/56.jpg)
Motivation
Motivation: “Profiting from Markup”
..., whereas McCain is secure on the topic, Obama<a>[VP worries about winning the pro-Israel vote]</a>.
“Capitalizing on Punctuation”— more common (particularly in long sentences)
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 13 / 25
![Page 57: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/57.jpg)
Motivation
Motivation: “Profiting from Markup”
..., whereas McCain is secure on the topic, Obama<a>[VP worries about winning the pro-Israel vote]</a>.
“Capitalizing on Punctuation”— more common (particularly in long sentences)— more uniform (better coverage of constructs)
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 13 / 25
![Page 58: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/58.jpg)
The Problem Input/Output
Problem: Unsupervised Learning of Parsing
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 14 / 25
![Page 59: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/59.jpg)
The Problem Input/Output
Problem: Unsupervised Learning of Parsing
Input: Raw Text
... By most measures, the nation’s industrial sector is nowgrowing very slowly — if at all. Factory payrolls fell inSeptember. So did the Federal Reserve ...
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 14 / 25
![Page 60: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/60.jpg)
The Problem Input/Output
Problem: Unsupervised Learning of Parsing
NN NNS VBD IN NN ♦| | | | | |
Factory payrolls fell in September .
Input: Raw Text (Sentences, Tokens and POS-tags)
... By most measures, the nation’s industrial sector is nowgrowing very slowly — if at all. Factory payrolls fell inSeptember. So did the Federal Reserve ...
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 14 / 25
![Page 61: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/61.jpg)
The Problem Input/Output
Problem: Unsupervised Learning of Parsing
NN NNS VBD IN NN ♦| | | | | |
Factory payrolls fell in September .
Input: Raw Text (Sentences, Tokens and POS-tags)
... By most measures, the nation’s industrial sector is nowgrowing very slowly — if at all. Factory payrolls fell inSeptember. So did the Federal Reserve ...
Output: Syntactic Structures (and a Probabilistic Grammar)
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 14 / 25
![Page 62: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/62.jpg)
Methodology Scoring
Scoring: Directed Dependency Accuracy
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 15 / 25
![Page 63: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/63.jpg)
Methodology Scoring
Scoring: Directed Dependency Accuracy
NN NNS VBD IN NN ♦| | | | | |
Factory payrolls fell in September .
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 15 / 25
![Page 64: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/64.jpg)
Methodology Scoring
Scoring: Directed Dependency Accuracy
NN NNS VBD IN NN ♦| | | | | |
Factory payrolls fell in September .
Directed score: 35 = 60%
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 15 / 25
![Page 65: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/65.jpg)
Methodology Scoring
Scoring: Directed Dependency Accuracy
NN NNS VBD IN NN ♦| | | | | |
Factory payrolls fell in September .
Directed score: 35 = 60% (right/left-branching baselines: 2
5 = 40%).
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 15 / 25
![Page 66: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/66.jpg)
Methodology Model
DMV: Dependency Model with Valence
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 16 / 25
![Page 67: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/67.jpg)
Methodology Model
DMV: Dependency Model with Valence
a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 16 / 25
![Page 68: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/68.jpg)
Methodology Model
DMV: Dependency Model with Valence
a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)
h
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 16 / 25
![Page 69: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/69.jpg)
Methodology Model
DMV: Dependency Model with Valence
a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)
h
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 16 / 25
![Page 70: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/70.jpg)
Methodology Model
DMV: Dependency Model with Valence
a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)
h
a1
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 16 / 25
![Page 71: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/71.jpg)
Methodology Model
DMV: Dependency Model with Valence
a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)
h
a1
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 16 / 25
![Page 72: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/72.jpg)
Methodology Model
DMV: Dependency Model with Valence
a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)
h
a1
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 16 / 25
![Page 73: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/73.jpg)
Methodology Model
DMV: Dependency Model with Valence
a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)
h
a1 a2
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 16 / 25
![Page 74: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/74.jpg)
Methodology Model
DMV: Dependency Model with Valence
a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)
h
a1 a2
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 16 / 25
![Page 75: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/75.jpg)
Methodology Model
DMV: Dependency Model with Valence
a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)
h
a1 a2
STOP
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 16 / 25
![Page 76: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/76.jpg)
Methodology Model
DMV: Dependency Model with Valence
a head-outward model, with word classesand valence/adjacency (Klein and Manning, 2004)
h
a1 a2
STOP
P(th) =∏
dir∈{L,R}
PSTOP(ch, dir,
adj︷︸︸︷
1n=0)
n∏
i=1
P(tai ) PATTACH(ch, dir, cai )
(1− PSTOP(ch, dir,
adj︷︸︸︷
1i=1))
n=|args(h,dir)|Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 16 / 25
![Page 77: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/77.jpg)
Methodology Learning
Learning: Viterbi EM
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 17 / 25
![Page 78: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/78.jpg)
Methodology Learning
Learning: Viterbi EM
well-suited to long sentences,which are more punctuation-rich
(Spitkovsky et al., CoNLL 2010)
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 17 / 25
![Page 79: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/79.jpg)
Methodology Learning
Learning: Viterbi EM
well-suited to long sentences,which are more punctuation-rich
(Spitkovsky et al., CoNLL 2010)
fast, simple and easily admits constraints(Spitkovsky et al., ACL 2010)
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 17 / 25
![Page 80: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/80.jpg)
Constraints
Constraints: Parser Induction
the model, i.e., projective trees (Klein and Manning, 2004)
— Dependency Model with Valence (DMV)
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 18 / 25
![Page 81: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/81.jpg)
Constraints
Constraints: Parser Induction
the model, i.e., projective trees (Klein and Manning, 2004)
— Dependency Model with Valence (DMV)
(((List (the fares (for ((flight) (number 891)))))) .)
partial bracketings (Pereira and Schabes, 1992)
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 18 / 25
![Page 82: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/82.jpg)
Constraints
Constraints: Parser Induction
the model, i.e., projective trees (Klein and Manning, 2004)
— Dependency Model with Valence (DMV)
(((List (the fares (for ((flight) (number 891)))))) .)
partial bracketings (Pereira and Schabes, 1992)
– synchronous grammars (Alshawi and Douglas, 2000)– linear-time parsing (Seginer, 2007)– skewness of trees (Seginer, 2007)– Zipfian distribution of words (Seginer, 2007)– sparse posterior regularization (Ganchev et al., 2009)
– web markup-induced constraints (Spitkovsky et al., 2010)
– semantic cues (Naseem and Barzilay, 2011)
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 18 / 25
![Page 83: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/83.jpg)
Experimental Results Unlexicalized
Experimental Results: Unlexicalized
directed dependency accuraciesfor baselines, inference, training and an oracle:
WSJ∞ (Section 23, all sentences)
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 19 / 25
![Page 84: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/84.jpg)
Experimental Results Unlexicalized
Experimental Results: Unlexicalized
directed dependency accuraciesfor baselines, inference, training and an oracle:
WSJ∞ (Section 23, all sentences)
Standard Training 52.0
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 19 / 25
![Page 85: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/85.jpg)
Experimental Results Unlexicalized
Experimental Results: Unlexicalized
directed dependency accuraciesfor baselines, inference, training and an oracle:
WSJ∞ (Section 23, all sentences)Punctuation as Words 41.7 (-10.3)
Standard Training 52.0
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 19 / 25
![Page 86: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/86.jpg)
Experimental Results Unlexicalized
Experimental Results: Unlexicalized
directed dependency accuraciesfor baselines, inference, training and an oracle:
WSJ∞ (Section 23, all sentences)Punctuation as Words 41.7 (-10.3)
Standard Training 52.0w/Constrained Inference 54.0 (+2.0)
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 19 / 25
![Page 87: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/87.jpg)
Experimental Results Unlexicalized
Experimental Results: Unlexicalized
directed dependency accuraciesfor baselines, inference, training and an oracle:
WSJ∞ (Section 23, all sentences)Punctuation as Words 41.7 (-10.3)
Standard Training 52.0w/Constrained Inference 54.0 (+2.0)
Constrained Training 55.6 (+3.6)
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 19 / 25
![Page 88: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/88.jpg)
Experimental Results Unlexicalized
Experimental Results: Unlexicalized
directed dependency accuraciesfor baselines, inference, training and an oracle:
WSJ∞ (Section 23, all sentences)Punctuation as Words 41.7 (-10.3)
Standard Training 52.0w/Constrained Inference 54.0 (+2.0)
Constrained Training 55.6 (+3.6)w/Constrained Inference 57.4 (+1.8)
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 19 / 25
![Page 89: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/89.jpg)
Experimental Results Unlexicalized
Experimental Results: Unlexicalized
directed dependency accuraciesfor baselines, inference, training and an oracle:
WSJ∞ (Section 23, all sentences)Punctuation as Words 41.7 (-10.3)
Standard Training 52.0w/Constrained Inference 54.0 (+2.0)
Constrained Training 55.6 (+3.6)w/Constrained Inference 57.4 (+1.8)
Supervised DMV 69.8
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 19 / 25
![Page 90: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/90.jpg)
Experimental Results Unlexicalized
Experimental Results: Unlexicalized
directed dependency accuraciesfor baselines, inference, training and an oracle:
WSJ∞ (Section 23, all sentences)Punctuation as Words 41.7 (-10.3)
Standard Training 52.0w/Constrained Inference 54.0 (+2.0)
Constrained Training 55.6 (+3.6)w/Constrained Inference 57.4 (+1.8)
Supervised DMV 69.8w/Constrained Inference 73.0 (+3.2)
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 19 / 25
![Page 91: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/91.jpg)
Experimental Results Lexicalized
Experimental Results: Lexicalized
directed dependency accuracies comparedto previous state-of-the-art
WSJ∞
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 20 / 25
![Page 92: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/92.jpg)
Experimental Results Lexicalized
Experimental Results: Lexicalized
directed dependency accuracies comparedto previous state-of-the-art
WSJ∞
Unlexicalized 57.4
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 20 / 25
![Page 93: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/93.jpg)
Experimental Results Lexicalized
Experimental Results: Lexicalized
directed dependency accuracies comparedto previous state-of-the-art
WSJ∞
(Spitkovsky et al., ACL 2010) 50.4
Unlexicalized 57.4
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 20 / 25
![Page 94: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/94.jpg)
Experimental Results Lexicalized
Experimental Results: Lexicalized
directed dependency accuracies comparedto previous state-of-the-art
WSJ∞
(Spitkovsky et al., ACL 2010) 50.4Lexicalized (Gillenwater et al., 2010) 53.3
Unlexicalized 57.4
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 20 / 25
![Page 95: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/95.jpg)
Experimental Results Lexicalized
Experimental Results: Lexicalized
directed dependency accuracies comparedto previous state-of-the-art
WSJ∞
(Spitkovsky et al., ACL 2010) 50.4Lexicalized (Gillenwater et al., 2010) 53.3
Lexicalized (Blunsom and Cohn, 2010) 55.7Unlexicalized 57.4
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 20 / 25
![Page 96: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/96.jpg)
Experimental Results Lexicalized
Experimental Results: Lexicalized
directed dependency accuracies comparedto previous state-of-the-art
WSJ∞
(Spitkovsky et al., ACL 2010) 50.4Lexicalized (Gillenwater et al., 2010) 53.3
Lexicalized (Blunsom and Cohn, 2010) 55.7Unlexicalized 57.4
Lexicalized Constrained Training 58.0
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 20 / 25
![Page 97: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/97.jpg)
Experimental Results Lexicalized
Experimental Results: Lexicalized
directed dependency accuracies comparedto previous state-of-the-art
WSJ∞
(Spitkovsky et al., ACL 2010) 50.4Lexicalized (Gillenwater et al., 2010) 53.3
Lexicalized (Blunsom and Cohn, 2010) 55.7Unlexicalized 57.4
Lexicalized Constrained Training 58.0w/Constrained Infernce 58.4
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 20 / 25
![Page 98: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/98.jpg)
Experimental Results Without Gold Tags
Experimental Results: “Fully” Unsupervised
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 21 / 25
![Page 99: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/99.jpg)
Experimental Results Without Gold Tags
Experimental Results: “Fully” Unsupervised
constraints sufficiently strong to abandon gold tags
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 21 / 25
![Page 100: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/100.jpg)
Experimental Results Without Gold Tags
Experimental Results: “Fully” Unsupervised
constraints sufficiently strong to abandon gold tags
WSJ∞
(this work) 58.4
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 21 / 25
![Page 101: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/101.jpg)
Experimental Results Without Gold Tags
Experimental Results: “Fully” Unsupervised
constraints sufficiently strong to abandon gold tags
WSJ∞
(this work) 58.4w/o Gold Tags 58.2
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 21 / 25
![Page 102: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/102.jpg)
Experimental Results Without Gold Tags
Experimental Results: “Fully” Unsupervised
constraints sufficiently strong to abandon gold tags
WSJ∞
(this work) 58.4w/o Gold Tags 58.2
using Clark’s (2000) unsupervised clusters
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 21 / 25
![Page 103: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/103.jpg)
Experimental Results Without Gold Tags
Experimental Results: “Fully” Unsupervised
constraints sufficiently strong to abandon gold tags
WSJ∞
(this work) 58.4w/o Gold Tags 58.2
using Clark’s (2000) unsupervised clusters— constructed by Finkel and Manning (2009) for NER
http://nlp.stanford.edu/software/
stanford-postagger-2008-09-28.tar.gz:
models/egw.bnc.200
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 21 / 25
![Page 104: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/104.jpg)
Experimental Results Without Gold Tags
Experimental Results: “Fully” Unsupervised
constraints sufficiently strong to abandon gold tags
WSJ∞
(this work) 58.4w/o Gold Tags 58.2
using Clark’s (2000) unsupervised clusters— constructed by Finkel and Manning (2009) for NER
http://nlp.stanford.edu/software/
stanford-postagger-2008-09-28.tar.gz:
models/egw.bnc.200
(Come see our poster at EMNLP!)
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 21 / 25
![Page 105: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/105.jpg)
Experimental Results Multi-Lingual
Experimental Results: Multi-Lingualfurther evaluation against CoNLL 2006/7 data sets— results generalize across languages:
Arabic 2006’7
Basque ’7Bulgarian ’6Catalan ’7Czech ’6
’7Danish ’6Dutch ’6English ’7German ’6Greek ’7Hungarian ’7Italian ’7Japanese ’6Portuguese ’6Slovenian ’6Spanish ’6Swedish ’6Turkish ’6
’7
Average:
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 22 / 25
![Page 106: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/106.jpg)
Experimental Results Multi-Lingual
Experimental Results: Multi-Lingualfurther evaluation against CoNLL 2006/7 data sets— results generalize across languages:
Inference OnlyArabic 2006 +0.1
’7 +0.9Basque ’7 +0.8Bulgarian ’6 +1.1Catalan ’7 +0.8Czech ’6 +0.9
’7 +1.0Danish ’6 +0.9Dutch ’6 +1.0English ’7 +1.3German ’6 +0.8Greek ’7 +0.5Hungarian ’7 +0.4Italian ’7 +0.1Japanese ’6 +0.0Portuguese ’6 +0.7Slovenian ’6 +2.0Spanish ’6 +0.8Swedish ’6 +0.5Turkish ’6 +0.1
’7 +0.2
Average: +0.7
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 22 / 25
![Page 107: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/107.jpg)
Experimental Results Multi-Lingual
Experimental Results: Multi-Lingualfurther evaluation against CoNLL 2006/7 data sets— results generalize across languages:
Inference Only Training & InferenceArabic 2006 +0.1 +1.1
’7 +0.9 +2.6Basque ’7 +0.8 +0.6Bulgarian ’6 +1.1 +1.6Catalan ’7 +0.8 +0.9Czech ’6 +0.9 +3.0
’7 +1.0 +2.7Danish ’6 +0.9 +0.2Dutch ’6 +1.0 +3.0English ’7 +1.3 +2.8German ’6 +0.8 +1.6Greek ’7 +0.5 +0.7Hungarian ’7 +0.4 +1.4Italian ’7 +0.1 -0.8Japanese ’6 +0.0 +0.1Portuguese ’6 +0.7 +0.8Slovenian ’6 +2.0 +2.8Spanish ’6 +0.8 +0.8Swedish ’6 +0.5 +0.8Turkish ’6 +0.1 +1.0
’7 +0.2 +0.1
Average: +0.7 +1.3
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 22 / 25
![Page 108: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/108.jpg)
Conclusion Thoughts
Thoughts:
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 23 / 25
![Page 109: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/109.jpg)
Conclusion Thoughts
Thoughts:
extend existing parsers
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 23 / 25
![Page 110: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/110.jpg)
Conclusion Thoughts
Thoughts:
extend existing parsers— no need to retrain models
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 23 / 25
![Page 111: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/111.jpg)
Conclusion Thoughts
Thoughts:
extend existing parsers— no need to retrain models— supervised systems?
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 23 / 25
![Page 112: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/112.jpg)
Conclusion Thoughts
Thoughts:
extend existing parsers— no need to retrain models— supervised systems?
would prosody aid with induction from speech?
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 23 / 25
![Page 113: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/113.jpg)
Conclusion Thoughts
Thoughts:
extend existing parsers— no need to retrain models— supervised systems?
would prosody aid with induction from speech?— “as words” breaks n-grams (Kahn et al., 2005)
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 23 / 25
![Page 114: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/114.jpg)
Conclusion Summary
Summary:
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 24 / 25
![Page 115: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/115.jpg)
Conclusion Summary
Summary:
punctuation helps dependency grammar induction
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 24 / 25
![Page 116: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/116.jpg)
Conclusion Summary
Summary:
punctuation helps dependency grammar induction— even better than markup...
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 24 / 25
![Page 117: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/117.jpg)
Conclusion Summary
Summary:
punctuation helps dependency grammar induction— even better than markup...
a popular approach: powerful models
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 24 / 25
![Page 118: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/118.jpg)
Conclusion Summary
Summary:
punctuation helps dependency grammar induction— even better than markup...
a popular approach: powerful models— priors prevent overfitting
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 24 / 25
![Page 119: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/119.jpg)
Conclusion Summary
Summary:
punctuation helps dependency grammar induction— even better than markup...
a popular approach: powerful models— priors prevent overfitting
an alternative: overly simple models
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 24 / 25
![Page 120: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/120.jpg)
Conclusion Summary
Summary:
punctuation helps dependency grammar induction— even better than markup...
a popular approach: powerful models— priors prevent overfitting
an alternative: overly simple models— constraints prevent underfitting
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 24 / 25
![Page 121: Punctuation: Making a Point · — e.g., punctuation and capitalization raw word streams often difficult even for humans — e.g., transcribed utterances (Kim and Woodland, 2002)](https://reader036.vdocuments.site/reader036/viewer/2022071016/5fcfccea588b316e594960b0/html5/thumbnails/121.jpg)
Conclusion Thanks! Questions?
Thanks!
Punctuation. It works...
Any questions?
Spitkovsky et al. (Stanford & Google) Punctuation: Making a Point CoNLL (2011-06-23) 25 / 25