automatic noun compound interpretation stephen tratz eduard hovy university of southern california/...
TRANSCRIPT
![Page 1: Automatic Noun Compound Interpretation Stephen Tratz Eduard Hovy University of Southern California/ Information Sciences Institute](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649ea45503460f94ba8dbf/html5/thumbnails/1.jpg)
Automatic Noun Compound Interpretation
Stephen TratzEduard Hovy
University of Southern California/Information Sciences Institute
![Page 2: Automatic Noun Compound Interpretation Stephen Tratz Eduard Hovy University of Southern California/ Information Sciences Institute](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649ea45503460f94ba8dbf/html5/thumbnails/2.jpg)
Noun Compound Definition
A head noun with one or more preceding noun modifiers
Examples: uranium rod, mountain cave, terrorist attack
Other names
– noun-noun compound
– noun sequence
– compound nominal
– complex nominal
![Page 3: Automatic Noun Compound Interpretation Stephen Tratz Eduard Hovy University of Southern California/ Information Sciences Institute](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649ea45503460f94ba8dbf/html5/thumbnails/3.jpg)
Problem 1: Relation between modifier and head nouns
– uranium rod (n1 substance-of n
2)
– mountain cave (n1 location-of n
2)
– terrorist attack (n1 performer-of n
2)
Problem 2: Structure of long noun compounds
– ((aluminum soup) (pot cover))
– ((aluminum (soup pot)) cover)
– (aluminum ((soup pot) cover))
The Problems
![Page 4: Automatic Noun Compound Interpretation Stephen Tratz Eduard Hovy University of Southern California/ Information Sciences Institute](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649ea45503460f94ba8dbf/html5/thumbnails/4.jpg)
The Need
Needed for natural language understanding
– Question answering
– Recognition of textual entailment
– Summarization
– Summarization evaluation
– Etc
![Page 5: Automatic Noun Compound Interpretation Stephen Tratz Eduard Hovy University of Southern California/ Information Sciences Institute](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649ea45503460f94ba8dbf/html5/thumbnails/5.jpg)
Solution
A taxonomy of relations that occur between nouns in noun compounds Wide coverage of relations Good definitions High inter-annotator agreement
An automatic classification method For supervised approaches, this requires a
sufficiently large annotated dataset
![Page 6: Automatic Noun Compound Interpretation Stephen Tratz Eduard Hovy University of Southern California/ Information Sciences Institute](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649ea45503460f94ba8dbf/html5/thumbnails/6.jpg)
Weaknesses of earlier solutions
Limited-quality/usefulness relations Using unlimited number of relations Using a few ambiguous relations such as
prepositions and a handful of simple verbs (e.g., BE, HAVE, CAUSE) --- still need to disambiguate!
Definitions are sometimes not provided Low inter-annotator agreement
Limited annotated data (problem for supervised classification)
![Page 7: Automatic Noun Compound Interpretation Stephen Tratz Eduard Hovy University of Southern California/ Information Sciences Institute](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649ea45503460f94ba8dbf/html5/thumbnails/7.jpg)
Dataset
Dataset of 17.5k noun compounds from:
– a large dataset extracted using mutual information plus part-of-speech tagging
– the WSJ portion of the Penn Treebank
![Page 8: Automatic Noun Compound Interpretation Stephen Tratz Eduard Hovy University of Southern California/ Information Sciences Institute](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649ea45503460f94ba8dbf/html5/thumbnails/8.jpg)
Dataset Comparison
Size Work
17509 Tratz and Hovy, 2010
2169 Kim and Baldwin, 2005
2031* Girju, 2007
1660 Rosario and Hearst, 2001
1443 Ó Séaghdha and Copestake, 2007
505* Barker and Szpakowicz, 1998
600* Nastase and Szpakowicz, 2003
395 Vanderwende, 1994
385 Lauer, 1995
![Page 9: Automatic Noun Compound Interpretation Stephen Tratz Eduard Hovy University of Southern California/ Information Sciences Institute](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649ea45503460f94ba8dbf/html5/thumbnails/9.jpg)
Our Semantic Relations
Relation taxonomy 43 relations Relatively fine-grained Defined using sentences Have rough mappings to relations used in well-
known noun compound research papers
![Page 10: Automatic Noun Compound Interpretation Stephen Tratz Eduard Hovy University of Southern California/ Information Sciences Institute](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649ea45503460f94ba8dbf/html5/thumbnails/10.jpg)
Relation examples
Substance/Material/Ingredient Of (uranium rod)
– n1 is one of the primary physical substances/
materials/ingredients that n2 is made out of/from
Location Of (mountain cave)
– n1 is the location / geographic scope where n
2 is
at, near, from, generally found, or occurs.
![Page 11: Automatic Noun Compound Interpretation Stephen Tratz Eduard Hovy University of Southern California/ Information Sciences Institute](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649ea45503460f94ba8dbf/html5/thumbnails/11.jpg)
All The Relations Communicator of Communication Performer of Act/Activity Creator/Provider/Cause Of Perform/Engage_In Create/Provide/Sell Obtain/Access/Seek Modify/Process/Change Mitigate/Oppose/Destroy Organize/Supervise/Authority Propel Protect/Conserve Transport/Transfer/Trade Traverse/Visit Possessor + Owned/Possessed Experiencer + Cognition/Mental Employer+Employee/Volunteer Consumer + Consumed User/Recipient + Used/Received Owned/Possessed + Possession Experiencer + Experiencer Thing Consumed + Consumer Thing/Means Used + User
Time [Span] + X X + Time [Span] Location/Geographic Scope of X Whole + Part/Member Of Substance/Material/Ingredient + Whole Part/Member + Collection/Configuration/Series X + Spatial Container/Location/Bounds Topic of Communication/Imagery/Info Topic of Plan/Deal/Arrangement/Rules Topic of Observation/Study/Evaluation Topic of Cognition/Emotion Topic of Expert Topic of Situation Topic of Event/Process Topic/Thing + Attribute Topic/Thing + Attribute Value Characteristic Of Coreferential Partial Attribute Transfer Measure + Whole Highly Lexicalized / Fixed Pair Other
![Page 12: Automatic Noun Compound Interpretation Stephen Tratz Eduard Hovy University of Southern California/ Information Sciences Institute](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649ea45503460f94ba8dbf/html5/thumbnails/12.jpg)
Taxonomy Creation
Taxonomy created by Inspecting a large number of examples Comparing relations to those in literature Refining relations using Mechanical Turk Upload data for annotation Analyze annotations Make changes Repeat (5x)
![Page 13: Automatic Noun Compound Interpretation Stephen Tratz Eduard Hovy University of Southern California/ Information Sciences Institute](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649ea45503460f94ba8dbf/html5/thumbnails/13.jpg)
Inter-annotator agreement study
What:
– Calculate the level of agreement between two or more sets of annotations
Why:
– Human agreement typically represents an upper bound on machine performance
How:
– Used Amazon's Mechanical Turk service to collect a set of annotations
![Page 14: Automatic Noun Compound Interpretation Stephen Tratz Eduard Hovy University of Southern California/ Information Sciences Institute](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649ea45503460f94ba8dbf/html5/thumbnails/14.jpg)
Reasons for using Turk
• Inexpensive
– Paid $0.08 per decision
• Low startup time
– Don't have to wait long for people to start working
• Relatively fast turnaround
![Page 15: Automatic Noun Compound Interpretation Stephen Tratz Eduard Hovy University of Southern California/ Information Sciences Institute](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649ea45503460f94ba8dbf/html5/thumbnails/15.jpg)
Problems with using Turk
• Mixed annotator quality
• No training for the annotators
• No guarantee of native English speakers
• Different number of annotations per Turker
– Can't force someone to annotate everything
– Problem for 3+ annotator agreement formula (Fleiss' Kappa)
![Page 16: Automatic Noun Compound Interpretation Stephen Tratz Eduard Hovy University of Southern California/ Information Sciences Institute](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649ea45503460f94ba8dbf/html5/thumbnails/16.jpg)
Solution: Combine Turkers
• Requested 10 annotations per compound
• Calculated a weight for each Turker based upon his/her level of agreement with other Turkers
– Average percentage of annotations that agreed with the Turker
• Used weights to created a single set of annotations
• Ignored Turkers who performed less than 3 annotations
![Page 17: Automatic Noun Compound Interpretation Stephen Tratz Eduard Hovy University of Southern California/ Information Sciences Institute](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649ea45503460f94ba8dbf/html5/thumbnails/17.jpg)
Agreement Scores
• Calculated raw agreement
– # agreements / # decisions
• Cohen's Kappa
– Adjusts for chance agreement
![Page 18: Automatic Noun Compound Interpretation Stephen Tratz Eduard Hovy University of Southern California/ Information Sciences Institute](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649ea45503460f94ba8dbf/html5/thumbnails/18.jpg)
Id Agree%
κ κ* κ**
Combined 0.59 0.57 0.61 0.67
Auto 0.51 0.47 0.47 0.45
Agreement Results
![Page 19: Automatic Noun Compound Interpretation Stephen Tratz Eduard Hovy University of Southern California/ Information Sciences Institute](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649ea45503460f94ba8dbf/html5/thumbnails/19.jpg)
Individual Turker Agreement(vs author, N >= 15)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Auto
Combined
![Page 20: Automatic Noun Compound Interpretation Stephen Tratz Eduard Hovy University of Southern California/ Information Sciences Institute](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649ea45503460f94ba8dbf/html5/thumbnails/20.jpg)
Turker vote weight vs Agreement
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.70.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
Turker vote weight
Ag
ree
me
nt
vs a
uth
or
•Correlation for Turkers who performed 15 or more vs Agreement: 0.92
![Page 21: Automatic Noun Compound Interpretation Stephen Tratz Eduard Hovy University of Southern California/ Information Sciences Institute](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649ea45503460f94ba8dbf/html5/thumbnails/21.jpg)
Comparison to other studies
![Page 22: Automatic Noun Compound Interpretation Stephen Tratz Eduard Hovy University of Southern California/ Information Sciences Institute](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649ea45503460f94ba8dbf/html5/thumbnails/22.jpg)
Automatic Classification Method
Maximum Entropy classifier
– SVMmulticlass
gave similar performance after optimizing the C parameter
Large number of boolean features extracted for each word from
WordNet Roget's Thesaurus Web 1T Corpus the spelling of the words
![Page 23: Automatic Noun Compound Interpretation Stephen Tratz Eduard Hovy University of Southern California/ Information Sciences Institute](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649ea45503460f94ba8dbf/html5/thumbnails/23.jpg)
Features Used
Synonyms Hypernyms Definition words Lexicographer Ids Link types (e.g., part-of) List of different types of
part-of-speech entries All parts-of Prefixes, suffixes
Roget's division information
Last letters of the word Trigrams and 4-grams
from Web 1T corpus Some combinations of
features (e.g. shared hypernyms)
A handful of others
![Page 24: Automatic Noun Compound Interpretation Stephen Tratz Eduard Hovy University of Southern California/ Information Sciences Institute](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649ea45503460f94ba8dbf/html5/thumbnails/24.jpg)
Cross-validation experiments
Performed one-feature-type-only and all-but-one experiments for the different types of features
Most useful features
– Hypernyms
– Definition words
– Synonyms
– Web 1T trigrams and 4-grams
![Page 25: Automatic Noun Compound Interpretation Stephen Tratz Eduard Hovy University of Southern California/ Information Sciences Institute](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649ea45503460f94ba8dbf/html5/thumbnails/25.jpg)
Conclusion
Novel taxonomy 43 fine-grained relations Defined using sentences with placeholders for the
nouns Achieved relatively high inter-annotator agreement
given the difficulty of the task Largest annotated dataset
Over 8 times larger than the next largest Automatic classification method
Achieves performance approximately .10 less than human inter-annotator agreement
![Page 26: Automatic Noun Compound Interpretation Stephen Tratz Eduard Hovy University of Southern California/ Information Sciences Institute](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649ea45503460f94ba8dbf/html5/thumbnails/26.jpg)
Future Work
Address structural issues of longer (3+ word) compounds
Merge relation set with The Preposition Project (Litkowski, 2002) relations for prepositions
Integrate into a dependency parser
![Page 27: Automatic Noun Compound Interpretation Stephen Tratz Eduard Hovy University of Southern California/ Information Sciences Institute](https://reader035.vdocuments.site/reader035/viewer/2022070400/56649ea45503460f94ba8dbf/html5/thumbnails/27.jpg)
The End
Thank you for listening Questions?