linguistically rich statistical models of language
DESCRIPTION
Linguistically Rich Statistical Models of Language. Joseph Smarr M.S. Candidate Symbolic Systems Program Advisor: Christopher D. Manning December 5 th , 2002. Grand Vision. Talk to your computer like another human HAL, Star Trek, etc. Ask your computer a question, it finds the answer - PowerPoint PPT PresentationTRANSCRIPT
Linguistically Rich Statistical Models of
Language
Joseph SmarrM.S. Candidate
Symbolic Systems ProgramAdvisor: Christopher D. Manning
December 5th, 2002
Grand Vision
Talk to your computer like another human HAL, Star Trek, etc.
Ask your computer a question, it finds the answer “Who’s speaking at this week’s SymSys
Forum?” Computer can read and summarize text for
you “What’s the cutting edge in NLP these
days?”
We’re Not There (Yet)
Turns out behaving intelligently is difficult What does it take to achieve the grand
vision? General Artificial Intelligence problems
Knowledge representation, common sense reasoning, etc.
Language-specific problems Complexity, ambiguity, and flexibility of
language Always underestimated because language
is so easy for us!
Are There Useful Sub-Goals?
Grand vision is still too hard, but we can solve simpler problems that are still valuable Filter news for stories about new tech gadgets Take the SSP talk email and add it to my calendar Dial my cell phone by speaking my friend’s name Automatically reply to customer service e-mails Find out which episode of The Simpsons is tonight
Two approaches to understanding language: Theory-driven: Theoretical Linguistics Task-driven: Natural Language Processing
Theoretical Linguistics vs. NLP
Theoretical Linguistics
Goal: Understand people’s
Knowledge of language Method:
Rich logical representations of language’s hidden structure and meaning
Guiding principles: Separation of (hidden)
knowledge of language and (observable) performance
Grammaticality is categorical (all or none)
Describe what are possible and impossible utterances
Natural Language Processing
Goal: Develop practical tools for
analyzing speech / text Method:
Simple, robust models of everyday language use that are sufficient to perform tasks
Guiding principles Exploit (empirical)
regularities and patterns in examples of language in text collections
Sentence “goodness” is gradient (better or worse)
Deal with the utterances you’re given, good or bad
Theoretical Linguistics vs. NLP
Linguistics NLP
Linguistic Puzzle
When dropping an argument, why do some verbs keep the subject and some keep the object? John sang the song John sang John broke the vase The vase broke
Not just “quirkiness of language” Similar patterns show up in other languages Seems to involve deep aspects of verb meaning
Rules to account for this phenomenon Two classes of verbs (unergative & unaccusative) Remaining argument must be realized as subject
Exception: Imperatives
“Open the pod bay doors, Hal”
Different goals lead to study of different problems. In NLP... Need to recognize this as a command Need to figure out what specific action to take Irrelevant how you’d say it in French
Describing language vs. working with language But both tasks clearly share many sub-problems
Theoretical Linguistics vs. NLP
Potential for much synergy between linguistics and NLP
However, historically they have remained quite distinct Chomsky (founder of generative grammar):
“It must be recognized that the notion ‘probability of a sentence’ is an entirely useless one, under any known interpretation of this term.”
Karttunen (founder of finite state technologies at Xerox)
Linguists’ reaction to NLP: “Not interested. You do not understand Theory. Go away you geek.”
Jelinek (former head of IBM speech project): “Every time I fire a linguist, the performance of our
speech recognition system goes up.”
Potential Synergies
Lexical acquisition (unknown words) Statistically infer new lexical entries from
context Modeling “naturalness” and “conventionality”
Use corpus data to weight constructions Dealing with ungrammatical utterances
Find “most similar / most likely” correction
Richer patterns for finding information in text Use argument structure / semantic
dependencies More powerful models for speech recognition
Progressively build parse tree while listening
Finding Information in Text
US Government has sponsored lots of research in “information extraction” from news articles Find mentions of terrorists and which locations
they’re targeting Find which companies are being acquired by
which others and for how much Progress driven by simplifying the models
used Early work used rich linguistic parsers
Unable to robustly handle natural text Modern work is mainly finite state patterns
Regular expressions are very practical and successful
Web Information Extraction
How much does that text book cost on Amazon? Learn patterns for finding relevant fields
Concept: Book
Title: Foundations of Statistical Natural Language Processing
Author(s):
Christopher D. Manning & Hinrich Schütze
Price: $58.45
Our Price: $##.##
Improving IE Performance on Natural Text Documents
How can we scale IE back up for natural text? Need to look elsewhere for regularities to
exploit
Idea: Consider grammatical structure Run shallow parser on each sentence Flatten output into sequence of “typed
chunks”Example of Tagged Sentence:
Uba2p is located largely in the nucleus.
NP_SEG VP_SEG PP_SEG NP_SEG
Power of Linguistic Features
21% increase 65% increase 45% increase
Linguistically Rich(er) IE
Exploit more grammatical structure for patterns e.g. Tim Grow’s work on IE with PCFGs
S{pur, acq, amt}
NP{pur}
VP{acq, amt}
NNPMD PP{amt
}VB
NNP
NNP
VP{acq, amt}
NP{amt} NNPCDCD
three
million
for
acquire
will
First
Sheland
Bank Inc
Union
Corp
NP{acq} NNPNN
PNNP
IN{pur}{pur} {pur
}
{acq}
{acq}
{acq}
{amt}
{amt}
{amt}dollars
Classifying Unknown Words
Which of the following is the name of a city?
CotrimoxazoleCotrimoxazole WethersfieldWethersfield
Alien Fury: Countdown to InvasionAlien Fury: Countdown to Invasion
Most linguistic grammars assume a fixed lexicon
How do humans learn to deal with new words? Context (“I spent a summer living in
Wethersfield”) Makeup of the word itself (“phonesthetics”)
Idea: Learn distinguishing letter sequences
What’s in a Name?
oxa : field
Generative Model of PNPsLength n-gram model and word model
P(pnp|c) = Pn-gram(word-lengths(pnp))
*word ipnp P(wi|word-length(wi))Word model: mixture of character n-gram model and common word model
P(wi|len) = len*Pn-gram(wi|len)k/len + (1-len)* Pword(wi|len)
N-Gram Models: deleted interpolation
P0-gram(symbol|history) = uniform-distribution
Pn-gram(s|h) = C(h)Pempirical(s|h) + (1-C(h))P(n-1)-gram(s|h)
Experimental Results
98.93%98.70%98.64%
98.41%98.16%
97.76%96.81%
95.77%95.47%
95.24%
94.34%
92.70%91.86%
90.90%89.94%
88.11%
93.25%
94.57%
82% 84% 86% 88% 90% 92% 94% 96% 98% 100%
drug-nyse
nyse-drug_movie_place_person
nyse-place
nyse-person
drug-person
nyse-movie
drug-nyse_movie_place_person
drug-movie
person-drug_nyse_movie_place
drug-place
nyse-place-person
place-person
drug-nyse-place-person
movie-person
place-drug_nyse_movie_person
movie-drug_nyse_place_person
movie-place
drug-nyse-movie-place-person
pairwise1-alln-way
Knowledge of Frequencies
Linguistics traditionally assumes Knowledge of Language doesn’t involve counting
Letter frequencies are clearly an important source of knowledge for unknown words
Similarly, we saw before that there are regular patterns to exploit in grammatical information
Take home point: Combining Statistical NLP methods with
richer linguistic representations is a big win!
Language is Ambiguous!
Ban on Nude Dancing on Governor’s Desk – from a Georgia newspaper column discussing current legislation
Lebanese chief limits access to private parts – talking about an Army General’s initiative
Death may ease tension – an article about the death of Colonel Jean-Claude Paul in Haiti
Iraqi Head Seeks Arms Juvenile Court to Try Shooting Defendant Teacher Strikes Idle Kids Stolen Painting Found By Tree
Language is Ambiguous!
Local HS Dropouts Cut in Half Obesity Study Looks for Larger Test Group British Left Waffles on Falkland Islands Red Tape Holds Up New Bridges Man Struck by Lightning Faces Battery
Charge Clinton Wins on Budget, but More Lies
Ahead Hospitals Are Sued by 7 Foot Doctors Kids Make Nutritious Snacks
Coping With Ambiguity
Categorical grammars like HPSG provide many possible analyses for sentences 455 parses for “List the sales of the products
produced in 1973 with the products produced in 1972.” (Martin et al, 1987)
In most cases, only one interpretation is intended
Initial solution was hand-coded preferences among rules Hard to manage as number of rules increase Need to capture interactions among rules
Statistical HPSG Parse Selection
HPSG provides deep analyses of sentence structure and meaning Useful for NLP tasks like question answering
Need to solve disambiguation problem to make using these richer representations practical
Idea: Learn statistical preferences among constructions from hand-disambiguated collection of sentences
Result: Correct analysis chosen >80% of the time
StatNLP methods + Linguistic representation = Win
Towards Semantic Extraction
HPSG provides representation of meaning Who did what to whom?
Computers need meaning to do inference Can we extend information extraction methods to
extract meaning representations from pages? Current project: IE for the semantic web
Large project to build rich ontologies to describe the content of web pages for intelligent agents
Use IE to extract new instances of concepts from web pages (as opposed to manual labeling)
student(Joseph), univ(Stanford), at(Joseph, Stanford)
Towards the Grand Vision?
Collaboration between Theoretical Linguistics and NLP is important step forward Practical tools with sophisticated language power
How can we ever teach computers enough about language and the world? Hawking: Moore’s Law is sufficient Moravec: mobile robots must learn like children Kurzweil: reverse-engineer the human brain
The experts agree:
Symbolic Systems is the future!
Upcoming Convergence Courses
Ling 139M Machine Translation Win
Ling 239E Grammar Engineering Win
CS 276B Text Information Retrieval Win
Ling 239A Parsing and Generation Spr
CS 224N Natural Language Processing Spr
Get Involved!!Get Involved!!