1 ling 6932 spring 2007 ling 6932 topics in computational linguistics hana filip lecture 1:...
TRANSCRIPT
1 LING 6932 Spring 2007
LING 6932 Topics in Computational Linguistics
Hana FilipLecture 1: Introduction to Field, History, Quick Review of Regular Expressions, Start of Finite
Automata
2 LING 6932 Spring 2007
Today 1/11 Week 1Overview and history of the field
Knowledge of languageThe role of ambiguityModels and AlgorithmsEliza, Turing, and conversational agentsHistory of speech and language processing
AdministrationOverview of course topics1 week on each topichttp://plaza.ufl.edu/hfilip/ (later also WebCT)
Regular expressionsFinite State AutomataDeterministic Recognition of Finite State Automata
3 LING 6932 Spring 2007
Computational Linguistics and Natural Language Processing
What is it?Getting computers to perform useful tasks involving human languages for:
– Enabling human-machine communication– Improving human-human communication– Doing stuff with language objects
Examples:– Question Answering http://www.humana-military.com/– Machine Translation http://www.google.com/language_tools– Spoken Conversational Agents
http://www.cs.rochester.edu/research/trains/The Trains Project: James Allen (University of Rochester)
4 LING 6932 Spring 2007
Kinds of knowledge needed?
Consider the following interaction with HAL 9000 the computer from 2001: A Space Odyssey2001: A Space Odyssey (1968) is a science fiction novel by Arthur C. Clarke which was developed concurrently with Stanley Kubrick's film versionHAL = Heuristically programmed ALgorithmic computer
Dave: Open the pod bay doors, Hal.HAL: I’m sorry Dave, I’m afraid I can’t do that.
5 LING 6932 Spring 2007
Knowledge needed to build HAL?
Speech recognition and synthesisDictionaries (how words are pronounced)Phonetics (how to recognize/produce each sound of English)
Natural language understandingKnowledge of the English words involved
– What they mean - lexical semantics – What the smallest meaning bearing units of words are - morphology– How they combine (what is a `pod bay door’?)
Knowledge of syntactic structure– I’m I do, Sorry that afraid Dave I’m can’t
Knowledge of compositional semantics – How the meaning of a complex expression is determined by its
structure and the meanings of its constituents
6 LING 6932 Spring 2007
What’s needed?
Dialog and pragmatic knowledge“open the door” is a REQUEST (as opposed to a STATEMENT or information-question)It is polite to respond, even if you’re planning to kill someone.It is polite to pretend to want to be cooperative (I’m afraid, I can’t…)What is `that’ in `I can’t do that’?
Even a system to book airline flights needs much of this kind of knowledge
7 LING 6932 Spring 2007
Modern conversational agents
Switchboard ReceptionistsImplications: need to know how a native or non-native speaker of English is likely to pronounce a random American name.
Airline and other simple dialogue systemsLots of fabulously fun pragmatic issues
– Grounding– Error handling
In generalNow that there are commercial installationsBig incentive to use linguistic or any other kind of knowledge if it actually improves performance
8 LING 6932 Spring 2007
Eliza: Weizenbaum (1966)
http://i5.nyu.edu/~mm64/x52.9265/january1966.html
User: You are like my father in some ways.ELIZA: What resemblance do you seeUser: You are not very aggressive but I think you don’t
want me to notice that.ELIZA: What makes you think I am not aggressiveUser: You don’t argue with meELIZA: Why do you think I don’t argue with youUser: You are afraid of meELIZA: Does it please you to believe I am afraid of you
9 LING 6932 Spring 2007
Question Answering
What does “door” mean?What year was Abraham Lincoln born?How many states were in the United States when Lincoln was born?Was there a military draft during the Hoover administration?What do US scientists think about whether human cloning should be legal?
10 LING 6932 Spring 2007
Modern QA systems
Still in infancySimple factoid questions beginning to work OKAnnual government-sponsored “bakeoff” called TRECTREC = Text REtrieval Conference… to encourage research in information retrieval from large text collectionshttp://trec.nist.gov/
11 LING 6932 Spring 2007
Machine Translation
Chinese gloss: Dai-yu alone on bed top think-of-with-gratitude Bao-chai again listen to window outside bamboo tip plantain leaf of on-top rain sound sigh drop clear cold penetrate curtain not feeling again fall down tears comeHawkes translation: As she lay there alone, Dai-yu’s thoughts turned to Bao-chai… Then she listened to the insistent rustle of the rain on the bamboos and plantains outside her window. The coldness penetrated the curtains of her bed. Almost without noticing it she had begun to cry.
12 LING 6932 Spring 2007
Machine Translation
The Story of the Stone or the Dream of the Red Chamber (Cao Xueqin 1792)
classic novel from the Qing dynasty, considered the greatest work of Chinese fiction
Issues: (“Language Differences”)Sentence segmentationZero anaphoric pronounsCoding of tense/aspect
Penetrate -> penetrated
Stylistic differences across languages– Bamboo tip plantain leaf -> bamboos and plantains
Cultural knowledge– Curtain -> curtains of her bed
13 LING 6932 Spring 2007
Ambiguity
Computational linguists are obsessed with ambiguityAmbiguity is a fundamental problem of computational linguisticsResolving ambiguity is a crucial goal
14 LING 6932 Spring 2007
Ambiguity
Find at least 5 meanings of this sentence:I made her duck
15 LING 6932 Spring 2007
Ambiguity
Find at least 5 meanings of this sentence:I made her duck
I cooked waterfowl for her benefit (to eat)I cooked waterfowl belonging to herI created the (plaster?) duck she ownsI caused her to quickly lower her head or bodyI waved my magic wand and turned her into undifferentiated waterfowlAt least one other meaning that’s inappropriate for gentle company.
16 LING 6932 Spring 2007
Ambiguity is Pervasive
I caused her to quickly lower her head or bodyLexical category: “duck” can be a N or V
I cooked waterfowl belonging to her.Lexical category: “her” can be a possessive (“of her”) or dative (“for her”) pronoun
I made the (plaster) duck statue she ownsLexical Semantics: “make” can mean “create” or “cook”
Lexical disambiguation part-of-speech tagging word sense disambiguation
Syntactic disambiguation: “her duck” two syntactic phrases: NP VP
one syntactic phrase: [Det N]NP
17 LING 6932 Spring 2007
Ambiguity is Pervasive
Grammar: “Make” can be:Transitive: (verb has a noun direct object)
– I cooked [waterfowl belonging to her]Ditransitive: (verb has 2 noun objects)
– I made [her] (into) [undifferentiated waterfowl]Action-transitive (verb has a direct object and another verb) I caused [her] [to move her body]
18 LING 6932 Spring 2007
Ambiguity is Pervasive
Phonetics!I mate or duckI’m eight or duckEye maid; her duckAye mate, her duckI maid her duckI’m aid her duckI mate her duckI’m ate her duckI’m ate or duckI mate or duck
19 LING 6932 Spring 2007
Models and Algorithms
Models: formalisms used to capture the various kinds of linguistic structure.
State machines (fsa, transducers, markov models)Formal rule systems (context-free grammars, feature systems)Logic (predicate calculus, inference)Probabilistic versions of all of these + others (gaussian mixture models, probabilistic relational models, etc etc)
Algorithms used to manipulate representations to create structure.
20 LING 6932 Spring 2007
Hidden Markov Chain
A hidden Markov model (HMM) is a statistical model in which the system being modeled is assumed to be a Markov process with unknown parameters, and the challenge is to determine the hidden parameters from the observable parameters.
x: hidden states (weather: sunny, cloudy, rainy)y: observable outputs (your friend: walking in the park, shopping, cleaning the
apartment)a: transition probabilitiesb: output probabilities
21 LING 6932 Spring 2007
Context-free GrammarChomsky 1957 “Syntactic Structures”
In linguistics and computer science, a context-free grammar (CFG) is a formal grammar in which every production rule is of the form
V—> w
where V is a nonterminal symbol and w is a string consisting of terminals and/or non-terminals.
The term "context-free" expresses the fact that the non-terminal V can always be replaced by w, regardless of the context in which it occurs. A formal language is context-free if there is a context-free grammar that generates it.
22 LING 6932 Spring 2007
Language, Thought, Understanding
A Gedanken Experiment: Turing Test 1950 paper "Computing Machinery and Intelligence"
Alan Turing (1912-1954) founder of computer science, mathematician, philosopher, cryptographer
Question “can a machine think” is not operational.Operational version:
2 people and a computerInterrogator talks to contestant and computer via teletypeTask of machine is to convince interrogator it is humanTask of contestant is to convince interrogator she and not machine is human.
23 LING 6932 Spring 2007
Eliza: Weizenbaum (1966)
User: You are like my father in some ways.ELIZA: What resemblance do you seeUser: You are not very aggressive but I think you don’t want me to notice that.ELIZA: What makes you think I am not aggressiveUser: You don’t argue with meELIZA: Why do you think I don’t argue with youUser: You are afraid of meELIZA: Does it please you to believe I am afraid of you
24 LING 6932 Spring 2007
History: foundational insights 1940s-1950s
Automaton:Turing 1936McCulloch-Pitts neuron (1943)
– http://diwww.epfl.ch/mantra/tutorial/english/mcpits/html/
Kleene (1951/1956)Shannon (1948) link between automata and Markov modelsChomsky (1956)/Backus (1959)/Naur(1960): CFG
Probabilistic/Information-theoretic modelsShannon (1948)Bell Labs speech recognition (1952)
25 LING 6932 Spring 2007
History: the two camps: 1957-1970
SymbolicZellig Harris 1958 Transformation and Discourse Analysis Project - first parser?
– Cascade of finite-state transducers
ChomskyAI workshop at Dartmouth (McCarthy, Minsky, Shannon, Rochester)Newell and Simon: Logic Theorist, General Problem Solver
StatisticalBledsoe and Browning (1959): Bayesian OCRMosteller and Wallace (1964): Bayesian authorship attributionDenes (1959): ASR combining grammar and acoustic probability
26 LING 6932 Spring 2007
Four paradigms: 1970-1983
StochasticHidden Markov Model 1972
– Independent application of Baker (CMU) and Jelinek/Bahl/Mercer lab (IBM) following work of Baum and colleagues at IDA
Logic-basedColmerauer (1970,1975) Q-systemsDefinite Clause Grammars (Pereira and Warren 1980)Kay (1979) functional grammar, Bresnan and Kaplan (1982) unification
Natural language understandingWinograd (1972) ShrdluSchank and Abelson (1977) scripts, story understandingInfluence of case-role work of Fillmore (1968) via Simmons (1973), Schank.
Discourse ModelingGrosz and colleagues: discourse structure and focusPerrault and Allen (1980) BDI model
27 LING 6932 Spring 2007
Empiricism and Finite State Redux: 1983-1993
Finite State ModelsKaplan and Kay (1981): Phonology/MorphologyChurch (1980): Syntax
Return of Probabilistic Models:Corpora created for language tasksEarly statistical versions of NLP applications (parsing, tagging, machine translation)Increased focus on methodological rigor:
– Can’t test your hypothesis on the data you used to build it!– Training sets and test sets
28 LING 6932 Spring 2007
The field comes together: 1994-2006
Statistical models standardACL conference:
– 1990: 39 articles 1 statistical– 2003 62 articles 48 statistical
Machine learning techniques keyInformation retrieval meets NLPUnified field:
NLP, Machine Translation (MT), Automatic Speech Recognition (ASR), Text-to-Speech (TTS), Dialog, Information Retrieval (IR)
29 LING 6932 Spring 2007
Language, Thought, Understanding
A Gedanken Experiment: Turing Test 1950 paper "Computing Machinery and Intelligence"
Alan Turing (1912-1954) founder of computer science, mathematician, philosopher, cryptographer
Question “can a machine think” is not operational.Operational version:
2 people and a computerInterrogator talks to contestant and computer via teletypeTask of machine is to convince interrogator it is humanTask of contestant is to convince interrogator she and not machine is human.
30 LING 6932 Spring 2007
Eliza: Weizenbaum (1966)
User: You are like my father in some ways.ELIZA: What resemblance do you seeUser: You are not very aggressive but I think you don’t want me to notice that.ELIZA: What makes you think I am not aggressiveUser: You don’t argue with meELIZA: Why do you think I don’t argue with youUser: You are afraid of meELIZA: Does it please you to believe I am afraid of you
31 LING 6932 Spring 2007
History: foundational insights 1940s-1950s
Automaton:Turing 1936McCulloch-Pitts neuron (1943)
– http://diwww.epfl.ch/mantra/tutorial/english/mcpits/html/
Kleene (1951/1956)Shannon (1948) link between automata and Markov modelsChomsky (1956)/Backus (1959)/Naur(1960): CFG
Probabilistic/Information-theoretic modelsShannon (1948)Bell Labs speech recognition (1952)
32 LING 6932 Spring 2007
History: the two camps: 1957-1970
SymbolicZellig Harris 1958 Transformation and Discourse Analysis Project - first parser?
– Cascade of finite-state transducers
ChomskyAI workshop at Dartmouth (McCarthy, Minsky, Shannon, Rochester)Newell and Simon: Logic Theorist, General Problem Solver
StatisticalBledsoe and Browning (1959): Bayesian OCRMosteller and Wallace (1964): Bayesian authorship attributionDenes (1959): ASR combining grammar and acoustic probability
33 LING 6932 Spring 2007
Four paradigms: 1970-1983
StochasticHidden Markov Model 1972
– Independent application of Baker (CMU) and Jelinek/Bahl/Mercer lab (IBM) following work of Baum and colleagues at IDA
Logic-basedColmerauer (1970,1975) Q-systemsDefinite Clause Grammars (Pereira and Warren 1980)Kay (1979) functional grammar, Bresnan and Kaplan (1982) unification
Natural language understandingWinograd (1972) ShrdluSchank and Abelson (1977) scripts, story understandingInfluence of case-role work of Fillmore (1968) via Simmons (1973), Schank.
Discourse ModelingGrosz and colleagues: discourse structure and focusPerrault and Allen (1980) BDI model
34 LING 6932 Spring 2007
Empiricism and Finite State Redux: 1983-1993
Finite State ModelsKaplan and Kay (1981): Phonology/MorphologyChurch (1980): Syntax
Return of Probabilistic Models:Corpora created for language tasksEarly statistical versions of NLP applications (parsing, tagging, machine translation)Increased focus on methodological rigor:
– Can’t test your hypothesis on the data you used to build it!– Training sets and test sets
35 LING 6932 Spring 2007
The field comes together: 1994-2006
Statistical models standardACL conference:
– 1990: 39 articles 1 statistical– 2003 62 articles 48 statistical
Machine learning techniques keyInformation retrieval meets NLPUnified field:
NLP, Machine Translation (MT), Automatic Speech Recognition (ASR), Text-to-Speech (TTS), Dialog, Information Retrieval (IR)
36 LING 6932 Spring 2007
Some brief demos
Machine Translationhttp://translate.google.com/translate_tText-To-Speech:http://www-306.ibm.com/software/pervasive/tech/demos/tts.shtmlQuestion Answering (LCC):http://www.languagecomputer.com/demos/question_answering/internet_demo/more_examples.html