foundations of language science and technology …
TRANSCRIPT
FOUNDATIONS OF LST COURSE ✩ WS2005/06German Research Center for Artificial Intelligence GmbH
HANS USZKOREIT 2006
FOUNDATIONS OF LANGUAGESCIENCE AND TECHNOLOGY
Introduction
© 2006 Hans Uszkoreit
THE MIRACLE
© 2006 Hans Uszkoreit
Language is the Medium
Of course, language can also be trans-mitted as text.
© 2006 Hans Uszkoreit
WHAT HAPPENS IN BETWEEN?
sound waves activation of conceptsGrammarGrammar
© 2006 Hans Uszkoreit
WHAT HAPPENS IN BETWEEN?
GrammarGrammarsound waves activation of concepts
© 2006 Hans Uszkoreit
WHAT HAPPENS IN BETWEEN?
sound waves activation of conceptsGrammarGrammar
© 2006 Hans Uszkoreit
WHAT HAPPENS IN BETWEEN?
N
NP
A
NDetV
VP
NP
S
Sue gave Paul an old penny.
NP
sound waves activation of conceptsGrammarGrammar
© 2006 Hans Uszkoreit
WHAT HAPPENS IN BETWEEN?
phonology/morphology
semantic interpretation
N
NP
A
NDetV
VP
NP
S
Sue gave Paul an old penny.
NP
sound waves activation of conceptsGrammarGrammar
© 2006 Hans Uszkoreit
THREE TRADITIONS� �
N
NP
A
NDetV
VP
NP
S
Sue gave Paul an old penny.
NP
N
NP
A
NDetV
VP
NP
S
Sue gave Paul an old penny.
NP
Phrase-structureGrammar
Sue
give
Paul
old
penny
Act ObjGoal
Obj
Dependency-Grammar
NP N/NNP/N N((S\NP)/NP)/NP
Sue gave Paul old penny
NP
an
(S\NP)/NPN
NP
S\NP
S CategorialGrammar
© 2006 Hans Uszkoreit
Grammar
N
NP
A
NDetV
VP
NP
S
Sue gave Paul an old penny.
NP
Phrase-structureGrammar
S → NP VP
© 2006 Hans Uszkoreit
N
NP
A
NDetV
VP
NP
S
Sue gave Paul an old penny.
NP
Phrase-structureGrammar
S → NP VP
Grammar
© 2006 Hans Uszkoreit
N
NP
A
NDetV
VP
NP
S
Sue gave Paul an old penny.
NP
Phrase-structureGrammar
S → NP VPVP → V NP NP
Grammar
© 2006 Hans Uszkoreit
N
NP
A
NDetV
VP
NP
S
Sue gave Paul an old penny.
NP
Phrase-structureGrammar
S → NP VPVP → V NP NP
V → gave
Grammar
© 2006 Hans Uszkoreit
N
NP
A
NDetV
VP
NP
S
Sue gave Paul an old penny.
NP
TransformationGrammar
what did Sue give Paul ____ ?
NP
V
VP
NP
S
NP
AuxNP-Q
IPS
Grammar
© 2006 Hans Uszkoreit
N
NP
A
NDetV
VP
NP
S
Sue gave Paul an old penny.
NP
PHON/anoldpenny/
SYN
CATNP
HEADCASEobjectiveNUMBERsingPERSONthird
VALENCEvstruc
SEM
QUANTexistVARX1
RESTR
RELold'VARX1
ARGpenny'
Unification Grammar
Grammar
© 2006 Hans Uszkoreit
Size
How large is the grammar.
Let's start with the lexicon.
© 2006 Hans Uszkoreit
How Many Words?
Estimates for English
Shakespeare actively used 29.000 word forms mapping to about 25.000 headwords
common estimates of the vocabluary of a college graduate:20.000 words active -- 25.000 words passive
David Crystal's estimate60.000 words active -- 75.000 words passive
Total Size of English Vocabulary
1 million words without special scientific and technical terms2 million words including all scientific and technical terms
A million-word-corpus of American English exhibits about 38.000 head words.
© 2006 Hans Uszkoreit
Size of a Grammar
LinGO - English Resource Grammar
(60% coverage of newspaper texts)
8.000 types
100.000 lines of code
average feature structure > 300 nodes
© 2006 Hans Uszkoreit
The Tower of Babel
© 2006 Hans Uszkoreit
How Many Languages ?
According to Ethnologue 6,809 languages
230 in Europe, 2197 in Asia (832 in Papua-New Guinea)
Bible translations exist for 2.200 languages
250 families of languages (such as Indoeuropean Languages)
© 2006 Hans Uszkoreit
Transdisciplinary Interests
psychology
linguistics
computer science
psycho-linguistics
computational-linguistics
AI
© 2000 Hans Uszkoreit
CL
MotivationsMotivations
engineeringengineering cognitioncognition
linguistics linguistics
© 2000 Hans Uszkoreit
MotivationenMotivationen
models of grammarmodels of grammar
languagelanguagetechnologytechnologyapplicationsapplications
models of models of human languagehuman language
processingprocessing
engineeringengineering cognitioncognition
linguistics linguistics
© 2006 Hans Uszkoreit
Central Questions of Language Research
LINGUISTIC KNOWLEDGEWhat are the contents and structures of this knowledge
LANGUAGE PROCESSINGHow do we produce and comprehend linguistic utterances?
LANGUAGE ACQUISITIONHow does the child learn his mother tongue?
LANGUAGE CHANGEHow do languages (dialects, sociolects) emerge, change, evolve?
© 2006 Hans Uszkoreit
Areas of Linguistics
❏ According to levels of linguistic description
• Phonetics• Phonology• Morphology• Syntax• Semantics• Pragmatics/Discourse
❏ According to aspects of human language
• Psycholinguistics• Neurolinguistics• Historical Linguistics• Sociolinguistics• Ethnolinguistics• Dialectology• Applied Linguistics• Mathematical Linguistics• Computational Linguistics
© 2006 Hans Uszkoreit
Combinations
❏ According to levels of linguistic description
• Computational Phonetics• Computational Phonology• Computational Computational Morphology• Computational Syntax• Computational Semantics• Computational Pragmatics
❏ According to aspects of human language
• Computational Psycholinguistics• Computational Neurolinguistics• Computational Historical Linguistics• Computational Sociolinguistics• Computational Ethnolinguistics ???• Computational Dialectology• Computational Applied Linguistics / Applied Computational Linguistics• Computational Mathematical Linguistics (funny)
© 2006 Hans Uszkoreit
acoustic form written form
morpho-phonological processing
phonetic or graphemic representation
syntactic representation
semantic representation
representation of the full meaning
Levels ofDescription
© 2006 Hans Uszkoreit
acoustic form written form
morpho-phonological processing
phonetic or graphemic representation
syntactic representation
semantic representation
representation of the full meaning
phonetic processing orthographic processing
morpho-phonological processing
syntactic processing
semantic processing
pragmatic processing - knowledge processing
Levels ofProcessing
© 2006 Hans Uszkoreit
Text-to-SpeechSystem
acoustic form written form
morpho-phonological processing
phonetic or graphemic representation
syntactic representation
semantic representation
representation of the full meaning
phonetic processing orthographic processing
morpho-phonological processing
syntactic processing
semantic processing
pragmatic processing - knowledge processing
© 2006 Hans Uszkoreit
Why is Language Hard for Machines
Why do we need deep processing for simple text-to-speech conversion
(l) The student will read the paper. (/riːd/)(2) The students have read the paper. (/rɛd/)(3) Will the students read the paper? (/riːd/)(4) Have any citizens of good will read the paper? (/rɛd/)(5) Have the executors of the will read the paper? (/rɛd/)(6) Have the students who will arrive next week read the paper yet?
(/rɛd/)(7) Please have the students read the paper. (/riːd/)(8) Have the students read the paper? (/rɛd/)
© 2006 Hans Uszkoreit
acoustic form written form
morpho-phonological processing
phonetic or graphemic representation
syntactic representation
semantic representation
representation of the full meaning
SpeechTranslation
phonetic processing orthographic processing
morpho-phonological processing
syntactic processing
semantic processing
pragmatic processing - knowledge processing
© 2006 Hans Uszkoreit
acoustic form written form
morpho-phonological processing
phonetic or graphemic representation
syntactic representation
semantic representation
representation of the full meaning
SpeechTranslation
phonetic processing orthographic processing
morpho-phonological processing
syntactic processing
semantic processing
pragmatic processing - knowledge processing
© 2006 Hans Uszkoreit
Ambiguity I
phonetic (homophony):their theretoe tow
orthographic (homography):read readundoable undoable
lexical (homonymy):bank bankball ball
© 2006 Hans Uszkoreit
Ambiguity II
syntacticWith the naked eye she She couldn't watch couldn´t see much. all suspectsSo she watched the man So she watched the manwith a telescope. with a telescope.
semanticThe three selected special agents The three selected special agentsspeak two foreign languages speak two foreign languagesnearly without an accent. nearly without an accent.Namely French and Russian. But only two of them master
Russian.
pragmaticCould you translate this text? Could you translate this text?I need it tomorrow. I even wonder if anybody could do it.
© 2006 Hans Uszkoreit
Lexical Ambiguity
Certain readings are less preferred than others:
Where is a bank?Do you like plants?
The preference can be influenced by context.
The goal keeper opened the ball. vs. The Mayor opened the ball.
The astronomer married a star. vs. The movie director married a star.
© 2004 Hans Uszkoreit
„„Früher stellten die Frauen der Inseln am Wochenende Kopftücher mitFrüher stellten die Frauen der Inseln am Wochenende Kopftücher mit in the past produced the women of the islands on the weekends scarfs with in the past produced the women of the islands on the weekends scarfs with
Blumenmotiven her, die ihre Männer an den folgenden Montagen auf demBlumenmotiven her, die ihre Männer an den folgenden Montagen auf dem
flower patterns that their husbands on the following Mondays on theflower patterns that their husbands on the following Mondays on the
Markt im Zentrum der Hauptinsel verkauften.Markt im Zentrum der Hauptinsel verkauften.““
market in the center of the main island sold.market in the center of the main island sold.
In the past the women of the islands produced scarfs with flower patterns on the In the past the women of the islands produced scarfs with flower patterns on theweekends that were sold by their husbands on the following Mondays on the market inweekends that were sold by their husbands on the following Mondays on the market inthe center of the main island.the center of the main island.
The sentence exhibits a total of 13 lexical, syntactic and anaphoric ambiguitiesThe sentence exhibits a total of 13 lexical, syntactic and anaphoric ambiguities
2 x 2 x 2 x 3 x 3 x 2 x 4 x 2 x 4 x 2 x 2 x 7 x 2 = 2 x 2 x 2 x 3 x 3 x 2 x 4 x 2 x 4 x 2 x 2 x 7 x 2 = 258,048258,048
Ambiguity (a pathological example)Ambiguity (a pathological example)