university of washingtonssli.ee.washington.edu/ws07/notes/ling-intro-slides.pdf · survey of areas...
TRANSCRIPT
Big questionsSurvey of areas of linguistics
SummaryThe lab
Linguistics in a nutshellby hook or by crook
Jeremy G. Kahn
Signal, Speech & Language Interpretation Laboratory Department of LinguisticsUniversity of Washington
22 June 2008 / Workshop 2007
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
Outline
1 Linguistics: big questionsLinguistics charterWhat linguists look atLinguistics’ role
2 Survey of areas of linguisticsPhonetics & PhonologyMorphology and syntaxSemantics
3 The lab
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
Business information
Linguistics introductions
By necessity, incompleteApologies
my personal speaking styleguessing about level of preparation
Caveat: I’m a computational linguist
Caveat: I have an engineering bias
Goal: informality. Questions are good
Thanks to Don Baumer (Linguistics) for letting me crib slides &examples
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
CharterWhat linguists look atLinguistics’ role
Outline
1 Linguistics: big questionsLinguistics charterWhat linguists look atLinguistics’ role
2 Survey of areas of linguisticsPhonetics & PhonologyMorphology and syntaxSemantics
3 The lab
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
CharterWhat linguists look atLinguistics’ role
What is linguistics?
Scientific study of human language
How is language organized?
How is it used?
General questions about Language (capital L)
What do all languages have in common?
How can we describe how Language (or languages)works?
How can we describe how a language works?
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
CharterWhat linguists look atLinguistics’ role
Language & communication
All communications have:
mode or medium : speech, gesture, olfaction, etc
semanticity : meaning carried
pragmatic function : intention carried
some also have:
interchangeability (send *and* receive)
cultural transmission : learned from other users
arbitrariness : non-iconicity
discreteness "compositionality"
displacement : discuss things that aren’t here
productivity : new ways to organize it
Where do computer languages differ from human languages?Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
CharterWhat linguists look atLinguistics’ role
What makes language interesting?
Language is creative, but constrained“Seattle is rainy.” – well-formed* “rainy Seattle is.” – ill-formed“I like caffeinated drinks without bubbles.”* “Bubbles without drinks caffeinated like I”Not just word order:
“pronk” could be an English word (in fact, it is)
“przak” could not be (how do you know?)
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
CharterWhat linguists look atLinguistics’ role
Constraint and creativity
Linguists like to say language is “rule-governed”.Statistically-minded engineers might quibble...Engineering way of looking at it (thanks Shannon):
sender wants to have symbol for every idea
recipient won’t have those symbols
compositionality and productivity allows novelty andcommunication
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
CharterWhat linguists look atLinguistics’ role
Outline
1 Linguistics: big questionsLinguistics charterWhat linguists look atLinguistics’ role
2 Survey of areas of linguisticsPhonetics & PhonologyMorphology and syntaxSemantics
3 The lab
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
CharterWhat linguists look atLinguistics’ role
Language as a part of the human OS
Language:
not literacy.
major advantage over chimpanzees (e.g. displacement)
we’ve got specialist wetware
Competent language use
No school required
No explicit instruction required
Most humans competent in one language before age 3
What do we mean when we say “competent”?
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
CharterWhat linguists look atLinguistics’ role
Competence and Performance
Big idea in modern linguistics
Competence : what a native user of a language knows.ability to produce & comprehend languagesystem or knowledge (“grammar”) thatsupports thatlargely subconsciouslearned (first-language) without effort
Performance : what language users dooften fully competentnot always: speech errors, typos, “brain-o’s”
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
CharterWhat linguists look atLinguistics’ role
What’s so neat about competence?
Many modern linguists care about competence more thanperformance.Their view (Chomsky):
your competence is a window on the underlying structureof your grammar
your performance includes a bunch of messy wetware
These (self-proclaimed “theoretical”) linguists are very veryinterested in trying to figure out what the OS is from thebehavior of the code.
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
CharterWhat linguists look atLinguistics’ role
Grammaticality and meaningfulness
“Meaningful” and “grammatical” not synonymous:
Grammatical, but meaningless : ‘Colorless green ideas sleepfuriously.’ — Noam Chomsky
Ungrammatical, but meaningful : ‘Around the survivors, aperimeter create.’ — Yoda, Episode 2
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
CharterWhat linguists look atLinguistics’ role
Outline
1 Linguistics: big questionsLinguistics charterWhat linguists look atLinguistics’ role
2 Survey of areas of linguisticsPhonetics & PhonologyMorphology and syntaxSemantics
3 The lab
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
CharterWhat linguists look atLinguistics’ role
What’s all this about grammar, then?
Descriptive grammar : an attempt to describe the acceptabilityjudgments (or patterns of use/competence) of aspeaker.
Prescriptive grammar : explicit instructions on how one shouldwrite (or speak); the language police.
Linguistics is not about descriptive grammar.
We don’t tell you how you should.
We try to describe how you do.
Dogma: All human languages, stigmatized or not, are equallyexpressive.
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
CharterWhat linguists look atLinguistics’ role
Linguistics and semi-supervised learning
Humans do it We get very little explicit labeling of our languagedata, yet we learn without instruction:
what words and parts of words meanhow to pronounce words we readhow to understand sophisticated sentenceconstructions (“respectively”)and more. . .
It’s not all hard-coded (“universal grammar”):patterns often language-specific
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
CharterWhat linguists look atLinguistics’ role
Linguistics and semi-supervised learning
The corpora are out there :the webemail (Enron emails!)newsgroups
also speech corpora:radiotelevisionpodcasts
All mostly unlabeled but enormous
Natural language problems: perfect for semi-supervised work.
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
Phonetics & PhonologyMorphology and syntaxSemantics
Overview of the different parts of language
Overview of the different parts of language (different parts of"grammar")
Phonetics - how sounds are made and perceived
Phonology - function and patterning of sounds
Morphology - structure of words
Syntax - analysis of sentence structure (word order)
Semantics - meaning (words to meaning)
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
Phonetics & PhonologyMorphology and syntaxSemantics
Other areas of linguistic study
Other areas of linguistic study:
Historical linguistics - language evolution and creation
Pragmatics - what else is intended and performed
Typology - language classification and differences
Psycholinguistics - neurobiological basis for language
Language acquisition
Sociolinguistics - language’s influence on and indication ofsocial status and behavior
Writing systems - . . . a mess
and more. . .We’ll not cover those here
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
Phonetics & PhonologyMorphology and syntaxSemantics
Outline
1 Linguistics: big questionsLinguistics charterWhat linguists look atLinguistics’ role
2 Survey of areas of linguisticsPhonetics & PhonologyMorphology and syntaxSemantics
3 The lab
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
Phonetics & PhonologyMorphology and syntaxSemantics
Phonetics
Phonetics: the study of linguistic speech soundsarticulatoryauditory (perceptual)acoustic
Problems phonetics works with:no "spaces" between words: but we perceive themsounds are in a continuous (acoustic) space, but we chunkthem into the (discrete) space of the language’s segments
Tools phoneticians use:spectrogram readershuman listeningtranscription system (usually the International PhoneticAlphabet, IPA)
Why use IPA?Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
Phonetics & PhonologyMorphology and syntaxSemantics
Spelling is not pronunciation
Probably obvious to non-native English speakers
Some languages have cleaner spelling-sound relationships(Spanish, Korean), but:
“corazon” and “quesadilla” have the same initial soundEven a “clean” alphabetic language (e.g. Spanish) doesn’thave a 1:1 relationship between characters and phoneticsegments:
English is alphabetic, but with even noisier mappings
“this” vs. “thought”English voicing of interdental (tongue-between-teeth)fricative: not represented in orthography ever.
This is why we use IPA.
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
Phonetics & PhonologyMorphology and syntaxSemantics
More on phonetics
Lots more available on phonetics:
articulatory names (parts of the speech system)
classification system
learning the IPA
“supra-segmentals”: articulations across multiplesegments (e.g., pitch shapes)
. . . and still not even touching the perceptual or acousticdomain
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
Phonetics & PhonologyMorphology and syntaxSemantics
Phonology
Phonology:
Study of inventory of sounds in a language
How sounds pattern together or contrast
Minimal pair (research tool):
‘had’ vs. ‘hat’ : /t/ and /d/ are contrastive in English
‘steel’ vs. ‘stale’ vs. ‘stool‘ : /i/, /e/, /u/ are contrastive
Contrastive sounds are phonemes: minimal units of sound
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
Phonetics & PhonologyMorphology and syntaxSemantics
Phonology (2)
Complementary distribution: two sounds appear in consistentlydifferent environments (never the same).
[ph ] ‘pit’
[p ] ‘spit’
[ph], [p] not phonemically different: allophones of /p/Glossing over much more in phonology. . .
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
Phonetics & PhonologyMorphology and syntaxSemantics
An aside for the deaf
Sign languages (e.g. American Sign Language) havephonology as well.
Handshapes and gestures are essentially phonemic
Different sign languages have different choices about howto cluster handshapes: different phonemes
I am not an expert, but I know it’s an open research area.
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
Phonetics & PhonologyMorphology and syntaxSemantics
Outline
1 Linguistics: big questionsLinguistics charterWhat linguists look atLinguistics’ role
2 Survey of areas of linguisticsPhonetics & PhonologyMorphology and syntaxSemantics
3 The lab
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
Phonetics & PhonologyMorphology and syntaxSemantics
Morphology
Morphology is:
the study of wordsthe rules (patterns) of word formation
Word : a minimal free form. Can appearin isolationin multiple positions
“The hunter pursued the bears.”is “-er” a word? No. (constrained after “hunt”)is “the hunter” a word? No. (not minimal)wait: what is “-er” then?
Morpheme : the smallest part of a word carrying meaning
Some morphemes can’t stand alone (affixes):(prefix, suffix, infix, circumfix)
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
Phonetics & PhonologyMorphology and syntaxSemantics
Syntax
Lexicon : a dictionary (form and category)
Lexical category : (also “content word”).“Open class”, e.g.
Noun (rabbit, bicycle)Verb (die, love, walk)Adjective (red, tall, frivolous)Adverb (often, very)
Grammatical category (also “function word”).“Closed class”, e.g.
Preposition (with, on, of, for)Conjunction (and, or, because)Determiner (our, the, this, many)Auxiliary (will, can, may)
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
Phonetics & PhonologyMorphology and syntaxSemantics
Syntax
Some words are ambiguous (especially open-class). Consider“comb”.How to tell what category it is? some examples:
meaning : acting as a person/place thing? probably NOUN
inflection : if you can add ‘-ed’ or ‘-ing’ to it? probably VERB
distribution : if it appears after a degree word (e.g. “very”):probably ADJ
(Computational linguistics: “part-of-speech tagging”)Morphology ties to syntax.
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
Phonetics & PhonologyMorphology and syntaxSemantics
Back to morphology
Nope, not done:Not just words in the lexicon: also morphemes:
closed-class (function) morphemes :
prepositions & articles (function words)inflectional morphemes: don’t change class
open-class morphemes :usually stand-alone (nouns, verbs, etc)also ‘-ly’, ‘-er’, ‘anti-’ derivational morpheme(may change class of stem)
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
Phonetics & PhonologyMorphology and syntaxSemantics
Word formation in English
Inflectional morphemes (no class change)-s third person singular present
-ed past tense-ing progressive-en past participle
-s plural-’s possessive-er comparative
-est superlative
Derivational affixes (class change)input result
happy [adj] + -ness happiness [n]beauty [n] + -full beautiful [adj]
beautiful [adj] + -ly beautifully [adv]stable [adj] + -ize stabilize [v]
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
Phonetics & PhonologyMorphology and syntaxSemantics
Subtleties in morphology
Perverse cases, even in English:
recursive-ish morphology:input result
beauty [n] + -ful + -ness beautifulness [n]
English has roughly one (rather rude, emphatic) infix:input result
-****ing- + Massachusetts ("Massa-****ing-chusetts")
Comp ling task : stemming, morphological analysis (v.important in other languages, e.g. Czech)
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
Phonetics & PhonologyMorphology and syntaxSemantics
Back to syntax
Review: some words are ambiguous (“comb”): what to do?
meaning
inflection
distribution
Distribution could be a lot:
Constituent : grammatical unit; part of larger unitsentence = noun phrase (NP) + verb phrase(VP)noun phrase (NP) = determiner + nournnoun is a (minimal) constituent
Note recursion is possible.
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
Phonetics & PhonologyMorphology and syntaxSemantics
Phrases and ambiguity
How does phrase structure help with ambiguity?S
NP
Det
the
N
men
VP
V
comb
NP
Det
their
N
hair
S
NP
Det
the
N
men
VP
V
share
NP
Det
a
N
combNote that structure resolves lexical ambiguity: whether “comb”is noun or verb
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
Phonetics & PhonologyMorphology and syntaxSemantics
Syntax and structural ambiguity
Another kind of ambiguity:
The woman shot the man with the gun.
Who has the gun? (she shot him with it):S
NP
The woman
VP
V
shot
NP
Det
the
N
man
PP
with the gun
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
Phonetics & PhonologyMorphology and syntaxSemantics
Syntax and structural ambiguity
Another kind of ambiguity:
The woman shot the man with the gun.
Who has the gun? (he had it):S
NP
The woman
VP
V
shot
NP
Det
the
N
man
PP
with the gunKahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
Phonetics & PhonologyMorphology and syntaxSemantics
Syntax and structural ambiguity
No ambiguity about the meaning of any word
two different kinds of attachment for “with the gun”
PP attachment? messy. POS? fairly easy.
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
Phonetics & PhonologyMorphology and syntaxSemantics
Outline
1 Linguistics: big questionsLinguistics charterWhat linguists look atLinguistics’ role
2 Survey of areas of linguisticsPhonetics & PhonologyMorphology and syntaxSemantics
3 The lab
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
Phonetics & PhonologyMorphology and syntaxSemantics
Semantics
Two major areas within the study of language meaning:
Lexical semantics : meaning of individual morphemes
Compositional semantics : (or “phrasal semantics”): howmeaning gets built up from pieces
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
Phonetics & PhonologyMorphology and syntaxSemantics
Lexical semantics
synonymy : “means (almost) the same thing”: (angry,sad),(vomit,puke)
homonymy : “same form, unrelated meanings”:(pass[abstain],pass[succeed])
antonymy : “opposite meaning”
hyponymy (hypernymy) : A is a hyponym of B (A is a specialcase of B; B is a hypernym of A; B is ageneralization of A)
poodle ; dog ; animalsprint ; run ; move
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
Phonetics & PhonologyMorphology and syntaxSemantics
Compositional semantics
sense (intension) : the meaning of a word/phrase as a function(e.g., “rabbit” is a function from items to booleanvalue)
reference (extension) : which thing(s) in the world the function(word,phrase) picks out (the set of rabbits)
Example:
“Jeremy”
“today’s linguistics tutor”
Same reference (extension), different sense
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
Phonetics & PhonologyMorphology and syntaxSemantics
Compositional semantics
Dealing with sentences. Sentences are boolean function onuniverse.
“I like cheese”
“I live in Seattle” Same reference (TRUE), different sense(different function).
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
Summarizing
Lots of areas of linguistic research.
Most of these are becoming approachable computationally
None are very easy
But:
these represent what linguists think is going on in naturallanguage
not necessarily what is needed: these classes may notrelate to task at hand in computation
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
Emotion detection task, revisited
What can we add to the emotion detection task?
Class together words (let’s use POS)
sequence of classes might be interesting
Kahn Linguistics brushup
Big questionsSurvey of areas of linguistics
SummaryThe lab
The lab
1 Read the datafiles; extract text, write out datafile.tok
2 Invoke the Ratnaparkhi tagger on the tokenized text:datafile.maxHpos
3 read the .maxHpos file and pull out just the tags (clean upthe punctuation so it doesn’t break BoosTexter). Createdatafile.pos , which must end with space-comma
4 paste together datafile.pos with datafile.orig5 rerun the emotion detection, but this time with the extra
sequence information
Kahn Linguistics brushup