introduction to computational linguisticscourses.washington.edu/ling472/slides/0327-overview.pdf ·...
TRANSCRIPT
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
Introduction to Computational LinguisticsModeling Language for CompLing & NLP
Course overview
Olga Zamaraeva (2018)Based on Dickinson, Brew, & Meurers (2013) and Bender (prev. years)
University of WashingtonMarch 27, 2018
1 / 33
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
But first: Emergency procedures
https://www.washington.edu/uwem/plans-and-procedures/uw-emergency-procedures/
2 / 33
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
Computational Linguistics & NLP
CompLing == NLP? CompLing ⊃ NLP? CompLing ∩ NLP = ∅?
3 / 33
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
Computational Linguistics & NLP
CompLing == NLP? CompLing ⊃ NLP? CompLing ∩ NLP = ∅?
I Goal: A survey of the space that CompLing&NLP live in
I Linguistic perspective
I Use working definitions but not try to decide which isbetter etc.
4 / 33
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
Definitions
I Computational Linguistics & Natural LanguageProcessing
I Analyzing language material to solve a variety of tasks
I Natural Language Processing (NLP)I Focus: algorithms* (applicability, effectiveness)I *(Learning “world” through language?)
I speech recognitionI machine translationI parsingI dialog systemsI ...and more
I Computational LinguisticsI Focus: study of language with computational means
I documenting understudied languagesI rigorous testing of syntactic theoriesI building typological databasesI semantic ontologiesI modeling human linguistic behaviorI ...and more 5 / 33
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
Prerequisites and alternatives
I PrerequisitesI Basic linguistic theory (LING200)I Basic programming skills are a big plus (e.g. CSE142)I Basic probabilityI Maturity when it comes to dealing with open source
software
I Alternatives: pure NLP perspectiveI CSE447 (NLP)I LING570/572 (and 571)I Require strong probability, programming, and basic
machine learning background
6 / 33
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
Course Logistics
I Canvas: https://canvas.uw.edu/ (CSE472 Sp18)I Public website: http://courses.washington.edu/ling472/I Lecture T/Th, Section FI Readings: Jurafsky&Martin, Language&ComputersI Homeworks (almost weekly)I Project (first milestone: 4/8)I Write ups weigh 50% of the grade (both project AND
homework)I Grading Scale: 95% = 4.0, 62% = 0.7 (generic LING
dept. scale)I For some assignments, showing effort will get you a
long wayI Write ups are graded for content AND quality of writing
7 / 33
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
Course Logistics: Project
I Project: Error Analysis for a NLP packageI Pick a package+dataset (e.g. here:
https://aclanthology.coli.uni-saarland.de/ )I Look for a list of packages, e.g. https:
//aclanthology.coli.uni-saarland.de/events/acl-2017 orhere: or here: http://aclweb.org/anthology/P/P17/
I Look for a publication that comes with software anddataset:
I Find a package that you can actually runI Reproduce the experiment described in the publicationI Perform error analysisI Write it up
8 / 33
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
Modeling language
I If we want to do anything with language, we need a wayto represent language.
I What does it mean to model language?I Do linguists need to model language?I Do computer scientists need to model language?
I How close is a model to the real thing?I Does it matter to linguists?I Does it matter to computer scientists? To engineers?
9 / 33
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
Modeling language: statistical vs. symbolicmethods
I Statistical methods involve training a model on a bodyof data so it can predict the most probablelabel/structure/etc for new data
I Symbolic methods involve knowledge engineering, orhand-coding of linguistic knowledge which is thenapplied to tasks
I Statistical methods provide robustness, symbolicmethods precision
I Statistical and symbolic methods can be and often arecombined in both NLP and CompLing
10 / 33
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
Modeling Language: where to start?
I We can interact with the computer in several ways:I write or read textI speak or listen to speech
I Computer has to have some way to representI textI speech
I Next, we can talk about modeling:I sounds of language (phonetics & phonology)I words (morphology)I sentence structure (syntax)I meaning (semantics)
11 / 33
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
Relating texts to languages
I Texts represent a fragment of a language, in terms ofgrammar and lexicon
I Texts often represent standard varietiesI What would it mean to limit data to standard varieties?
picture of Dirk Hovy giving his talk: https://www.youtube.com/watch?v=gcfKvy4NrOE
12 / 33
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
Unwritten languages
Many languages have never been written down. Of the≈7000 spoken languages, ≈3000 have do not have writingsystems.
Some examples:
I Salar, a Turkic language in China.I Gugu Badhun, a language in Australia.I Southeastern Pomo, a language in California
(See: http://www.ethnologue.com/ and http://www.sil.org/mexico/ilv/iinfoilvmexico.htm)
13 / 33
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
Sounds as language
We want to be able to encode any spoken language
I The International Phonetic Alphabet (IPA)I Interactive example chart: http://web.uvic.ca/ling/
resources/ipa/charts/IPAlab/IPAlab.htm
picture from: https://home.cc.umanitoba.ca/ robh/howto.html
14 / 33
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
Applications of speech encoding
Many applications for encoding speech, e.g.:I Spoken dialog systems, i.e. speak with a computer (and
have it speak back).I Helping speech pathologists diagnose problems
Mapping sounds to symbols (alphabet), and vice versa:
I Automatic Speech Recognition (ASR): sounds to textI Text-to-Speech Synthesis (TTS): texts to sounds
15 / 33
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
Machine Learning
I A large family of algorithms that train as they runI Require large amounts of training dataI Learn to output predictions for previously unseen dataI Require data to be in a numeric vector format
16 / 33
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
Machine Learning: Variance and Bias
I A key problem in ML, scribbled on this slide by CarlosGuestrin’s own hand!
17 / 33
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
Statistical beginnings: N-grams
I A basic technique to turn a document into a vector ofnumbers
Material based upon chapter 5 of Jurafsky and Martin 2000
18 / 33
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
Regular Expressions
I sequence of characters that defines a (search) patternI ([0-9]{3})[0-9]{3}-[0-9]{4}I matches phone numbersI this is our first topic and you can play with regexes here:I https://regex101.com/
19 / 33
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
Finite State Automata
colou?r
c o l o r
u
20 / 33
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
Regular Languages
...are part of the Chomsky Hierarchy:
(picture by J. Finkelstein for Wikipedia)
21 / 33
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
Morphology and Phonology
Verbal position classes in Finnish:
voice tense per partstem
22 / 33
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
Finite State Transducers
Witches and Wizards:
t : t
w : w i : i
c : c ε : e
z : z a : a r : r d : d
s : PL
h : h
23 / 33
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
Syntax
S
NP
I
VP
saw NP
the astronomer PP
with the telescope
S
NP
I
VP
saw NP
the astronomer
PP
with the telescope
24 / 33
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
Parsing
CKY demo: http://lxmls.it.pt/2015/cky.html
25 / 33
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
Syntactic theory
I ...is a complex thingI If you want get a sense what it is about:
read The Linguistics Wars by R. Harris
picture from:
https://koine-greek.com/2017/05/22/a-brief-history-of-syntactic-theory-parallel-contraint-based-syntax/
26 / 33
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
Grammar engineering
I In linguistic terms, creating formal,machine-and-human-readable grammar descriptions
I Like NLP grammars, linguistically-engineeredgrammars also map strings to syntactic and semanticrepresentations
I but they are also language descriptions grounded inlinguistic theory
I Demo: http://erg.delph-in.net/logonI Demo:
http://matrix.ling.washington.edu/customize/matrix.cgi
27 / 33
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
Semantics
I Formal semantics is couched in formal logicI “Every dog chases some cat”I ∀ x dog(x) ∃ y cat(y) chase(x,y)I ∃ y cat(y) ∀ x dog(x) chase(x,y)
picture source: https://imgur.com/a/XBHub
28 / 33
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
Computational Semantics
...may take different forms, e.g. AMR (abstract meaningrepresentation, Banarescu et al., 2013)
29 / 33
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
Computational Semantics
...may take different forms, e.g. MRS (minimal recursionsemantics, Copestake et al., 2005)
Every dog chases some cat.
(1)
mrsLTOP h1INDEX e3
RELS⟨
every q relLBL h4ARG0 x6RSTR h7BODY h5
,
dog n rel
LBL h8ARG0 x6
,
chase v relLBL h2ARG0 e3ARG1 x6ARG2 x9
,
some q relLBL h10ARG0 x9RSTR h12BODY h11
,
cat n rel
LBL h13ARG0 x9
⟩
HCONS⟨
qeqHARG h1LARG h2
,qeqHARG h7LARG h8
,qeqHARG h12LARG h13
⟩
30 / 33
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
Semantic parsing
I Mapping a string to a semantic representationI Latest technique: Word embeddingsI Words are converted to vectors of numbersI (cf. N-grams are documents converted to vectors of
numbers)
picture credit:
https://www.quora.com/What-does-the-word-embedding-mean-in-the-context-of-Machine-Learning
31 / 33
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
Deep learning
One of the most successful techniques used in NLP inrecent years.
picture from: https://mapr.com/blog/demystifying-ai-ml-dl/
32 / 33
LING472
Modeling Languagefor CompLing &
NLPCourse overview
EmergencyProcedures
Course overviewand logistics
Modeling Language
Text as language
Speech aslanguage
Relating written andspoken language
Statisticalbeginnings
Formal Beginnings
Morphology andPhonology
Syntax and Parsing
Semantics
Representationlearning
Ethics and NLP
Ethics and NLP
I What and Why does NLP research do?I What is the researcher’s responsibility?
pictures from Dirk Hovy’s talk: https://www.youtube.com/watch?v=gcfKvy4NrOE33 / 33