introduction to computational linguisticscourses.washington.edu/ling472/slides/0327-overview.pdf ·...

33
LING472 Modeling Language for CompLing & NLP Course overview Emergency Procedures Course overview and logistics Modeling Language Text as language Speech as language Relating written and spoken language Statistical beginnings Formal Beginnings Morphology and Phonology Syntax and Parsing Semantics Representation learning Ethics and NLP Introduction to Computational Linguistics Modeling Language for CompLing & NLP Course overview Olga Zamaraeva (2018) Based on Dickinson, Brew, & Meurers (2013) and Bender (prev. years) University of Washington March 27, 2018 1 / 33

Upload: nguyennga

Post on 19-Aug-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

Introduction to Computational LinguisticsModeling Language for CompLing & NLP

Course overview

Olga Zamaraeva (2018)Based on Dickinson, Brew, & Meurers (2013) and Bender (prev. years)

University of WashingtonMarch 27, 2018

1 / 33

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

But first: Emergency procedures

https://www.washington.edu/uwem/plans-and-procedures/uw-emergency-procedures/

2 / 33

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

Computational Linguistics & NLP

CompLing == NLP? CompLing ⊃ NLP? CompLing ∩ NLP = ∅?

3 / 33

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

Computational Linguistics & NLP

CompLing == NLP? CompLing ⊃ NLP? CompLing ∩ NLP = ∅?

I Goal: A survey of the space that CompLing&NLP live in

I Linguistic perspective

I Use working definitions but not try to decide which isbetter etc.

4 / 33

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

Definitions

I Computational Linguistics & Natural LanguageProcessing

I Analyzing language material to solve a variety of tasks

I Natural Language Processing (NLP)I Focus: algorithms* (applicability, effectiveness)I *(Learning “world” through language?)

I speech recognitionI machine translationI parsingI dialog systemsI ...and more

I Computational LinguisticsI Focus: study of language with computational means

I documenting understudied languagesI rigorous testing of syntactic theoriesI building typological databasesI semantic ontologiesI modeling human linguistic behaviorI ...and more 5 / 33

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

Prerequisites and alternatives

I PrerequisitesI Basic linguistic theory (LING200)I Basic programming skills are a big plus (e.g. CSE142)I Basic probabilityI Maturity when it comes to dealing with open source

software

I Alternatives: pure NLP perspectiveI CSE447 (NLP)I LING570/572 (and 571)I Require strong probability, programming, and basic

machine learning background

6 / 33

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

Course Logistics

I Canvas: https://canvas.uw.edu/ (CSE472 Sp18)I Public website: http://courses.washington.edu/ling472/I Lecture T/Th, Section FI Readings: Jurafsky&Martin, Language&ComputersI Homeworks (almost weekly)I Project (first milestone: 4/8)I Write ups weigh 50% of the grade (both project AND

homework)I Grading Scale: 95% = 4.0, 62% = 0.7 (generic LING

dept. scale)I For some assignments, showing effort will get you a

long wayI Write ups are graded for content AND quality of writing

7 / 33

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

Course Logistics: Project

I Project: Error Analysis for a NLP packageI Pick a package+dataset (e.g. here:

https://aclanthology.coli.uni-saarland.de/ )I Look for a list of packages, e.g. https:

//aclanthology.coli.uni-saarland.de/events/acl-2017 orhere: or here: http://aclweb.org/anthology/P/P17/

I Look for a publication that comes with software anddataset:

I Find a package that you can actually runI Reproduce the experiment described in the publicationI Perform error analysisI Write it up

8 / 33

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

Modeling language

I If we want to do anything with language, we need a wayto represent language.

I What does it mean to model language?I Do linguists need to model language?I Do computer scientists need to model language?

I How close is a model to the real thing?I Does it matter to linguists?I Does it matter to computer scientists? To engineers?

9 / 33

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

Modeling language: statistical vs. symbolicmethods

I Statistical methods involve training a model on a bodyof data so it can predict the most probablelabel/structure/etc for new data

I Symbolic methods involve knowledge engineering, orhand-coding of linguistic knowledge which is thenapplied to tasks

I Statistical methods provide robustness, symbolicmethods precision

I Statistical and symbolic methods can be and often arecombined in both NLP and CompLing

10 / 33

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

Modeling Language: where to start?

I We can interact with the computer in several ways:I write or read textI speak or listen to speech

I Computer has to have some way to representI textI speech

I Next, we can talk about modeling:I sounds of language (phonetics & phonology)I words (morphology)I sentence structure (syntax)I meaning (semantics)

11 / 33

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

Relating texts to languages

I Texts represent a fragment of a language, in terms ofgrammar and lexicon

I Texts often represent standard varietiesI What would it mean to limit data to standard varieties?

picture of Dirk Hovy giving his talk: https://www.youtube.com/watch?v=gcfKvy4NrOE

12 / 33

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

Unwritten languages

Many languages have never been written down. Of the≈7000 spoken languages, ≈3000 have do not have writingsystems.

Some examples:

I Salar, a Turkic language in China.I Gugu Badhun, a language in Australia.I Southeastern Pomo, a language in California

(See: http://www.ethnologue.com/ and http://www.sil.org/mexico/ilv/iinfoilvmexico.htm)

13 / 33

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

Sounds as language

We want to be able to encode any spoken language

I The International Phonetic Alphabet (IPA)I Interactive example chart: http://web.uvic.ca/ling/

resources/ipa/charts/IPAlab/IPAlab.htm

picture from: https://home.cc.umanitoba.ca/ robh/howto.html

14 / 33

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

Applications of speech encoding

Many applications for encoding speech, e.g.:I Spoken dialog systems, i.e. speak with a computer (and

have it speak back).I Helping speech pathologists diagnose problems

Mapping sounds to symbols (alphabet), and vice versa:

I Automatic Speech Recognition (ASR): sounds to textI Text-to-Speech Synthesis (TTS): texts to sounds

15 / 33

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

Machine Learning

I A large family of algorithms that train as they runI Require large amounts of training dataI Learn to output predictions for previously unseen dataI Require data to be in a numeric vector format

16 / 33

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

Machine Learning: Variance and Bias

I A key problem in ML, scribbled on this slide by CarlosGuestrin’s own hand!

17 / 33

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

Statistical beginnings: N-grams

I A basic technique to turn a document into a vector ofnumbers

Material based upon chapter 5 of Jurafsky and Martin 2000

18 / 33

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

Regular Expressions

I sequence of characters that defines a (search) patternI ([0-9]{3})[0-9]{3}-[0-9]{4}I matches phone numbersI this is our first topic and you can play with regexes here:I https://regex101.com/

19 / 33

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

Finite State Automata

colou?r

c o l o r

u

20 / 33

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

Regular Languages

...are part of the Chomsky Hierarchy:

(picture by J. Finkelstein for Wikipedia)

21 / 33

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

Morphology and Phonology

Verbal position classes in Finnish:

voice tense per partstem

22 / 33

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

Finite State Transducers

Witches and Wizards:

t : t

w : w i : i

c : c ε : e

z : z a : a r : r d : d

s : PL

h : h

23 / 33

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

Syntax

S

NP

I

VP

saw NP

the astronomer PP

with the telescope

S

NP

I

VP

saw NP

the astronomer

PP

with the telescope

24 / 33

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

Parsing

CKY demo: http://lxmls.it.pt/2015/cky.html

25 / 33

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

Syntactic theory

I ...is a complex thingI If you want get a sense what it is about:

read The Linguistics Wars by R. Harris

picture from:

https://koine-greek.com/2017/05/22/a-brief-history-of-syntactic-theory-parallel-contraint-based-syntax/

26 / 33

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

Grammar engineering

I In linguistic terms, creating formal,machine-and-human-readable grammar descriptions

I Like NLP grammars, linguistically-engineeredgrammars also map strings to syntactic and semanticrepresentations

I but they are also language descriptions grounded inlinguistic theory

I Demo: http://erg.delph-in.net/logonI Demo:

http://matrix.ling.washington.edu/customize/matrix.cgi

27 / 33

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

Semantics

I Formal semantics is couched in formal logicI “Every dog chases some cat”I ∀ x dog(x) ∃ y cat(y) chase(x,y)I ∃ y cat(y) ∀ x dog(x) chase(x,y)

picture source: https://imgur.com/a/XBHub

28 / 33

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

Computational Semantics

...may take different forms, e.g. AMR (abstract meaningrepresentation, Banarescu et al., 2013)

29 / 33

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

Computational Semantics

...may take different forms, e.g. MRS (minimal recursionsemantics, Copestake et al., 2005)

Every dog chases some cat.

(1)

mrsLTOP h1INDEX e3

RELS⟨

every q relLBL h4ARG0 x6RSTR h7BODY h5

,

dog n rel

LBL h8ARG0 x6

,

chase v relLBL h2ARG0 e3ARG1 x6ARG2 x9

,

some q relLBL h10ARG0 x9RSTR h12BODY h11

,

cat n rel

LBL h13ARG0 x9

HCONS⟨

qeqHARG h1LARG h2

,qeqHARG h7LARG h8

,qeqHARG h12LARG h13

30 / 33

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

Semantic parsing

I Mapping a string to a semantic representationI Latest technique: Word embeddingsI Words are converted to vectors of numbersI (cf. N-grams are documents converted to vectors of

numbers)

picture credit:

https://www.quora.com/What-does-the-word-embedding-mean-in-the-context-of-Machine-Learning

31 / 33

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

Deep learning

One of the most successful techniques used in NLP inrecent years.

picture from: https://mapr.com/blog/demystifying-ai-ml-dl/

32 / 33

LING472

Modeling Languagefor CompLing &

NLPCourse overview

EmergencyProcedures

Course overviewand logistics

Modeling Language

Text as language

Speech aslanguage

Relating written andspoken language

Statisticalbeginnings

Formal Beginnings

Morphology andPhonology

Syntax and Parsing

Semantics

Representationlearning

Ethics and NLP

Ethics and NLP

I What and Why does NLP research do?I What is the researcher’s responsibility?

pictures from Dirk Hovy’s talk: https://www.youtube.com/watch?v=gcfKvy4NrOE33 / 33