si485i : nlp - usna · who cares about nlp? ... ... system commands 13 . what nlp topics did we...

32
SI425 : NLP Missing Topics and the Future

Upload: phamtruc

Post on 29-May-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SI485i : NLP - USNA · Who cares about NLP? ...  ... system commands 13 . What NLP topics did we miss?

SI425 : NLP Missing Topics and the Future

Page 2: SI485i : NLP - USNA · Who cares about NLP? ...  ... system commands 13 . What NLP topics did we miss?

Who cares about NLP?

• NLP has expanded quickly

• Most top-tier universities now have NLP faculty (Stanford,

Cornell, Berkeley, MIT, UPenn, CMU, Hopkins, etc)

• Commercial NLP hiring: Google, Microsoft, IBM,

Amazon, LinkedIn, Yahoo

• Web startups in Silicon Valley are eating up NLP

students

• Navy, DoD, NSA, NIH: all funding NLP research

2

Page 3: SI485i : NLP - USNA · Who cares about NLP? ...  ... system commands 13 . What NLP topics did we miss?

What NLP topics did we miss?

• Speech Recognition

3

Page 4: SI485i : NLP - USNA · Who cares about NLP? ...  ... system commands 13 . What NLP topics did we miss?

What NLP topics did we miss?

• Speech Recognition

4

Page 5: SI485i : NLP - USNA · Who cares about NLP? ...  ... system commands 13 . What NLP topics did we miss?

What NLP topics did we miss?

• Machine Translation

5

Page 6: SI485i : NLP - USNA · Who cares about NLP? ...  ... system commands 13 . What NLP topics did we miss?

What NLP topics did we miss?

• Machine Translation

6

Start at ~6min in.

http://www.youtube.com/watch?feature=player_embedded&v=Nu

-nlQqFCKg

Page 7: SI485i : NLP - USNA · Who cares about NLP? ...  ... system commands 13 . What NLP topics did we miss?

What NLP topics did we miss?

• Machine Translation

• IBM Models (1 through 5)

• Neural Network Translation

7

Page 8: SI485i : NLP - USNA · Who cares about NLP? ...  ... system commands 13 . What NLP topics did we miss?

Machine Translation

• How to model translations?

• Words: P( casa | house )

• Spurious words: P( a | null )

• Fertility: Pn( 1 | house )

• English word translates to one Spanish word

• Distortion: Pd( 5 | 2 )

• The 2nd English word maps to the 5th Spanish word

Page 9: SI485i : NLP - USNA · Who cares about NLP? ...  ... system commands 13 . What NLP topics did we miss?

Distortion

• Encourage translations to follow the diagonal…

• P( 4 | 4 ) * P( 5 | 5 ) * …

Page 10: SI485i : NLP - USNA · Who cares about NLP? ...  ... system commands 13 . What NLP topics did we miss?

Learning Translations

• Huge corpus of “aligned sentences”.

• Europarl

• Corpus of European Parliamant proceedings

• The EU is mandated to translate into all 21 official languages

• 21 languages, (semi-) aligned to each other

• P( casa | house ) = (count all casa/house pairs!)

• Pd( 2 | 5 ) = (count all sentences where 2nd word

went to 5th word)

Page 11: SI485i : NLP - USNA · Who cares about NLP? ...  ... system commands 13 . What NLP topics did we miss?

Machine Translation Technology

• Hand-held devices for military

• Speak english -> recognition -> translation -> generate Urdu

• Translate web documents

• Education technology?

• Doesn’t yet receive much of a focus

Page 12: SI485i : NLP - USNA · Who cares about NLP? ...  ... system commands 13 . What NLP topics did we miss?

What NLP topics did we miss?

• Dialogue Systems

12

Do you think

Anakin likes me? I don’t care.

Page 13: SI485i : NLP - USNA · Who cares about NLP? ...  ... system commands 13 . What NLP topics did we miss?

What NLP topics did we miss?

• Dialogue Systems

• Why? Heavy interest in human-robot communication.

• UAVs require teams of 5+ people for each operating

machine • Goal: reduce the number of people

• Give computer high-level dialogue commands, rather than low-level

system commands

13

Page 14: SI485i : NLP - USNA · Who cares about NLP? ...  ... system commands 13 . What NLP topics did we miss?

What NLP topics did we miss?

• Dialogue Systems

• Dialogue is a fascinating topic. Not only do we need

to understand language, but now discourse cues: • Questions require replies

• Imperatives/Commands

• Acknowledgments: “ok”

• Back-channels: “uh huh”, “mm hmm”

• Belief-Desire-Intention (BDI) Model

• Beliefs: you maintain a set of facts about the world

• Desires: things you want to become true in the world

• Intentions: desires that you are taking action on

14

Page 15: SI485i : NLP - USNA · Who cares about NLP? ...  ... system commands 13 . What NLP topics did we miss?

Neural Networks

• The field started shifting ~2012 with real movement in

just the past 2 years.

• Neural networks now becoming the default type of

classifier.

15

Page 16: SI485i : NLP - USNA · Who cares about NLP? ...  ... system commands 13 . What NLP topics did we miss?

Word Embeddings

• Our features in this class were largely binary on/off

• [ he 1, said 1, pizza 0, denied 1, ate 0, … ]

• Now we represent a unigram as a vector of real

numbers.

• Just like distributional learning!!!

• But: the vector is learned instead of derived from context

word counts

16

Page 17: SI485i : NLP - USNA · Who cares about NLP? ...  ... system commands 13 . What NLP topics did we miss?

Distributional to Embeddings

Distributional word vector:

Word Embedding:

cell [ 1.47, 0.42, 1.44, 0.53, 0.69, 1.02, 0.42, -0.43 … ]

Page 18: SI485i : NLP - USNA · Who cares about NLP? ...  ... system commands 13 . What NLP topics did we miss?

Word Embeddings as Features

• The features are no longer n-grams, but embeddings of

the n-grams.

• Before: [ he 1, said 1, pizza 0, denied 1, ate 0, … ]

• Now: [ 1.47, 0.42, 1.44, 0.53, 0.69, 1.02, 0.42, -0.43 … ]

• Sum up the word embeddings for each unigram

• OR just concatenate them into an even longer vector!

18

Page 19: SI485i : NLP - USNA · Who cares about NLP? ...  ... system commands 13 . What NLP topics did we miss?

Basic Neural Network

• We learned multi-class logistic regression (MaxEnt)

• This is a one-layer neural network!

19

Dickens

Austen

Shakespeare

thou

hast

opened

the

box

quicketh

Page 20: SI485i : NLP - USNA · Who cares about NLP? ...  ... system commands 13 . What NLP topics did we miss?

Basic Neural Network

• Do several regressions at once. Decide later.

20

thou

hast

opened

the

box

quicketh

Dickens

Austen

Shakespeare

New hidden layer!!!

Page 21: SI485i : NLP - USNA · Who cares about NLP? ...  ... system commands 13 . What NLP topics did we miss?

Basic Neural Network

• Do several regressions at once. Decide later.

21

Dickens

Austen

Shakespeare

Word

Embeddings

As Input

Page 22: SI485i : NLP - USNA · Who cares about NLP? ...  ... system commands 13 . What NLP topics did we miss?

El Fin

• Secret 1:

22

Page 23: SI485i : NLP - USNA · Who cares about NLP? ...  ... system commands 13 . What NLP topics did we miss?

El Fin

• Secret 1:

I intentionally made some of our labs ambiguous

23

Page 24: SI485i : NLP - USNA · Who cares about NLP? ...  ... system commands 13 . What NLP topics did we miss?

El Fin

• Secret 1:

I intentionally made some of our labs ambiguous

Under-defined tasks with unclear expected results

24

Page 25: SI485i : NLP - USNA · Who cares about NLP? ...  ... system commands 13 . What NLP topics did we miss?

El Fin

• Secret 1:

I intentionally made some of our labs ambiguous

Under-defined tasks with unclear expected results

• Secret 2:

25

Page 26: SI485i : NLP - USNA · Who cares about NLP? ...  ... system commands 13 . What NLP topics did we miss?

El Fin

• Secret 1:

I intentionally made some of our labs ambiguous

Under-defined tasks with unclear expected results

• Secret 2:

I tried to teach you skills that have nothing to do with NLP

26

Page 27: SI485i : NLP - USNA · Who cares about NLP? ...  ... system commands 13 . What NLP topics did we miss?

El Fin

• Secret 1:

I intentionally made some of our labs ambiguous

Under-defined tasks with unclear expected results

• Secret 2:

I tried to teach you skills that have nothing to do with NLP

Experimentation

Error Analysis

27

Page 28: SI485i : NLP - USNA · Who cares about NLP? ...  ... system commands 13 . What NLP topics did we miss?

El Fin

• Secret 1:

I intentionally made some of our labs ambiguous

Under-defined tasks with unclear expected results

• Secret 2:

I tried to teach you skills that have nothing to do with NLP

Experimentation

Error Analysis

• Secret 3:

28

Page 29: SI485i : NLP - USNA · Who cares about NLP? ...  ... system commands 13 . What NLP topics did we miss?

El Fin

• Secret 1:

I intentionally made some of our labs ambiguous

Under-defined tasks with unclear expected results

• Secret 2:

I tried to teach you skills that have nothing to do with NLP

Experimentation

Error Analysis

• Secret 3:

I appreciate the hard work you put into the class

29

Page 30: SI485i : NLP - USNA · Who cares about NLP? ...  ... system commands 13 . What NLP topics did we miss?

30

Page 31: SI485i : NLP - USNA · Who cares about NLP? ...  ... system commands 13 . What NLP topics did we miss?

What NLP topics did we miss?

Unsupervised Learning

31

Page 32: SI485i : NLP - USNA · Who cares about NLP? ...  ... system commands 13 . What NLP topics did we miss?

What NLP topics did we miss?

Unsupervised Learning

• Most of this semester used data

that had human labels.

• Bootstrapping was our main counter-

example: it is mostly unsupervised.

• Many many algorithms being

researched to learn language and

knowledge without humans, only

using text.

32