si485i : nlp - usna · who cares about nlp? ... ... system commands 13 . what nlp topics did we...
TRANSCRIPT
SI425 : NLP Missing Topics and the Future
Who cares about NLP?
• NLP has expanded quickly
• Most top-tier universities now have NLP faculty (Stanford,
Cornell, Berkeley, MIT, UPenn, CMU, Hopkins, etc)
• Commercial NLP hiring: Google, Microsoft, IBM,
Amazon, LinkedIn, Yahoo
• Web startups in Silicon Valley are eating up NLP
students
• Navy, DoD, NSA, NIH: all funding NLP research
2
What NLP topics did we miss?
• Speech Recognition
3
What NLP topics did we miss?
• Speech Recognition
4
What NLP topics did we miss?
• Machine Translation
5
What NLP topics did we miss?
• Machine Translation
6
Start at ~6min in.
http://www.youtube.com/watch?feature=player_embedded&v=Nu
-nlQqFCKg
What NLP topics did we miss?
• Machine Translation
• IBM Models (1 through 5)
• Neural Network Translation
7
Machine Translation
• How to model translations?
• Words: P( casa | house )
• Spurious words: P( a | null )
• Fertility: Pn( 1 | house )
• English word translates to one Spanish word
• Distortion: Pd( 5 | 2 )
• The 2nd English word maps to the 5th Spanish word
Distortion
• Encourage translations to follow the diagonal…
• P( 4 | 4 ) * P( 5 | 5 ) * …
Learning Translations
• Huge corpus of “aligned sentences”.
• Europarl
• Corpus of European Parliamant proceedings
• The EU is mandated to translate into all 21 official languages
• 21 languages, (semi-) aligned to each other
• P( casa | house ) = (count all casa/house pairs!)
• Pd( 2 | 5 ) = (count all sentences where 2nd word
went to 5th word)
Machine Translation Technology
• Hand-held devices for military
• Speak english -> recognition -> translation -> generate Urdu
• Translate web documents
• Education technology?
• Doesn’t yet receive much of a focus
What NLP topics did we miss?
• Dialogue Systems
12
Do you think
Anakin likes me? I don’t care.
What NLP topics did we miss?
• Dialogue Systems
• Why? Heavy interest in human-robot communication.
• UAVs require teams of 5+ people for each operating
machine • Goal: reduce the number of people
• Give computer high-level dialogue commands, rather than low-level
system commands
13
What NLP topics did we miss?
• Dialogue Systems
• Dialogue is a fascinating topic. Not only do we need
to understand language, but now discourse cues: • Questions require replies
• Imperatives/Commands
• Acknowledgments: “ok”
• Back-channels: “uh huh”, “mm hmm”
• Belief-Desire-Intention (BDI) Model
• Beliefs: you maintain a set of facts about the world
• Desires: things you want to become true in the world
• Intentions: desires that you are taking action on
14
Neural Networks
• The field started shifting ~2012 with real movement in
just the past 2 years.
• Neural networks now becoming the default type of
classifier.
15
Word Embeddings
• Our features in this class were largely binary on/off
• [ he 1, said 1, pizza 0, denied 1, ate 0, … ]
• Now we represent a unigram as a vector of real
numbers.
• Just like distributional learning!!!
• But: the vector is learned instead of derived from context
word counts
16
Distributional to Embeddings
Distributional word vector:
Word Embedding:
cell [ 1.47, 0.42, 1.44, 0.53, 0.69, 1.02, 0.42, -0.43 … ]
Word Embeddings as Features
• The features are no longer n-grams, but embeddings of
the n-grams.
• Before: [ he 1, said 1, pizza 0, denied 1, ate 0, … ]
• Now: [ 1.47, 0.42, 1.44, 0.53, 0.69, 1.02, 0.42, -0.43 … ]
• Sum up the word embeddings for each unigram
• OR just concatenate them into an even longer vector!
18
Basic Neural Network
• We learned multi-class logistic regression (MaxEnt)
• This is a one-layer neural network!
19
Dickens
Austen
Shakespeare
thou
hast
opened
the
box
quicketh
Basic Neural Network
• Do several regressions at once. Decide later.
20
thou
hast
opened
the
box
quicketh
Dickens
Austen
Shakespeare
New hidden layer!!!
Basic Neural Network
• Do several regressions at once. Decide later.
21
Dickens
Austen
Shakespeare
Word
Embeddings
As Input
El Fin
• Secret 1:
22
El Fin
• Secret 1:
I intentionally made some of our labs ambiguous
23
El Fin
• Secret 1:
I intentionally made some of our labs ambiguous
Under-defined tasks with unclear expected results
24
El Fin
• Secret 1:
I intentionally made some of our labs ambiguous
Under-defined tasks with unclear expected results
• Secret 2:
25
El Fin
• Secret 1:
I intentionally made some of our labs ambiguous
Under-defined tasks with unclear expected results
• Secret 2:
I tried to teach you skills that have nothing to do with NLP
26
El Fin
• Secret 1:
I intentionally made some of our labs ambiguous
Under-defined tasks with unclear expected results
• Secret 2:
I tried to teach you skills that have nothing to do with NLP
Experimentation
Error Analysis
27
El Fin
• Secret 1:
I intentionally made some of our labs ambiguous
Under-defined tasks with unclear expected results
• Secret 2:
I tried to teach you skills that have nothing to do with NLP
Experimentation
Error Analysis
• Secret 3:
28
El Fin
• Secret 1:
I intentionally made some of our labs ambiguous
Under-defined tasks with unclear expected results
• Secret 2:
I tried to teach you skills that have nothing to do with NLP
Experimentation
Error Analysis
• Secret 3:
I appreciate the hard work you put into the class
29
30
What NLP topics did we miss?
Unsupervised Learning
31
What NLP topics did we miss?
Unsupervised Learning
• Most of this semester used data
that had human labels.
• Bootstrapping was our main counter-
example: it is mostly unsupervised.
• Many many algorithms being
researched to learn language and
knowledge without humans, only
using text.
32