si485i : nlp - usna cares about nlp? ... ... system commands 13 . what nlp topics did we miss?

Download SI485i : NLP - USNA  cares about NLP? ...   ... system commands 13 . What NLP topics did we miss?

Post on 29-May-2018

212 views

Category:

Documents

0 download

Embed Size (px)

TRANSCRIPT

  • SI425 : NLP Missing Topics and the Future

  • Who cares about NLP?

    NLP has expanded quickly

    Most top-tier universities now have NLP faculty (Stanford,

    Cornell, Berkeley, MIT, UPenn, CMU, Hopkins, etc)

    Commercial NLP hiring: Google, Microsoft, IBM,

    Amazon, LinkedIn, Yahoo

    Web startups in Silicon Valley are eating up NLP

    students

    Navy, DoD, NSA, NIH: all funding NLP research

    2

  • What NLP topics did we miss?

    Speech Recognition

    3

  • What NLP topics did we miss?

    Speech Recognition

    4

  • What NLP topics did we miss?

    Machine Translation

    5

  • What NLP topics did we miss?

    Machine Translation

    6

    Start at ~6min in.

    http://www.youtube.com/watch?feature=player_embedded&v=Nu

    -nlQqFCKg

  • What NLP topics did we miss?

    Machine Translation

    IBM Models (1 through 5)

    Neural Network Translation

    7

  • Machine Translation

    How to model translations?

    Words: P( casa | house )

    Spurious words: P( a | null )

    Fertility: Pn( 1 | house )

    English word translates to one Spanish word

    Distortion: Pd( 5 | 2 )

    The 2nd English word maps to the 5th Spanish word

  • Distortion

    Encourage translations to follow the diagonal

    P( 4 | 4 ) * P( 5 | 5 ) *

  • Learning Translations

    Huge corpus of aligned sentences.

    Europarl

    Corpus of European Parliamant proceedings

    The EU is mandated to translate into all 21 official languages

    21 languages, (semi-) aligned to each other

    P( casa | house ) = (count all casa/house pairs!)

    Pd( 2 | 5 ) = (count all sentences where 2nd word

    went to 5th word)

  • Machine Translation Technology

    Hand-held devices for military

    Speak english -> recognition -> translation -> generate Urdu

    Translate web documents

    Education technology?

    Doesnt yet receive much of a focus

  • What NLP topics did we miss?

    Dialogue Systems

    12

    Do you think

    Anakin likes me? I dont care.

  • What NLP topics did we miss?

    Dialogue Systems

    Why? Heavy interest in human-robot communication.

    UAVs require teams of 5+ people for each operating

    machine Goal: reduce the number of people

    Give computer high-level dialogue commands, rather than low-level

    system commands

    13

  • What NLP topics did we miss?

    Dialogue Systems

    Dialogue is a fascinating topic. Not only do we need

    to understand language, but now discourse cues: Questions require replies

    Imperatives/Commands

    Acknowledgments: ok

    Back-channels: uh huh, mm hmm

    Belief-Desire-Intention (BDI) Model

    Beliefs: you maintain a set of facts about the world

    Desires: things you want to become true in the world

    Intentions: desires that you are taking action on

    14

  • Neural Networks

    The field started shifting ~2012 with real movement in

    just the past 2 years.

    Neural networks now becoming the default type of

    classifier.

    15

  • Word Embeddings

    Our features in this class were largely binary on/off

    [ he 1, said 1, pizza 0, denied 1, ate 0, ]

    Now we represent a unigram as a vector of real

    numbers.

    Just like distributional learning!!!

    But: the vector is learned instead of derived from context

    word counts

    16

  • Distributional to Embeddings

    Distributional word vector:

    Word Embedding:

    cell [ 1.47, 0.42, 1.44, 0.53, 0.69, 1.02, 0.42, -0.43 ]

  • Word Embeddings as Features

    The features are no longer n-grams, but embeddings of

    the n-grams.

    Before: [ he 1, said 1, pizza 0, denied 1, ate 0, ]

    Now: [ 1.47, 0.42, 1.44, 0.53, 0.69, 1.02, 0.42, -0.43 ]

    Sum up the word embeddings for each unigram

    OR just concatenate them into an even longer vector!

    18

  • Basic Neural Network

    We learned multi-class logistic regression (MaxEnt)

    This is a one-layer neural network!

    19

    Dickens

    Austen

    Shakespeare

    thou

    hast

    opened

    the

    box

    quicketh

  • Basic Neural Network

    Do several regressions at once. Decide later.

    20

    thou

    hast

    opened

    the

    box

    quicketh

    Dickens

    Austen

    Shakespeare

    New hidden layer!!!

  • Basic Neural Network

    Do several regressions at once. Decide later.

    21

    Dickens

    Austen

    Shakespeare

    Word

    Embeddings

    As Input

  • El Fin

    Secret 1:

    22

  • El Fin

    Secret 1:

    I intentionally made some of our labs ambiguous

    23

  • El Fin

    Secret 1:

    I intentionally made some of our labs ambiguous

    Under-defined tasks with unclear expected results

    24

  • El Fin

    Secret 1:

    I intentionally made some of our labs ambiguous

    Under-defined tasks with unclear expected results

    Secret 2:

    25

  • El Fin

    Secret 1:

    I intentionally made some of our labs ambiguous

    Under-defined tasks with unclear expected results

    Secret 2:

    I tried to teach you skills that have nothing to do with NLP

    26

  • El Fin

    Secret 1:

    I intentionally made some of our labs ambiguous

    Under-defined tasks with unclear expected results

    Secret 2:

    I tried to teach you skills that have nothing to do with NLP

    Experimentation

    Error Analysis

    27

  • El Fin

    Secret 1:

    I intentionally made some of our labs ambiguous

    Under-defined tasks with unclear expected results

    Secret 2:

    I tried to teach you skills that have nothing to do with NLP

    Experimentation

    Error Analysis

    Secret 3:

    28

  • El Fin

    Secret 1:

    I intentionally made some of our labs ambiguous

    Under-defined tasks with unclear expected results

    Secret 2:

    I tried to teach you skills that have nothing to do with NLP

    Experimentation

    Error Analysis

    Secret 3:

    I appreciate the hard work you put into the class

    29

  • 30

  • What NLP topics did we miss?

    Unsupervised Learning

    31

  • What NLP topics did we miss?

    Unsupervised Learning

    Most of this semester used data

    that had human labels.

    Bootstrapping was our main counter-

    example: it is mostly unsupervised.

    Many many algorithms being

    researched to learn language and

    knowledge without humans, only

    using text.

    32

Recommended

View more >