a whirlwind tour of natural language processing mark sammons cognitive computation group, uiuc

43
A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Upload: ralph-gilmore

Post on 17-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

A Whirlwind Tour of Natural Language Processing

Mark Sammons Cognitive Computation Group, UIUC

Page 2: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Who Cares about NLP?

…Eddie Izzard,

that’s who…

(Those of a sensitive disposition toward explicit language should probably cover their ears…)

Page 2

Page 3: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Remember Star Trek? HAL in 2001? The Heart of Gold in Hitch-hiker’s Guide…?

Grand Vision of Artificial Intelligence: computers that actively communicate.

A substantial effort devoted to achieving AI. But how do we decide whether a machine is smart? IBM’s Deep Blue plays a mean game of chess…

…but is it intelligent? Early idea of evaluation: Turing Test

If a human can’t tell that it’s a machine… AI philosophy: is *appearance* of intelligent behavior

the same as intelligence? General assumption: NLP is AI-complete (play on

concept of NP-completeness) – i.e. need Intelligence to properly solve NLP

Page 3

Page 4: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

More Realistically… where does NLP help?

Already here: Context-sensitive spelling, grammar checkers in

text editors Machine Translation, e.g. in web browsers Automated phone trees (by some definition of

“help”) Web search

Under development: Better Machine Translation Better search Voice command in e.g. cars

Page 4

Page 5: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Outline

Why NLP is hard

NLP domains: Speech vs. Text

Attacking NLP problems

Linguistics: building explanatory models

Statistics: data-driven approaches

Machine Learning & NLP

NLP Problems and Solutions

Page 5

Page 6: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Why is NLP so hard?

Meaning

Language

Ambiguity

Variability

Page 7: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Variability

Example: Relation Extraction: “Works for”

Jim Carpenter works for the U.S. Government.

The American government employed Jim Carpenter.

Jim Carpenter was fired by the US Government.

Jim Carpenter worked in a number of important positions. …. As a press liaison for the IRS, he made contacts in the white house.

Top Russian interior minister Yevgeny Topolov met yesterday with his US counterpart, Jim Carpenter.

Former US Secretary of Defense Jim Carpenter spoke today…

Page 7

Page 8: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Context Sensitive Paraphrasing [3]

He used a Phillips head to tighten the screw.

The bank owner tightened security after a spate of local crimes.

The Federal Reserve will aggressively tighten monetary policy.

……….

LoosenStrengthenStep upToughenImproveFastenImposeIntensifyEaseBeef upSimplifyCurbReduce

LoosenStrengthenStep upToughenImproveFastenImposeIntensifyEaseBeef upSimplifyCurbReduce

Ambiguity

Page 9: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Domain Size

Ideal goal: must handle all well-formed strings of text Problem: infinite domain

Sequential modifiers:

I saw Martin Sheen in a movie I saw Martin Sheen in a movie in ParisI saw Martin Sheen in a movie in Paris in the SpringI saw Martin Sheen in a movie in Paris in the Spring with my friend…

Unbounded relative clauses:

I saw Martin Sheen, who was with a friend I knew from high school, which was well known for its long, storied history of ………….., in a movie…..

Page 9

Page 10: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Outline

Why NLP is hard

NLP domains: Speech vs. Text

Attacking NLP problems

Linguistics: building explanatory models

Statistics: data-driven approaches

Machine Learning & NLP

NLP Problems and Solutions

Page 10

Page 11: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Speech Recognition

NOT “voice recognition”

How hard can it be?

First image: “Fix the wing” Second image: same utterance in

noisy airport maintenanceenvironment

Page 11

Page 12: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Speech Recognition – yup, it’s hard…

“Yuhgudda unnuhstahn sheeguhnuhbeeyah, yunoewaah, dissappointed.”

“You’ve got to understand she’s going to be, ah, you know, ah, disappointed.”

Difficult to recognize words, word boundaries Even given word boundaries, utterances are ill-

formed (compared to text) (multiple variations for single word)

Hesitations, repetitions, fragmentary sentences, self-interruptions, poor word choice, sound quality…

LBJ/Mansfield audio sample

Page 12

Page 13: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Development and Evaluation for Speech Recognition

Switchboard (and other) corpora Large set of phone conversations Audio signals aligned with transcriptions of utterances

(phone sequences) Dictionaries aligning words with phone sequence

equivalents

Typically, machine learning approaches applied Signal processing techniques extract features from signals Statistical methods relate these features to particular

phones – create a model

Analyze new signals, use model to identify plausible phone sequences

Choose most likely sequence given another statistical model

Page 13

Page 14: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Speech Recognition System (Courtesy of ComputerWorld…)

Page 14

Page 15: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

The State of the Art in Speech-to-Text Translation

Current performance on known tasks: 98% word accuracy for dictation Very controlled circumstances

State of the art for spontaneous speech: News broadcast: ~90% Switchboard (phone conversations): ~80%

A lot of work even to get to a clean text representation of signal

Notice that I haven’t even begun to address tasks like search using this input

(Note also that there are many other research directions in speech processing – e.g. speaker identification)

Page 15

Page 16: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

What about Text?

A lot of overlap

If you can solve NLP in text, and can accurately parse

speech into text, the two problems are the same

Text domain has some nice characteristics

Paragraph, Sentence, Word segmentation already

present

Well-formed utterances (in many/most sub-domains)

Little regional variation

Most information is already in the form of text

Page 16

Page 17: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Outline

Why NLP is hard

NLP domains: Speech vs. Text

Attacking NLP problems

Linguistics: building explanatory models

Statistics: data-driven approaches

Machine Learning & NLP

NLP Problems and Solutions

Page 17

Page 18: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Linguistics

Linguists: meaning through structure + lexical knowledge

“Colorless green ideas sleep furiously”

Discover the rules of language (a grammar) Prescriptive grammar: rules describe what you shouldn’t

do. Generative grammar: a finite set of rules that can

generate all possible strings in a language, and only those strings that are valid in that language [3] “Generate” here means “assign a structural description

to” Attempts to move beyond simplistic linear models, where

words are dependent only on previous words

Page 18

Page 19: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Divide and Conquer: Morphology

Consider the sub-problem of recognizing well-formed variations of words

Popular method: Finite State Automata/Transducers

Automaton: recognizes patterns Transducer: maps from an input pattern to an output

pattern – e.g. indicate whether a noun is plural

Page 19

Page 20: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Morphology Example: plurals [5]

Page 20

q0

q1

q2

Regular noun

Irregular singular noun

Irregular plural noun

-s: N

: + PL

: N+ PL

: N

Page 21: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Basic Generative Grammar: Context-Free Grammar

Accomplishes the goal of a finite description of infinite domain, at least for syntactic structure

Generate parse trees, decompose into constituents, infer generative rules:

Page 21

S => NP VPVP => V VPVP => VP PPVP => V ADJPNP => PROPRO => HeV => wantsPP => to…..

[4]

Page 22: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Context-Free Grammar

Drawbacks to CFGs: Real natural language may not be context-free Hard to model some phenomena, e.g. limits on nesting:

The cat ran away.

The cat the dog bit ran away.The cat the dog the horse kicked bit ran

away.

Phenomena like agreement, morphology, long distance dependencies, require very complex set of rules

What about unseen words/phrases/sentences? Given a sentence, there may be multiple ways

to explain it.

I pointed to the man with the crutch.

Page 22

Page 23: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

That doesn’t deter Real Linguists…

A range of formalisms have been developed Different ways of tackling composition of words,

phrases, clauses Trade-off between importance of sentence structure

and individual words Strong emphasis on generality, particularly across

languages

Typically much more involved than the simplistic CFG in the previous example

There is ongoing work to encode a hand-written grammar of English – English Resource Grammar Uses Head-driven Phrase Structure Grammar Explains syntax via a Typed Feature Structure model

Page 23

Page 24: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

HPSG sample Feature Structure (for one word)

Page 24

Page 25: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

General Points

Much work on analyzing languages for structure

Wide range of theories; all have some descriptive power

All assume close relation between structure and meaning

We will see CFGs again later…

Page 25

Page 26: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Outline

Why NLP is hard

NLP domains: Speech vs. Text

Attacking NLP problems: 4 research strands

Linguistics: building explanatory models

Statistics: data-driven approaches

Machine Learning & NLP

NLP Problems and Solutions

Page 26

Page 27: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Data-Driven Approaches

Consider a partially completed sentence…

We can capture some measure of this intuitive restriction on word choice using probabilities Bigrams, trigrams, n-grams Effect of adding complexity in terms of storage

requirements?50,0002 = 2.5 Billion

We can estimate these probabilities directly from a corpus (body of text): p(wn|wn-1) = C(wn-1 wn)/C(wn-1)

Applications: spelling checker, augmentative communication systems, speech processing…

Page 27

Page 28: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

N-gram model samples

The following sentences were generated using n-gram models trained on Shakespeare’s works (~885,000 words, ~29,000 types) [5]:

1-gram: Every enter now severally so, let2-gram: What means sir. I confess she? Then all

sorts, he is trim, captain.3-gram: This shall forbid it should be branded, if

renown made it empty.4-gram: Enter Leonato’s brother Antonio, and the

rest, but seek the weary beds of people sick.

Page 28

Page 29: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

N-Gram Modeling

What’s it good for? Determine plausibility of new sentence:

The man spoke briefly…

The dog spoke briefly…

The spoke briefly man…

The wheel spoke briefly…

Given N-gram models of two domains, identify most likely source:

ACENOR stocks caught fire today on word of a take-over….

Teen pop sensation Tilde Greengrass roared into Austin today…

Teen Angst Poetry and Band Names… Drawbacks: how to handle unseen sequences?

Page 29

Page 30: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Computational Linguistics

We just used very elementary statistics to make some potentially interesting discoveries about language

In fact, given the right resources, we can use statistics to build automated resources for linguistic analysis… Part of speech tagging:

(DT the) (NN man) (VBD climbed) (IN up) (DT the ) (NN tree)

Phrase boundary detection & phrase labeling

(NP the man) (VP climbed) (PP up the tree)

Parsing….

Page 30

Page 31: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Parsing Revisited

We saw earlier an outline of a Context-Free Grammar model of language S => NP VP

VP => NP PPNP => NP PPNP => DT NN

(NP I) (VP saw) (NP the man) (PP with the telescope)(NP I) (VP saw) (NP the man) (PP with the book)

Two valid parses for each… are they equally valid?

Page 31

Page 32: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Probabilistic CFGs

In the n-gram modeling example, we derived probabilities based on a corpus. Can we do the same for CFG rules? Not the same problem: for n-gram modeling, the

words alone were sufficient Need a corpus with additional information – the

parse trees Given such a corpus, can use statistical analysis

to derive the rules themselves, and the relative probabilities of rules.

This pattern – applying statistical methods to a labeled data set to extract a predictive model – is common in Machine Learning.

Page 32

Page 33: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Outline

Why NLP is hard

NLP domains: Speech vs. Text

Attacking NLP problems: 4 research strands

Linguistics: building explanatory models

Statistics: data-driven approaches

Machine Learning & NLP

NLP Problems and Solutions

Page 33

Page 34: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Page 34

Machine Learning: Classification

h: X -> Y(classifier)

yOutput:

xInput:

(x,y)(x,y)(x,y)(x,y)yx

Learningalgorithm

D: Training examples

Page 35: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Page 35

Machine Learning (supervised)

Given some labeled data, and assuming some set of models, find the model that best maps each example to its label.

Statistically: represent examples using some abstraction (set of features), compute the relation between features and labels. Choice of model affects best possible performance. Complex model: may get better results (more

expressive), but requires much more data to train (and labeled data is expensive)

Simple model: fewer parameters, so less expressive, but easier to learn

Some examples…

Page 36: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Outline

Why NLP is hard

NLP domains: Speech vs. Text

Attacking NLP problems: 4 research strands

Linguistics: building explanatory models

Logic: defining meaning and reasoning

Statistics: data-driven approaches

Machine Learning & NLP

NLP Problems and Solutions

Page 36

Page 37: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

NLP Problems and Solutions (focused)

Part-of-Speech tagging Context Sensitive Spelling Correction Named Entity Recognition Relation detection Comma Resolution Verb and Noun Phrase Chunking Prepositional Phrase Attachment Coreference Resolution Statistical Parsing Semantic Role Labeling Emotion and Subjectivity detection

Page 37

Page 38: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Example: Named Entity Recognition

Entities are inherently ambiguous (e.g. JFK can be both location and a person depending on the context) Can appear in various forms ; Can be nested. Using lists is not sufficient New entities are always being introduced

A lot of Machine Learning work – significant over fitting

Key difficulties – Adaptation to: New domains/corpora Slightly new definition of an

entity New languages New types of entities

How to reduce the requirements on the resources needed to produce a semantic categorization for a new domain/new language/new type of entities

New

NE s

een

NE seen

Page 39: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Grand Challenges

Machine Translation

Message Understanding (Information Extraction)

Question Answering

Information Retrieval & Data Mining

Textual Entailment

Page 39

Page 40: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Page 40

Textual Entailment

Work at the level of meaning Frame the task of understanding text as recognizing

when two text fragments mean the same thing (one meaning ‘contains’ the other)

Dagan and Glickman, 2004 pose this problem as Recognizing Textual Entailment.

Now we can recast many problems in terms of TE:

The American government employed Jim Carpenter. Top Russian interior minister Yevgeny Topolov met yesterday with his US counterpart, Jim Carpenter.

Former US Secretary of Defence Jim Carpenter spoke today…

Jim Carpenter works for the U.S. Government.

?

Page 41: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Page 41

PASCAL RTE Challenges (2004-present)

Move away from strict definition (Chierchia & McConnell-Ginet, 2001 [6]):

A text T entails a hypothesis H if H is true in every circumstance (possible world) in which T is true

‘Applied’ Definition (Dagan & Glickman, 2004 [7])

T entails H (TH) if humans reading T will infer that H is most likely true

800 development, 800 test pairs for each challenge

Page 42: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Page 42

Some Examples (2nd RTE Challenge)

TEXT HYPOTHESIS TASK ENTAILMENT

1

Reagan attended a ceremony in Washington to commemorate the landings in Normandy.

Washington is located inNormandy.

IE False

2Google files for its long awaited IPO.

Google goes public. IR True

3

… a shootout at the Guadalajara airport in May, 1993, that killed Cardinal Juan Jesus Posadas Ocampo and six others.

Cardinal Juan Jesus Posadas Ocampo died in 1993.

QA True

4

The SPD got just 21.5% of the votein the European Parliament elections,while the conservative opposition parties polled 44.5%.

The SPD is defeated bythe opposition parties.

IE True

Page 43: A Whirlwind Tour of Natural Language Processing Mark Sammons Cognitive Computation Group, UIUC

Incomplete List of Citations1. Peter Bell and Simon King. Sparse gaussian graphical models for

speech recognition. In Proc. Interspeech 2007, Antwerp, Belgium, August 2007

2. Connor & Roth ECML 073. Chomsky, Noam (1957,2002). Syntactic Structures. Mouton de Gruyter,

13.4. Image courtesy of Bill Wilson, Univ. New South Wales, Australia

http://www.cse.unsw.edu.au/~billw/5. Jurafsky and Martin. Speech and Language Processing, Prentice-Hall,

20006. Chierchia & McConnell-Ginet. Meaning and Grammar: An Introduction

to Semantics (rev. 2nd ed.), 20007. Dagan & Glickman, 2004. Probabilistic textual entailment: Generic

applied modeling of language variability. PASCAL workshop on Text Understanding and mining. 2004.

Some slides came from Prof. Dan Roth, University of Illinois.

Page 43