ling 570 day #3 stemming, probabilistic automata, markov chains/model

Post on 12-Jan-2016

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Ling 570

Day #3

Stemming, Probabilistic Automata, Markov Chains/Model

2

MORPHOLOGY AND FSTS

3

FST as Translator

FR: ce bill met de le baume sur une blessure

EN: this bill puts balm on a sore wound

Last Class

4

FST Application Examples

• Case folding:– He said he said

• Tokenization:– “He ran.” “ He ran . “

• POS tagging:– They can fish PRO VERB NOUN

5

FST Application Examples

• Pronunciation:– B AH T EH R B AH DX EH R

• Morphological generation:– Fox s Foxes

• Morphological analysis:– cats cat s

6

Roadmap

• Motivation:– Representing words

• A little (mostly English) Morphology• Stemming

7

The Lexicon

• Goal: Represent all the words in a language• Approach?

8

The Lexicon

• Goal: Represent all the words in a language• Approach?

– Enumerate all words?

9

The Lexicon

• Goal: Represent all the words in a language• Approach?

– Enumerate all words?• Doable for English

– Typical for ASR (Automatic Speech Recognition)– English is morphologically relatively impoverished

10

The Lexicon

• Goal: Represent all the words in a language• Approach?

– Enumerate all words?• Doable for English

– Typical for ASR (Automatic Speech Recognition)– English is morphologically relatively impoverished

• Other languages?

11

The Lexicon

• Goal: Represent all the words in a language• Approach?

– Enumerate all words?• Doable for English

– Typical for ASR (Automatic Speech Recognition)– English is morphologically relatively impoverished

• Other languages?– Wildly impractical

» Turkish: 40,000 forms/verb;

uygarlas¸tıramadıklarımızdanmıs¸sınızcasına

“(behaving) as if you are among those whom we could not civilize”

12

Morphological Parsing

• Goal: Take a surface word form and generate a linguistic structure of component morphemes

13

Morphological Parsing

• Goal: Take a surface word form and generate a linguistic structure of component morphemes

• A morpheme is the minimal meaning-bearing unit in a language.

14

Morphological Parsing

• Goal: Take a surface word form and generate a linguistic structure of component morphemes

• A morpheme is the minimal meaning-bearing unit in a language.– Stem: the morpheme that forms the central

meaning unit in a word– Affix: prefix, suffix, infix, circumfix

15

Morphological Parsing

• Goal: Take a surface word form and generate a linguistic structure of component morphemes

• A morpheme is the minimal meaning-bearing unit in a language.– Stem: the morpheme that forms the central

meaning unit in a word– Affix: prefix, suffix, infix, circumfix

• Prefix: e.g., possible impossible

16

Morphological Parsing

• Goal: Take a surface word form and generate a linguistic structure of component morphemes

• A morpheme is the minimal meaning-bearing unit in a language.– Stem: the morpheme that forms the central

meaning unit in a word– Affix: prefix, suffix, infix, circumfix

• Prefix: e.g., possible impossible• Suffix: e.g., walk walking

17

Morphological Parsing

• Goal: Take a surface word form and generate a linguistic structure of component morphemes

• A morpheme is the minimal meaning-bearing unit in a language.– Stem: the morpheme that forms the central meaning

unit in a word– Affix: prefix, suffix, infix, circumfix

• Prefix: e.g., possible impossible• Suffix: e.g., walk walking• Infix: e.g., hingi humingi (Tagalog)

18

Morphological Parsing

• Goal: Take a surface word form and generate a linguistic structure of component morphemes

• A morpheme is the minimal meaning-bearing unit in a language.– Stem: the morpheme that forms the central meaning

unit in a word– Affix: prefix, suffix, infix, circumfix

• Prefix: e.g., possible impossible• Suffix: e.g., walk walking• Infix: e.g., hingi humingi (Tagalog)• Circumfix: e.g., sagen gesagt (German)

19

Surface Variation & Morphology

• Searching (a la Bing) for documents about:– Televised sports

20

Surface Variation & Morphology

• Searching (a la Bing) for documents about:– Televised sports

• Many possible surface forms:– Televised, television, televise,..– Sports, sport, sporting,…

21

Surface Variation & Morphology

• Searching (a la Bing) for documents about:– Televised sports

• Many possible surface forms:– Televised, television, televise,..– Sports, sport, sporting,…

• How can we match?

22

Surface Variation & Morphology

• Searching (a la Bing) for documents about:– Televised sports

• Many possible surface forms:– Televised, television, televise,..– Sports, sport, sporting,…

• How can we match?– Convert surface forms to common base form

• Stemming or morphological analysis

23

Two Perspectives

• Stemming:– writing

24

Two Perspectives

• Stemming:– writing write (or writ)– Beijing

25

Two Perspectives

• Stemming:– writing write (or writ)– Beijing Beije

• Morphological Analysis:

26

Two Perspectives

• Stemming:– writing write (or writ)– Beijing Beije

• Morphological Analysis:– writing write+V+prog

27

Two Perspectives

• Stemming:– writing write (or writ)– Beijing Beije

• Morphological Analysis:– writing write+V+prog– cats cat + N + pl– writes write+V+3rdpers+Sg

Stemming

• Simple type of morphological analysis• Supports matching using base form• e.g. Television, televised, televising televise

Stemming

• Simple type of morphological analysis• Supports matching using base form• e.g. Television, televised, televising televise• Most popular: Porter stemmer

Stemming

• Simple type of morphological analysis• Supports matching using base form• e.g. Television, televised, televising televise• Most popular: Porter stemmer

• Task: Given surface form, produce base form– Typically, removes suffixes

Stemming

• Simple type of morphological analysis• Supports matching using base form• e.g. Television, televised, televising televise• Most popular: Porter stemmer

• Task: Given surface form, produce base form– Typically, removes suffixes

• Model:– Rule cascade– No lexicon!

32

Stemming

• Used in many NLP/IR applications• For building equivalence classes

ConnectConnectedConnectingConnectionConnections

Porter Stemmer, simple and efficientWebsite: http://www.tartarus.org/~martin/PorterStemmer

On patas: ~/dropbox/12-13/570/porter

Same class;

suffixes irrelevant

Porter Stemmer

• Rule cascade:– Rule form:

• (condition) PATT1 PATT2

Porter Stemmer

• Rule cascade:– Rule form:

• (condition) PATT1 PATT2• E.g. stem contains vowel, ING -> ε

Porter Stemmer

• Rule cascade:– Rule form:

• (condition) PATT1 PATT2• E.g. stem contains vowel, ING -> ε• ATIONAL ATE

Porter Stemmer

• Rule cascade:– Rule form:

• (condition) PATT1 PATT2• E.g. stem contains vowel, ING -> ε• ATIONAL ATE

– Rule partial order:• Step1a: -s• Step1b: -ed, -ing

Porter Stemmer

• Rule cascade:– Rule form:

• (condition) PATT1 PATT2• E.g. stem contains vowel, ING -> ε• ATIONAL ATE

– Rule partial order:• Step1a: -s• Step1b: -ed, -ing• Step 2-4: derivational suffixes

Porter Stemmer

• Rule cascade:– Rule form:

• (condition) PATT1 PATT2• E.g. stem contains vowel, ING -> ε• ATIONAL ATE

– Rule partial order:• Step1a: -s• Step1b: -ed, -ing• Step 2-4: derivational suffixes• Step 5: cleanup

• Pros:

Porter Stemmer

• Rule cascade:– Rule form:

• (condition) PATT1 PATT2• E.g. stem contains vowel, ING -> ε• ATIONAL ATE

– Rule partial order:• Step1a: -s• Step1b: -ed, -ing• Step 2-4: derivational suffixes• Step 5: cleanup

• Pros: Simple, fast, buildable for a variety of languages• Cons:

Porter Stemmer

• Rule cascade:– Rule form:

• (condition) PATT1 PATT2• E.g. stem contains vowel, ING -> ε• ATIONAL ATE

– Rule partial order:• Step1a: -s• Step1b: -ed, -ing• Step 2-4: derivational suffixes• Step 5: cleanup

• Pros: Simple, fast, buildable for a variety of languages• Cons: Overaggressive and underaggressive

41

STEMMING & EVAL

42

Evaluating Performance

• Measures of Stemming Performance rely on similar metrics used in IR:– Precision: measure of the proportion of selected items

the system got right• precision = tp / (tp + fp)• # of correct answers / # of answers given

– Recall: measure of the proportion of the target items the system selected• recall = tp / (tp + fn)• # of correct answers / # of possible correct answers

– Rule of thumb: as precision increases, recall drops, and vice versa

• Metrics widely adopted in Stat NLP

43

Precision and Recall

• Take a given stemming task– Suppose there are 100 words that could be

stemmed– A stemmer gets 52 of these right (tp)– But it inadvertently stems 10 others (fp)

Precision = 52 / (52 + 10) = .84

Recall = 52 / (52 + 48) = .52

44

Precision and Recall

• Take a given stemming task– Suppose there are 100 words that could be

stemmed– A stemmer gets 52 of these right (tp)– But it inadvertently stems 10 others (fp)

Precision = 52 / (52 + 10) = .84

Recall = 52 / (52 + 48) = .52

Note: easy to get precision of 1.0. Why?

45

Baseline Tokenizer 1 Tokenizer 2 Tokenizer 3 Tokenizer 4After After After After Aftercoming coming coming coming comingclose close close close close Precision Recall F-Measure

to to to to to Tokenizer 1 0.827586 0.888889 0.858237548

a a a a a Tokenizer 2 0.961538 0.925926 0.943732194

partial partial partial partial partial Tokenizer 3 0.928571 0.962963 0.945767196

settlementsettlement settlement settlement settlement Tokenizer 4 1 1 1

a a a a ayear year year year yearago ago ago ago ago, , , , ,shareholdersshareholders shareholders shareholders shareholderswho who who who whofiled filed filed filed filedcivil civil civil civil civilsuits suits suits suits suitsagainst against against against againstIvan Ivan Ivan Ivan IvanF. F F. F. F.Boesky . Boesky Boesky Boesky& Boesky & & &Co. & Co. Co Co.L.P. Co L.P. . L.P.Drexel . Drexel's L.P. Drexel's L.P plaintiffs Drexel 'splaintiffs . ' 's plaintiffs' Drexel plaintiffs '

's 'plaintiffs

WEIGHTED AUTOMATA & MARKOV CHAINS

PFA Definition

• A Probabilistic Finite-State Automaton is a 6-tuple:– A set of states Q– An alphabet Σ– A set of transitions: δsubset Q x Σ x Q– Initial state probabilities: Q R+

– Transition probabilities: δ R+

– Final state probabilities: Q R+

PFA Recap

• Subject to constraints:

• Computing sequence probabilities

PFA Example

• Example– I(q0)=1

– I(q1)=0

– F(q0)=0

– F(q1)=0.2

– P(q0,a,q1)=1; P(q1,b,q1) =0.8

– P(abn) = I(q0)*P(q0,a,q1)*P(q1,b,q1)n*F(q1)

– = 0.8n*0.2

Markov Chain

• A Markov Chain is a special case of a PFA in which the sequence uniquely determines which states the automaton will go through.

• Markov Chains can not represent inherently ambiguous problems– Can assign probability to unambiguous

sequences

Markov Chain for Words

Markov Chain for Pronunciation

• Observations: 0/1

Markov Chain for Walking through Groningen

Markov Chain: “First-order observable Markov Model”

• A set of states – Q = q1, q2…qN; the state at time t is qt

Markov Chain: “First-order observable Markov Model”

• A set of states – Q = q1, q2…qN; the state at time t is qt

• Transition probabilities: – a set of probabilities A = a01a02…an1…ann.

– Each aij represents the probability of transitioning from state i to state j

– The set of these is the transition probability matrix A

Markov Chain: “First-order observable Markov Model”

• A set of states – Q = q1, q2…qN; the state at time t is qt

• Transition probabilities: – a set of probabilities A = a01a02…an1…ann.

– Each aij represents the probability of transitioning from state i to state j

– The set of these is the transition probability matrix A• Distinguished start and final states

– q0,qF

Markov Chain: “First-order observable Markov Model”

• A set of states – Q = q1, q2…qN; the state at time t is qt

• Transition probabilities: – a set of probabilities A = a01a02…an1…ann.

– Each aij represents the probability of transitioning from state i to state j

– The set of these is the transition probability matrix A• Distinguished start and final states

– q0,qF

• Current state only depends on previous state

Markov Models

• The parameters of a MM can be arranged in matrices

• The A-matrix for the set of transition probabilities:

p11 p12 …p1j

A = p21 p22 …p2j

…[ ]

Markov Models

• The parameters of a MM can be arranged in matrices

• The A-matrix for the set of transition probabilities:

p11 p12 …p1j

A = p21 p22 …p2j

• What’s missing?

[ ]

Markov Models

• The parameters of a MM can be arranged in matrices

• The A-matrix for the set of transition probabilities:

p11 p12 …p1j

A = p21 p22 …p2j

• What’s missing? Starting probabilities.

[ ]

Markov Models

• Exercise– Build the transition probability matrix over

this set of dataThe duck died.

The car killed the duck.

The duck died under her car.

We duck under the car.

We retrieve the poor duck.

– Build the starting probability matrix

Markov Models

• Exercise– Given your model, what’s the probability for

each of the following sentences?The duck died under her car.

We duck under the car.

The duck under the car.

We retrieve killed the duck.

We the poor duck died.

We retrieve the poor duck under the car.

– For a given start state (The, We), what’s the most likely string (of the above)?

HMMs

• Next class

top related