Đề cương chi tiết bài giảng xlnntn

1

B MN DUYT

Ch nhim B mn

Ng Hu Phc

CNG CHI TIT BI GING

(Dng cho tit ging)

Hc phn:

X L NGN NG T NHIN

Nhm mn hc:.....................

B mn: Khoa hc my tnh

Khoa (Vin): CNTT

Thay mt nhm

mn hc

H Ch Trung

Thng tin v nhm mn hc

TT H tn gio vin Hc hm Hc v

1 H Ch Trung GVC TS

3 Nguyn Trung Tn TG TS

a im lm vic: Gi hnh chnh, B mn Khoa hc my tnh Tng 13

nh S4 Hc vin K thut Qun s.

a ch lin h: B mn Khoa hc my tnh Khoa Cng ngh thng tin

Hc vin K thut Qun s. 236 Hong Quc Vit.

in thoi, email: 01685582102, [email protected];

Bi ging 01: Tng quan v x l ngn ng t nhin

Chng I, mc:

Tit th: 1-3 Tun th: 1

- Mc ch yu cu

Mc ch: Trang b nhng hiu bit chung nht v mn hc; Nm vng

cc khi nim, bi ton c bn trong X l ngn ng t nhin, c s ton hc

lm c s hc tp mn hc.

Yu cu: sinh vin phi h thng li cc kin thc c s v ton ri rc,

kin thc lp trnh, t nghin cu v n tp li nhng vn l thuyt ngn ng

hnh thc v vn phm.

- Hnh thc t chc dy hc: L thuyt, tho lun, t hc, t nghin cu

- Thi gian: Gio vin ging: 2 tit; Tho lun v lm bi tp trn lp: 1 tit;

Sinh vin t hc: 6 tit.

- a im: Ging ng do P2 phn cng.

- Ni dung chnh:

2

1. Ti sao cn hc XLNNTN?

2. ng dng ca x l ngn ng t nhin

3. Cc vn ca XLNNTN

4. Ni dung mn hc

1. Ti sao cn hc XLNNTN?

NLP - L mt nhnh ca tr tu nhn to tp trung vo cc ng dng

trn ngn ng ca con ngi. Trong tr tu nhn to th x l ngn ng t nhin

l mt trong nhng phn kh nht v n lin quan n vic phi hiu ngha

ngn ng-cng c hon ho nht ca t duy, giao tip.

Cc thut ton NLP hin i l c c s da trn cc thnh tu ca hc

my, c bit l hc my thng k. Nghin cu cc thut ton NLP hin i i

hi mt s hiu bit trn nhiu lnh vc khc nhau, bao gm ngn ng hc, khoa

hc my tnh, v xc sut thng k.

2. Ti sao XLNNTN l kh?

Ambiguity

At last, a computer that understands you like your mother"

1. (*) It understands you as well as your mother understands you

2. It understands (that) you like your mother

3. It understands you as well as it understands your mother

1 and 3: Does this mean well, or poorly?

Ambiguity at Many Levels

At the acoustic level (speech recognition):

1. : : : a computer that understands you like your mother"

2. : : : a computer that understands you lie cured mother"

Ambiguity at Many Levels

At the syntactic level:

Different structures lead to different interpretations.

ng gi i rt nhanh

At the semantic (meaning) level:

Two definitions of mother"

a woman who has given birth to a child

3

a stringy slimy substance consisting of yeast cells and bacteria; is added

to cider or wine to produce vinegar

At the semantic (meaning) level:

They put money in the bank

= buried in mud?

I saw her duck with a telescope

At the discourse (multi-clause) level:

Alice says they've built a computer that understands you like your mother

But she

doesn't know any details

doesn't understand me at all

This is an instance of anaphora, where she co-referees to some other

discourse entity

V d: ng gi i rt nhanh.

3. ng dng ca x l ngn ng t nhin

XLNN l mt trong nhng lnh vc mi nhn trong x hi thng tin

1. Xy dng kho thut ng (Terminological Resources Construction)

Mc ch: xy dng t in thut ng chuyn ngnh; bng thut ng dng

trong nh my, x nghip; t in ln dng cho cc h thng ch mc ho ti

liu; t in thut ng song ng dng cho dch thut v.v; Thu thp thut ng t

kho vn bn.

Cch tip cn: Xc nh t, ng on danh t. Xc nh cc nhm t

thng cng xut hin (collocation)

2. Tm kim, truy xut thng tin (Information Retrieval/Extraction)

Mc ch: Tm kim cc vn bn c lin quan n truy vn; Sp xp cc

vn bn tm c.

4

Cch tip cn: Ch mc ho ti liu (indexation); X l cu truy vn

(chun ho, tm thut ng tng ng, v.v.); Sp xp kt qu truy vn (nh

gi lin quan ca ti liu so vi truy vn)

3. Tm tt vn bn (Text Summary)

Mc ch: Sinh tm tt vn bn t ng

Cch tip cn: Hiu vn bn t ng, rt gn, sinh tm tt; Xc nh cc

n v vn bn ni bt, chn on vn bn tng ng, gp tm tt; Lc tm tt

vn bn nh phn loi ng ngha cu da theo cc cu trc ngn ng.

4. Dch t ng (Machine Translation)

Mc ch: Dch t ng; Tr gip dch bng my

Cch tip cn: Phn tch vn bn ngun (sa li, chun ho, n gin ho,

ch gii ngn ng); Dch t ng (kh thi trn cc vn bn trong phm vi

hp)/bn t ng (can thip trn ngn ng ngun hoc ch); Sa bn dch

5. Hiu vn bn t ng (Automatic Text Comprehension)

Mc ch: Nhn bit ch vn bn; Thit lp quan h gia cc cu (cu

trc nguyn nhn, chui thi gian, i t, v.v).

Cch tip cn: Phn tch cu trc vn bn thit lp c quan h gia

cc thnh phn trong vn bn; Phn tch ch , hnh ng, nhn vt, cu trc

mnh v.v.

6. Sinh vn bn t ng (Automatic Text Generation)

Mc ch: Sinh vn bn cho h thng dch; Sinh vn bn cho h thng hi

thoi ngi my; Sinh vn bn din t cc d liu s

Cch tip cn: Phn tch ni dung mc su: mng ng ngha, khi nim;

T chc ni dung su thnh cc mnh cn din t; Xy dng cy c php,

chnh sa hnh thi t.

7. i thoi ngi - my (Human-Machine Dialogue)

Mc ch: Xy dng h thng giao tip ngi my.

Cch tip cn: Tin x l u vo: nhn dng ting ni; Hiu vn bn t

ng (c bit ch n vn phn tch tham chiu - reference); Sinh vn bn

t ng; Tng hp ting ni.

1.2. Cc vn ca XLNN

1. X l n ng (Monolingual Processing)

Phn tch vn bn tng cp

- T vng (Lexical/Morpho-syntactic Analysis)

- C php (Syntactic Analysis/Parsing)

- Ng ngha (Semantic Analysis)

5

- Ng dng (Pragmatics)

2. X l a ng (Multilingual Processing)

Xy dng cng c

- Ging hng a ng (Multilingual Alignment)

- Tr gip dch a ng (Machine Translation)

- Tm kim thng tin a ng (Multilingual Information Retrieval)

1.3. Ti nguyn ngn ng cho XLNN

1. Tm quan trng

Cng c v ti nguyn trong XLNN

Cng c, phng php: mang tnh tng qut, p dng c cho nhiu

ngn ng

Ti nguyn: c trng cho tng ngn ng; xy dng rt tn km) dn n

nhu cu chia s, trao i ti nguyn ngn ng

Cc "ngn hng" ng liu ln:

LDC (Linguistic Data Consortium), ELDA (Evaluations and Language

resources Distribution Agency), OLAC (Open Language Archives Community),

v.v.

2. X l n ng

T in (lexicon)

- Thng tin hnh thi (morphology)

- Thng tin c php (syntax)

- Thng tin ng ngha (semantics), bn th hc (ontology)

Ng php (grammar)

- Vn phm hnh thc (Grammar Formalisms)

Kho vn bn (Corpora)

- Kho vn bn th (Raw Corpus)

- Kho vn bn c ch gii ngn ng (Annotated Corpus) t, t loi,

cu trc ng php, v.v.

3. X l a ng

T in a ng

- T in song ng

- T in a ng

Ng php

- Vn phm song ng(Bilingual Grammar)

Kho vn bn a ng (Multilingual/Parallel Corpus)

- Kho vn bn a ng th

6

- Kho vn bn a ng ging hng (Aligned Multilingual Corpus),

c hoc khng c ch gii ngn ng

- B nh dch (Translation Memory)

1.4. Vn chun ho (Standardization)

1. Yu cu chun ho ti nguyn ngn ng

Nhu cu trao i ng liu: Biu din nht qun; M ho chun

Cc hot ng chun ho: Cc d n hng ti chun (EAGLES, TEI,

v.v.); D n ISO TC 37/SC 4

2. Cc kha cnh chun ho

M hnh biu din: T in; Ch gii kho vn bn, v.v.

Thut ng, phm tr d liu: Thut ng chun (Terminology); DCR (Data

Category Registry)

Ngn ng m ho: XML; RDF (Resource Description Framework), OWL

(Web Ontology Language), v.v.

4. Ni dung mn hc

1. Tng quan v x l ngn ng t nhin (1 lecture)

2. B tc mt s khi nim, thut ng trong NLP (1 lecture)

3. M hnh ngn ng v cc k thut lm mn (1 lecture)

4. Vn gn nhn v m hnh Markov n (2 lectures)

5. Phn tch da trn thng k (2 lectures)

6. Dch my (2 lectures)

7. Log-linear models (2 lectures)

8. Conditional random fields, and global linear models (2 lectures)

9. Unsupervised/semi-supervised learning in NLP (2 lectures)

- Ni dung tho lun

1. Phn bit cc dng thc ngn ng s ging, khc nhau gia ngn ng

lp trnh v ngn ng t nhin.

- Yu cu SV chun b

n tp li cc kin thc lin quan n l thuyt ngn ng hnh thc,

automata hu hn v biu thc chnh quy.

- Ti liu tham kho

1. Speech&Language Procesing: An Introduction to Natural Language

Processing, Computational Linguistics, and Speech Recognition, 2nd

edition, Daniel Jurafsky and James Martin. Prentice Hall, 2008. Chng 1.

7

2. Foundations of Statistical Natural Language Processing, Christopher

Manning and Hinrich Schtze, MIT Press, 1999. Chng 1.

3. Data Mining: Practical Machine Learning Tools and Techniques (3rd ed),

Ian H. Witten and Eibe Frank, Morgan Kaufmann, 2005. Chng 1.

- Ghi ch: Cc mn hc tin quyt : tr tu nhn to, cu trc d liu v gii

thut, lp trnh cn bn.

Bi ging 02: B tc mt s khi nim, thut ng trong XLNNTN

Chng I, mc:


- Mc ch yu cu

Mc ch: Cung cp cc khi nim v thut ng c bn trong x l ngn

ng t nhin; cc vn t ra trong x l ngn ng t nhin v ng dng

Yu cu: Sinh vin nm vng khi nim lm tin cho theo di cc bi

ging tip theo ca mn hc.





- Ni dung chnh:

2.1. Tm tt c im ting Vit

1. Lch s pht trin ting Vit

Qu trnh pht trin: H Nam , nhnh Mn-Khmer, khi ng Mn-

Khmer, nhm Vit-Mng (A. Haudricourt, 1953); Quan h tip xc vi cc

ngn ng trong khu vc, c bit l cc ting h Thi; Thi k Bc thuc, vay

mn ting Hn (xp x 70% vn t vng ting Vit gc Hn); Thi k Php

thuc, vay mn t ting Php, "sao phng ng php" chu u

Loi hnh ngn ng ting Vit

Cc loi hnh ngn ng

Bin hnh (flexional languages)

- Bin i hnh thi t th hin quan h ng php

- Cu to t: cn t, ph t kt hp cht ch

- Mt ph t c th biu din nhiu ngha ng php

- V d: ting Anh, Php, Nga

8

Chp dnh (agglutinating languages)

- Cu to t mi bng cch chp dnh cn t vi cc ph t

- Cn t c th ng c lp

- Mi ph t ch th hin mt ngha nht nh

- V d: ting Th Nh K, Nht, Triu Tin

a tng hp (polysynthetic languages)

- C n v t c bit c th lm thnh cu

- C c tnh cht ca ngn ng bin hnh v chp dnh

- V d: Mt s ngn ng vng Kapkaz

n lp (isolating languages)

- T khng c hin tng bin hnh

- Quan h ng php c din t bng trt t t (word order) hoc cc h t

(tool words)

- n v hnh tit = m tit (syllable) = hnh v (morpheme)

- V d: Hn, Thi, Vit l ngn ng n lp.

Ch vit v h thng m

Ch vit

- Da trn bng ch ci latin

- Ch vit: k m (phonetic transcription)

- Cc quy nh chun ho cha c tn trng (i hay y, qui hay quy, phin m

ting nc ngoi)

H thng m

- H thng m chun cho ting Vit ph thng (cha c a vo t in)

- Cc cch pht m a phng

- (Tham kho thm http://www.vietlex.com)

T v t loi ting Vit

T trong t in ting Vit (Trung tm t in hc)

T n: t n tit, mt s t a tit

T phc: t a tit

- Kt hp chnh ph (semantic subordination): xe p

- Kt hp song song (semantic coordination): qun o, non nc, giang sn

- Ly (reduplication): trng trng

- Qun ng (expression): u b u bu

T loi trong t in ting Vit

9

- Danh t (noun), ng t (verb), tnh t (adjective), i t (pronoun), ph t

(adverb), kt t (conjunction/linking word), tnh thi t (modal word), thn t

(interjection)

- Hin tng chuyn loi (category mutation) ph bin

Ng php

Cu to ng

- Th t chnh - ph ng vai tr ch o

- S dng h t th hin s nhiu, quan h thi, quan h ph thuc v lin hp

song song

- S dng dng ly, ng iu thay i sc thi ngha

Cu to cu

- Th t thng thng S-V-O

- Th t - thuyt (topic prominent): Cy l to. Nh xy ri.

2.2. Phn tch t vng

Mt s thut ng

T (word)

- Hnh v (morpheme), gc t (stem), t v (lexeme), t v chun tc

(lemma)

T loi (part-of-speech - POS)

- Phn loi t (word category): danh t, ng t, tnh t, v.v.

- c im hnh thi t (morphology): dng t bin hnh (inflectional

forms)

Phn tch t vng ting Vit

Phn on t (Word segmentation): Nhp nhng do t a tit; Cng

c hin c?

Gn nhn t loi (POS tagging): Xc nh tp t loi; Gii quyt

nhp nhng do hin tng chuyn loi, t ng ngha; Khng da c

vo hnh thi t; Cng c hin c?

2.3. Phn tch ng php

Phn tch cm t (chunking):

- Phn tch c php nng

- Hai cch tip cn: quy tc (vn phm chnh quy), thng k (bi ton

gn nhn)

- Ph thuc vo kt qu tch t v gn nhn t loi

10

- Cng c hin c?

Phn tch cy c php (parsing)

- C php thnh phn (constituency), c php ph thuc

- (dependency)

- Hai cch tip cn: Thng k, da vo quy tc

- Cng c hin c?

2.4. Phn tch ng ngha

- Ni dung tho lun

2. Kinh nghim trong qu trnh bin dch v debug khi lp trnh trong mi

trng Turbo C v Visual C++.

3. S ging, khc nhau gia ngn ng lp trnh v ngn ng t nhin.

4. S ging, khc nhau gia trnh bin dch v ngi bin dch.

- Yu cu SV chun b



- Bi tp

- Ti liu tham kho




- Cu hi n tp

- Ghi ch: Cc mn hc tin quyt : ton ri rc, cu trc d liu v gii thut,

lp trnh cn bn.

Bi ging 03: M hnh ngn ng v cc k thut lm mn

Chng I, mc:


- Mc ch yu cu

Mc ch: Trang b kin thc v m hnh ha cc m hnh biu din ngn

ng t nhin.

Yu cu: Nm vng cc m hnh biu din ngn ng trong hc my.

11





- Ni dung chnh:

1. Vn m hnh ha ngn ng

2. M hnh N-gram (bigram, trigram)

3. nh gi m hnh ngn ng

4. Cc k thut lm mn

4.1. Ni suy tuyn tnh (Linear interpolation)

4.2. Chit khu (Discounting methods)

4.3. Truy hi (Back-off)

1. Vn m hnh ha ngn ng

M hnh ngn ng c p dng trong rt nhiu lnh vc ca x l ngn

ng t nhin nh: kim li chnh t, dch my hay phn on t... Chnh v vy,

nghin cu m hnh ngn ng chnh l tin nghin cu cc lnh vc tip

theo.

M hnh ngn ng c nhiu hng tip cn, nhng ch yu c xy dng

theo m hnh Ngram.

M hnh ngn ng l mt phn b xc sut trn cc tp vn bn. Ni n

gin, m hnh ngn ng c th cho bit xc sut mt cu (hoc cm t) thuc

mt ngn ng l bao nhiu.

V d 1: khi p dng m hnh ngn ng cho ting Vit:

P[hm qua l th nm] = 0.001

P[nm th hm l qua] = 0

V d 2: We have some (finite) vocabulary,

say V = {the, a, man, telescope, Beckham, two, }

I We have an (infinite) set of strings, Vt

the STOP

a STOP

the fan STOP

the fan saw Beckham STOP

the fan saw saw STOP

the fan saw Beckham play for Real Madrid STOP

12

We have a training sample of example sentences in English

We need to learn a probability distribution p i.e., p is a function that

satisfies:

() = 1, () 0

p(the STOP) = 10-12

p(the fan STOP) = 10-8

p(the fan saw Beckham STOP) = 2 x10-8

p(the fan saw saw STOP) = 10-15

p(the fan saw Beckham play for Real Madrid STOP) = 2 x10-9

2. M hnh N-gram (bigram, trigram)

Nhim v ca m hnh ngn ng l cho bit xc sut ca mt cu ww...wl

bao nhiu. Theo cng thc Bayes: P(AB) = P(B|A) * P(A), th:

P(www) = P(w) * P(w|w) * P(w|ww) ** P(w|www)

Theo cng thc ny, m hnh ngn ng cn phi c mt lng b nh v

cng ln c th lu ht xc sut ca tt c cc chui di nh hn m. R

rng, iu ny l khng th khi m l di ca cc vn bn ngn ng t nhin

(m c th tin ti v cng). c th tnh c xc sut ca vn bn vi lng

b nh chp nhn c, ta s dng xp x Markov bc n:

P(w|w,w,, w) = P(w|w,w, ,w)

Nu p dng xp x Markov, xc sut xut hin ca mt t (w) c coi

nh ch ph thuc vo n t ng lin trc n (www) ch khng phi ph

thuc vo ton b dy t ng trc (www). Nh vy, cng thc tnh xc sut

vn bn c tnh li theo cng thc:

P(www) = P(w) * P(w|w) * P(w|ww) ** P(w|www)* P(w|www)

Vi cng thc ny, ta c th xy dng m hnh ngn ng da trn vic

thng k cc cm c t hn n+1 t. M hnh ngn ng ny gi l m hnh ngn

ng N-gram.

Mt cm N-gram l 1 dy con gm n phn t lin tip nhau ca 1 dy cc

phn t cho trc.

3. nh gi m hnh ngn ng

Khi s dng m hnh N-gram theo cng thc xc sut th, s phn b

khng u trong tp vn bn hun luyn c th dn n cc c lng khng

13

chnh xc. Khi cc N-gram phn b tha, nhiu cm n-gram khng xut hin

hoc ch c s ln xut hin nh, vic c lng cc cu c cha cc cm n-

gram ny s c kt qu ti. Vi

V l kch thc b t vng, ta s c Vcm N-gram c th sinh t b t

vng. Tuy nhin, thc t th s cm N-gram c ngha v thng gp ch chim

rt t.

V d: ting Vit c khong hn 5000 m tit khc nhau, ta c tng s cm

3-gram c th c l: 5.000= 125.000.000.000 Tuy nhin, s cm 3-gram thng

k c ch xp x 1.500.000. Nh vy s c rt nhiu cm 3-gram khng xut

hin hoc ch xut hin rt t.

Khi tnh ton xc sut ca mt cu, c rt nhiu trng hp s gp cm

Ngram cha xut hin trong d liu hun luyn bao gi. iu ny lm xc sut

ca c cu bng 0, trong khi cu c th l mt cu hon ton ng v mt ng

php v ng ngha. khc phc tnh trng ny, ngi ta phi s dng mt s

phng php lm mn (Estimation techniques).

4. Cc k thut lm mn:

khc phc tnh trng cc cm N-gram phn b tha nh cp,

ngi ta a ra cc phng php lm mn kt qu thng k nhm nh gi

chnh xc hn (mn hn) xc sut ca cc cm N-gram. Cc phng php lm

mn nh gi li xc sut ca cc cm N-gram bng cch:

Gn cho cc cm N-gram c xc sut 0 (khng xut hin) mt gi tr

khc 0.

Thay i li gi tr xc sut ca cc cm N-gram c xc sut khc 0

(c xut hin khi thng k) thnh mt gi tr ph hp (tng xc sut

khng i).

Cc phng php lm mn c th c chia ra thnh loi nh sau:

Chit khu (Discounting): gim (lng nh) xc sut ca cc

cm Ngram c xc sut ln hn 0 b cho cc cm Ngram khng

xut hin trong tp hun luyn.

Truy hi (Back-off): tnh ton xc sut cc cm Ngram khng xut

hin trong tp hun luyn da vo cc cm Ngram ngn hn c xc

sut ln hn 0

Ni suy (Interpolation): tnh ton xc sut ca tt c cc cm

Ngram da vo xc sut ca cc cm Ngram ngn hn.

- Yu cu SV chun b

14



- Ti liu tham kho






- Cu hi n tp


lp trnh cn bn.

Bi ging 04: Ngn ng hnh thc v Automata hu hn

Chng 4, mc:

Tit th: 1-3 Tun th: 4,5

- Mc ch yu cu

Mc ch: Nm c cc khi nim c bn v cng c lm vic vi ngn

ng l vn phm phi ng cnh v automata hu hn.

Yu cu: Nm vng l thuyt ngn ng hnh thc, cc dng thc automata

hu hn v ng dng trong x l ngn ng.





- Ni dung chnh:

1. An introduction to the parsing problem

2. Context free grammars

3. A brief(!) sketch of the syntax of English

4. Examples of ambiguous structures

5. Probabilistic Context-Free Grammars (PCFGs)

6. The CKY Algorithm for parsing with PCFGs

7. Lexicalization of a treebank

15

8. Lexicalized probabilistic context-free grammars

9. Parameter estimation in lexicalized probabilistic context-free grammars

10. Accuracy of lexicalized probabilistic context-free grammars

1. C php (Syntax)

Mc ch phn tch c php: Kim tra mt cu c ng ng php hay

khng; Ch ra cc ng on (syntagm) v quan h ph thuc gia chng cho

vic xy dng ngha ca cu

V d: Con mo con ang xi mt con chut cng to b.

[[[Con [mo]] con]NP[[ang xi] [mt [[con[chut cng]] to b]]NP]VP]S.

T vng v ng php:

T vng v ng php:

T vng (lexicon): T vng cha tt c cc t trong ngn ng; T vng

phi cha cc thng tin ng m, hnh thi, ng php, ng ngha ca mi t

Ng php (grammar): Phm tr ng php (t loi, ng on, v.v.); Quy tc

(ng m, hnh thi, ng php, ng ngha, ng dng)

T vng v ng php b sung cho nhau.

2. Vn phm hnh thc

Vn phm G l mt b sp th t gm 4 thnh phn G = < , , S, P >,

trong :

o - bng ch ci, gi l bng ch ci c bn (bng ch ci kt thc

terminal symbol);

o , =, gi l bng k hiu ph (bng ch ci khng kt thc

nonterminal symbol);

o S - k hiu xut pht hay tin (start variable);

o P - tp cc lut sinh (production rules) dng , , ( )*,

trong cha t nht mt k hiu khng kt thc (i khi, ta gi chng

l cc qui tc hoc lut vit li).

Cc quy c trong vic a ra vn phm. Trong mn hc s dng:

o Ch ci in hoa A, B, C, biu th cc bin, trong S l k hiu

xut pht;

o X, Y, Z, biu din cc k t cha bit hoc cc bin;

o a, b, c, d, e, biu din ch ci;

o u, v, w, x, y, z, biu din chui ch ci;

o , , , biu th chui cc bin hoc cc k hiu kt thc.

16

Khi nim v dn xut trc tip, dn xut gin tip, dn xut ng lc v

khng lp, cy dn xut, vn phm tng ng, ngn ng sinh bi vn phm.

Avram Noam Chomsky a ra mt h thng phn loi cc vn phm da

vo tnh cht ca cc lut sinh (1956).

Vn phm loi 0 Vn phm khng hn ch (UG Unrestricted

Grammar): khng cn tha iu kin rng buc no trn tp cc lut sinh;

Vn phm loi 1 Vn phm cm ng cnh (CSG Context Sensitive

Grammar): nu vn phm G c cc lut sinh dng v:

, || ||;

Vn phm loi 2 Vn phm phi ng cnh (CFG Context-Free

Grammar): c lut sinh dng A vi A l mt bin n v l chui cc k

hiu thuc ( )*;

Vn phm loi 3 Vn phm chnh quy (RG Regular Grammar): c

mi lut sinh dng tuyn tnh phi hoc tuyn tnh tri.

Tuyn tnh phi: A aB hoc A a;

Tuyn tnh tri: A Ba hoc A a;

Vi A, B l cc bin n, a l k hiu kt thc (c th l rng).

Nu k hiu L0, L1, L2, L3 l lp cc ngn ng c sinh ra bi vn phm

loi 0, 1, 2, 3 tng ng, ta c: L3 L2 L1 L0.

L0, L1 - lp ngn ng quy on nhn c bng my Turing

L2 lp ngn ng i s nhn bit c nh tmat y xung

L3 lp ngn ng nhn bit c nh tmat hu hn trng thi

Vn phm hnh thc cho phn tch c php: ngha ngn ng hc ca

G=

biu din t vng ca ngn ng;

biu din cc phm tr ng php: cu, cc ng on (danh ng, ng

ng, v.v), cc t loi (danh t, ng t, v.v.)

Tin S tng ng vi phm tr cu

Tp quy tc sinh P biu din cc quy tc c php. Cc quy tc cha t nht

mt k hiu kt (t) gi l quy tc t vng. Cc quy tc khc gi l quy tc ng

on.

Mi t trong t vng (t in) c m t bng mt tp cc quy tc sinh

cha t ny v phi.

Mi cy dn xut (cy c php) m t phn tch ca mt ng on thnh

cc thnh phn trc tip.

, , , ,A A

17

Nhp nhng ngn ng v vn phm:

Nhp nhng t vng: Mt t c nhiu t loi ()

Nhp nhng ng on: Mt cm t c th phn tch thanh cc cm con

theo nhiu cch khc nhau (ng gi i nhanh qu)

3. Ngn ng chnh quy v tmat

Automata l mt my tru tng (m hnh tnh ton) c c cu v hot

ng n gin nhng c kh nng on nhn ngn ng.

Finite automata (FA) - m hnh tnh ton hu hn: c khi u v kt

thc, mi thnh phn u c kch thc hu hn c nh v khng th m rng

trong sut qu trnh tnh ton;

Hot ng theo theo tng bc ri rc (steps);

Ni chung, thng tin ra sn sinh bi mt FA ph thuc vo c thng tin vo

hin ti v trc . Nu s dng b nh (memory), gi s rng n c t nht

mt b nh v hn;

S phn bit gia cc loi automata khc nhau ch yu da trn vic

thng tin c th c a vo memory hay khng;

nh ngha: mt DFA l mt b nm: A=(Q, , , q0, F), trong :

1. Q : tp khc rng, tp hu hn cc trng thi (p, q);

2. : b ch ci nhp vo (a, b, c );

3. : D Q, hm chuyn (hay nh x), D Q , c ngha l (p, a)

=q hoc (p, a) = , trong p, q Q , a ;

4. q0 Q : trng thi bt u (start state);

5. F Q : tp cc trng thi kt thc (finish states).

Trong trng hp D = Q ta ni A l mt DFA y .

nh ngha: Automat hu hn a nh c nh ngha bi b 5: A = (Q,

, , q0, F), trong :

1. Q - tp hu hn cc trng thi.

2. - l tp hu hn cc ch ci.

3. - l nh x chuyn trng thi. : Q 2Q

4. q0 Q l trng thi khi u.

5. F Q l tp trng thi kt;

nh x l mt hm a tr (hm khng n nh), v vy A c gi l

khng n nh;

nh ngha: NFA vi -dch chuyn (NFA) l b nm: A= (Q, , , q0,

F), trong :

18

1. Q: tp hu hn cc trng thi;

2. : tp hu hn cc ch ci;

3. : Q( {}) 2Q ;

4. q0 l trng thi ban u;

5. F Q l tp trng thi kt thc.

Biu thc chnh quy

nh ngha: Biu thc chnh quy c nh ngha mt cch quy nh

sau:

1. l biu thc chnh quy. L()={}.

l biu thc chnh quy. L()={}.

nu a, a l biu thc chnh quy. L(a)={a}.

2. Nu r, s l cc biu thc chnh quy th:

((r)) l biu thc chnh quy. L((r))=L(r);

r+s l biu thc chnh quy. L(r+s)=L(r)L(s);

r.s l biu thc chnh quy. L(r.s)=L(r).L(s);

r* l biu thc chnh quy. L(r*)=L(r)*.

3. Biu thc chnh quy ch nh ngha nh trong 1 v 2.

* Tm c v RE:

1. Jeffrey E. F. Friedl. Mastering Regular Expressions, 2nd Edition.

O'Reilly & Associates, Inc. 2002.

2. http://www.regular-expressions.info/

nh l 1: Nu L l tp c chp nhn bi mt NFA th tn ti mt DFA

chp nhn L.

Gii thut tng qut xy dng DFA t NFA:

Gi s NFA A={Q, , , q0, F} chp nhn L, gii thut xy dng DFA

A={Q, , , q0, F} chp nhn L nh sau:

o Q = 2Q , phn t trong Q c k hiu l [q0, q1, , qi] vi q0, q1,

, qi Q;

o q0 = [q0];

19

o F l tp hp cc trng thi ca Q c cha t nht mt trng thi

kt thc trong tp F ca A;

o Hm chuyn ([q1, q2,..., qi], a) = [p1, p2,..., pj] nu v ch nu

({q1, q2,..., qi }, a) = {p1, p2,..., pj}.

o i tn cc trng thi [q0, q1, , qi].

nh l 2: Nu L c chp nhn bi mt NFAe th L cng c chp

nhn bi mt NFA khng c e-dch chuyn.

Thut ton: Gi s ta c NFAe A(Q, , , q0, F) chp nhn L, ta xy

dng: NFA A={Q, , , q0, F} nh sau:

o F = F q0 nu *( q0) cha t nht mt trng thi thuc F. Ngc

li, F = F;

o (q, a) = *(q, a).

H qu: Nu L l tp c chp nhn bi mt NFA th tn ti mt DFA

chp nhn L.

Gii thut xy dng cho DFA tng ng:

1. Tm kim T = e* (q0) ; T cha c nh du;

2. Thm T vo tp Q (of DFA);

3. while (xt trng thi T Q cha nh du){

3.1. nh du T;

3.2. forearch (vi mi k hiu nhp a){

U:= e*(d(T,a));

if(U khng thuc tp trng thi Q){

add U to Q;

Trng thi U cha c nh du;

}

[T,a]= U;

}

}

nh l 3: nu r l RE th tn ti mt NFA chp nhn L(r).

(chng minh: bi ging, gii thut Thompson)

20

nh l 4: Nu L c chp nhn bi mt DFA, th L c k hiu bi

mt RE.

Chng minh:

L c chp nhn bi DFA A({q1, q2,..., qn}, , , q1, F)

t Rkij = {x | (qi, x) = qj v nu (qi, y) = ql (y x) th l k} (c

ngha l Rkij - tp hp tt c cc chui lm cho automata i t trng

thi i n trng thi j m khng i ngang qua trng thi no ln hn

k)

nh ngha quy ca Rkij:

Rkij = Rk-1ik(Rk-1

kk)*Rk-1

kj Rk-1ij

Ta s chng minh (quy np theo k) b sau: vi mi Rkij u tn ti mt

biu thc chnh quy k hiu cho Rkij .

k = 0: R0ij l tp hu hn cc chui 1 k hiu hoc e

Gi s ta c b trn ng vi k-1, tc l tn ti RE

Rk-1lm sao cho L(Rk-1

lm) = rk-1

lm

Vy i vi rkij ta c th chn RE:

rkij = (rk-1

ik)(rk-1

kk)*(rk-1

kj) + rk-1

ij

b c chng minh

nhn xt: () = 1 . Vy L c th c k hiu bng RE:

r = rn1j1 + rn1j2 + + rn1jp vi F = {qj1, qj2, , qjp}

4. My chuyn hu hn trng thi (tmat hu hn c u ra)

ng dng my chuyn hu hn trng thi:

Phn on vn bn thnh cc cu, phn on cu thanh cc t

Phn tch t thnh cc hnh v (ngn ng bin hnh) Gn nhn t loi

21

Ci t trnh phn tch c php vn phm phi ng cnh: my chuyn

quy. Mi quy tc biu din bng mt my chuyn, vn phm l my chuyn vi

xu vo l cu cn phn tch, xu ra l cu phn tch c php t ngoc.

5. Vn phm phi ng cnh v phn tch c php

Thut ton phn tch c php:

Nguyn l chung

Cc chin lc:

Phn tch t di ln (nhn bit): qu trnh phn tch c hng

dn bi cu vo (quy tc sinh c dng t phi sang tri)

Phn tch t trn xung (on bit): qu trnh phn tch c hng

dn bi cc gi thuyt (quy tc sinh c dng t tri sang phi)

Kt hp 2 chin lc

hn ch vic lp li cc tnh ton, ngi ta s dng mt bng ghi nh

cc kt qu trung gian.

Hn ch ca vn phm phi ng cnh:

Cc hn ch chnh:

Cy kt qu khng th hin cc rng buc ng ngha trong cu phn tch

S a dng ca cc cu trc c php i hi mt s lng rt ln cc quy

tc ng php, nhng khng c cch biu din lin h gia chng vi nhau.

Chomsky a ra vn phm ci bin (transformational grammar), nhng vn

phm ny cng b ph phn v mt ngn ng rt nhiu. Hn na phc tp

tnh ton tr v tng ng vi vn phm dng 0.

T ra i nhiu h hnh thc vn phm mi.

Parsing (Syntactic Structure)

INPUT:

Boeing is located in Seattle.

OUTPUT:

22

Syntactic Formalisms

Work in formal syntax goes back to Chomsky's PhD thesis in the 1950s

Examples of current formalisms: minimalism, lexical functional grammar

(LFG), head-driven phrase-structure grammar (HPSG), tree adjoining

grammars (TAG), categorical grammars

Data for Parsing Experiments

Penn WSJ Treebank = 50,000 sentences with associated trees

Usual set-up: 40,000 training sentences, 2400 test sentences

An example tree:

The Information Conveyed by Parse Trees

(1) Part of speech for each word

(N = noun, V = verb, DT = determiner)

(2) Phrases

23

Noun Phrases (NP): the burglar", the apartment"

Verb Phrases (VP): robbed the apartment"

Sentences (S): the burglar robbed the apartment"

(3) Useful Relationships

=>the burglar" is the subject of robbed"

An Example Application: Machine Translation

English word order is subject - verb - object

Japanese word order is subject - object - verb

English: IBM bought Lotus

Japanese: IBM Lotus bought

English: Sources said that IBM bought Lotus yesterday

Japanese: Sources yesterday IBM Lotus bought that said

24

Context-Free Grammars

A Context-Free Grammar for English

N = {S, NP, VP, PP, DT, Vi, Vt, NN, IN}

S = S

= {sleeps, saw, man, woman, telescope, the, with, in}

Note: S=sentence, VP=verb phrase, NP=noun phrase,

PP=prepositional phrase, DT=determiner, Vi=intransitive verb,

Vt=transitive verb, NN=noun, IN=preposition

Left-Most Derivations

25

For example: [S], [NP VP], [D N VP], [the N VP], [the man VP], [the man

Vi], [the man sleeps]

Representation of a derivation as a tree:

An Example

DERIVATION RULES USED

S S NP VP

NP VP NP DT N

DT N VP DT the

the N VP N dog

the dog VP VP VB

the dog VB VB laughs

the dog laughs

Properties of CFGs

A CFG defines a set of possible derivations

A string s is in the language defined by the CFG if there is at least one

derivation that yields s

Each string in the language generated by the CFG may have more than

one derivation (ambiguity")

An Example of Ambiguity

26

The Problem with Parsing: Ambiguity

INPUT: She announced a program to promote safety in trucks and vans

POSSIBLE OUTPUTS:

27

A Brief Overview of English Syntax

Parts of Speech (tags from the Brown corpus):

Nouns

o NN = singular noun e.g., man, dog, park

o NNS = plural noun e.g., telescopes, houses, buildings

o NNP = proper noun e.g., Smith, Gates, IBM

Determiners

o DT = determiner e.g., the, a, some, every

Adjectives

o JJ = adjective e.g., red, green, large, idealistic

- Ni dung tho lun

1. S ging, khc nhau gia ngn ng lp trnh v ngn ng t nhin.

2. Cc chin lc phn tch c php t trn xung v t di ln

- Yu cu SV chun b



- Ti liu tham kho




Bi ging 05: X l ngn ng da trn thng k

Chng 6, mc:


- Mc ch yu cu

Mc ch: Tm hiu phng php thng k ng dng cho x l ngn ng,

Yu cu: Nm c phng php, bit cch ng dng xy dng chng

trnh, vit c mt s ng dng n gin




28


- Ni dung chnh:

1. The Tagging Problem

2. Generative models, and the noisy-channel model, for supervised learning

3. Hidden Markov Model (HMM) taggers

a. Basic definitions

b. Parameter estimation

c. The Viterbi algorithm

Phn tch ngn ng:

Cc cch tip cn

Da trn lut: XD m hnh h thng vi tp cc lut ngn ng

Da trn thng k: XD m hnh h thng vi tp cc xc sut cho cc "s

kin" c th xy ra

Cc m hnh lai kt hp c 2 phng php trn

Phn tch da vo kho ng liu chim u th.

Ch gii kho ng liu v pht hin tri thc ( corpora annotation):

Ch gii n v t

Ch gii t loi

Ch gii cm t, cu trc cu

Ch gii ng ngha

Ch gii ng s ch (co-reference)

Ch gii song ng

...

Cc phng php hc my:

29

N -gram, m hnh Markov (Markov model)

SVM (Support Vector Machine)

CRF (Conditional Random Field)

Mng n-ron (Neural network)

Hc da trn cc lut bin i (transformation-based learning): phng

php Brill

Phn loi s dng cy quyt nh (decision trees)

...

N gram:

Mt n-gram l mt on vn bn c di n (n t)

Thng tin n-gram cho bit c tnh no ca ngn ng, nhng khng nm

bt c cu trc ngn ng

Vic tm kim v s dng cc b n-gram l nhanh v d dng

N -gram c th ng dng trong nhiu ng dng XLNN:

D on t tip theo trong mt pht ngn da vo n-1 t trc

Hu dng trong cc ng dng kim tra chnh t, xp x ngn ng, v.v.

V d

30

M hnh Markov: Mt m hnh bigram cng c gi l m hnh Markov

bc mt

M hnh ny v c bn l mt otomat hu hn trng thi c trng s: cc

trng thi l cc t, cc cung ni gia 2 trng thi gn vi 1 xc sut no .

Trigram: Vic chn n = 3 (trigram) cho php ta c xp x tt hn Nhn

chung, cc trigram l ngn, cho php tnh xc sut tng i chnh xc t d

liu quan st c

Vi n cng ln, vic tnh xc sut cng km chnh xc (do thiu d liu) v

phc tp b nh cng cng tng.

Hun luyn m hnh n-gram:

Tnh xc sut t kho ng liu hun luyn (k thut c lng kh nng cc

i MLE) nh vic tnh cc tn sut tng i:

31

Cn lu vic la chn kho ng liu hun luyn tu theo ng liu m ta s

p dng m hnh n-gram thu c.

K thut lm mn: Vn khi hun luyn m hnh n-gram: d liu tha, c

th c nhng n-gram vi xc sut tnh c = 0.

K thut lm mn: bin i cc xc sut bng 0 thnh khc 0, tc l iu

chnh cc xc sut tnh cho cc d liu cha quan st c.

Tnh tha ca d liu:

Tnh tha ca d liu: ~ 50% s t ch xut hin 1 ln

Lut Zipf: Tn s xut hin ca mt t t l nghch vi xp hng v tn sut

ca t

V d: Lm mn bng cch thm 1, gi s cn tnh xc sut bigram, thm 1

vo tt c cc t s, ng thi cng mu s vi s t xut hin trong kho ng

liu ( tng cc xc sut bng 1)

- Yu cu SV chun b



- Bi tp

- Ti liu tham kho






Bi ging 06: Vn gn nhn v m hnh Markov n

Chng 5, mc:

Tit th: 1-3 Tun th: 7, 8

- Mc ch yu cu

Mc ch: Tm hiu bi ton gn nhn t loi, m hnh markov n

Yu cu: hiu v nm c vai tr ca gn nhn t loi trong x l ngn

ng, cc phng php v m hnh gn nhn t loi, m hnh markov n.

32





- Ni dung chnh:

Markov Processes

Consider a sequence of random variables X1, X2,Xn.

Each random variable can take any value in a finite set V.

For now we assume the length n is fixed (e.g., n = 100)

Our goal: model:

(1 = 1, 2 = 2, = )

First-Order Markov Processes:

(1 = 1, 2 = 2, = )

= (1 = 1) ( = |1 = 1, , 1 = 1)

=2

= (1 = 1) ( = |1 = 1)

=2

The first-order Markov assumption: For any {2 } for any 1

Second-Order Markov Processes

(1 = 1, 2 = 2, = )

= (1 = 1) (2 = 2|1 = 1)

( = |2 = 2, 1 = 1)

=3

= ( = |2 = 2, 1 = 1)

=1

(For convenience we assume 0 = 1 =, where * is a special start"

symbol.)

Modeling Variable Length Sequences

We would like the length of the sequence, n, to also be a random variable

A simple solution: always define = STOP where STOP is a special

symbol

Then use a Markov process as before:

33

(1 = 1, 2 = 2, = )

= ( = |2 = 2, 1 = 1)

=1

(For convenience we assume 0 = 1 =, where * is a special start"

symbol.)

Trigram Language Models:

A trigram language model consists of:

1. A finite set V

2. A parameter (|, ) for each trigram u, v, w such that

{}, and , {}

For any sentence 1 , where for = 1. . ( 1) and =

the probability of the sentence under the trigram language model is:

(1 ) = (|2, 1)

=1

where we define 0 = 1 =

An Example

For the sentence the dog barks STOP we would have

( ) = (| ,)

(| , )

(|, )

(|, )

The Trigram Estimation Problem

Remaining estimation problem:

(|2, 1)

For example:(|, )

A natural estimate (the maximum likelihood estimate"):

(|2, 1) =(2, 1, )

(2, 1)

(|, ) =(, , )

(, )

Sparse Data Problems:

Say our vocabulary size is N = |V|, then there are N3 parameters in the

model.

e.g., N = 20.000 => 20.0003 = 8 x1012 parameters.

34

Evaluating a Language Model: Perplexity

We have some test data, m sentences:

1, 2, 3, ,

We could look at the probability under our model ()=1 . Or more

conveniently, the log probability

()

=1

= log ()

=1

In fact the usual evaluation measure is perplexity

= 2 =1

log ()

=1

and M is the total number of words in the test data.

Some Intuition about Perplexity

Say we have a vocabulary V, and N = |V| + 1 and model that predicts

(|, ) =1

For all {}, and , {}

Easy to calculate the perplexity in this case:

= 2 = log (1

)

=> Perplexity = N

Perplexity is a measure of effective branching factor"

Typical Values of Perplexity:

Results from Goodman (A bit of progress in language modeling"), where

|V| = 50.000

A trigram model: Perplexity = 74

(1 ) = (|2, 1)

=1

A bigram model: Perplexity = 137

(1 ) = (|1)

=1

A unigram model: Perplexity = 955

(1 ) = ()

=1

Some History:

35

Shannon conducted experiments on entropy of English i.e., how good are

people at the perplexity game?

C. Shannon. Prediction and entropy of printed English. Bell Systems

Technical Journal, 30:50-64, 1951.

Chomsky (in Syntactic Structures (1957)):

Second, the notion grammatical" cannot be identified with meaningful"

or significant" in any semantic sense.

Sentences (1) and (2) are equally nonsensical, but any speaker of English

will recognize that only the former is grammatical.

(1) Colorless green ideas sleep furiously.

(2) Furiously sleep ideas green colorless.

Third, the notion grammatical in English" cannot be identified in any

way with the notion high order of statistical approximation to English". It is

fair to assume that neither sentence (1) nor (2) (nor indeed any part of these

sentences) has ever occurred in an English discourse. Hence, in any statistical

model for grammaticalness, these sentences will be ruled out on identical

grounds as equally `remote' from English. Yet (1), though nonsensical, is

grammatical, while (2) is not.

Sparse Data Problems

A natural estimate (the maximum likelihood estimate"):

(|2, 1) =(2, 1, )

(2, 1)

(|, ) =(, , )

(, )

Say our vocabulary size is N = |V|, then there are N3 parameters in the

model.

e.g., N = 20.000 => 20.0003 = 8 x1012 parameters.

The Bias-Variance Trade-Of

Trigram maximum-likelihood estimate:

(|2, 1) =(2, 1, )

(2, 1)

Bigram maximum-likelihood estimate:

36

(|1) =(1, )

(1)

Unigram maximum-likelihood estimate

() =()

()

Linear Interpolation:

Take our estimate (|2, 1) to be:

(|2, 1) = 1(|2, 1) + 2(|1) + 3()

where 1 + 2 + 3 = 1 and > 0 for all ;

Our estimate correctly defines a distribution (define = {})

(|, ) =

[1(|2, 1) + 2(|1) + 3()]

= 1 (|2, 1) + 2 (|1)

+ 3 ()

= 1 + 2 + 3

= 1

(Can show also that (|, ) 0 for all )

How to estimate the values?

Hold out part of training set as validation" data

Define (1, 2, 3)to be the number of times the trigram (1, 2, 3) is

seen in validation set.

Choose 1, 2, 3 to maximize:

(1, 2, 3) = (1, 2, 3)(3|2, 1)

1,2,3

such that 1 + 2 + 3 = 1 and > 0 for all , and where:

(|2, 1) = 1(|2, 1) + 2(|1) + 3()

Discounting Methods

Summary

Three steps in deriving the language model probabilities:

1. Expand (1 ) using Chain rule.

2. Make Markov Independence Assumptions

(|1, 2, , 2, 1) = (|2, 1)

3. Smooth the estimates using low order counts.

Other methods used to improve language models:

37

o Topic" or long-range" features.

o Syntactic models.

It's generally hard to improve on trigram models though!!

Tagging Problems, and Hidden Markov Models

1. The Tagging Problem

2. Generative models, and the noisy-channel model, for supervised learning

3. Hidden Markov Model (HMM) taggers

a. Basic definitions

b. Parameter estimation

c. The Viterbi algorithm

Part-of-Speech Tagging

INPUT:

Profits soared at Boeing Co., easily topping forecasts on Wall Street,

as their CEO Alan Mulally announced first quarter results.

OUTPUT:

Profits/N soared/V at/P Boeing/N Co./N ,/, easily/ADV topping/V

forecasts/N on/P Wall/N Street/N ,/, as/P their/POSS CEO/N Alan/N Mulally/N

announced/V first/ADJ quarter/N results/N ./.

N = Noun

V = Verb

P = Preposition

Adv = Adverb

Adj = Adjective

Named Entity Recognition

INPUT: Profits soared at Boeing Co., easily topping forecasts on Wall

Street, as their CEO Alan Mulally announced first quarter results.

OUTPUT: Profits soared at [Company Boeing Co.], easily topping

forecasts on [Location Wall Street], as their CEO [Person Alan Mulally]

announced first quarter results.

Named Entity Extraction as Tagging

INPUT: Profits soared at Boeing Co., easily topping forecasts on Wall

Street, as their CEO Alan Mulally announced first quarter results.

OUTPUT:

38

Profits/NA soared/NA at/NA Boeing/SC Co./CC ,/NA easily/NA

topping/NA forecasts/NA on/NA Wall/SL Street/CL ,/NA as/NA their/NA

CEO/NA Alan/SP Mulally/CP announced/NA first/NA quarter/NA results/NA

./NA

NA = No entity

SC = Start Company

CC = Continue Company

SL = Start Location

CL = Continue Location

Our Goal

Training set:

1 Pierre/NNP Vinken/NNP ,/, 61/CD years/NNS old/JJ ,/, will/MD

join/VB the/DT board/NN as/IN a/DT nonexecutive/JJ director/NN

Nov./NNP 29/CD ./.

2 Mr./NNP Vinken/NNP is/VBZ chairman/NN of/IN Elsevier/NNP

N.V./NNP ,/, the/DT Dutch/NNP publishing/VBG group/NN ./.

3 Rudolph/NNP Agnew/NNP ,/, 55/CD years/NNS old/JJ and/CC

chairman/NN of/IN Consolidated/NNP Gold/NNP Fields/NNP PLC/NNP

,/, was/VBD named/VBN a/DT nonexecutive/JJ director/NN of/IN

this/DT British/JJ industrial/JJ conglomerate/NN ./.

38.219 It/PRP is/VBZ also/RB pulling/VBG 20/CD people/NNS

out/IN of/IN Puerto/NNP Rico/NNP ,/, who/WP were/VBD helping/VBG

Huricane/NNP Hugo/NNP victims/NNS ,/, and/CC sending/VBG

them/PRP to/TO San/NNP Francisco/NNP instead/RB ./.

From the training set, induce a function/algorithm that maps new sentences

to their tag sequences.

Two Types of Constraints

Influential/JJ members/NNS of/IN the/DT House/NNP Ways/NNP and/CC

Means/NNP Committee/NNP introduced/VBD legislation/NN that/WDT

would/MD restrict/VB how/WRB the/DT new/JJ savings-and-loan/NN

bailout/NN agency/NN can/MD raise/VB capital/NN ./.

Local": e.g., can is more likely to be a modal verb MD rather than a noun

NN

39

Contextual": e.g., a noun is much more likely than a verb to follow a

determiner

Sometimes these preferences are in conflict:

The trash can is in the garage

Supervised Learning Problems

We have training examples (), () for = 1 . Each () is an input,

each () is a label.

Task is to learn a function mapping inputs to labels ()

Conditional models:

o Learn a distribution (|) from training examples

o For any test input , define () = max

(|)

Generative Models:

We have training examples (), () for = 1 . Task is to learn a

function mapping inputs to labels ().

Generative models:

o Learn a distribution (, ) from training examples

o Often we have (, ) = ()(|)

Note: we then have:

(|) =()(|)

()

where () = ()(|)

Decoding with Generative Models

We have training examples (), () for = 1 . Task is to learn a

function mapping inputs to labels ().

Generative models:

o Learn a distribution (, ) from training examples

o Often we have (, ) = ()(|)

Output from the model:

() = max

(|)

= max

()(|)

()

= max

()(|)

Hidden Markov Models

40

We have an input sentence = 1, 2, ( is the ith word in the

sentence)

We have a tag sequence = 1, 2, ( is the ith tag in the sentence)

We'll use an HMM to define:

(1, 2, , 1, 2, )

for any sentence 1, 2, and tag sequence 1, 2, of the

same length.

Then the most likely tag sequence for x is

max (1, 2, , 1, 2, )1..

Trigram Hidden Markov Models (Trigram HMMs)

For any sentence 1, 2, where 1 V for = 1 and any tag

sequence 1, 2, +1 where for = 1 and +1 = ,

the joint probability of the sentence and tag sequence is:

(1, 2, , 1, 2, +1) = (|2, 1)

+1

=2

(|)

=1

where we have assumed that 0 = 1 =

Parameters of the model:

(|, ) for any {}, , {}

(|) for any ,

An Example:

If we have = 3, 1 3 equal to the sentence the dog laughs,

and 1 4 equal to the tag sequence D N V STOP, then

(1, 2, , 1, 2, +1)

= (|,) (| , ) (|, ) (|, )

(|) (|) (|)

STOP is a special tag that terminates the sequence

We take 0 = 1 = where * is special padding symbol.

Why the Name?

(1, , 1, ) = (|1, ) (|2, 1) (|)

1

1

Smoothed Estimation:

41

(|, ) = 1 (, , )

(, )+ 2

( , )

( )+ 3

()

()

1 + 2 + 3 = 1 and for all i, 0

Dealing with Low-Frequency Words: An Example

Profits soared at Boeing Co., easily topping forecasts on Wall Street, as

their CEO Alan Mulally announced first quarter results.

A common method is as follows:

Step 1: Split vocabulary into two sets

Frequent words = words occurring >= 5 times in training

Low frequency words = all other words

Step 2: Map low frequency words into a small, finite set, depending on

prefixes, suffixes etc.

The Viterbi Algorithm

Problem: for an input 1 find

max1+1

(1 , 1 +1)

where the is taken over all sequences 1 +1 such that

for = 1 and +1 = .

We assume that again takes the form

(1, , 1, +1) = (|2, 1) (|)

=1

+1

=1

Recall that we have assumed in this definition that 0 = 1 = and

+1 = .

Brute Force Search is Hopelessly Inefficient


Define n to be the length of the sentence

Define for = 1 to be the set of possible tags at position k:

1 = 0 = {}

= for = 1

Define:

42

(1, 0, 1, ) = (|2, 1) (|)

=1

=1

Define a dynamic programming table:

(, , ) = maximum probability of a tag sequence ending in tags , at

position that is,

(, , ) = max:1=,=

(1, 0, 1, )

An Example: The man saw the dog with the telescope

A Recursive Definition:

(0,,) = 1

Recursive definition:

For any {1. . }, for any 1 and

(, , ) = max2

(( 1, , ) (|, ) (|))


Input: a sentence 1 , parameters (|, ) and (|)

Initialization: Set (0,,) = 1

Definition: 1 = 0 = {}, = for = 1

Algorithm:

For = 1

For 1 and

(, , ) = max2

(( 1, , ) (|, ) (|))

Return: max1,

((, , ) (|, ))

The Viterbi Algorithm with Backpointers

Input: a sentence 1 , parameters (|, ) and (|)

Initialization: Set (0,,) = 1

Definition: 1 = 0 = {}, = for = 1

Algorithm:

For = 1

For 1 and

(, , ) = max2

(( 1, , ) (|, ) (|))

(, , ) = max2

(( 1, , ) (|, ) (|))

Set (1, ) = max(,)

((, , ) (|, ))

43

For = 2 1 = ( + 2, +1, +2)

Return the tag sequence 1

The Viterbi Algorithm: Running Time

(||3) time to calculate (|, ) (|) for all , , , .

||2 entries in to be filled in.

(||) time to fill in one entry

(||3) time in total.

Pros and Cons

Hidden markov model taggers are very simple to train (just need to

compile counts from the training corpus)

Perform relatively well (over 90% performance on named entity

recognition)

Main difficulty is modeling

(|)

can be very difficult if words" are complex.

- Ni dung tho lun

Kho ng liu ting vit, thc trng v gii php xy dng.

- Yu cu SV chun b

Tm hiu v kho ng liu ting vit v vit chng trnh gn nhn s dng

m hnh markov n.

- Bi tp

- Ti liu tham kho








44

Bi ging 07: Dch my

Chng I, mc:

Tit th: 1-3 Tun th: 9,10

- Mc ch yu cu

Mc ch: Tm hiu v dch my, cc khi nim c bn v cc k thut

ph bin.

Yu cu: Nm c kin trc, ni dung ca dch my, cc giai on v

mt s k thut c bn





- Ni dung chnh:

1. Challenges in machine translation

2. Classical machine translation

3. A brief introduction to statistical MT

4. The IBM Translation Models

a. IBM Model 1

b. IBM Model 2

c. EM Training of Models 1 and 2

5. Phrase-Based Translation

6. Decoding with Phrase-Based Translation Models

1. Challenges in machine translation

Challenges: Lexical Ambiguity

45

(Example from Dorr et. al, 1999)

Example 1:

book the flight t ch read the book c sch

Example 2:

the box was in the pen the pen was on the table

Example 3:

kill a man git kill a process hy

Challenges: Differing Word Orders

I English word order is subject verb object

I Japanese word order is subject object verb

English: IBM bought Lotus Japanese: IBM Lotus bought



Syntactic Structure is not preserved across Translations!

2. Classical machine translation

2.1. Direct Machine Translation

Translation is word-by-word

Very little analysis of the source text (e.g., no syntactic or semantic

analysis)

Relies on a large bilingual directionary. For each word in the source

language, the dictionary specifies a set of rules for translating that word

After the words are translated, simple reordering rules are applied (e.g.,

move adjectives after nouns when translating from English to French)

An Example of a set of Direct Translation Rules

(From Jurafsky and Martin, edition 2, chapter 25. Originally from a system

from Panov 1960)

Rules for translating much or many into Russian:

if preceding word is how return skol'ko

else if preceding word is as return stol'ko zhe

else if word is much

if preceding word is very return nil

else if following word is a noun return mnogo

else (word is many)

46

if preceding word is a preposition and following word is noun return

mnogii

else return mnogo

Some Problems with Direct Machine Translation

Lack of any analysis of the source language causes several problems, for

example:

Difficult or impossible to capture long-range reorderings



Words are translated without disambiguation of their syntactic role e.g.,

that can be a complementizer or determiner, and will often be translated

differently for these two cases

They said that ...

They like that ice-cream

2.2. Transfer-Based Approaches

Three phases in translation:

Analysis: Analyze the source language sentence; for example, build a

syntactic analysis of the source language sentence.

Transfer: Convert the source-language parse tree to a target-language

parse tree.

Generation: Convert the target-language parse tree to an output sentence.

Transfer-Based Approaches

The parse trees" involved can vary from shallow analyses to much

deeper analyses (even semantic representations).

The transfer rules might look quite similar to the rules for direct

translation systems. But they can now operate on syntactic structures.

It's easier with these approaches to handle long-distance reorderings

The Systran systems are a classic example of this approach

47


2.3. Interlingua-Based Translation

Two phases in translation:

Analysis: Analyze the source language sentence into a (language-

independent) representation of its meaning.

Generation: Convert the meaning representation into an output sentence.

One Advantage: If we want to build a translation system that

translates between n languages, we need to develop n analysis and

generation systems. With a transfer based system, we'd need to develop O(n2)

sets of translation rules.

Disadvantage: What would a language-independent representation look

like?

Interlingua-Based Translation

How to represent different concepts in an interlingua?

Different languages break down concepts in quite different ways:

48

German has two words for wall: one for an internal wall, one for a

wall that is outside

Japanese has two words for brother: one for an elder brother, one for

a younger brother

Spanish has two words for leg: pierna for a human's leg, pata for an

animal's leg, or the leg of a table

An interlingua might end up simple being an intersection of these

different ways of breaking down concepts, but that doesn't seem very

satisfactory...

3. A Brief Introduction to Statistical MT

Parallel corpora are available in several language pairs

Basic idea: use a parallel corpus as a training set of translation examples

Classic example: IBM work on French-English translation, using the

Canadian Hansards. (1.7 million sentences of 30 words or less in length).

Idea goes back to Warren Weaver (1949): suggested applying statistical

and cryptanalytic techniques to translation.

4. The IBM Translation Models

4.1. IBM Model 1

Alignments

How do we model (|)?

English sentence has words 1 , French sentence has words

1 .

An alignment a identifies which English word each French word

originated from

Formally, an alignment a is {1 }, where each {0 }

There are ( + 1) possible alignments.

Example: = 6, = 7

= And the program has been implemented

= Le programme a ete mis en application

One alignment is {2, 3, 4, 5, 6, 6, 6}

Another (bad!) alignment is {1, 1, 1, 1, 1, 1, 1}

Alignments in the IBM Models

We'll define models for (|, ) and (|, , ), giving

(, |, ) = (|, )(|, , )

49

Also

(|, ) = (|, )(|, , )

A By-Product: Most Likely Alignments

Gi s chng ta c m hnh (, |, ) = (|)(|, , ), ta c th

tnh

(|, , ) =(, |, )

(, |, )

for any alignment

For a given , pair, we can also compute the most likely alignment,

= max

(|, , )

Nowadays, the original IBM models are rarely (if ever) used for

translation, but they are used for recovering alignments.

An Example Alignment

French: le conseil rendu son avis, et nous devons prsent

adopter un nouvel avis sur la base de la premire position.

English: the council has stated its position, and now, on the basis of

the first position, we again have to give our opinion.

Alignment:

the/le council/conseil has/ stated/rendu its/son position/avis ,/,

and/et now/prsent ,/NULL on/sur the/le basis/base of/de the/la

first/premire position/position ,/NULL we/nous again/NULL have/devons

to/a give/adopter our/nouvel opinion/avis ./.

IBM Model 1: Alignments

In IBM model 1 all alignments are equally likely:

(|, ) =1

( + 1)

This is a major simplifying assumption, but it gets things started...

IBM Model 1: Translation Probabilities

Next step: come up with an estimate fo (| , , )r

In model 1, this is:

(| , , ) = (|)

=1

e.g., = 6, = 7

=

50

=

= {2,3,4,5,6,6,6}

(| , ) = (|) (|) (|)

(|) (|) (|)

(|)

IBM Model 1: The Generative Process

To generate a French string f from an English string :

Step 1: Pick an alignment with probability 1

(+1)

Step 2: Pick the French words with probability

(| , , ) = (|)

=1

The final result:

(| , , ) = (|, ) (|, , )

=1

( + 1) (|)

=1

An Example Lexical Entry

English French Probability

Position position 0.756715

Position situation 0.0547918

Position mesure 0.0281663

Position vue 0.0169303

Position point 0.0124795

Position attitude 0.0108907

de la situation au niveau des ngociations de l ' ompi

.. of the current position in the wipo negotiations

nous ne sommes pas en mesure de dcider ,

we are not in a position to decide , : : :

le point de vue de la commission face ce problme complexe.

the commission's position on this complex problem.

4.2. IBM Model 2

Only difference: we now introduce alignment or distortion parameters

(|, , ) =

,

51

Define:

(|, ) = (|, , )

=1

where = {1, }

Gives:

(, |, ) = (|, , )

=1

(|)

An Example:

= 6

= 7

e = And the program has been implemented

=

= {2, 3, 4, 5, 6, 6, 6}

(|, 7) = (2|1,6,7) (3|2,6,7) (4|3,6,7) (5|4,6,7)

(6|5,6,7) (6|6,6,7) (6|7,6,7)

(|, , 7) = (|) (|) (|)

(|) (|) (|)

(|)

IBM Model 2: The Generative Process

To generate a French string from an English string :

Step 1: Pick an alignment = {1, } with probability

(|, , )

=1

Step 2: Pick the French words with probability

(| , , ) = (|)

=1

The final result:

(, |, )) = (|, ) (|, , ) = (|, , )

=1

(|)

Recovering Alignments

52

If we have parameters and , we can easily recover the most likely

alignment for any sentence pair

Given a sentence pair 1, 2, , 1, 2, , define:

= max{0...}

(|, , ) (|)

for = 1

=

=

4.3. EM Training of Models 1 and 2

The Parameter Estimation Problem

Input to the parameter estimation algorithm: ((), ()) for = 1 .

Each () is an English sentence, each () is a French sentence.

Output: parameters (|) and (|, , )

A key challenge: we do not have alignments on our training examples, e.g.,

(100) =

(100) =

Parameter Estimation if the Alignments are Observed

First: case where alignments are observed in training data. E.g.,

(100) =

(100) =

(100) = {2, 3, 4, 5, 6, 6, 6}

Training data is ((), (), ()), for = 1 . Each () is an English

sentence, each () is a French sentence. each () is an alignment.

Maximum-likelihood parameter estimates in this case are trivial:

(|) =(,)

(), (|, , ) =

(|,,)

(,,)

Input: A training corpus

((), (), ()), for = 1 , where

() = 1()

(), () = 1

() (), () = 1

()

()

Algorithm:

Set all counts ( ) = 0

For = 1

For = 1 , j= 0

((),

()) (

(), ()

) + (, , )

(()) (

()) + (, , )

53

(|, , ) (|, , ) + (, , )

(, , ) (, , ) + (, , )

where (, , ) = 1 if ()

= , 0 otherwise

Output: (|) =(,)

(), (|, , ) =

(|,,)

(,,)

Parameter Estimation with the EM Algorithm

Training examples are: ((), ()) for = 1 . Each () is an English

sentence, each () is a French sentence.

The algorithm is related to algorithm when alignments are observed, but

two key differences:

1. The algorithm is iterative. We start with some initial (e.g., random)

choice for the q and t parameters. At each iteration we compute some counts

based on the data together with our current parameter estimates. We then re-

estimate our parameters with these counts, and iterate.

2. We use the following definition for (, , ) at each iteration:

(, , ) =(|, , ) (

()|())

(|, , ) (()|

())=0

Input: A training corpus

((), ()), for = 1 , where () = 1()

(), () = 1

() ()

Initialization: Initialize (|) and (|, , ) parameters (e.g., to random

values)

For = 1


For = 1

For = 1 , j= 0

((),

()) ((),

()) + (, , )

(()) (

()) + (, , )

(|, , ) (|, , ) + (, , )

(, , ) (, , ) + (, , )

where

(, , ) =(|, , ) (

()|())

(|, , ) (()|

())=0

54

Recalculate the parameters: (|) =(,)

(), (|, , ) =

(|,,)

(,,)

The EM Algorithm for IBM Model 1:

For = 1


For = 1

For = 1 , j= 0

((),

()) ((),

()) + (, , )

(()) (

()) + (, , )

(|, , ) (|, , ) + (, , )

(, , ) (, , ) + (, , )

Where

(, , ) =

1(1 + )

(()|

())

1

(1 + ) (

()|())

=0

= (

()|())

(()|

())=0

Recalculate the parameters: () = (, )/() An Example:

(100) =

(100) =

Justification for the Algorithm

Training examples are: ((), ()) for = 1 . Each () is an English

sentence, each () is a French sentence.

The log-likelihood function:

(, ) = log (()|()) = log ((), |())

=1

=1

The maximum-likelihood estimates are:

arg max,

(, )

The EM algorithm will converge to a local maximum of the log-likelihood

function.

55

Summary

Key ideas in the IBM translation models:

o Alignment variables

o Translation parameters, e.g., (|)

o Distortion parameters, e.g., (2|1,6,7)

The EM algorithm: an iterative algorithm for training the q and t

parameters

Once the parameters are trained, we can recover the most likely

alignments on our training examples

5. Phrase-Based Translation

1. Learning phrases from alignments

2. A phrase-based model

3. Decoding in phrase-based models

Phrase-Based Models

First stage in training a phrase-based model is extraction of a phrase-based

(PB) lexicon

A PB lexicon pairs strings in one language with strings in another

language, e.g.,

nach Kanada in Canada

zur Konferenz to the conference

Morgen tomorrow

fliege will fly

6. Decoding with Phrase-Based Translation Models

- Ni dung tho lun

Cc vn dch my vi ting vit, cc sn phm c.

- Yu cu SV chun b

Ci t tm hiu v l thuyt cc th vin v modul v dch my do gio

vin cung cp.

- Ti liu tham kho




56



Bi ging 08: Log-linear models

Chng I, mc:


- Mc ch yu cu

Mc ch: Tm hiu v m hnh log-linear trong dch my

Yu cu: Nm vng k thut v vit chng trnh.





- Ni dung chnh:

1. The Language Modeling Problem

2. Log-linear models

3. Parameter estimation in log-linear models

4. Smoothing/regularization in log-linear models

5. Global Linear Model

Log-linear models

Given

An input domain X and a finite set of labels Y

A set of m feature functions k : X Y (very often these are

indicator/binary functions: k : X Y {0,1}).

The feature vectors (x,y) m induced by the feature functions k for

any x X and y Y

Learn a conditional probability P(y | x W), where

W is a parameter vector of weights (W m )

P(y | x, W) = e W (x,y) / y Y e W (x,y)

57

log P(y | x, W) = W (x,y) / - log y Y e W (x,y) [substraction between

a linear term and a normalization term]

Examples of feature functions for POS tagging:

1 (x,y) = {

2 (x,y) = {

3 (x,y) = {

It is natural to come up with as many feature functions (

pairs) as we can.

Learning in this framework amounts then to learning the weights WML

that maximize the likelihood of the training corpus.

WML = argmax W m L(W) = argmax W m i = 1..n P( yi | xi)

L(W) = i = 1..n log P( yi | xi)

= i = 1..n W ( xi , yi) / - i = 1..n log y Y e W (xi , y)

Note: Finding the parameters that maximize the likelihood/probability of

some training corpus is a universal machine learning trick.

Summary: we have cast the learning problem as an optimization problem.

Several solutions exist for solving this problem:

Gradient ascent

Conjugate gradient methods

Iterative scaling

Improved iterative scaling

- Ni dung tho lun

Xy dng th vin lp trnh cho dch my

- Yu cu SV chun b

Xy dng modul m phng dch Vit-Anh.

1 if current word wi is the and y

= DT

0 otherwise

1 if current word wi ends in ing and y =

VBG

0 otherwise 1 if < t i-2, t i-1, t i > = < DT JJ

Vt >

0 otherwise

58

- Ti liu tham kho






- Cu hi n tp


lp trnh cn bn.

Bi ging 09: Conditional random fields, and global linear models

Chng I, mc:

Tit th: 1-3 Tun th: 12, 13

- Mc ch yu cu

Mc ch: Tm hiu v CRFs v GLMs.

Yu cu: Nm vng m hnh v bit cch xy dng ng dng da trn 2

dng m hnh c cung cp trong bi ging





- Ni dung chnh:

CRF (conditional random fields) l m hnh chui cc xc sut c iu kin,

hun luyn ti a ha xc sut iu kin. N l mt framework cho php xy

dng nhng m hnh xc sut phn on v gn nhn chui d liu [1]. Theo

[3], CRF, cng ging nh trng ngu nhin Markov (Markov random field), l

mt m hnh th v hng m mi nh biu din cho mt bin ngu nhin

(random variable) m c phn phi (distribution) c suy ra, v mi cung

(edge) biu din mi quan h ph thuc gia hai bin ngu nhin.

59

Hnh 1: Cu trc chui (chain-structured) ca th CRFs.

X l mt bin ngu nhin trn chui d liu cn c gn nhn v Y l

bin ngu nhin trn chui nhn (hoc trng thi) tng ng. V d X l chui

cc t quan st (observation) thng qua cc cu bng ngn ng t nhin, Y l

chui cc nhn t loi c gn cho nhng cu trong tp X (cc nhn ny c

quy nh sn trong tp cc nhn t loi). Mt linear-chain (chui tuyn tnh)

CRF vi cc tham s c cho bi cng thc [2]:

Vi Zx l mt tha s chun ha nhm m bo tng cc xc sut ca

chui trng thi bng 1 [4].

fk(yt-1, yt, x, t) l mt hm c trng (feature function), thng c gi tr

nh phn (binary-valued), nhng cng c th l gi tr thc (real-valued). V

l mt trng s hc (learned weight) kt hp vi c trng fk. Nhng hm c

trng c th o bt k trng thi chuyn dch (state transition) no, yt-1 yt, v

chui quan st x, tp trung vo thi im hin ti t. V d, mt hm c trng c

th c gi tr 1 khi yt-1 l trng thi TITLE, yt l trng thi AUTHOR v xt l mt

t xut hin trong tp t vng cha tn ngi.

Ngi ta thng hun luyn CRFs bng cch lm cc i ha hm

likelihood theo d liu hun luyn s dng cc k thut ti u nh L

60

BFGS1. Vic lp lun (da trn m hnh hc) l tm ra chui nhn tng ng

ca mt chui quan st u vo. i vi CRFs, ngi ta thng s dng thut

ton qui hoch ng in hnh l Viterbi2 (l thut ton lp trnh ng nhm tm

ra chui kh nng (most likely) ca cc trng thi n ) thc hin lp lun vi

d liu mi [5].

Maximum Entropy Models

An equivalent approach to learning conditional probability models is this:

There are lots of conditional distributions out there, most of them very

spiked, overfit, etc. Let Q be the set of distributions that can be specified

in log-linear form:

Q = { p : p(y | xi) = e W (x i

,y) / y Y e W (x i , y)

We would like to learn a distribution that is as uniform as possible

without violating any of the requirements imposed by the training data.

P = {p : i = 1..n ( xi , yi) = i = 1..n y Y p(y | xi ) (xi,y)

(empirical count = expected count)

p is an n |Y| vector defining P(y | xi ) for all i, y.

Note that a distribution that satisfies the above equality always exist

p( y | xi ) = {

Because uniformity equates high entropy, we can search for distributions

that are both consistent with the requirements imposed by the data and

have high entropy.

Entropy of a vector P:

1 http://en.wikipedia.org/wiki/L-BFGS

2 http://en.wikipedia.org/wiki/Viterbi_algorithm

1 if y

= yi

0

otherwise

61

H (p) = - px log px

Entropy if uncertainty, but also non-commitment.

What do we want from a distribution P?

o Minimize commitment = maximize entropy

o Resemble some reference distribution (data)

Solution: maximize entropy H, subject to constraints f.

Adding constraints (features):

o Lowers maximum entropy

o Raises maximum likelihood

o Brings the distribution further from uniform

o Brings the distribution closer to a target distribution

Lets say we have the following event space:

NN NNS NNP NNPS VBZ VBD

and the following empirical data:

3 5 11 13 3 1

Maximize H:

1/e 1/e 1/e 1/e 1/e 1/e

62

but we wanted probabilities: E[NN, NNS, NNP, NNPS, VBZ, VBD] =

1

1/6 1/6 1/6 1/6 1/6 1/6

This is probably too uniform:

NN NNS NNP NNPS VBZ VBD

1/6 1/6 1/6 1/6 1/6 1/6

we notice that N* are more common than V* in the real data, so we

introduce a feature fN = {NN, NNS, NNP, NNPS}, with E[fN] = 32/36

8/36 8/36 8/36 8/36 2/36 2/36

and proper nouns are more frequent than common nouns, so we add fp

= {NNP, NNPS}, with E[fp] = 24/36

4

/36

4

/36

1

2/36

1

2/36

2

/36

2

/36

we could keep refining the models, for example by adding a feature to

distinguish singular vs. plural nouns, or verb types.

Fundamental theorem: It turns out that finding the maximum likelihood

solution to the optimization problem in Section 3 is the same with finding the

maximum entropy solution to the problem in Section 4.

The maximum entropy solution can be written in log-linear form.

Finding the maximum-likelihood solution also gives the maximum

entropy solution.

- Yu cu SV chun b

c trc phn bi ging gio vin giao. Thc hin cc bi tp theo phn

cng.

- Ti liu tham kho




63





Bi ging 10: Hc my trong NLP

Chng I, mc:

Tit th: 1-3 Tun th: 14, 15

- Mc ch yu cu

Mc ch: Tm hiu v cc phng php hc my trong x l ngn ng t

nhin, ng dng.

Yu cu: Nm c cc phng php v ng dng. Cch xy dng

chng trnh.





- Ni dung chnh:

Khi nim v hc my

Hc my, c ti liu gi l My hc, (ting Anh: machine learning) l mt

lnh vc ca tr tu nhn to lin quan n vic pht trin cc k thut cho php

cc my tnh c th "hc". C th hn, hc my l mt phng php to ra

cc chng trnh my tnh bng vic phn tch cc tp d liu. Hc my c lin

quan ln n thng k, v c hai lnh vc u nghin cu vic phn tch d liu,

nhng khc vi thng k, hc my tp trung vo s phc tp ca cc gii thut

trong vic thc thi tnh ton. Nhiu bi ton suy lun c xp vo loi bi

ton NP-kh, v th mt phn ca hc my l nghin cu s pht trin cc gii

thut suy lun xp x m c th x l c.

Cc loi thut ton thng dng bao gm:

Hc c gim st:

L mt k thut ca ngnh hc my xy dng mt hm (function) t tp

d liu hun luyn. D liu hun luyn bao gm cc cp gm i tng u vo

(thng dng vec-t), v u ra mong mun. u ra ca mt hm c th l mt

gi tr lin tc (gi l hi qui), hay c th l d on mt nhn phn loi cho mt

64

i tng u vo (gi l phn loi). Nhim v ca chng trnh hc c gim st

l d on gi tr ca hm cho mt i tng bt k l u vo hp l, sau khi

xem xt mt s v d hun luyn (ngha l, cc cp u vo v u ra tng

ng). t c iu ny, chng trnh hc phi tng qut ha d liu sn c

d on c nhng tnh hung cha gp phi theo mt cch "hp l".

Hc c gim st c th to ra 2 loi m hnh. Ph bin nht, hc c gim

st to ra mt m hnh ton cc (global model) nh x i tng u vo n

u ra mong mun. Tuy nhin, trong mt s trng hp, vic nh x c thc

hin di dng mt tp cc m hnh cc b, da trn cc hng xm ca n.

gii quyt mt bi ton hc c gim st(v d: nhn dng ch vit

tt) ngi ta phi xt nhiu bc khc nhau:

Xc nh loi ca tp d liu hun luyn. Trc khi lm bt c iu g,

chng ta nn quyt nh loi d liu no s c s dng lm dng hun

luyn. Chng hn, c th l mt k t vit tay n l, ton b mt t vit tay,

hay ton b mt dng ch vit tay.

Thu thp d liu hun luyn. Tp d liu hun luyn cn ph hp vi cc

hm chc nng c xy dng. V vy, cn thit phi kim tra tch thch hp ca

d liu u vo c d liu u ra tng ng. Tp d liu hun luyn c th

c thu thp t nhiu ngun khc nhau: t vic o c tnh ton, t cc tp d

liu c sn

Xc nh vic biu din cc c trng u vo cho hm chc nng. S

chnh xc ca hm chc nng ph thuc ln vo cch biu din cc i tng

u vo. Thng thng, i tng u vo c chuyn i thnh mt vec-t

c trng, cha mt s cc c trng nhm m t cho i tng . S lng

cc c trng khng nn qu ln, do s bng n d liu, nhng phi ln

d on chnh xc u ra. Nu hm chc nng m t qu chi tit v i tng,

th cc d liu u ra c th b phn r thnh nhiu nhm hay nhn khc nhau,

vic ny dn ti vic kh phn bit c mi quan h gia cc i tng hay

kh tm c nhm(nhn) chim a s trong tp d liu cng nh vic d on

phn t i din cho nhm, i vi cc i tng gy nhiu, chng c th c

dn nhn, tuy nhin s lng nhn qu nhiu, v s nhn t l nghch vi s

phn ca mi nhn. Ngc li, hm chc nng c qu t m t v i tng d

dn ti vic dn nhn i tng b sai hay d b xt cc i tng gy nhiu.

Vic xc nh tng i ng s lng c tnh ca phn t s gim bt chi ph

khi thc hin nh gi kt qu sau hun luyn cng nh kt qu gp b d liu

u vo mi.

65

Xc nh cu trc ca hm chc nng cn tm v gii thut hc tng ng.

V d, ngi k s c th la chn vic s dngmng n-ron nhn to hay cy

quyt nh.

Hon thin thit k. Ngi thit k s chy gii thut hc t tp hun luyn

thu thp c. Cc tham s ca gii thut hc c th c iu chnh bng cch

ti u ha hiu nng trn mt tp con (gi l tp kim chng -validation set) ca

tp hun luyn, hay thng qua kim chng cho (cross-validation). Sau khi hc

v iu chnh tham s, hiu nng ca gii thut c th c o c trn mt tp

kim tra c lp vi tp hun luyn.

Hc khng gim st:

ting Anh l unsupervised learning, l mt phng php nhm tm ra mt

m hnh m ph hp vi cc tp d liu quan st. N khc bit vi hc c gim

st ch l u ra ng tng ng cho mi u vo l khng bit trc. Trong

hc khng c gim st, u vo l mt tp d liu c thu thp. Hc khng c

gim st thng i x vi cc i tng u vo nh l mt tp cc bin ngu

nhin. Sau , mt m hnh mt kt hp s c xy dng cho tp d liu .

Hc khng c gim st c th c dng kt hp vi suy din

Bayes(Bayesian inference) cho ra xc sut c iu kin cho bt k bin ngu

nhin no khi bit trc cc bin khc.

Hc khng c gim st cng hu ch cho vic nn d liu: v c bn, mi

gii thut nn d liu hoc l da vo mt phn b xc sut trn mt tp u vo

mt cch tng minh hay khng tng minh.

Hc na gim st:

Kt hp cc v d c gn nhn v khng gn nhn sinh mt hm hoc

mt b phn loi thch hp.

Hc tng cng:

Thut ton hc mt chnh sch hnh ng ty theo cc quan st v mi

trng xung quanh. Mi hnh ng u c tc ng ti mi trng, v mi

trng cung cp thng tin phn hi hng dn cho thut ton ca qu trnh

hc.

Chuyn i:

Tng t hc c gim st nhng khng xy dng hm mt cch r rng.

Thay v th, c gng on kt qu mi da vo cc d liu hun luyn, kt qu

hun luyn, v d liu th nghim c sn trong qu trnh hun luyn.

Hc li:

66

L mt gii thut cp n vic b sung cc gii thut m chng trnh

hc s dng d on kt qu cho nhng trng hp cha tng gp trc y.

Khi ngi thit k chng trnh nhm mc tiu vo xy dng gii thut,

ngi c th cho chng trnh hc d on mt u ra ch no . Vi iu

ny, chng trnh hc hc s c c mt lng hu hn cc v d hun luyn

minh ha mi quan h mong mun gia gi tr u vo v u ra. Sau khi hc

thnh cng, chng trnh hc s tnh ton s xp x u ra ng, ngay c cho cc

v d vn cha c th trong sut qu trnh hun luyn. Khng c cc gi nh

b sung, nhim v ny khng th c gii quyt v cc tnh hung cha c

xem xt c th c u ra bt k. Loi gi nh cn thit v bn cht ca hm chc

nng ch c gi l qu trnh thin kin qui np (ting Anh: inductive bias).

Vic tip cn n mt nh ngha hnh thc hn ca thin kin qui np l

da trn lgic ton. y, thin kin qui np l mt cng thc lgic, cng vi

d liu hun luyn, i hi mt cch lgic gi thuyt a ra bi chng trnh

hc. Kt qu c c c th c xem l m t th v nhng kt qu ca cc i

tng hon ton.

Khi quyt nh xy dng mt h thng hc my, ngi thit k cn tr

c cc cu hi sau:

H thng truy xut d liu bng cch no? Vic ny ng ngha vi vic:

lm th no h thng hc c th s dng nhng tri thc thu thp c t d liu

hun luyn?

Nu chng trnh hc nm trong mt mi trng c th v thc hin c

cc hnh ng kim sot trn cc tp d liu u vo, ng thi c th cp nht

tri thc trong qu trnh thc th nh mt qu trnh hc tng cng. Hoc n c

th lm iu thng qua qu trnh c rt kinh nghim. D liu c th c th

b m ha, hay cha nhiu i tng gy nhiu, iu ny i hi chng trnh

hc phi c kh nng gii m hay nh gi mt cc xp x cc i tng gy

nhiu thc hin phn tch v kt qu t c tt nht. T quan im ny,

chng trnh hc c th c xy dng da trn cc m hnh thc thch hp:hc

c gim st, hay khng c gim st ty theo ngi thit k.

Chng trnh cn hc nhng g? Mc tiu cn t c l gi?

Cc dng hm chc nng khc nhau c th c nh ngha bn trong mt

chng trnh hc. Cc chc nng cc hm ny cn c xc nh thng qua s

mong mun nhng g c c sau qu trnh phn tch. Mc tiu ny c th c

m t bng u ra cc hm chc nng c s dng. Chng ta c th xc nh

67

c xp x mc tiu ny thng qua b d liu hun luyn hay thng qua nhng

phn ng ca chng trnh hc trong qu trnh x l b d liu thc t.

Lm th no khi qut(m t) c d liu? Lm th no xc nh ng

c s i s cho cc hm chc nng c th nh ngha c hm chc

nng?

Mt qu trnh quy np c th c xy dng xc nh mt cch gn

ng cc c trng ca hm chc nng. Qu trnh ny c th c hiu nh mt

qu trnh tm kim gi thit(hay m hnh) ca d liu, trong mt khng gian

rng ln d liu hay trong b d liu hun luyn m ngi thit k a vo. S

la chn m t xp x ny ny gip hn ch c b d liu cn thit ng thi

c th gim bt chi ph.

Qu trnh ny, c th c dng ch ra i din cho mt nhm nay ton

b tp d liu.

Thut ton no c th c p dng?

S la chn thut ton ph hp rt cn thit xy dng mt chng trnh

hc. V chng trnh hc cn thit phi hn ch bt can thip ca con ngi n

qu trnh phn tch, nn vic ny cn phi c thut ton tha nh cu: gip

chng trnh hc t c mt mc xp x gn tt nht theo nhu cu trn mt

tp d liu ln, lin tc c c cp nht. Vic xc nh th no l mt ln

hay mc tin cy ca chng trnh s c xc nh ty theo tng trng hp

c th.

S dng my hc trong x l ngn ng t nhin:

Hin nay, ngi ta c nhu cu p dng cc thnh tu ca my hc vo lnh

vc x l ngn ng t nhin. Nhiu m hnh hc my khc nhau c p

dng vo lnh vc x l ngn ng t nhin. Trc kia, ngi ta phi x l bng

tay mt khi lng d liu ln, bn cnh , mt khi lng ln quy tc c

s dng trong cc ngn ng cng gp phn lm tng khi lng cng vic ln

rt nhiu. Hu ht cc m hnh my hc c p dng u gi nguyn c bn

cht, tuy khng phi lc no cng p dng cc quy tc thng k, tm ra nhng

quy tc in hnh da trn cc mu d liu thu thp c.

V d: xt nhim v ca mt phn gn th cho mt bi pht biu, tc l xc

nh chnh xc ngha ca bi pht biu vi mi t trong mt cu nht nh,

thng l mt trong cha bao gi c thy trc y. Da trn hc my,

cc phng php nh du thng c tin hnh theo hai bc:

Bc u tin: cc bc o to - lm cho vic s dng ca mt th tp d

liu hun luyn, trong bao gm

Đề cương chi tiết bài giảng xlnntn

Documents