Đề cương chi tiết bài giảng xlnntn

69
1 BMÔN DUYỆT Chnhim Bmôn Ngô Hữu Phúc ĐỀ CƯƠNG CHI TIẾT BÀI GIẢNG (Dùng cho tiết ging) Hc phn: XLÝ NGÔN NGỮ TNHIÊN Nhóm môn học:..................... Bmôn: Khoa học máy tính Khoa (Vin): CNTT Thay mặt nhóm môn học Hà Chí Trung Thông tin về nhóm môn học TT Htên giáo viên Học hàm Hc v1 Hà Chí Trung GVC TS 3 Nguyễn Trung Tín TG TS Địa điểm làm việc: Gihành chính, Bộ môn Khoa học máy tính – Tng 13 nhà S4 Hc vin Kthuật Quân sự. Địa chliên hệ: Bmôn Khoa học máy tính – Khoa Công nghệ thông tin – Hc vin Kthuật Quân sự. 236 Hoàng Quốc Vit. Điện thoi, email: 01685582102, [email protected]; Bài giảng 01: Tng quan vxlý ngôn ngữ tnhiên Chương I, mục: Tiết th: 1-3 Tun th: 1 - Mục đích yêu cầu Mục đích: Trang bnhng hiu biết chung nht vmôn học; Nm vng các khái niệm, bài toán cơ bản trong Xlý ngôn ngữ tnhiên, cơ sở toán học làm cơ sở hc tập môn học. Yêu cầu: sinh viên phải hthng lại các kiến thức cơ sở vtoán rời rc, kiến thc lập trình, tự nghiên cứu và ôn tập li nhng vấn đề lý thuyết ngôn ngữ hình thức và văn phạm. - Hình thức tchc dy hc: Lý thuyết, tho lun, thc, tnghiên cứu - Thi gian: Giáo viên giảng: 2 tiết; Tho luận và làm bài tập trên lớp: 1 tiết; Sinh viên tự hc: 6 tiết. - Địa điểm: Giảng đường do P2 phân công. - Nội dung chính:

Upload: tran-anh

Post on 14-Sep-2015

37 views

Category:

Documents


12 download

DESCRIPTION

xu ly ngon ngu tu nhien

TRANSCRIPT

  • 1

    B MN DUYT

    Ch nhim B mn

    Ng Hu Phc

    CNG CHI TIT BI GING

    (Dng cho tit ging)

    Hc phn:

    X L NGN NG T NHIN

    Nhm mn hc:.....................

    B mn: Khoa hc my tnh

    Khoa (Vin): CNTT

    Thay mt nhm

    mn hc

    H Ch Trung

    Thng tin v nhm mn hc

    TT H tn gio vin Hc hm Hc v

    1 H Ch Trung GVC TS

    3 Nguyn Trung Tn TG TS

    a im lm vic: Gi hnh chnh, B mn Khoa hc my tnh Tng 13

    nh S4 Hc vin K thut Qun s.

    a ch lin h: B mn Khoa hc my tnh Khoa Cng ngh thng tin

    Hc vin K thut Qun s. 236 Hong Quc Vit.

    in thoi, email: 01685582102, [email protected];

    Bi ging 01: Tng quan v x l ngn ng t nhin

    Chng I, mc:

    Tit th: 1-3 Tun th: 1

    - Mc ch yu cu

    Mc ch: Trang b nhng hiu bit chung nht v mn hc; Nm vng

    cc khi nim, bi ton c bn trong X l ngn ng t nhin, c s ton hc

    lm c s hc tp mn hc.

    Yu cu: sinh vin phi h thng li cc kin thc c s v ton ri rc,

    kin thc lp trnh, t nghin cu v n tp li nhng vn l thuyt ngn ng

    hnh thc v vn phm.

    - Hnh thc t chc dy hc: L thuyt, tho lun, t hc, t nghin cu

    - Thi gian: Gio vin ging: 2 tit; Tho lun v lm bi tp trn lp: 1 tit;

    Sinh vin t hc: 6 tit.

    - a im: Ging ng do P2 phn cng.

    - Ni dung chnh:

  • 2

    1. Ti sao cn hc XLNNTN?

    2. ng dng ca x l ngn ng t nhin

    3. Cc vn ca XLNNTN

    4. Ni dung mn hc

    1. Ti sao cn hc XLNNTN?

    NLP - L mt nhnh ca tr tu nhn to tp trung vo cc ng dng

    trn ngn ng ca con ngi. Trong tr tu nhn to th x l ngn ng t nhin

    l mt trong nhng phn kh nht v n lin quan n vic phi hiu ngha

    ngn ng-cng c hon ho nht ca t duy, giao tip.

    Cc thut ton NLP hin i l c c s da trn cc thnh tu ca hc

    my, c bit l hc my thng k. Nghin cu cc thut ton NLP hin i i

    hi mt s hiu bit trn nhiu lnh vc khc nhau, bao gm ngn ng hc, khoa

    hc my tnh, v xc sut thng k.

    2. Ti sao XLNNTN l kh?

    Ambiguity

    At last, a computer that understands you like your mother"

    1. (*) It understands you as well as your mother understands you

    2. It understands (that) you like your mother

    3. It understands you as well as it understands your mother

    1 and 3: Does this mean well, or poorly?

    Ambiguity at Many Levels

    At the acoustic level (speech recognition):

    1. : : : a computer that understands you like your mother"

    2. : : : a computer that understands you lie cured mother"

    Ambiguity at Many Levels

    At the syntactic level:

    Different structures lead to different interpretations.

    ng gi i rt nhanh

    At the semantic (meaning) level:

    Two definitions of mother"

    a woman who has given birth to a child

  • 3

    a stringy slimy substance consisting of yeast cells and bacteria; is added

    to cider or wine to produce vinegar

    At the semantic (meaning) level:

    They put money in the bank

    = buried in mud?

    I saw her duck with a telescope

    At the discourse (multi-clause) level:

    Alice says they've built a computer that understands you like your mother

    But she

    doesn't know any details

    doesn't understand me at all

    This is an instance of anaphora, where she co-referees to some other

    discourse entity

    V d: ng gi i rt nhanh.

    3. ng dng ca x l ngn ng t nhin

    XLNN l mt trong nhng lnh vc mi nhn trong x hi thng tin

    1. Xy dng kho thut ng (Terminological Resources Construction)

    Mc ch: xy dng t in thut ng chuyn ngnh; bng thut ng dng

    trong nh my, x nghip; t in ln dng cho cc h thng ch mc ho ti

    liu; t in thut ng song ng dng cho dch thut v.v; Thu thp thut ng t

    kho vn bn.

    Cch tip cn: Xc nh t, ng on danh t. Xc nh cc nhm t

    thng cng xut hin (collocation)

    2. Tm kim, truy xut thng tin (Information Retrieval/Extraction)

    Mc ch: Tm kim cc vn bn c lin quan n truy vn; Sp xp cc

    vn bn tm c.

  • 4

    Cch tip cn: Ch mc ho ti liu (indexation); X l cu truy vn

    (chun ho, tm thut ng tng ng, v.v.); Sp xp kt qu truy vn (nh

    gi lin quan ca ti liu so vi truy vn)

    3. Tm tt vn bn (Text Summary)

    Mc ch: Sinh tm tt vn bn t ng

    Cch tip cn: Hiu vn bn t ng, rt gn, sinh tm tt; Xc nh cc

    n v vn bn ni bt, chn on vn bn tng ng, gp tm tt; Lc tm tt

    vn bn nh phn loi ng ngha cu da theo cc cu trc ngn ng.

    4. Dch t ng (Machine Translation)

    Mc ch: Dch t ng; Tr gip dch bng my

    Cch tip cn: Phn tch vn bn ngun (sa li, chun ho, n gin ho,

    ch gii ngn ng); Dch t ng (kh thi trn cc vn bn trong phm vi

    hp)/bn t ng (can thip trn ngn ng ngun hoc ch); Sa bn dch

    5. Hiu vn bn t ng (Automatic Text Comprehension)

    Mc ch: Nhn bit ch vn bn; Thit lp quan h gia cc cu (cu

    trc nguyn nhn, chui thi gian, i t, v.v).

    Cch tip cn: Phn tch cu trc vn bn thit lp c quan h gia

    cc thnh phn trong vn bn; Phn tch ch , hnh ng, nhn vt, cu trc

    mnh v.v.

    6. Sinh vn bn t ng (Automatic Text Generation)

    Mc ch: Sinh vn bn cho h thng dch; Sinh vn bn cho h thng hi

    thoi ngi my; Sinh vn bn din t cc d liu s

    Cch tip cn: Phn tch ni dung mc su: mng ng ngha, khi nim;

    T chc ni dung su thnh cc mnh cn din t; Xy dng cy c php,

    chnh sa hnh thi t.

    7. i thoi ngi - my (Human-Machine Dialogue)

    Mc ch: Xy dng h thng giao tip ngi my.

    Cch tip cn: Tin x l u vo: nhn dng ting ni; Hiu vn bn t

    ng (c bit ch n vn phn tch tham chiu - reference); Sinh vn bn

    t ng; Tng hp ting ni.

    1.2. Cc vn ca XLNN

    1. X l n ng (Monolingual Processing)

    Phn tch vn bn tng cp

    - T vng (Lexical/Morpho-syntactic Analysis)

    - C php (Syntactic Analysis/Parsing)

    - Ng ngha (Semantic Analysis)

  • 5

    - Ng dng (Pragmatics)

    2. X l a ng (Multilingual Processing)

    Xy dng cng c

    - Ging hng a ng (Multilingual Alignment)

    - Tr gip dch a ng (Machine Translation)

    - Tm kim thng tin a ng (Multilingual Information Retrieval)

    1.3. Ti nguyn ngn ng cho XLNN

    1. Tm quan trng

    Cng c v ti nguyn trong XLNN

    Cng c, phng php: mang tnh tng qut, p dng c cho nhiu

    ngn ng

    Ti nguyn: c trng cho tng ngn ng; xy dng rt tn km) dn n

    nhu cu chia s, trao i ti nguyn ngn ng

    Cc "ngn hng" ng liu ln:

    LDC (Linguistic Data Consortium), ELDA (Evaluations and Language

    resources Distribution Agency), OLAC (Open Language Archives Community),

    v.v.

    2. X l n ng

    T in (lexicon)

    - Thng tin hnh thi (morphology)

    - Thng tin c php (syntax)

    - Thng tin ng ngha (semantics), bn th hc (ontology)

    Ng php (grammar)

    - Vn phm hnh thc (Grammar Formalisms)

    Kho vn bn (Corpora)

    - Kho vn bn th (Raw Corpus)

    - Kho vn bn c ch gii ngn ng (Annotated Corpus) t, t loi,

    cu trc ng php, v.v.

    3. X l a ng

    T in a ng

    - T in song ng

    - T in a ng

    Ng php

    - Vn phm song ng(Bilingual Grammar)

    Kho vn bn a ng (Multilingual/Parallel Corpus)

    - Kho vn bn a ng th

  • 6

    - Kho vn bn a ng ging hng (Aligned Multilingual Corpus),

    c hoc khng c ch gii ngn ng

    - B nh dch (Translation Memory)

    1.4. Vn chun ho (Standardization)

    1. Yu cu chun ho ti nguyn ngn ng

    Nhu cu trao i ng liu: Biu din nht qun; M ho chun

    Cc hot ng chun ho: Cc d n hng ti chun (EAGLES, TEI,

    v.v.); D n ISO TC 37/SC 4

    2. Cc kha cnh chun ho

    M hnh biu din: T in; Ch gii kho vn bn, v.v.

    Thut ng, phm tr d liu: Thut ng chun (Terminology); DCR (Data

    Category Registry)

    Ngn ng m ho: XML; RDF (Resource Description Framework), OWL

    (Web Ontology Language), v.v.

    4. Ni dung mn hc

    1. Tng quan v x l ngn ng t nhin (1 lecture)

    2. B tc mt s khi nim, thut ng trong NLP (1 lecture)

    3. M hnh ngn ng v cc k thut lm mn (1 lecture)

    4. Vn gn nhn v m hnh Markov n (2 lectures)

    5. Phn tch da trn thng k (2 lectures)

    6. Dch my (2 lectures)

    7. Log-linear models (2 lectures)

    8. Conditional random fields, and global linear models (2 lectures)

    9. Unsupervised/semi-supervised learning in NLP (2 lectures)

    - Ni dung tho lun

    1. Phn bit cc dng thc ngn ng s ging, khc nhau gia ngn ng

    lp trnh v ngn ng t nhin.

    - Yu cu SV chun b

    n tp li cc kin thc lin quan n l thuyt ngn ng hnh thc,

    automata hu hn v biu thc chnh quy.

    - Ti liu tham kho

    1. Speech&Language Procesing: An Introduction to Natural Language

    Processing, Computational Linguistics, and Speech Recognition, 2nd

    edition, Daniel Jurafsky and James Martin. Prentice Hall, 2008. Chng 1.

  • 7

    2. Foundations of Statistical Natural Language Processing, Christopher

    Manning and Hinrich Schtze, MIT Press, 1999. Chng 1.

    3. Data Mining: Practical Machine Learning Tools and Techniques (3rd ed),

    Ian H. Witten and Eibe Frank, Morgan Kaufmann, 2005. Chng 1.

    - Ghi ch: Cc mn hc tin quyt : tr tu nhn to, cu trc d liu v gii

    thut, lp trnh cn bn.

    Bi ging 02: B tc mt s khi nim, thut ng trong XLNNTN

    Chng I, mc:

    Tit th: 1-3 Tun th: 2

    - Mc ch yu cu

    Mc ch: Cung cp cc khi nim v thut ng c bn trong x l ngn

    ng t nhin; cc vn t ra trong x l ngn ng t nhin v ng dng

    Yu cu: Sinh vin nm vng khi nim lm tin cho theo di cc bi

    ging tip theo ca mn hc.

    - Hnh thc t chc dy hc: L thuyt, tho lun, t hc, t nghin cu

    - Thi gian: Gio vin ging: 2 tit; Tho lun v lm bi tp trn lp: 1 tit;

    Sinh vin t hc: 6 tit.

    - a im: Ging ng do P2 phn cng.

    - Ni dung chnh:

    2.1. Tm tt c im ting Vit

    1. Lch s pht trin ting Vit

    Qu trnh pht trin: H Nam , nhnh Mn-Khmer, khi ng Mn-

    Khmer, nhm Vit-Mng (A. Haudricourt, 1953); Quan h tip xc vi cc

    ngn ng trong khu vc, c bit l cc ting h Thi; Thi k Bc thuc, vay

    mn ting Hn (xp x 70% vn t vng ting Vit gc Hn); Thi k Php

    thuc, vay mn t ting Php, "sao phng ng php" chu u

    Loi hnh ngn ng ting Vit

    Cc loi hnh ngn ng

    Bin hnh (flexional languages)

    - Bin i hnh thi t th hin quan h ng php

    - Cu to t: cn t, ph t kt hp cht ch

    - Mt ph t c th biu din nhiu ngha ng php

    - V d: ting Anh, Php, Nga

  • 8

    Chp dnh (agglutinating languages)

    - Cu to t mi bng cch chp dnh cn t vi cc ph t

    - Cn t c th ng c lp

    - Mi ph t ch th hin mt ngha nht nh

    - V d: ting Th Nh K, Nht, Triu Tin

    a tng hp (polysynthetic languages)

    - C n v t c bit c th lm thnh cu

    - C c tnh cht ca ngn ng bin hnh v chp dnh

    - V d: Mt s ngn ng vng Kapkaz

    n lp (isolating languages)

    - T khng c hin tng bin hnh

    - Quan h ng php c din t bng trt t t (word order) hoc cc h t

    (tool words)

    - n v hnh tit = m tit (syllable) = hnh v (morpheme)

    - V d: Hn, Thi, Vit l ngn ng n lp.

    Ch vit v h thng m

    Ch vit

    - Da trn bng ch ci latin

    - Ch vit: k m (phonetic transcription)

    - Cc quy nh chun ho cha c tn trng (i hay y, qui hay quy, phin m

    ting nc ngoi)

    H thng m

    - H thng m chun cho ting Vit ph thng (cha c a vo t in)

    - Cc cch pht m a phng

    - (Tham kho thm http://www.vietlex.com)

    T v t loi ting Vit

    T trong t in ting Vit (Trung tm t in hc)

    T n: t n tit, mt s t a tit

    T phc: t a tit

    - Kt hp chnh ph (semantic subordination): xe p

    - Kt hp song song (semantic coordination): qun o, non nc, giang sn

    - Ly (reduplication): trng trng

    - Qun ng (expression): u b u bu

    T loi trong t in ting Vit

  • 9

    - Danh t (noun), ng t (verb), tnh t (adjective), i t (pronoun), ph t

    (adverb), kt t (conjunction/linking word), tnh thi t (modal word), thn t

    (interjection)

    - Hin tng chuyn loi (category mutation) ph bin

    Ng php

    Cu to ng

    - Th t chnh - ph ng vai tr ch o

    - S dng h t th hin s nhiu, quan h thi, quan h ph thuc v lin hp

    song song

    - S dng dng ly, ng iu thay i sc thi ngha

    Cu to cu

    - Th t thng thng S-V-O

    - Th t - thuyt (topic prominent): Cy l to. Nh xy ri.

    2.2. Phn tch t vng

    Mt s thut ng

    T (word)

    - Hnh v (morpheme), gc t (stem), t v (lexeme), t v chun tc

    (lemma)

    T loi (part-of-speech - POS)

    - Phn loi t (word category): danh t, ng t, tnh t, v.v.

    - c im hnh thi t (morphology): dng t bin hnh (inflectional

    forms)

    Phn tch t vng ting Vit

    Phn on t (Word segmentation): Nhp nhng do t a tit; Cng

    c hin c?

    Gn nhn t loi (POS tagging): Xc nh tp t loi; Gii quyt

    nhp nhng do hin tng chuyn loi, t ng ngha; Khng da c

    vo hnh thi t; Cng c hin c?

    2.3. Phn tch ng php

    Phn tch cm t (chunking):

    - Phn tch c php nng

    - Hai cch tip cn: quy tc (vn phm chnh quy), thng k (bi ton

    gn nhn)

    - Ph thuc vo kt qu tch t v gn nhn t loi

  • 10

    - Cng c hin c?

    Phn tch cy c php (parsing)

    - C php thnh phn (constituency), c php ph thuc

    - (dependency)

    - Hai cch tip cn: Thng k, da vo quy tc

    - Cng c hin c?

    2.4. Phn tch ng ngha

    - Ni dung tho lun

    2. Kinh nghim trong qu trnh bin dch v debug khi lp trnh trong mi

    trng Turbo C v Visual C++.

    3. S ging, khc nhau gia ngn ng lp trnh v ngn ng t nhin.

    4. S ging, khc nhau gia trnh bin dch v ngi bin dch.

    - Yu cu SV chun b

    n tp li cc kin thc lin quan n l thuyt ngn ng hnh thc,

    automata hu hn v biu thc chnh quy.

    - Bi tp

    - Ti liu tham kho

    1. Speech&Language Procesing: An Introduction to Natural Language

    Processing, Computational Linguistics, and Speech Recognition, 2nd

    edition, Daniel Jurafsky and James Martin. Prentice Hall, 2008. Chng 2.

    - Cu hi n tp

    - Ghi ch: Cc mn hc tin quyt : ton ri rc, cu trc d liu v gii thut,

    lp trnh cn bn.

    Bi ging 03: M hnh ngn ng v cc k thut lm mn

    Chng I, mc:

    Tit th: 1-3 Tun th: 3

    - Mc ch yu cu

    Mc ch: Trang b kin thc v m hnh ha cc m hnh biu din ngn

    ng t nhin.

    Yu cu: Nm vng cc m hnh biu din ngn ng trong hc my.

  • 11

    - Hnh thc t chc dy hc: L thuyt, tho lun, t hc, t nghin cu

    - Thi gian: Gio vin ging: 2 tit; Tho lun v lm bi tp trn lp: 1 tit;

    Sinh vin t hc: 6 tit.

    - a im: Ging ng do P2 phn cng.

    - Ni dung chnh:

    1. Vn m hnh ha ngn ng

    2. M hnh N-gram (bigram, trigram)

    3. nh gi m hnh ngn ng

    4. Cc k thut lm mn

    4.1. Ni suy tuyn tnh (Linear interpolation)

    4.2. Chit khu (Discounting methods)

    4.3. Truy hi (Back-off)

    1. Vn m hnh ha ngn ng

    M hnh ngn ng c p dng trong rt nhiu lnh vc ca x l ngn

    ng t nhin nh: kim li chnh t, dch my hay phn on t... Chnh v vy,

    nghin cu m hnh ngn ng chnh l tin nghin cu cc lnh vc tip

    theo.

    M hnh ngn ng c nhiu hng tip cn, nhng ch yu c xy dng

    theo m hnh Ngram.

    M hnh ngn ng l mt phn b xc sut trn cc tp vn bn. Ni n

    gin, m hnh ngn ng c th cho bit xc sut mt cu (hoc cm t) thuc

    mt ngn ng l bao nhiu.

    V d 1: khi p dng m hnh ngn ng cho ting Vit:

    P[hm qua l th nm] = 0.001

    P[nm th hm l qua] = 0

    V d 2: We have some (finite) vocabulary,

    say V = {the, a, man, telescope, Beckham, two, }

    I We have an (infinite) set of strings, Vt

    the STOP

    a STOP

    the fan STOP

    the fan saw Beckham STOP

    the fan saw saw STOP

    the fan saw Beckham play for Real Madrid STOP

  • 12

    We have a training sample of example sentences in English

    We need to learn a probability distribution p i.e., p is a function that

    satisfies:

    () = 1, () 0

    p(the STOP) = 10-12

    p(the fan STOP) = 10-8

    p(the fan saw Beckham STOP) = 2 x10-8

    p(the fan saw saw STOP) = 10-15

    p(the fan saw Beckham play for Real Madrid STOP) = 2 x10-9

    2. M hnh N-gram (bigram, trigram)

    Nhim v ca m hnh ngn ng l cho bit xc sut ca mt cu ww...wl

    bao nhiu. Theo cng thc Bayes: P(AB) = P(B|A) * P(A), th:

    P(www) = P(w) * P(w|w) * P(w|ww) ** P(w|www)

    Theo cng thc ny, m hnh ngn ng cn phi c mt lng b nh v

    cng ln c th lu ht xc sut ca tt c cc chui di nh hn m. R

    rng, iu ny l khng th khi m l di ca cc vn bn ngn ng t nhin

    (m c th tin ti v cng). c th tnh c xc sut ca vn bn vi lng

    b nh chp nhn c, ta s dng xp x Markov bc n:

    P(w|w,w,, w) = P(w|w,w, ,w)

    Nu p dng xp x Markov, xc sut xut hin ca mt t (w) c coi

    nh ch ph thuc vo n t ng lin trc n (www) ch khng phi ph

    thuc vo ton b dy t ng trc (www). Nh vy, cng thc tnh xc sut

    vn bn c tnh li theo cng thc:

    P(www) = P(w) * P(w|w) * P(w|ww) ** P(w|www)* P(w|www)

    Vi cng thc ny, ta c th xy dng m hnh ngn ng da trn vic

    thng k cc cm c t hn n+1 t. M hnh ngn ng ny gi l m hnh ngn

    ng N-gram.

    Mt cm N-gram l 1 dy con gm n phn t lin tip nhau ca 1 dy cc

    phn t cho trc.

    3. nh gi m hnh ngn ng

    Khi s dng m hnh N-gram theo cng thc xc sut th, s phn b

    khng u trong tp vn bn hun luyn c th dn n cc c lng khng

  • 13

    chnh xc. Khi cc N-gram phn b tha, nhiu cm n-gram khng xut hin

    hoc ch c s ln xut hin nh, vic c lng cc cu c cha cc cm n-

    gram ny s c kt qu ti. Vi

    V l kch thc b t vng, ta s c Vcm N-gram c th sinh t b t

    vng. Tuy nhin, thc t th s cm N-gram c ngha v thng gp ch chim

    rt t.

    V d: ting Vit c khong hn 5000 m tit khc nhau, ta c tng s cm

    3-gram c th c l: 5.000= 125.000.000.000 Tuy nhin, s cm 3-gram thng

    k c ch xp x 1.500.000. Nh vy s c rt nhiu cm 3-gram khng xut

    hin hoc ch xut hin rt t.

    Khi tnh ton xc sut ca mt cu, c rt nhiu trng hp s gp cm

    Ngram cha xut hin trong d liu hun luyn bao gi. iu ny lm xc sut

    ca c cu bng 0, trong khi cu c th l mt cu hon ton ng v mt ng

    php v ng ngha. khc phc tnh trng ny, ngi ta phi s dng mt s

    phng php lm mn (Estimation techniques).

    4. Cc k thut lm mn:

    khc phc tnh trng cc cm N-gram phn b tha nh cp,

    ngi ta a ra cc phng php lm mn kt qu thng k nhm nh gi

    chnh xc hn (mn hn) xc sut ca cc cm N-gram. Cc phng php lm

    mn nh gi li xc sut ca cc cm N-gram bng cch:

    Gn cho cc cm N-gram c xc sut 0 (khng xut hin) mt gi tr

    khc 0.

    Thay i li gi tr xc sut ca cc cm N-gram c xc sut khc 0

    (c xut hin khi thng k) thnh mt gi tr ph hp (tng xc sut

    khng i).

    Cc phng php lm mn c th c chia ra thnh loi nh sau:

    Chit khu (Discounting): gim (lng nh) xc sut ca cc

    cm Ngram c xc sut ln hn 0 b cho cc cm Ngram khng

    xut hin trong tp hun luyn.

    Truy hi (Back-off): tnh ton xc sut cc cm Ngram khng xut

    hin trong tp hun luyn da vo cc cm Ngram ngn hn c xc

    sut ln hn 0

    Ni suy (Interpolation): tnh ton xc sut ca tt c cc cm

    Ngram da vo xc sut ca cc cm Ngram ngn hn.

    - Yu cu SV chun b

  • 14

    n tp li cc kin thc lin quan n l thuyt ngn ng hnh thc,

    automata hu hn v biu thc chnh quy.

    - Ti liu tham kho

    1. Speech&Language Procesing: An Introduction to Natural Language

    Processing, Computational Linguistics, and Speech Recognition, 2nd

    edition, Daniel Jurafsky and James Martin. Prentice Hall, 2008. Chng 3.

    2. Foundations of Statistical Natural Language Processing, Christopher

    Manning and Hinrich Schtze, MIT Press, 1999. Chng 2.

    - Cu hi n tp

    - Ghi ch: Cc mn hc tin quyt : ton ri rc, cu trc d liu v gii thut,

    lp trnh cn bn.

    Bi ging 04: Ngn ng hnh thc v Automata hu hn

    Chng 4, mc:

    Tit th: 1-3 Tun th: 4,5

    - Mc ch yu cu

    Mc ch: Nm c cc khi nim c bn v cng c lm vic vi ngn

    ng l vn phm phi ng cnh v automata hu hn.

    Yu cu: Nm vng l thuyt ngn ng hnh thc, cc dng thc automata

    hu hn v ng dng trong x l ngn ng.

    - Hnh thc t chc dy hc: L thuyt, tho lun, t hc, t nghin cu

    - Thi gian: Gio vin ging: 2 tit; Tho lun v lm bi tp trn lp: 1 tit;

    Sinh vin t hc: 6 tit.

    - a im: Ging ng do P2 phn cng.

    - Ni dung chnh:

    1. An introduction to the parsing problem

    2. Context free grammars

    3. A brief(!) sketch of the syntax of English

    4. Examples of ambiguous structures

    5. Probabilistic Context-Free Grammars (PCFGs)

    6. The CKY Algorithm for parsing with PCFGs

    7. Lexicalization of a treebank

  • 15

    8. Lexicalized probabilistic context-free grammars

    9. Parameter estimation in lexicalized probabilistic context-free grammars

    10. Accuracy of lexicalized probabilistic context-free grammars

    1. C php (Syntax)

    Mc ch phn tch c php: Kim tra mt cu c ng ng php hay

    khng; Ch ra cc ng on (syntagm) v quan h ph thuc gia chng cho

    vic xy dng ngha ca cu

    V d: Con mo con ang xi mt con chut cng to b.

    [[[Con [mo]] con]NP[[ang xi] [mt [[con[chut cng]] to b]]NP]VP]S.

    T vng v ng php:

    T vng v ng php:

    T vng (lexicon): T vng cha tt c cc t trong ngn ng; T vng

    phi cha cc thng tin ng m, hnh thi, ng php, ng ngha ca mi t

    Ng php (grammar): Phm tr ng php (t loi, ng on, v.v.); Quy tc

    (ng m, hnh thi, ng php, ng ngha, ng dng)

    T vng v ng php b sung cho nhau.

    2. Vn phm hnh thc

    Vn phm G l mt b sp th t gm 4 thnh phn G = < , , S, P >,

    trong :

    o - bng ch ci, gi l bng ch ci c bn (bng ch ci kt thc

    terminal symbol);

    o , =, gi l bng k hiu ph (bng ch ci khng kt thc

    nonterminal symbol);

    o S - k hiu xut pht hay tin (start variable);

    o P - tp cc lut sinh (production rules) dng , , ( )*,

    trong cha t nht mt k hiu khng kt thc (i khi, ta gi chng

    l cc qui tc hoc lut vit li).

    Cc quy c trong vic a ra vn phm. Trong mn hc s dng:

    o Ch ci in hoa A, B, C, biu th cc bin, trong S l k hiu

    xut pht;

    o X, Y, Z, biu din cc k t cha bit hoc cc bin;

    o a, b, c, d, e, biu din ch ci;

    o u, v, w, x, y, z, biu din chui ch ci;

    o , , , biu th chui cc bin hoc cc k hiu kt thc.

  • 16

    Khi nim v dn xut trc tip, dn xut gin tip, dn xut ng lc v

    khng lp, cy dn xut, vn phm tng ng, ngn ng sinh bi vn phm.

    Avram Noam Chomsky a ra mt h thng phn loi cc vn phm da

    vo tnh cht ca cc lut sinh (1956).

    Vn phm loi 0 Vn phm khng hn ch (UG Unrestricted

    Grammar): khng cn tha iu kin rng buc no trn tp cc lut sinh;

    Vn phm loi 1 Vn phm cm ng cnh (CSG Context Sensitive

    Grammar): nu vn phm G c cc lut sinh dng v:

    , || ||;

    Vn phm loi 2 Vn phm phi ng cnh (CFG Context-Free

    Grammar): c lut sinh dng A vi A l mt bin n v l chui cc k

    hiu thuc ( )*;

    Vn phm loi 3 Vn phm chnh quy (RG Regular Grammar): c

    mi lut sinh dng tuyn tnh phi hoc tuyn tnh tri.

    Tuyn tnh phi: A aB hoc A a;

    Tuyn tnh tri: A Ba hoc A a;

    Vi A, B l cc bin n, a l k hiu kt thc (c th l rng).

    Nu k hiu L0, L1, L2, L3 l lp cc ngn ng c sinh ra bi vn phm

    loi 0, 1, 2, 3 tng ng, ta c: L3 L2 L1 L0.

    L0, L1 - lp ngn ng quy on nhn c bng my Turing

    L2 lp ngn ng i s nhn bit c nh tmat y xung

    L3 lp ngn ng nhn bit c nh tmat hu hn trng thi

    Vn phm hnh thc cho phn tch c php: ngha ngn ng hc ca

    G=

    biu din t vng ca ngn ng;

    biu din cc phm tr ng php: cu, cc ng on (danh ng, ng

    ng, v.v), cc t loi (danh t, ng t, v.v.)

    Tin S tng ng vi phm tr cu

    Tp quy tc sinh P biu din cc quy tc c php. Cc quy tc cha t nht

    mt k hiu kt (t) gi l quy tc t vng. Cc quy tc khc gi l quy tc ng

    on.

    Mi t trong t vng (t in) c m t bng mt tp cc quy tc sinh

    cha t ny v phi.

    Mi cy dn xut (cy c php) m t phn tch ca mt ng on thnh

    cc thnh phn trc tip.

    , , , ,A A

  • 17

    Nhp nhng ngn ng v vn phm:

    Nhp nhng t vng: Mt t c nhiu t loi ()

    Nhp nhng ng on: Mt cm t c th phn tch thanh cc cm con

    theo nhiu cch khc nhau (ng gi i nhanh qu)

    3. Ngn ng chnh quy v tmat

    Automata l mt my tru tng (m hnh tnh ton) c c cu v hot

    ng n gin nhng c kh nng on nhn ngn ng.

    Finite automata (FA) - m hnh tnh ton hu hn: c khi u v kt

    thc, mi thnh phn u c kch thc hu hn c nh v khng th m rng

    trong sut qu trnh tnh ton;

    Hot ng theo theo tng bc ri rc (steps);

    Ni chung, thng tin ra sn sinh bi mt FA ph thuc vo c thng tin vo

    hin ti v trc . Nu s dng b nh (memory), gi s rng n c t nht

    mt b nh v hn;

    S phn bit gia cc loi automata khc nhau ch yu da trn vic

    thng tin c th c a vo memory hay khng;

    nh ngha: mt DFA l mt b nm: A=(Q, , , q0, F), trong :

    1. Q : tp khc rng, tp hu hn cc trng thi (p, q);

    2. : b ch ci nhp vo (a, b, c );

    3. : D Q, hm chuyn (hay nh x), D Q , c ngha l (p, a)

    =q hoc (p, a) = , trong p, q Q , a ;

    4. q0 Q : trng thi bt u (start state);

    5. F Q : tp cc trng thi kt thc (finish states).

    Trong trng hp D = Q ta ni A l mt DFA y .

    nh ngha: Automat hu hn a nh c nh ngha bi b 5: A = (Q,

    , , q0, F), trong :

    1. Q - tp hu hn cc trng thi.

    2. - l tp hu hn cc ch ci.

    3. - l nh x chuyn trng thi. : Q 2Q

    4. q0 Q l trng thi khi u.

    5. F Q l tp trng thi kt;

    nh x l mt hm a tr (hm khng n nh), v vy A c gi l

    khng n nh;

    nh ngha: NFA vi -dch chuyn (NFA) l b nm: A= (Q, , , q0,

    F), trong :

  • 18

    1. Q: tp hu hn cc trng thi;

    2. : tp hu hn cc ch ci;

    3. : Q( {}) 2Q ;

    4. q0 l trng thi ban u;

    5. F Q l tp trng thi kt thc.

    Biu thc chnh quy

    nh ngha: Biu thc chnh quy c nh ngha mt cch quy nh

    sau:

    1. l biu thc chnh quy. L()={}.

    l biu thc chnh quy. L()={}.

    nu a, a l biu thc chnh quy. L(a)={a}.

    2. Nu r, s l cc biu thc chnh quy th:

    ((r)) l biu thc chnh quy. L((r))=L(r);

    r+s l biu thc chnh quy. L(r+s)=L(r)L(s);

    r.s l biu thc chnh quy. L(r.s)=L(r).L(s);

    r* l biu thc chnh quy. L(r*)=L(r)*.

    3. Biu thc chnh quy ch nh ngha nh trong 1 v 2.

    * Tm c v RE:

    1. Jeffrey E. F. Friedl. Mastering Regular Expressions, 2nd Edition.

    O'Reilly & Associates, Inc. 2002.

    2. http://www.regular-expressions.info/

    nh l 1: Nu L l tp c chp nhn bi mt NFA th tn ti mt DFA

    chp nhn L.

    Gii thut tng qut xy dng DFA t NFA:

    Gi s NFA A={Q, , , q0, F} chp nhn L, gii thut xy dng DFA

    A={Q, , , q0, F} chp nhn L nh sau:

    o Q = 2Q , phn t trong Q c k hiu l [q0, q1, , qi] vi q0, q1,

    , qi Q;

    o q0 = [q0];

  • 19

    o F l tp hp cc trng thi ca Q c cha t nht mt trng thi

    kt thc trong tp F ca A;

    o Hm chuyn ([q1, q2,..., qi], a) = [p1, p2,..., pj] nu v ch nu

    ({q1, q2,..., qi }, a) = {p1, p2,..., pj}.

    o i tn cc trng thi [q0, q1, , qi].

    nh l 2: Nu L c chp nhn bi mt NFAe th L cng c chp

    nhn bi mt NFA khng c e-dch chuyn.

    Thut ton: Gi s ta c NFAe A(Q, , , q0, F) chp nhn L, ta xy

    dng: NFA A={Q, , , q0, F} nh sau:

    o F = F q0 nu *( q0) cha t nht mt trng thi thuc F. Ngc

    li, F = F;

    o (q, a) = *(q, a).

    H qu: Nu L l tp c chp nhn bi mt NFA th tn ti mt DFA

    chp nhn L.

    Gii thut xy dng cho DFA tng ng:

    1. Tm kim T = e* (q0) ; T cha c nh du;

    2. Thm T vo tp Q (of DFA);

    3. while (xt trng thi T Q cha nh du){

    3.1. nh du T;

    3.2. forearch (vi mi k hiu nhp a){

    U:= e*(d(T,a));

    if(U khng thuc tp trng thi Q){

    add U to Q;

    Trng thi U cha c nh du;

    }

    [T,a]= U;

    }

    }

    nh l 3: nu r l RE th tn ti mt NFA chp nhn L(r).

    (chng minh: bi ging, gii thut Thompson)

  • 20

    nh l 4: Nu L c chp nhn bi mt DFA, th L c k hiu bi

    mt RE.

    Chng minh:

    L c chp nhn bi DFA A({q1, q2,..., qn}, , , q1, F)

    t Rkij = {x | (qi, x) = qj v nu (qi, y) = ql (y x) th l k} (c

    ngha l Rkij - tp hp tt c cc chui lm cho automata i t trng

    thi i n trng thi j m khng i ngang qua trng thi no ln hn

    k)

    nh ngha quy ca Rkij:

    Rkij = Rk-1ik(Rk-1

    kk)*Rk-1

    kj Rk-1ij

    Ta s chng minh (quy np theo k) b sau: vi mi Rkij u tn ti mt

    biu thc chnh quy k hiu cho Rkij .

    k = 0: R0ij l tp hu hn cc chui 1 k hiu hoc e

    Gi s ta c b trn ng vi k-1, tc l tn ti RE

    Rk-1lm sao cho L(Rk-1

    lm) = rk-1

    lm

    Vy i vi rkij ta c th chn RE:

    rkij = (rk-1

    ik)(rk-1

    kk)*(rk-1

    kj) + rk-1

    ij

    b c chng minh

    nhn xt: () = 1 . Vy L c th c k hiu bng RE:

    r = rn1j1 + rn1j2 + + rn1jp vi F = {qj1, qj2, , qjp}

    4. My chuyn hu hn trng thi (tmat hu hn c u ra)

    ng dng my chuyn hu hn trng thi:

    Phn on vn bn thnh cc cu, phn on cu thanh cc t

    Phn tch t thnh cc hnh v (ngn ng bin hnh) Gn nhn t loi

  • 21

    Ci t trnh phn tch c php vn phm phi ng cnh: my chuyn

    quy. Mi quy tc biu din bng mt my chuyn, vn phm l my chuyn vi

    xu vo l cu cn phn tch, xu ra l cu phn tch c php t ngoc.

    5. Vn phm phi ng cnh v phn tch c php

    Thut ton phn tch c php:

    Nguyn l chung

    Cc chin lc:

    Phn tch t di ln (nhn bit): qu trnh phn tch c hng

    dn bi cu vo (quy tc sinh c dng t phi sang tri)

    Phn tch t trn xung (on bit): qu trnh phn tch c hng

    dn bi cc gi thuyt (quy tc sinh c dng t tri sang phi)

    Kt hp 2 chin lc

    hn ch vic lp li cc tnh ton, ngi ta s dng mt bng ghi nh

    cc kt qu trung gian.

    Hn ch ca vn phm phi ng cnh:

    Cc hn ch chnh:

    Cy kt qu khng th hin cc rng buc ng ngha trong cu phn tch

    S a dng ca cc cu trc c php i hi mt s lng rt ln cc quy

    tc ng php, nhng khng c cch biu din lin h gia chng vi nhau.

    Chomsky a ra vn phm ci bin (transformational grammar), nhng vn

    phm ny cng b ph phn v mt ngn ng rt nhiu. Hn na phc tp

    tnh ton tr v tng ng vi vn phm dng 0.

    T ra i nhiu h hnh thc vn phm mi.

    Parsing (Syntactic Structure)

    INPUT:

    Boeing is located in Seattle.

    OUTPUT:

  • 22

    Syntactic Formalisms

    Work in formal syntax goes back to Chomsky's PhD thesis in the 1950s

    Examples of current formalisms: minimalism, lexical functional grammar

    (LFG), head-driven phrase-structure grammar (HPSG), tree adjoining

    grammars (TAG), categorical grammars

    Data for Parsing Experiments

    Penn WSJ Treebank = 50,000 sentences with associated trees

    Usual set-up: 40,000 training sentences, 2400 test sentences

    An example tree:

    The Information Conveyed by Parse Trees

    (1) Part of speech for each word

    (N = noun, V = verb, DT = determiner)

    (2) Phrases

  • 23

    Noun Phrases (NP): the burglar", the apartment"

    Verb Phrases (VP): robbed the apartment"

    Sentences (S): the burglar robbed the apartment"

    (3) Useful Relationships

    =>the burglar" is the subject of robbed"

    An Example Application: Machine Translation

    English word order is subject - verb - object

    Japanese word order is subject - object - verb

    English: IBM bought Lotus

    Japanese: IBM Lotus bought

    English: Sources said that IBM bought Lotus yesterday

    Japanese: Sources yesterday IBM Lotus bought that said

  • 24

    Context-Free Grammars

    A Context-Free Grammar for English

    N = {S, NP, VP, PP, DT, Vi, Vt, NN, IN}

    S = S

    = {sleeps, saw, man, woman, telescope, the, with, in}

    Note: S=sentence, VP=verb phrase, NP=noun phrase,

    PP=prepositional phrase, DT=determiner, Vi=intransitive verb,

    Vt=transitive verb, NN=noun, IN=preposition

    Left-Most Derivations

  • 25

    For example: [S], [NP VP], [D N VP], [the N VP], [the man VP], [the man

    Vi], [the man sleeps]

    Representation of a derivation as a tree:

    An Example

    DERIVATION RULES USED

    S S NP VP

    NP VP NP DT N

    DT N VP DT the

    the N VP N dog

    the dog VP VP VB

    the dog VB VB laughs

    the dog laughs

    Properties of CFGs

    A CFG defines a set of possible derivations

    A string s is in the language defined by the CFG if there is at least one

    derivation that yields s

    Each string in the language generated by the CFG may have more than

    one derivation (ambiguity")

    An Example of Ambiguity

  • 26

    The Problem with Parsing: Ambiguity

    INPUT: She announced a program to promote safety in trucks and vans

    POSSIBLE OUTPUTS:

  • 27

    A Brief Overview of English Syntax

    Parts of Speech (tags from the Brown corpus):

    Nouns

    o NN = singular noun e.g., man, dog, park

    o NNS = plural noun e.g., telescopes, houses, buildings

    o NNP = proper noun e.g., Smith, Gates, IBM

    Determiners

    o DT = determiner e.g., the, a, some, every

    Adjectives

    o JJ = adjective e.g., red, green, large, idealistic

    - Ni dung tho lun

    1. S ging, khc nhau gia ngn ng lp trnh v ngn ng t nhin.

    2. Cc chin lc phn tch c php t trn xung v t di ln

    - Yu cu SV chun b

    n tp li cc kin thc lin quan n l thuyt ngn ng hnh thc,

    automata hu hn v biu thc chnh quy.

    - Ti liu tham kho

    1. Speech&Language Procesing: An Introduction to Natural Language

    Processing, Computational Linguistics, and Speech Recognition, 2nd

    edition, Daniel Jurafsky and James Martin. Prentice Hall, 2008. Chng 4.

    Bi ging 05: X l ngn ng da trn thng k

    Chng 6, mc:

    Tit th: 1-3 Tun th: 6

    - Mc ch yu cu

    Mc ch: Tm hiu phng php thng k ng dng cho x l ngn ng,

    Yu cu: Nm c phng php, bit cch ng dng xy dng chng

    trnh, vit c mt s ng dng n gin

    - Hnh thc t chc dy hc: L thuyt, tho lun, t hc, t nghin cu

    - Thi gian: Gio vin ging: 2 tit; Tho lun v lm bi tp trn lp: 1 tit;

    Sinh vin t hc: 6 tit.

  • 28

    - a im: Ging ng do P2 phn cng.

    - Ni dung chnh:

    1. The Tagging Problem

    2. Generative models, and the noisy-channel model, for supervised learning

    3. Hidden Markov Model (HMM) taggers

    a. Basic definitions

    b. Parameter estimation

    c. The Viterbi algorithm

    Phn tch ngn ng:

    Cc cch tip cn

    Da trn lut: XD m hnh h thng vi tp cc lut ngn ng

    Da trn thng k: XD m hnh h thng vi tp cc xc sut cho cc "s

    kin" c th xy ra

    Cc m hnh lai kt hp c 2 phng php trn

    Phn tch da vo kho ng liu chim u th.

    Ch gii kho ng liu v pht hin tri thc ( corpora annotation):

    Ch gii n v t

    Ch gii t loi

    Ch gii cm t, cu trc cu

    Ch gii ng ngha

    Ch gii ng s ch (co-reference)

    Ch gii song ng

    ...

    Cc phng php hc my:

  • 29

    N -gram, m hnh Markov (Markov model)

    SVM (Support Vector Machine)

    CRF (Conditional Random Field)

    Mng n-ron (Neural network)

    Hc da trn cc lut bin i (transformation-based learning): phng

    php Brill

    Phn loi s dng cy quyt nh (decision trees)

    ...

    N gram:

    Mt n-gram l mt on vn bn c di n (n t)

    Thng tin n-gram cho bit c tnh no ca ngn ng, nhng khng nm

    bt c cu trc ngn ng

    Vic tm kim v s dng cc b n-gram l nhanh v d dng

    N -gram c th ng dng trong nhiu ng dng XLNN:

    D on t tip theo trong mt pht ngn da vo n-1 t trc

    Hu dng trong cc ng dng kim tra chnh t, xp x ngn ng, v.v.

    V d

  • 30

    M hnh Markov: Mt m hnh bigram cng c gi l m hnh Markov

    bc mt

    M hnh ny v c bn l mt otomat hu hn trng thi c trng s: cc

    trng thi l cc t, cc cung ni gia 2 trng thi gn vi 1 xc sut no .

    Trigram: Vic chn n = 3 (trigram) cho php ta c xp x tt hn Nhn

    chung, cc trigram l ngn, cho php tnh xc sut tng i chnh xc t d

    liu quan st c

    Vi n cng ln, vic tnh xc sut cng km chnh xc (do thiu d liu) v

    phc tp b nh cng cng tng.

    Hun luyn m hnh n-gram:

    Tnh xc sut t kho ng liu hun luyn (k thut c lng kh nng cc

    i MLE) nh vic tnh cc tn sut tng i:

  • 31

    Cn lu vic la chn kho ng liu hun luyn tu theo ng liu m ta s

    p dng m hnh n-gram thu c.

    K thut lm mn: Vn khi hun luyn m hnh n-gram: d liu tha, c

    th c nhng n-gram vi xc sut tnh c = 0.

    K thut lm mn: bin i cc xc sut bng 0 thnh khc 0, tc l iu

    chnh cc xc sut tnh cho cc d liu cha quan st c.

    Tnh tha ca d liu:

    Tnh tha ca d liu: ~ 50% s t ch xut hin 1 ln

    Lut Zipf: Tn s xut hin ca mt t t l nghch vi xp hng v tn sut

    ca t

    V d: Lm mn bng cch thm 1, gi s cn tnh xc sut bigram, thm 1

    vo tt c cc t s, ng thi cng mu s vi s t xut hin trong kho ng

    liu ( tng cc xc sut bng 1)

    - Yu cu SV chun b

    n tp li cc kin thc lin quan n l thuyt ngn ng hnh thc,

    automata hu hn v biu thc chnh quy.

    - Bi tp

    - Ti liu tham kho

    1. Speech&Language Procesing: An Introduction to Natural Language

    Processing, Computational Linguistics, and Speech Recognition, 2nd

    edition, Daniel Jurafsky and James Martin. Prentice Hall, 2008. Chng 5.

    2. Foundations of Statistical Natural Language Processing, Christopher

    Manning and Hinrich Schtze, MIT Press, 1999. Chng 4.

    Bi ging 06: Vn gn nhn v m hnh Markov n

    Chng 5, mc:

    Tit th: 1-3 Tun th: 7, 8

    - Mc ch yu cu

    Mc ch: Tm hiu bi ton gn nhn t loi, m hnh markov n

    Yu cu: hiu v nm c vai tr ca gn nhn t loi trong x l ngn

    ng, cc phng php v m hnh gn nhn t loi, m hnh markov n.

  • 32

    - Hnh thc t chc dy hc: L thuyt, tho lun, t hc, t nghin cu

    - Thi gian: Gio vin ging: 2 tit; Tho lun v lm bi tp trn lp: 1 tit;

    Sinh vin t hc: 6 tit.

    - a im: Ging ng do P2 phn cng.

    - Ni dung chnh:

    Markov Processes

    Consider a sequence of random variables X1, X2,Xn.

    Each random variable can take any value in a finite set V.

    For now we assume the length n is fixed (e.g., n = 100)

    Our goal: model:

    (1 = 1, 2 = 2, = )

    First-Order Markov Processes:

    (1 = 1, 2 = 2, = )

    = (1 = 1) ( = |1 = 1, , 1 = 1)

    =2

    = (1 = 1) ( = |1 = 1)

    =2

    The first-order Markov assumption: For any {2 } for any 1

    Second-Order Markov Processes

    (1 = 1, 2 = 2, = )

    = (1 = 1) (2 = 2|1 = 1)

    ( = |2 = 2, 1 = 1)

    =3

    = ( = |2 = 2, 1 = 1)

    =1

    (For convenience we assume 0 = 1 =, where * is a special start"

    symbol.)

    Modeling Variable Length Sequences

    We would like the length of the sequence, n, to also be a random variable

    A simple solution: always define = STOP where STOP is a special

    symbol

    Then use a Markov process as before:

  • 33

    (1 = 1, 2 = 2, = )

    = ( = |2 = 2, 1 = 1)

    =1

    (For convenience we assume 0 = 1 =, where * is a special start"

    symbol.)

    Trigram Language Models:

    A trigram language model consists of:

    1. A finite set V

    2. A parameter (|, ) for each trigram u, v, w such that

    {}, and , {}

    For any sentence 1 , where for = 1. . ( 1) and =

    the probability of the sentence under the trigram language model is:

    (1 ) = (|2, 1)

    =1

    where we define 0 = 1 =

    An Example

    For the sentence the dog barks STOP we would have

    ( ) = (| ,)

    (| , )

    (|, )

    (|, )

    The Trigram Estimation Problem

    Remaining estimation problem:

    (|2, 1)

    For example:(|, )

    A natural estimate (the maximum likelihood estimate"):

    (|2, 1) =(2, 1, )

    (2, 1)

    (|, ) =(, , )

    (, )

    Sparse Data Problems:

    Say our vocabulary size is N = |V|, then there are N3 parameters in the

    model.

    e.g., N = 20.000 => 20.0003 = 8 x1012 parameters.

  • 34

    Evaluating a Language Model: Perplexity

    We have some test data, m sentences:

    1, 2, 3, ,

    We could look at the probability under our model ()=1 . Or more

    conveniently, the log probability

    ()

    =1

    = log ()

    =1

    In fact the usual evaluation measure is perplexity

    = 2 =1

    log ()

    =1

    and M is the total number of words in the test data.

    Some Intuition about Perplexity

    Say we have a vocabulary V, and N = |V| + 1 and model that predicts

    (|, ) =1

    For all {}, and , {}

    Easy to calculate the perplexity in this case:

    = 2 = log (1

    )

    => Perplexity = N

    Perplexity is a measure of effective branching factor"

    Typical Values of Perplexity:

    Results from Goodman (A bit of progress in language modeling"), where

    |V| = 50.000

    A trigram model: Perplexity = 74

    (1 ) = (|2, 1)

    =1

    A bigram model: Perplexity = 137

    (1 ) = (|1)

    =1

    A unigram model: Perplexity = 955

    (1 ) = ()

    =1

    Some History:

  • 35

    Shannon conducted experiments on entropy of English i.e., how good are

    people at the perplexity game?

    C. Shannon. Prediction and entropy of printed English. Bell Systems

    Technical Journal, 30:50-64, 1951.

    Chomsky (in Syntactic Structures (1957)):

    Second, the notion grammatical" cannot be identified with meaningful"

    or significant" in any semantic sense.

    Sentences (1) and (2) are equally nonsensical, but any speaker of English

    will recognize that only the former is grammatical.

    (1) Colorless green ideas sleep furiously.

    (2) Furiously sleep ideas green colorless.

    Third, the notion grammatical in English" cannot be identified in any

    way with the notion high order of statistical approximation to English". It is

    fair to assume that neither sentence (1) nor (2) (nor indeed any part of these

    sentences) has ever occurred in an English discourse. Hence, in any statistical

    model for grammaticalness, these sentences will be ruled out on identical

    grounds as equally `remote' from English. Yet (1), though nonsensical, is

    grammatical, while (2) is not.

    Sparse Data Problems

    A natural estimate (the maximum likelihood estimate"):

    (|2, 1) =(2, 1, )

    (2, 1)

    (|, ) =(, , )

    (, )

    Say our vocabulary size is N = |V|, then there are N3 parameters in the

    model.

    e.g., N = 20.000 => 20.0003 = 8 x1012 parameters.

    The Bias-Variance Trade-Of

    Trigram maximum-likelihood estimate:

    (|2, 1) =(2, 1, )

    (2, 1)

    Bigram maximum-likelihood estimate:

  • 36

    (|1) =(1, )

    (1)

    Unigram maximum-likelihood estimate

    () =()

    ()

    Linear Interpolation:

    Take our estimate (|2, 1) to be:

    (|2, 1) = 1(|2, 1) + 2(|1) + 3()

    where 1 + 2 + 3 = 1 and > 0 for all ;

    Our estimate correctly defines a distribution (define = {})

    (|, ) =

    [1(|2, 1) + 2(|1) + 3()]

    = 1 (|2, 1) + 2 (|1)

    + 3 ()

    = 1 + 2 + 3

    = 1

    (Can show also that (|, ) 0 for all )

    How to estimate the values?

    Hold out part of training set as validation" data

    Define (1, 2, 3)to be the number of times the trigram (1, 2, 3) is

    seen in validation set.

    Choose 1, 2, 3 to maximize:

    (1, 2, 3) = (1, 2, 3)(3|2, 1)

    1,2,3

    such that 1 + 2 + 3 = 1 and > 0 for all , and where:

    (|2, 1) = 1(|2, 1) + 2(|1) + 3()

    Discounting Methods

    Summary

    Three steps in deriving the language model probabilities:

    1. Expand (1 ) using Chain rule.

    2. Make Markov Independence Assumptions

    (|1, 2, , 2, 1) = (|2, 1)

    3. Smooth the estimates using low order counts.

    Other methods used to improve language models:

  • 37

    o Topic" or long-range" features.

    o Syntactic models.

    It's generally hard to improve on trigram models though!!

    Tagging Problems, and Hidden Markov Models

    1. The Tagging Problem

    2. Generative models, and the noisy-channel model, for supervised learning

    3. Hidden Markov Model (HMM) taggers

    a. Basic definitions

    b. Parameter estimation

    c. The Viterbi algorithm

    Part-of-Speech Tagging

    INPUT:

    Profits soared at Boeing Co., easily topping forecasts on Wall Street,

    as their CEO Alan Mulally announced first quarter results.

    OUTPUT:

    Profits/N soared/V at/P Boeing/N Co./N ,/, easily/ADV topping/V

    forecasts/N on/P Wall/N Street/N ,/, as/P their/POSS CEO/N Alan/N Mulally/N

    announced/V first/ADJ quarter/N results/N ./.

    N = Noun

    V = Verb

    P = Preposition

    Adv = Adverb

    Adj = Adjective

    Named Entity Recognition

    INPUT: Profits soared at Boeing Co., easily topping forecasts on Wall

    Street, as their CEO Alan Mulally announced first quarter results.

    OUTPUT: Profits soared at [Company Boeing Co.], easily topping

    forecasts on [Location Wall Street], as their CEO [Person Alan Mulally]

    announced first quarter results.

    Named Entity Extraction as Tagging

    INPUT: Profits soared at Boeing Co., easily topping forecasts on Wall

    Street, as their CEO Alan Mulally announced first quarter results.

    OUTPUT:

  • 38

    Profits/NA soared/NA at/NA Boeing/SC Co./CC ,/NA easily/NA

    topping/NA forecasts/NA on/NA Wall/SL Street/CL ,/NA as/NA their/NA

    CEO/NA Alan/SP Mulally/CP announced/NA first/NA quarter/NA results/NA

    ./NA

    NA = No entity

    SC = Start Company

    CC = Continue Company

    SL = Start Location

    CL = Continue Location

    Our Goal

    Training set:

    1 Pierre/NNP Vinken/NNP ,/, 61/CD years/NNS old/JJ ,/, will/MD

    join/VB the/DT board/NN as/IN a/DT nonexecutive/JJ director/NN

    Nov./NNP 29/CD ./.

    2 Mr./NNP Vinken/NNP is/VBZ chairman/NN of/IN Elsevier/NNP

    N.V./NNP ,/, the/DT Dutch/NNP publishing/VBG group/NN ./.

    3 Rudolph/NNP Agnew/NNP ,/, 55/CD years/NNS old/JJ and/CC

    chairman/NN of/IN Consolidated/NNP Gold/NNP Fields/NNP PLC/NNP

    ,/, was/VBD named/VBN a/DT nonexecutive/JJ director/NN of/IN

    this/DT British/JJ industrial/JJ conglomerate/NN ./.

    38.219 It/PRP is/VBZ also/RB pulling/VBG 20/CD people/NNS

    out/IN of/IN Puerto/NNP Rico/NNP ,/, who/WP were/VBD helping/VBG

    Huricane/NNP Hugo/NNP victims/NNS ,/, and/CC sending/VBG

    them/PRP to/TO San/NNP Francisco/NNP instead/RB ./.

    From the training set, induce a function/algorithm that maps new sentences

    to their tag sequences.

    Two Types of Constraints

    Influential/JJ members/NNS of/IN the/DT House/NNP Ways/NNP and/CC

    Means/NNP Committee/NNP introduced/VBD legislation/NN that/WDT

    would/MD restrict/VB how/WRB the/DT new/JJ savings-and-loan/NN

    bailout/NN agency/NN can/MD raise/VB capital/NN ./.

    Local": e.g., can is more likely to be a modal verb MD rather than a noun

    NN

  • 39

    Contextual": e.g., a noun is much more likely than a verb to follow a

    determiner

    Sometimes these preferences are in conflict:

    The trash can is in the garage

    Supervised Learning Problems

    We have training examples (), () for = 1 . Each () is an input,

    each () is a label.

    Task is to learn a function mapping inputs to labels ()

    Conditional models:

    o Learn a distribution (|) from training examples

    o For any test input , define () = max

    (|)

    Generative Models:

    We have training examples (), () for = 1 . Task is to learn a

    function mapping inputs to labels ().

    Generative models:

    o Learn a distribution (, ) from training examples

    o Often we have (, ) = ()(|)

    Note: we then have:

    (|) =()(|)

    ()

    where () = ()(|)

    Decoding with Generative Models

    We have training examples (), () for = 1 . Task is to learn a

    function mapping inputs to labels ().

    Generative models:

    o Learn a distribution (, ) from training examples

    o Often we have (, ) = ()(|)

    Output from the model:

    () = max

    (|)

    = max

    ()(|)

    ()

    = max

    ()(|)

    Hidden Markov Models

  • 40

    We have an input sentence = 1, 2, ( is the ith word in the

    sentence)

    We have a tag sequence = 1, 2, ( is the ith tag in the sentence)

    We'll use an HMM to define:

    (1, 2, , 1, 2, )

    for any sentence 1, 2, and tag sequence 1, 2, of the

    same length.

    Then the most likely tag sequence for x is

    max (1, 2, , 1, 2, )1..

    Trigram Hidden Markov Models (Trigram HMMs)

    For any sentence 1, 2, where 1 V for = 1 and any tag

    sequence 1, 2, +1 where for = 1 and +1 = ,

    the joint probability of the sentence and tag sequence is:

    (1, 2, , 1, 2, +1) = (|2, 1)

    +1

    =2

    (|)

    =1

    where we have assumed that 0 = 1 =

    Parameters of the model:

    (|, ) for any {}, , {}

    (|) for any ,

    An Example:

    If we have = 3, 1 3 equal to the sentence the dog laughs,

    and 1 4 equal to the tag sequence D N V STOP, then

    (1, 2, , 1, 2, +1)

    = (|,) (| , ) (|, ) (|, )

    (|) (|) (|)

    STOP is a special tag that terminates the sequence

    We take 0 = 1 = where * is special padding symbol.

    Why the Name?

    (1, , 1, ) = (|1, ) (|2, 1) (|)

    1

    1

    Smoothed Estimation:

  • 41

    (|, ) = 1 (, , )

    (, )+ 2

    ( , )

    ( )+ 3

    ()

    ()

    1 + 2 + 3 = 1 and for all i, 0

    Dealing with Low-Frequency Words: An Example

    Profits soared at Boeing Co., easily topping forecasts on Wall Street, as

    their CEO Alan Mulally announced first quarter results.

    A common method is as follows:

    Step 1: Split vocabulary into two sets

    Frequent words = words occurring >= 5 times in training

    Low frequency words = all other words

    Step 2: Map low frequency words into a small, finite set, depending on

    prefixes, suffixes etc.

    The Viterbi Algorithm

    Problem: for an input 1 find

    max1+1

    (1 , 1 +1)

    where the is taken over all sequences 1 +1 such that

    for = 1 and +1 = .

    We assume that again takes the form

    (1, , 1, +1) = (|2, 1) (|)

    =1

    +1

    =1

    Recall that we have assumed in this definition that 0 = 1 = and

    +1 = .

    Brute Force Search is Hopelessly Inefficient

    The Viterbi Algorithm

    Define n to be the length of the sentence

    Define for = 1 to be the set of possible tags at position k:

    1 = 0 = {}

    = for = 1

    Define:

  • 42

    (1, 0, 1, ) = (|2, 1) (|)

    =1

    =1

    Define a dynamic programming table:

    (, , ) = maximum probability of a tag sequence ending in tags , at

    position that is,

    (, , ) = max:1=,=

    (1, 0, 1, )

    An Example: The man saw the dog with the telescope

    A Recursive Definition:

    (0,,) = 1

    Recursive definition:

    For any {1. . }, for any 1 and

    (, , ) = max2

    (( 1, , ) (|, ) (|))

    The Viterbi Algorithm

    Input: a sentence 1 , parameters (|, ) and (|)

    Initialization: Set (0,,) = 1

    Definition: 1 = 0 = {}, = for = 1

    Algorithm:

    For = 1

    For 1 and

    (, , ) = max2

    (( 1, , ) (|, ) (|))

    Return: max1,

    ((, , ) (|, ))

    The Viterbi Algorithm with Backpointers

    Input: a sentence 1 , parameters (|, ) and (|)

    Initialization: Set (0,,) = 1

    Definition: 1 = 0 = {}, = for = 1

    Algorithm:

    For = 1

    For 1 and

    (, , ) = max2

    (( 1, , ) (|, ) (|))

    (, , ) = max2

    (( 1, , ) (|, ) (|))

    Set (1, ) = max(,)

    ((, , ) (|, ))

  • 43

    For = 2 1 = ( + 2, +1, +2)

    Return the tag sequence 1

    The Viterbi Algorithm: Running Time

    (||3) time to calculate (|, ) (|) for all , , , .

    ||2 entries in to be filled in.

    (||) time to fill in one entry

    (||3) time in total.

    Pros and Cons

    Hidden markov model taggers are very simple to train (just need to

    compile counts from the training corpus)

    Perform relatively well (over 90% performance on named entity

    recognition)

    Main difficulty is modeling

    (|)

    can be very difficult if words" are complex.

    - Ni dung tho lun

    Kho ng liu ting vit, thc trng v gii php xy dng.

    - Yu cu SV chun b

    Tm hiu v kho ng liu ting vit v vit chng trnh gn nhn s dng

    m hnh markov n.

    - Bi tp

    - Ti liu tham kho

    1. Speech&Language Procesing: An Introduction to Natural Language

    Processing, Computational Linguistics, and Speech Recognition, 2nd

    edition, Daniel Jurafsky and James Martin. Prentice Hall, 2008. Chng 6.

    2. Foundations of Statistical Natural Language Processing, Christopher

    Manning and Hinrich Schtze, MIT Press, 1999. Chng 4.

    3. Data Mining: Practical Machine Learning Tools and Techniques (3rd ed),

    Ian H. Witten and Eibe Frank, Morgan Kaufmann, 2005. Chng 3.

  • 44

    Bi ging 07: Dch my

    Chng I, mc:

    Tit th: 1-3 Tun th: 9,10

    - Mc ch yu cu

    Mc ch: Tm hiu v dch my, cc khi nim c bn v cc k thut

    ph bin.

    Yu cu: Nm c kin trc, ni dung ca dch my, cc giai on v

    mt s k thut c bn

    - Hnh thc t chc dy hc: L thuyt, tho lun, t hc, t nghin cu

    - Thi gian: Gio vin ging: 2 tit; Tho lun v lm bi tp trn lp: 1 tit;

    Sinh vin t hc: 6 tit.

    - a im: Ging ng do P2 phn cng.

    - Ni dung chnh:

    1. Challenges in machine translation

    2. Classical machine translation

    3. A brief introduction to statistical MT

    4. The IBM Translation Models

    a. IBM Model 1

    b. IBM Model 2

    c. EM Training of Models 1 and 2

    5. Phrase-Based Translation

    6. Decoding with Phrase-Based Translation Models

    1. Challenges in machine translation

    Challenges: Lexical Ambiguity

  • 45

    (Example from Dorr et. al, 1999)

    Example 1:

    book the flight t ch read the book c sch

    Example 2:

    the box was in the pen the pen was on the table

    Example 3:

    kill a man git kill a process hy

    Challenges: Differing Word Orders

    I English word order is subject verb object

    I Japanese word order is subject object verb

    English: IBM bought Lotus Japanese: IBM Lotus bought

    English: Sources said that IBM bought Lotus yesterday

    Japanese: Sources yesterday IBM Lotus bought that said

    Syntactic Structure is not preserved across Translations!

    2. Classical machine translation

    2.1. Direct Machine Translation

    Translation is word-by-word

    Very little analysis of the source text (e.g., no syntactic or semantic

    analysis)

    Relies on a large bilingual directionary. For each word in the source

    language, the dictionary specifies a set of rules for translating that word

    After the words are translated, simple reordering rules are applied (e.g.,

    move adjectives after nouns when translating from English to French)

    An Example of a set of Direct Translation Rules

    (From Jurafsky and Martin, edition 2, chapter 25. Originally from a system

    from Panov 1960)

    Rules for translating much or many into Russian:

    if preceding word is how return skol'ko

    else if preceding word is as return stol'ko zhe

    else if word is much

    if preceding word is very return nil

    else if following word is a noun return mnogo

    else (word is many)

  • 46

    if preceding word is a preposition and following word is noun return

    mnogii

    else return mnogo

    Some Problems with Direct Machine Translation

    Lack of any analysis of the source language causes several problems, for

    example:

    Difficult or impossible to capture long-range reorderings

    English: Sources said that IBM bought Lotus yesterday

    Japanese: Sources yesterday IBM Lotus bought that said

    Words are translated without disambiguation of their syntactic role e.g.,

    that can be a complementizer or determiner, and will often be translated

    differently for these two cases

    They said that ...

    They like that ice-cream

    2.2. Transfer-Based Approaches

    Three phases in translation:

    Analysis: Analyze the source language sentence; for example, build a

    syntactic analysis of the source language sentence.

    Transfer: Convert the source-language parse tree to a target-language

    parse tree.

    Generation: Convert the target-language parse tree to an output sentence.

    Transfer-Based Approaches

    The parse trees" involved can vary from shallow analyses to much

    deeper analyses (even semantic representations).

    The transfer rules might look quite similar to the rules for direct

    translation systems. But they can now operate on syntactic structures.

    It's easier with these approaches to handle long-distance reorderings

    The Systran systems are a classic example of this approach

  • 47

    Japanese: Sources yesterday IBM Lotus bought that said

    2.3. Interlingua-Based Translation

    Two phases in translation:

    Analysis: Analyze the source language sentence into a (language-

    independent) representation of its meaning.

    Generation: Convert the meaning representation into an output sentence.

    One Advantage: If we want to build a translation system that

    translates between n languages, we need to develop n analysis and

    generation systems. With a transfer based system, we'd need to develop O(n2)

    sets of translation rules.

    Disadvantage: What would a language-independent representation look

    like?

    Interlingua-Based Translation

    How to represent different concepts in an interlingua?

    Different languages break down concepts in quite different ways:

  • 48

    German has two words for wall: one for an internal wall, one for a

    wall that is outside

    Japanese has two words for brother: one for an elder brother, one for

    a younger brother

    Spanish has two words for leg: pierna for a human's leg, pata for an

    animal's leg, or the leg of a table

    An interlingua might end up simple being an intersection of these

    different ways of breaking down concepts, but that doesn't seem very

    satisfactory...

    3. A Brief Introduction to Statistical MT

    Parallel corpora are available in several language pairs

    Basic idea: use a parallel corpus as a training set of translation examples

    Classic example: IBM work on French-English translation, using the

    Canadian Hansards. (1.7 million sentences of 30 words or less in length).

    Idea goes back to Warren Weaver (1949): suggested applying statistical

    and cryptanalytic techniques to translation.

    4. The IBM Translation Models

    4.1. IBM Model 1

    Alignments

    How do we model (|)?

    English sentence has words 1 , French sentence has words

    1 .

    An alignment a identifies which English word each French word

    originated from

    Formally, an alignment a is {1 }, where each {0 }

    There are ( + 1) possible alignments.

    Example: = 6, = 7

    = And the program has been implemented

    = Le programme a ete mis en application

    One alignment is {2, 3, 4, 5, 6, 6, 6}

    Another (bad!) alignment is {1, 1, 1, 1, 1, 1, 1}

    Alignments in the IBM Models

    We'll define models for (|, ) and (|, , ), giving

    (, |, ) = (|, )(|, , )

  • 49

    Also

    (|, ) = (|, )(|, , )

    A By-Product: Most Likely Alignments

    Gi s chng ta c m hnh (, |, ) = (|)(|, , ), ta c th

    tnh

    (|, , ) =(, |, )

    (, |, )

    for any alignment

    For a given , pair, we can also compute the most likely alignment,

    = max

    (|, , )

    Nowadays, the original IBM models are rarely (if ever) used for

    translation, but they are used for recovering alignments.

    An Example Alignment

    French: le conseil rendu son avis, et nous devons prsent

    adopter un nouvel avis sur la base de la premire position.

    English: the council has stated its position, and now, on the basis of

    the first position, we again have to give our opinion.

    Alignment:

    the/le council/conseil has/ stated/rendu its/son position/avis ,/,

    and/et now/prsent ,/NULL on/sur the/le basis/base of/de the/la

    first/premire position/position ,/NULL we/nous again/NULL have/devons

    to/a give/adopter our/nouvel opinion/avis ./.

    IBM Model 1: Alignments

    In IBM model 1 all alignments are equally likely:

    (|, ) =1

    ( + 1)

    This is a major simplifying assumption, but it gets things started...

    IBM Model 1: Translation Probabilities

    Next step: come up with an estimate fo (| , , )r

    In model 1, this is:

    (| , , ) = (|)

    =1

    e.g., = 6, = 7

    =

  • 50

    =

    = {2,3,4,5,6,6,6}

    (| , ) = (|) (|) (|)

    (|) (|) (|)

    (|)

    IBM Model 1: The Generative Process

    To generate a French string f from an English string :

    Step 1: Pick an alignment with probability 1

    (+1)

    Step 2: Pick the French words with probability

    (| , , ) = (|)

    =1

    The final result:

    (| , , ) = (|, ) (|, , )

    =1

    ( + 1) (|)

    =1

    An Example Lexical Entry

    English French Probability

    Position position 0.756715

    Position situation 0.0547918

    Position mesure 0.0281663

    Position vue 0.0169303

    Position point 0.0124795

    Position attitude 0.0108907

    de la situation au niveau des ngociations de l ' ompi

    .. of the current position in the wipo negotiations

    nous ne sommes pas en mesure de dcider ,

    we are not in a position to decide , : : :

    le point de vue de la commission face ce problme complexe.

    the commission's position on this complex problem.

    4.2. IBM Model 2

    Only difference: we now introduce alignment or distortion parameters

    (|, , ) =

    ,

  • 51

    Define:

    (|, ) = (|, , )

    =1

    where = {1, }

    Gives:

    (, |, ) = (|, , )

    =1

    (|)

    An Example:

    = 6

    = 7

    e = And the program has been implemented

    =

    = {2, 3, 4, 5, 6, 6, 6}

    (|, 7) = (2|1,6,7) (3|2,6,7) (4|3,6,7) (5|4,6,7)

    (6|5,6,7) (6|6,6,7) (6|7,6,7)

    (|, , 7) = (|) (|) (|)

    (|) (|) (|)

    (|)

    IBM Model 2: The Generative Process

    To generate a French string from an English string :

    Step 1: Pick an alignment = {1, } with probability

    (|, , )

    =1

    Step 2: Pick the French words with probability

    (| , , ) = (|)

    =1

    The final result:

    (, |, )) = (|, ) (|, , ) = (|, , )

    =1

    (|)

    Recovering Alignments

  • 52

    If we have parameters and , we can easily recover the most likely

    alignment for any sentence pair

    Given a sentence pair 1, 2, , 1, 2, , define:

    = max{0...}

    (|, , ) (|)

    for = 1

    =

    =

    4.3. EM Training of Models 1 and 2

    The Parameter Estimation Problem

    Input to the parameter estimation algorithm: ((), ()) for = 1 .

    Each () is an English sentence, each () is a French sentence.

    Output: parameters (|) and (|, , )

    A key challenge: we do not have alignments on our training examples, e.g.,

    (100) =

    (100) =

    Parameter Estimation if the Alignments are Observed

    First: case where alignments are observed in training data. E.g.,

    (100) =

    (100) =

    (100) = {2, 3, 4, 5, 6, 6, 6}

    Training data is ((), (), ()), for = 1 . Each () is an English

    sentence, each () is a French sentence. each () is an alignment.

    Maximum-likelihood parameter estimates in this case are trivial:

    (|) =(,)

    (), (|, , ) =

    (|,,)

    (,,)

    Input: A training corpus

    ((), (), ()), for = 1 , where

    () = 1()

    (), () = 1

    () (), () = 1

    ()

    ()

    Algorithm:

    Set all counts ( ) = 0

    For = 1

    For = 1 , j= 0

    ((),

    ()) (

    (), ()

    ) + (, , )

    (()) (

    ()) + (, , )

  • 53

    (|, , ) (|, , ) + (, , )

    (, , ) (, , ) + (, , )

    where (, , ) = 1 if ()

    = , 0 otherwise

    Output: (|) =(,)

    (), (|, , ) =

    (|,,)

    (,,)

    Parameter Estimation with the EM Algorithm

    Training examples are: ((), ()) for = 1 . Each () is an English

    sentence, each () is a French sentence.

    The algorithm is related to algorithm when alignments are observed, but

    two key differences:

    1. The algorithm is iterative. We start with some initial (e.g., random)

    choice for the q and t parameters. At each iteration we compute some counts

    based on the data together with our current parameter estimates. We then re-

    estimate our parameters with these counts, and iterate.

    2. We use the following definition for (, , ) at each iteration:

    (, , ) =(|, , ) (

    ()|())

    (|, , ) (()|

    ())=0

    Input: A training corpus

    ((), ()), for = 1 , where () = 1()

    (), () = 1

    () ()

    Initialization: Initialize (|) and (|, , ) parameters (e.g., to random

    values)

    For = 1

    Set all counts ( ) = 0

    For = 1

    For = 1 , j= 0

    ((),

    ()) ((),

    ()) + (, , )

    (()) (

    ()) + (, , )

    (|, , ) (|, , ) + (, , )

    (, , ) (, , ) + (, , )

    where

    (, , ) =(|, , ) (

    ()|())

    (|, , ) (()|

    ())=0

  • 54

    Recalculate the parameters: (|) =(,)

    (), (|, , ) =

    (|,,)

    (,,)

    The EM Algorithm for IBM Model 1:

    For = 1

    Set all counts ( ) = 0

    For = 1

    For = 1 , j= 0

    ((),

    ()) ((),

    ()) + (, , )

    (()) (

    ()) + (, , )

    (|, , ) (|, , ) + (, , )

    (, , ) (, , ) + (, , )

    Where

    (, , ) =

    1(1 + )

    (()|

    ())

    1

    (1 + ) (

    ()|())

    =0

    = (

    ()|())

    (()|

    ())=0

    Recalculate the parameters: () = (, )/() An Example:

    (100) =

    (100) =

    Justification for the Algorithm

    Training examples are: ((), ()) for = 1 . Each () is an English

    sentence, each () is a French sentence.

    The log-likelihood function:

    (, ) = log (()|()) = log ((), |())

    =1

    =1

    The maximum-likelihood estimates are:

    arg max,

    (, )

    The EM algorithm will converge to a local maximum of the log-likelihood

    function.

  • 55

    Summary

    Key ideas in the IBM translation models:

    o Alignment variables

    o Translation parameters, e.g., (|)

    o Distortion parameters, e.g., (2|1,6,7)

    The EM algorithm: an iterative algorithm for training the q and t

    parameters

    Once the parameters are trained, we can recover the most likely

    alignments on our training examples

    5. Phrase-Based Translation

    1. Learning phrases from alignments

    2. A phrase-based model

    3. Decoding in phrase-based models

    Phrase-Based Models

    First stage in training a phrase-based model is extraction of a phrase-based

    (PB) lexicon

    A PB lexicon pairs strings in one language with strings in another

    language, e.g.,

    nach Kanada in Canada

    zur Konferenz to the conference

    Morgen tomorrow

    fliege will fly

    6. Decoding with Phrase-Based Translation Models

    - Ni dung tho lun

    Cc vn dch my vi ting vit, cc sn phm c.

    - Yu cu SV chun b

    Ci t tm hiu v l thuyt cc th vin v modul v dch my do gio

    vin cung cp.

    - Ti liu tham kho

    1. Speech&Language Procesing: An Introduction to Natural Language

    Processing, Computational Linguistics, and Speech Recognition, 2nd

    edition, Daniel Jurafsky and James Martin. Prentice Hall, 2008. Chng 7.

  • 56

    2. Data Mining: Practical Machine Learning Tools and Techniques (3rd ed),

    Ian H. Witten and Eibe Frank, Morgan Kaufmann, 2005. Chng 4.

    Bi ging 08: Log-linear models

    Chng I, mc:

    Tit th: 1-3 Tun th: 11

    - Mc ch yu cu

    Mc ch: Tm hiu v m hnh log-linear trong dch my

    Yu cu: Nm vng k thut v vit chng trnh.

    - Hnh thc t chc dy hc: L thuyt, tho lun, t hc, t nghin cu

    - Thi gian: Gio vin ging: 2 tit; Tho lun v lm bi tp trn lp: 1 tit;

    Sinh vin t hc: 6 tit.

    - a im: Ging ng do P2 phn cng.

    - Ni dung chnh:

    1. The Language Modeling Problem

    2. Log-linear models

    3. Parameter estimation in log-linear models

    4. Smoothing/regularization in log-linear models

    5. Global Linear Model

    Log-linear models

    Given

    An input domain X and a finite set of labels Y

    A set of m feature functions k : X Y (very often these are

    indicator/binary functions: k : X Y {0,1}).

    The feature vectors (x,y) m induced by the feature functions k for

    any x X and y Y

    Learn a conditional probability P(y | x W), where

    W is a parameter vector of weights (W m )

    P(y | x, W) = e W (x,y) / y Y e W (x,y)

  • 57

    log P(y | x, W) = W (x,y) / - log y Y e W (x,y) [substraction between

    a linear term and a normalization term]

    Examples of feature functions for POS tagging:

    1 (x,y) = {

    2 (x,y) = {

    3 (x,y) = {

    It is natural to come up with as many feature functions (

    pairs) as we can.

    Learning in this framework amounts then to learning the weights WML

    that maximize the likelihood of the training corpus.

    WML = argmax W m L(W) = argmax W m i = 1..n P( yi | xi)

    L(W) = i = 1..n log P( yi | xi)

    = i = 1..n W ( xi , yi) / - i = 1..n log y Y e W (xi , y)

    Note: Finding the parameters that maximize the likelihood/probability of

    some training corpus is a universal machine learning trick.

    Summary: we have cast the learning problem as an optimization problem.

    Several solutions exist for solving this problem:

    Gradient ascent

    Conjugate gradient methods

    Iterative scaling

    Improved iterative scaling

    - Ni dung tho lun

    Xy dng th vin lp trnh cho dch my

    - Yu cu SV chun b

    Xy dng modul m phng dch Vit-Anh.

    1 if current word wi is the and y

    = DT

    0 otherwise

    1 if current word wi ends in ing and y =

    VBG

    0 otherwise 1 if < t i-2, t i-1, t i > = < DT JJ

    Vt >

    0 otherwise

  • 58

    - Ti liu tham kho

    1. Speech&Language Procesing: An Introduction to Natural Language

    Processing, Computational Linguistics, and Speech Recognition, 2nd

    edition, Daniel Jurafsky and James Martin. Prentice Hall, 2008. Chng 7.

    2. Data Mining: Practical Machine Learning Tools and Techniques (3rd ed),

    Ian H. Witten and Eibe Frank, Morgan Kaufmann, 2005. Chng 5.

    - Cu hi n tp

    - Ghi ch: Cc mn hc tin quyt : ton ri rc, cu trc d liu v gii thut,

    lp trnh cn bn.

    Bi ging 09: Conditional random fields, and global linear models

    Chng I, mc:

    Tit th: 1-3 Tun th: 12, 13

    - Mc ch yu cu

    Mc ch: Tm hiu v CRFs v GLMs.

    Yu cu: Nm vng m hnh v bit cch xy dng ng dng da trn 2

    dng m hnh c cung cp trong bi ging

    - Hnh thc t chc dy hc: L thuyt, tho lun, t hc, t nghin cu

    - Thi gian: Gio vin ging: 2 tit; Tho lun v lm bi tp trn lp: 1 tit;

    Sinh vin t hc: 6 tit.

    - a im: Ging ng do P2 phn cng.

    - Ni dung chnh:

    CRF (conditional random fields) l m hnh chui cc xc sut c iu kin,

    hun luyn ti a ha xc sut iu kin. N l mt framework cho php xy

    dng nhng m hnh xc sut phn on v gn nhn chui d liu [1]. Theo

    [3], CRF, cng ging nh trng ngu nhin Markov (Markov random field), l

    mt m hnh th v hng m mi nh biu din cho mt bin ngu nhin

    (random variable) m c phn phi (distribution) c suy ra, v mi cung

    (edge) biu din mi quan h ph thuc gia hai bin ngu nhin.

  • 59

    Hnh 1: Cu trc chui (chain-structured) ca th CRFs.

    X l mt bin ngu nhin trn chui d liu cn c gn nhn v Y l

    bin ngu nhin trn chui nhn (hoc trng thi) tng ng. V d X l chui

    cc t quan st (observation) thng qua cc cu bng ngn ng t nhin, Y l

    chui cc nhn t loi c gn cho nhng cu trong tp X (cc nhn ny c

    quy nh sn trong tp cc nhn t loi). Mt linear-chain (chui tuyn tnh)

    CRF vi cc tham s c cho bi cng thc [2]:

    Vi Zx l mt tha s chun ha nhm m bo tng cc xc sut ca

    chui trng thi bng 1 [4].

    fk(yt-1, yt, x, t) l mt hm c trng (feature function), thng c gi tr

    nh phn (binary-valued), nhng cng c th l gi tr thc (real-valued). V

    l mt trng s hc (learned weight) kt hp vi c trng fk. Nhng hm c

    trng c th o bt k trng thi chuyn dch (state transition) no, yt-1 yt, v

    chui quan st x, tp trung vo thi im hin ti t. V d, mt hm c trng c

    th c gi tr 1 khi yt-1 l trng thi TITLE, yt l trng thi AUTHOR v xt l mt

    t xut hin trong tp t vng cha tn ngi.

    Ngi ta thng hun luyn CRFs bng cch lm cc i ha hm

    likelihood theo d liu hun luyn s dng cc k thut ti u nh L

  • 60

    BFGS1. Vic lp lun (da trn m hnh hc) l tm ra chui nhn tng ng

    ca mt chui quan st u vo. i vi CRFs, ngi ta thng s dng thut

    ton qui hoch ng in hnh l Viterbi2 (l thut ton lp trnh ng nhm tm

    ra chui kh nng (most likely) ca cc trng thi n ) thc hin lp lun vi

    d liu mi [5].

    Maximum Entropy Models

    An equivalent approach to learning conditional probability models is this:

    There are lots of conditional distributions out there, most of them very

    spiked, overfit, etc. Let Q be the set of distributions that can be specified

    in log-linear form:

    Q = { p : p(y | xi) = e W (x i

    ,y) / y Y e W (x i , y)

    We would like to learn a distribution that is as uniform as possible

    without violating any of the requirements imposed by the training data.

    P = {p : i = 1..n ( xi , yi) = i = 1..n y Y p(y | xi ) (xi,y)

    (empirical count = expected count)

    p is an n |Y| vector defining P(y | xi ) for all i, y.

    Note that a distribution that satisfies the above equality always exist

    p( y | xi ) = {

    Because uniformity equates high entropy, we can search for distributions

    that are both consistent with the requirements imposed by the data and

    have high entropy.

    Entropy of a vector P:

    1 http://en.wikipedia.org/wiki/L-BFGS

    2 http://en.wikipedia.org/wiki/Viterbi_algorithm

    1 if y

    = yi

    0

    otherwise

  • 61

    H (p) = - px log px

    Entropy if uncertainty, but also non-commitment.

    What do we want from a distribution P?

    o Minimize commitment = maximize entropy

    o Resemble some reference distribution (data)

    Solution: maximize entropy H, subject to constraints f.

    Adding constraints (features):

    o Lowers maximum entropy

    o Raises maximum likelihood

    o Brings the distribution further from uniform

    o Brings the distribution closer to a target distribution

    Lets say we have the following event space:

    NN NNS NNP NNPS VBZ VBD

    and the following empirical data:

    3 5 11 13 3 1

    Maximize H:

    1/e 1/e 1/e 1/e 1/e 1/e

  • 62

    but we wanted probabilities: E[NN, NNS, NNP, NNPS, VBZ, VBD] =

    1

    1/6 1/6 1/6 1/6 1/6 1/6

    This is probably too uniform:

    NN NNS NNP NNPS VBZ VBD

    1/6 1/6 1/6 1/6 1/6 1/6

    we notice that N* are more common than V* in the real data, so we

    introduce a feature fN = {NN, NNS, NNP, NNPS}, with E[fN] = 32/36

    8/36 8/36 8/36 8/36 2/36 2/36

    and proper nouns are more frequent than common nouns, so we add fp

    = {NNP, NNPS}, with E[fp] = 24/36

    4

    /36

    4

    /36

    1

    2/36

    1

    2/36

    2

    /36

    2

    /36

    we could keep refining the models, for example by adding a feature to

    distinguish singular vs. plural nouns, or verb types.

    Fundamental theorem: It turns out that finding the maximum likelihood

    solution to the optimization problem in Section 3 is the same with finding the

    maximum entropy solution to the problem in Section 4.

    The maximum entropy solution can be written in log-linear form.

    Finding the maximum-likelihood solution also gives the maximum

    entropy solution.

    - Yu cu SV chun b

    c trc phn bi ging gio vin giao. Thc hin cc bi tp theo phn

    cng.

    - Ti liu tham kho

    1. Speech&Language Procesing: An Introduction to Natural Language

    Processing, Computational Linguistics, and Speech Recognition, 2nd

    edition, Daniel Jurafsky and James Martin. Prentice Hall, 2008. Chng 9.

  • 63

    2. Foundations of Statistical Natural Language Processing, Christopher

    Manning and Hinrich Schtze, MIT Press, 1999. Chng 6.

    3. Data Mining: Practical Machine Learning Tools and Techniques (3rd ed),

    Ian H. Witten and Eibe Frank, Morgan Kaufmann, 2005. Chng 5.

    Bi ging 10: Hc my trong NLP

    Chng I, mc:

    Tit th: 1-3 Tun th: 14, 15

    - Mc ch yu cu

    Mc ch: Tm hiu v cc phng php hc my trong x l ngn ng t

    nhin, ng dng.

    Yu cu: Nm c cc phng php v ng dng. Cch xy dng

    chng trnh.

    - Hnh thc t chc dy hc: L thuyt, tho lun, t hc, t nghin cu

    - Thi gian: Gio vin ging: 2 tit; Tho lun v lm bi tp trn lp: 1 tit;

    Sinh vin t hc: 6 tit.

    - a im: Ging ng do P2 phn cng.

    - Ni dung chnh:

    Khi nim v hc my

    Hc my, c ti liu gi l My hc, (ting Anh: machine learning) l mt

    lnh vc ca tr tu nhn to lin quan n vic pht trin cc k thut cho php

    cc my tnh c th "hc". C th hn, hc my l mt phng php to ra

    cc chng trnh my tnh bng vic phn tch cc tp d liu. Hc my c lin

    quan ln n thng k, v c hai lnh vc u nghin cu vic phn tch d liu,

    nhng khc vi thng k, hc my tp trung vo s phc tp ca cc gii thut

    trong vic thc thi tnh ton. Nhiu bi ton suy lun c xp vo loi bi

    ton NP-kh, v th mt phn ca hc my l nghin cu s pht trin cc gii

    thut suy lun xp x m c th x l c.

    Cc loi thut ton thng dng bao gm:

    Hc c gim st:

    L mt k thut ca ngnh hc my xy dng mt hm (function) t tp

    d liu hun luyn. D liu hun luyn bao gm cc cp gm i tng u vo

    (thng dng vec-t), v u ra mong mun. u ra ca mt hm c th l mt

    gi tr lin tc (gi l hi qui), hay c th l d on mt nhn phn loi cho mt

  • 64

    i tng u vo (gi l phn loi). Nhim v ca chng trnh hc c gim st

    l d on gi tr ca hm cho mt i tng bt k l u vo hp l, sau khi

    xem xt mt s v d hun luyn (ngha l, cc cp u vo v u ra tng

    ng). t c iu ny, chng trnh hc phi tng qut ha d liu sn c

    d on c nhng tnh hung cha gp phi theo mt cch "hp l".

    Hc c gim st c th to ra 2 loi m hnh. Ph bin nht, hc c gim

    st to ra mt m hnh ton cc (global model) nh x i tng u vo n

    u ra mong mun. Tuy nhin, trong mt s trng hp, vic nh x c thc

    hin di dng mt tp cc m hnh cc b, da trn cc hng xm ca n.

    gii quyt mt bi ton hc c gim st(v d: nhn dng ch vit

    tt) ngi ta phi xt nhiu bc khc nhau:

    Xc nh loi ca tp d liu hun luyn. Trc khi lm bt c iu g,

    chng ta nn quyt nh loi d liu no s c s dng lm dng hun

    luyn. Chng hn, c th l mt k t vit tay n l, ton b mt t vit tay,

    hay ton b mt dng ch vit tay.

    Thu thp d liu hun luyn. Tp d liu hun luyn cn ph hp vi cc

    hm chc nng c xy dng. V vy, cn thit phi kim tra tch thch hp ca

    d liu u vo c d liu u ra tng ng. Tp d liu hun luyn c th

    c thu thp t nhiu ngun khc nhau: t vic o c tnh ton, t cc tp d

    liu c sn

    Xc nh vic biu din cc c trng u vo cho hm chc nng. S

    chnh xc ca hm chc nng ph thuc ln vo cch biu din cc i tng

    u vo. Thng thng, i tng u vo c chuyn i thnh mt vec-t

    c trng, cha mt s cc c trng nhm m t cho i tng . S lng

    cc c trng khng nn qu ln, do s bng n d liu, nhng phi ln

    d on chnh xc u ra. Nu hm chc nng m t qu chi tit v i tng,

    th cc d liu u ra c th b phn r thnh nhiu nhm hay nhn khc nhau,

    vic ny dn ti vic kh phn bit c mi quan h gia cc i tng hay

    kh tm c nhm(nhn) chim a s trong tp d liu cng nh vic d on

    phn t i din cho nhm, i vi cc i tng gy nhiu, chng c th c

    dn nhn, tuy nhin s lng nhn qu nhiu, v s nhn t l nghch vi s

    phn ca mi nhn. Ngc li, hm chc nng c qu t m t v i tng d

    dn ti vic dn nhn i tng b sai hay d b xt cc i tng gy nhiu.

    Vic xc nh tng i ng s lng c tnh ca phn t s gim bt chi ph

    khi thc hin nh gi kt qu sau hun luyn cng nh kt qu gp b d liu

    u vo mi.

  • 65

    Xc nh cu trc ca hm chc nng cn tm v gii thut hc tng ng.

    V d, ngi k s c th la chn vic s dngmng n-ron nhn to hay cy

    quyt nh.

    Hon thin thit k. Ngi thit k s chy gii thut hc t tp hun luyn

    thu thp c. Cc tham s ca gii thut hc c th c iu chnh bng cch

    ti u ha hiu nng trn mt tp con (gi l tp kim chng -validation set) ca

    tp hun luyn, hay thng qua kim chng cho (cross-validation). Sau khi hc

    v iu chnh tham s, hiu nng ca gii thut c th c o c trn mt tp

    kim tra c lp vi tp hun luyn.

    Hc khng gim st:

    ting Anh l unsupervised learning, l mt phng php nhm tm ra mt

    m hnh m ph hp vi cc tp d liu quan st. N khc bit vi hc c gim

    st ch l u ra ng tng ng cho mi u vo l khng bit trc. Trong

    hc khng c gim st, u vo l mt tp d liu c thu thp. Hc khng c

    gim st thng i x vi cc i tng u vo nh l mt tp cc bin ngu

    nhin. Sau , mt m hnh mt kt hp s c xy dng cho tp d liu .

    Hc khng c gim st c th c dng kt hp vi suy din

    Bayes(Bayesian inference) cho ra xc sut c iu kin cho bt k bin ngu

    nhin no khi bit trc cc bin khc.

    Hc khng c gim st cng hu ch cho vic nn d liu: v c bn, mi

    gii thut nn d liu hoc l da vo mt phn b xc sut trn mt tp u vo

    mt cch tng minh hay khng tng minh.

    Hc na gim st:

    Kt hp cc v d c gn nhn v khng gn nhn sinh mt hm hoc

    mt b phn loi thch hp.

    Hc tng cng:

    Thut ton hc mt chnh sch hnh ng ty theo cc quan st v mi

    trng xung quanh. Mi hnh ng u c tc ng ti mi trng, v mi

    trng cung cp thng tin phn hi hng dn cho thut ton ca qu trnh

    hc.

    Chuyn i:

    Tng t hc c gim st nhng khng xy dng hm mt cch r rng.

    Thay v th, c gng on kt qu mi da vo cc d liu hun luyn, kt qu

    hun luyn, v d liu th nghim c sn trong qu trnh hun luyn.

    Hc li:

  • 66

    L mt gii thut cp n vic b sung cc gii thut m chng trnh

    hc s dng d on kt qu cho nhng trng hp cha tng gp trc y.

    Khi ngi thit k chng trnh nhm mc tiu vo xy dng gii thut,

    ngi c th cho chng trnh hc d on mt u ra ch no . Vi iu

    ny, chng trnh hc hc s c c mt lng hu hn cc v d hun luyn

    minh ha mi quan h mong mun gia gi tr u vo v u ra. Sau khi hc

    thnh cng, chng trnh hc s tnh ton s xp x u ra ng, ngay c cho cc

    v d vn cha c th trong sut qu trnh hun luyn. Khng c cc gi nh

    b sung, nhim v ny khng th c gii quyt v cc tnh hung cha c

    xem xt c th c u ra bt k. Loi gi nh cn thit v bn cht ca hm chc

    nng ch c gi l qu trnh thin kin qui np (ting Anh: inductive bias).

    Vic tip cn n mt nh ngha hnh thc hn ca thin kin qui np l

    da trn lgic ton. y, thin kin qui np l mt cng thc lgic, cng vi

    d liu hun luyn, i hi mt cch lgic gi thuyt a ra bi chng trnh

    hc. Kt qu c c c th c xem l m t th v nhng kt qu ca cc i

    tng hon ton.

    Khi quyt nh xy dng mt h thng hc my, ngi thit k cn tr

    c cc cu hi sau:

    H thng truy xut d liu bng cch no? Vic ny ng ngha vi vic:

    lm th no h thng hc c th s dng nhng tri thc thu thp c t d liu

    hun luyn?

    Nu chng trnh hc nm trong mt mi trng c th v thc hin c

    cc hnh ng kim sot trn cc tp d liu u vo, ng thi c th cp nht

    tri thc trong qu trnh thc th nh mt qu trnh hc tng cng. Hoc n c

    th lm iu thng qua qu trnh c rt kinh nghim. D liu c th c th

    b m ha, hay cha nhiu i tng gy nhiu, iu ny i hi chng trnh

    hc phi c kh nng gii m hay nh gi mt cc xp x cc i tng gy

    nhiu thc hin phn tch v kt qu t c tt nht. T quan im ny,

    chng trnh hc c th c xy dng da trn cc m hnh thc thch hp:hc

    c gim st, hay khng c gim st ty theo ngi thit k.

    Chng trnh cn hc nhng g? Mc tiu cn t c l gi?

    Cc dng hm chc nng khc nhau c th c nh ngha bn trong mt

    chng trnh hc. Cc chc nng cc hm ny cn c xc nh thng qua s

    mong mun nhng g c c sau qu trnh phn tch. Mc tiu ny c th c

    m t bng u ra cc hm chc nng c s dng. Chng ta c th xc nh

  • 67

    c xp x mc tiu ny thng qua b d liu hun luyn hay thng qua nhng

    phn ng ca chng trnh hc trong qu trnh x l b d liu thc t.

    Lm th no khi qut(m t) c d liu? Lm th no xc nh ng

    c s i s cho cc hm chc nng c th nh ngha c hm chc

    nng?

    Mt qu trnh quy np c th c xy dng xc nh mt cch gn

    ng cc c trng ca hm chc nng. Qu trnh ny c th c hiu nh mt

    qu trnh tm kim gi thit(hay m hnh) ca d liu, trong mt khng gian

    rng ln d liu hay trong b d liu hun luyn m ngi thit k a vo. S

    la chn m t xp x ny ny gip hn ch c b d liu cn thit ng thi

    c th gim bt chi ph.

    Qu trnh ny, c th c dng ch ra i din cho mt nhm nay ton

    b tp d liu.

    Thut ton no c th c p dng?

    S la chn thut ton ph hp rt cn thit xy dng mt chng trnh

    hc. V chng trnh hc cn thit phi hn ch bt can thip ca con ngi n

    qu trnh phn tch, nn vic ny cn phi c thut ton tha nh cu: gip

    chng trnh hc t c mt mc xp x gn tt nht theo nhu cu trn mt

    tp d liu ln, lin tc c c cp nht. Vic xc nh th no l mt ln

    hay mc tin cy ca chng trnh s c xc nh ty theo tng trng hp

    c th.

    S dng my hc trong x l ngn ng t nhin:

    Hin nay, ngi ta c nhu cu p dng cc thnh tu ca my hc vo lnh

    vc x l ngn ng t nhin. Nhiu m hnh hc my khc nhau c p

    dng vo lnh vc x l ngn ng t nhin. Trc kia, ngi ta phi x l bng

    tay mt khi lng d liu ln, bn cnh , mt khi lng ln quy tc c

    s dng trong cc ngn ng cng gp phn lm tng khi lng cng vic ln

    rt nhiu. Hu ht cc m hnh my hc c p dng u gi nguyn c bn

    cht, tuy khng phi lc no cng p dng cc quy tc thng k, tm ra nhng

    quy tc in hnh da trn cc mu d liu thu thp c.

    V d: xt nhim v ca mt phn gn th cho mt bi pht biu, tc l xc

    nh chnh xc ngha ca bi pht biu vi mi t trong mt cu nht nh,

    thng l mt trong cha bao gi c thy trc y. Da trn hc my,

    cc phng php nh du thng c tin hnh theo hai bc:

    Bc u tin: cc bc o to - lm cho vic s dng ca mt th tp d

    liu hun luyn, trong bao gm