natural language processing - carnegie mellon...
TRANSCRIPT
-
Natural Language Processing
Lecture 6: Information Theory; Spelling, Edit Distance, and Noisy
Channels
-
A Taste of Information Theory
• Shannon Entropy, H(p)• Cross-entropy, H(p; q)• Perplexity
-
CodebookName Code
Biden 000Sanders 001Harris 010YKW 011Booker 100Warren 101Yang 110Marianne 111
-
CodebookName Code Probability
Biden 000 1/4Sanders 001 1/16Harris 010 1/64YKW 011 1/2Booker 100 1/64Warren 101 1/8Yang 110 1/64Marianne 111 1/64
-
CodebookName Probability New Code
Biden 1/4 10Sanders 1/16 1110Harris 1/64 111100YKW 1/2 0Booker 1/64 111101Warren 1/8 110Yang 1/64 111110Marianne 1/64 111111
-
A Taste of Information Theory
• Shannon Entropy, H(p)• Cross-entropy, H(p; q)• Perplexity
-
Three Spelling Problems
1. Detecting isolated non-words“graffe” “exampel”
2. Fixing isolated non-words“graffe” à “giraffe” “exampel” à “example”
3. Fixing errors in context“I ate desert” à “I ate dessert”“It was written be me” à “It was written by me”
-
String edit distance• How many letter changes to map A to B• Substitutions– E X A M P E L– E X A M P L E 2 substitutions
• Insertions– E X A P L E– E X A M P L E 1 insertion
• Deletions– E X A M M P L E– E X A _ M P L E 1 deletion
-
Levenshtein Distance
D0,0 = 0
Di,j = min
8><
>:
Di�1,j + inscost(ti)
Di,j�1 + delcost(sj)
Di�1,j�1 + substcost(ti, sj)
-
String Edit Distance
-
String edit distance# 9L 8E 7P 6M 5M 4A 3X 2E 1# 0 1 2 3 4 5 6 7 8
# E X A M P L E #
-
String edit distance# 9 8 7 6 5L 8 7 6 5 4E 7 6 5 4 3P 6 5 4 3 2M 5 4 3 2 1M 4 3 2 1 0 1A 3 2 1 0 1 2X 2 1 0 1 2 3E 1 0 1 2 3 4# 0 1 2 3 4 5 6 7 8
# E X A M P L E #
-
String edit distance# 9 8 7 6 5 4 4 6 5L 8 7 6 5 4 3 3 5 7E 7 6 5 4 3 2 3 2 3P 6 5 4 3 2 1 2 3 4M 5 4 3 2 1 2 3 4 5M 4 3 2 1 0 1 2 3 4A 3 2 1 0 1 2 3 4 5X 2 1 0 1 2 3 4 5 6E 1 0 1 2 3 4 5 6 7# 0 1 2 3 4 5 6 7 8
# E X A M P L E #
-
String edit distance# 9 8 7 6 5 4 4 6 5L 8 7 6 5 4 3 3 5 7E 7 6 5 4 3 2 3 2 3P 6 5 4 3 2 1 2 3 4M 5 4 3 2 1 2 3 4 5M 4 3 2 1 0 1 2 3 4A 3 2 1 0 1 2 3 4 5X 2 1 0 1 2 3 4 5 6E 1 0 1 2 3 4 5 6 7# 0 1 2 3 4 5 6 7 8
# E X A M P L E #
-
Levenshtein Hamming Distance
D0,0 = 0
Di,j = min
8><
>:
Di�1,j +1Di,j�1 +1Di�1,j�1 + substcost(ti, sj)
-
Levenshtein Distance with Transposition
D0,0 = 0
Di,j = min
8>>><
>>>:
Di�1,j + inscost(ti)
Di,j�1 + delcost(sj)
Di�1,j�1 + substcost(ti, sj)
Di�2,j�2 + transcost(sj�1, sj)if sj�1 = ti and sj = ti�1
-
Three Spelling Problems
üDetecting isolated non-wordsüFixing isolated non-words3. Fixing errors in context
-
Kernighan’s Model: A Noisy Channel
source
channel
example exmaple
-
acress
c freq(c) p(t | c) %actress 1343 p(delete t) 37cress 0 p(delete a) 0caress 4 p(transpose a & c) 0access 2280 p(substitute r for c) 0across 8436 p(substitute e for o) 18acres 2879 p(delete s) 21acres 2879 p(delete s) 23
-
How to choose between options
• Probabilities of edits– Insertions, deletions, substitutions,– Transpositions
• Probability of the new word
-
Noisy Channel Model (General)
source
channel
y x
decode
-
Probability model
• Most likely word given observationArgmax ( )
• By Bayes Rule is equivalent toArgmax ( )
• Which is equivalent to Argmax ( P(W) P(O|W) ) (denominator is constant)
• P(O | W) calculated from edit distance• P(W) calculated from language model