probabilistic spelling correction
TRANSCRIPT
Probabilistic Spelling CorrectionCE-324: Modern Information Retrieval Sharif University of Technology
M. Soleymani
Fall 2016
Most slides have been adapted from: Profs. Manning, Nayak & Raghavan lectures (CS-276, Stanford)
Applications of spelling correction
2
Spelling Tasks
3
Spelling Error Detection
Spelling Error Correction:
Autocorrect
htethe
Suggest a correction
Suggestion lists
Types of spelling errors
4
Non-word Errors
graffegiraffe
Real-word Errors
Typographical errors
three there
Cognitive Errors (homophones)
piece peace,
too two
your you’re
Real-word correction almost needs to be contextsensitive
Spelling correction steps
5
For each word w, generate candidate set:
Find candidate words with similar pronunciations
Find candidate words with similar spellings
Choose best candidate
By “Weighted edit distance” or “Noisy Channel” approach
Context-sensitive – so have to consider whether the
surrounding words “make sense”
“Flying form Heathrow to LAX””Flying from Heathrow to
LAX”
Candidate Testing:
Damerau-Levenshtein edit distance
6
Minimal edit distance between two strings, where edits
are:
Insertion
Deletion
Substitution
Transposition of two adjacent letters
7
Noisy channel intuition
8
Noisy channel
9
We see an observation 𝑥 of a misspelled word
Find the correct word 𝑤
Language Model
10
Take a big supply of words with T tokens:
𝑝 𝑤 =𝐶(𝑤)
𝑇
Supply of words
your document collection
In other applications:
you can take the supply to be typed queries (suitably filtered) – when
a static dictionary is inadequate
C(w) = # occurrences of w
Unigram prior probability
11
Counts from 404,253,213 words in Corpus of Contemporary
English (COCA)
Channel model probability
12
Error model probability, Edit probability
Misspelled word x = x1, x2, x3,… ,xm
Correct word w = w1, w2, w3,…, wn
P(x|w) = probability of the edit
(deletion/insertion/substitution/transposition)
Calculating p(x|w)
Still a research question.
Can be estimated.
Some simply ways. i.e.,
Confusion matrix A square 26×26 table which represents how many times one
letter was incorrectly used instead of another.
Usually, there are four confusion matrix:
deletion, insertion, substitution and transposition.
Computing error probability: Confusion
matrix
14
del[x,y]: count(xy typed as x)
ins[x,y]: count(x typed as xy)
sub[x,y]: count(y typed as x)
trans[x,y]: count(xy typed as yx)
Inser*on and dele*on condi*oned on previous character
Confusion matrix for subs*tu*on
15 The cell [o,e] in a substitution confusion matrix would give the count of times that e was substituted for o.
Channel model
16
Smoothing probabili*es: Add-1 smoothing
17
|A| character alphabet
Channel model for acress
18
19
20
Noisy channel for real-word spell correc*on
21
Given a sentence w1,w2,w3,…,wn
Generate a set of candidates for each word wi
Candidate(w1) = {w1, w’1 , w’’1 , w’’’1 ,…}
Candidate(w2) = {w2, w’2 , w’’2 , w’’’2 ,…}
Candidate(wn) = {wn, w’n , w’’n , w’’’n ,…}
Choose the sequence W that maximizes P(W)
Incorpora*ng context words:
Context-sensi*ve spelling correc*on
22
Determining whether actress or across is appropriate
will require looking at the context of use
A bigram language model condi*ons the probability of
a word on (just) the previous word
𝑃(𝑤1…𝑤𝑛) = 𝑃(𝑤1)𝑃(𝑤2|𝑤1)…𝑃(𝑤𝑛|𝑤𝑛−1)
Incorpora*ng context words
23
For unigram counts,𝑃(𝑤𝑘) is always non-zero
if our dic*onary is derived from the document collec*on
This won’t be true of 𝑃(𝑤𝑘|𝑤𝑘−1).We need to smooth
add-1 smoothing on this condi*onal distribu*on
Interpolate a unigram and a bigram:
Using a bigram language model
24
Using a bigram language model
25
Noisy channel for real-word spell
correc*on
26
Noisy channel for real-word spell
correc*on
27
Simplifica*on: One error per sentence
28
Where to get the probabili*es
29
Language model
Unigram
Bigram
Channel model
Same as for non-word spelling correc*on
Plus need probability for no error, P(w|w)
Probability of no error
30
What is the channel probability for a correctly typed
word?
P(“the”|“the”)
If you have a big corpus, you can es*mate this percent
correct
But this value depends strongly on the applica*onbility of
no error
Peter Norvig’s “thew” example
31
Improvements to channel model
32
Allow richer edits (Brill and Moore 2000)
entant
ph f
le al
Incorporate pronuncia*on into channel (Toutanova and
Moore 2002)
Incorporate device into channel
Not all Android phones need have the same error model
But spell correc*on may be done at the system level