probabilistic spelling correction

Probabilistic Spelling CorrectionCE-324: Modern Information Retrieval Sharif University of Technology

M. Soleymani

Fall 2016

Most slides have been adapted from: Profs. Manning, Nayak & Raghavan lectures (CS-276, Stanford)

Applications of spelling correction

2

Spelling Tasks

3

Spelling Error Detection

Spelling Error Correction:

Autocorrect

htethe

Suggest a correction

Suggestion lists

Types of spelling errors

4

Non-word Errors

graffegiraffe

Real-word Errors

Typographical errors

three there

Cognitive Errors (homophones)

piece peace,

too two

your you’re

Real-word correction almost needs to be contextsensitive

Spelling correction steps

5

For each word w, generate candidate set:

Find candidate words with similar pronunciations

Find candidate words with similar spellings

Choose best candidate

By “Weighted edit distance” or “Noisy Channel” approach

Context-sensitive – so have to consider whether the

surrounding words “make sense”

“Flying form Heathrow to LAX””Flying from Heathrow to

LAX”

Candidate Testing:

Damerau-Levenshtein edit distance

6

Minimal edit distance between two strings, where edits

are:

Insertion

Deletion

Substitution

Transposition of two adjacent letters

Noisy channel intuition

8

Noisy channel

9

We see an observation 𝑥 of a misspelled word

Find the correct word 𝑤

Language Model

10

Take a big supply of words with T tokens:

𝑝 𝑤 =𝐶(𝑤)

𝑇

Supply of words

your document collection

In other applications:

you can take the supply to be typed queries (suitably filtered) – when

a static dictionary is inadequate

C(w) = # occurrences of w

Unigram prior probability

11

Counts from 404,253,213 words in Corpus of Contemporary

English (COCA)

Channel model probability

12

Error model probability, Edit probability

Misspelled word x = x1, x2, x3,… ,xm

Correct word w = w1, w2, w3,…, wn

P(x|w) = probability of the edit

(deletion/insertion/substitution/transposition)

Calculating p(x|w)

Still a research question.

Can be estimated.

Some simply ways. i.e.,

Confusion matrix A square 26×26 table which represents how many times one

letter was incorrectly used instead of another.

Usually, there are four confusion matrix:

deletion, insertion, substitution and transposition.

Computing error probability: Confusion

matrix

14

del[x,y]: count(xy typed as x)

ins[x,y]: count(x typed as xy)

sub[x,y]: count(y typed as x)

trans[x,y]: count(xy typed as yx)

Inser*on and dele*on condi*oned on previous character

Confusion matrix for subs*tu*on

15 The cell [o,e] in a substitution confusion matrix would give the count of times that e was substituted for o.

Channel model

16

Smoothing probabili*es: Add-1 smoothing

17

|A| character alphabet

Channel model for acress

18

Noisy channel for real-word spell correc*on

21

Given a sentence w1,w2,w3,…,wn

Generate a set of candidates for each word wi

Candidate(w1) = {w1, w’1 , w’’1 , w’’’1 ,…}

Candidate(w2) = {w2, w’2 , w’’2 , w’’’2 ,…}

Candidate(wn) = {wn, w’n , w’’n , w’’’n ,…}

Choose the sequence W that maximizes P(W)

Incorpora*ng context words:

Context-sensi*ve spelling correc*on

22

Determining whether actress or across is appropriate

will require looking at the context of use

A bigram language model condi*ons the probability of

a word on (just) the previous word

𝑃(𝑤1…𝑤𝑛) = 𝑃(𝑤1)𝑃(𝑤2|𝑤1)…𝑃(𝑤𝑛|𝑤𝑛−1)

Incorpora*ng context words

23

For unigram counts,𝑃(𝑤𝑘) is always non-zero

if our dic*onary is derived from the document collec*on

This won’t be true of 𝑃(𝑤𝑘|𝑤𝑘−1).We need to smooth

add-1 smoothing on this condi*onal distribu*on

Interpolate a unigram and a bigram:

Using a bigram language model

24

Using a bigram language model

25

Noisy channel for real-word spell

correc*on

26

Noisy channel for real-word spell

correc*on

27

Simplifica*on: One error per sentence

28

Where to get the probabili*es

29

Language model

Unigram

Bigram

Channel model

Same as for non-word spelling correc*on

Plus need probability for no error, P(w|w)

Probability of no error

30

What is the channel probability for a correctly typed

word?

P(“the”|“the”)

If you have a big corpus, you can es*mate this percent

correct

But this value depends strongly on the applica*onbility of

no error

Peter Norvig’s “thew” example

31

Improvements to channel model

32

Allow richer edits (Brill and Moore 2000)

entant

ph f

le al

Incorporate pronuncia*on into channel (Toutanova and

Moore 2002)

Incorporate device into channel

Not all Android phones need have the same error model

But spell correc*on may be done at the system level

probabilistic spelling correction

Documents