mathematical foundationsrafea/csce569/slides/snlp-math-foundation… · 3/28/2004 mathematical...

28
3/28/2004 Mathematical Foundation 1 Mathematical Foundations Extracted from presentation on the following site http://nlp.stanford.edu/fsnlp/rosario-math- foundation.ppt Ahmed Rafea

Upload: others

Post on 26-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mathematical Foundationsrafea/CSCE569/slides/snlp-math-foundation… · 3/28/2004 Mathematical Foundation 3 Motivations (Cont) nAn example of statistical inference is the task of

3/28/2004 Mathematical Foundation 1

Mathematical FoundationsExtracted from presentation on the following site http://nlp.stanford.edu/fsnlp/rosario-math-foundation.ppt

Ahmed Rafea

Page 2: Mathematical Foundationsrafea/CSCE569/slides/snlp-math-foundation… · 3/28/2004 Mathematical Foundation 3 Motivations (Cont) nAn example of statistical inference is the task of

3/28/2004 Mathematical Foundation 2

Motivations n Statistical NLP aims to do statistical

inference for the field of NLn Statistical inference consists of

taking some data (generated in accordance with some unknown probability distribution) and then making some inference about this distribution.

Page 3: Mathematical Foundationsrafea/CSCE569/slides/snlp-math-foundation… · 3/28/2004 Mathematical Foundation 3 Motivations (Cont) nAn example of statistical inference is the task of

3/28/2004 Mathematical Foundation 3

Motivations (Cont)n An example of statistical inference is

the task of language modeling (ex how to predict the next word given the previous words)

n In order to do this, we need a modelof the language.

n Probability theory helps us finding such model

Page 4: Mathematical Foundationsrafea/CSCE569/slides/snlp-math-foundation… · 3/28/2004 Mathematical Foundation 3 Motivations (Cont) nAn example of statistical inference is the task of

3/28/2004 Mathematical Foundation 4

Probability Theoryn How likely it is that something will

happenn Sample space O is listing of all

possible outcome of an experimentn Event A is a subset of On Probability function (or distribution)

[ ]0,1O:P →

Page 5: Mathematical Foundationsrafea/CSCE569/slides/snlp-math-foundation… · 3/28/2004 Mathematical Foundation 3 Motivations (Cont) nAn example of statistical inference is the task of

3/28/2004 Mathematical Foundation 5

Prior Probabilityn Prior probability: the probability

before we consider any additional knowledge

)(AP

Page 6: Mathematical Foundationsrafea/CSCE569/slides/snlp-math-foundation… · 3/28/2004 Mathematical Foundation 3 Motivations (Cont) nAn example of statistical inference is the task of

3/28/2004 Mathematical Foundation 6

Conditional probabilityn Sometimes we have partial knowledge

about the outcome of an experimentn Conditional (or Posterior) Probabilityn Suppose we know that event B is truen The probability that A is true given

the knowledge about B is expressed by

)|( BAP

Page 7: Mathematical Foundationsrafea/CSCE569/slides/snlp-math-foundation… · 3/28/2004 Mathematical Foundation 3 Motivations (Cont) nAn example of statistical inference is the task of

3/28/2004 Mathematical Foundation 7

Conditional probability (cont)

)()|()()|(),(

APABPBPBAPBAP

==

n Joint probability of A and B.n 2-dimensional table with a value in every

cell giving the probability of that specific state occurring

Page 8: Mathematical Foundationsrafea/CSCE569/slides/snlp-math-foundation… · 3/28/2004 Mathematical Foundation 3 Motivations (Cont) nAn example of statistical inference is the task of

3/28/2004 Mathematical Foundation 8

Chain Rule

P(A,B) = P(A|B)P(B)= P(B|A)P(A)

P(A,B,C,D…) = P(A)P(B|A)P(C|A,B)P(D|A,B,C..)

Page 9: Mathematical Foundationsrafea/CSCE569/slides/snlp-math-foundation… · 3/28/2004 Mathematical Foundation 3 Motivations (Cont) nAn example of statistical inference is the task of

3/28/2004 Mathematical Foundation 9

(Conditional) independencen Two events A e B are independent of

each other if P(A) = P(A|B)

n Two events A and B are conditionally independent of each other given C ifP(A|C) = P(A|B,C)

Page 10: Mathematical Foundationsrafea/CSCE569/slides/snlp-math-foundation… · 3/28/2004 Mathematical Foundation 3 Motivations (Cont) nAn example of statistical inference is the task of

3/28/2004 Mathematical Foundation 10

Bayes’ Theoremn Bayes’ Theorem lets us swap the

order of dependence between eventsn We saw that n Bayes’ Theorem:

P(B)B)P(A,B)|P(A =

P(B)A)P(A)|P(BB)|P(A =

Page 11: Mathematical Foundationsrafea/CSCE569/slides/snlp-math-foundation… · 3/28/2004 Mathematical Foundation 3 Motivations (Cont) nAn example of statistical inference is the task of

3/28/2004 Mathematical Foundation 11

Example n S:stiff neck, M: meningitisn P(S|M) =0.5, P(M) = 1/50,000

P(S)=1/20n I have stiff neck, should I worry?

0002.020/1

000,50/15.0)(

)()|()|(

=

=SP

MPMSPSMP

Page 12: Mathematical Foundationsrafea/CSCE569/slides/snlp-math-foundation… · 3/28/2004 Mathematical Foundation 3 Motivations (Cont) nAn example of statistical inference is the task of

3/28/2004 Mathematical Foundation 12

Random Variablesn So far, event space that differs with

every problem we look atn Random variables (RV) X allow us to

talk about the probabilities of numerical values that are related to the event space

ℑ→Ωℜ→Ω

::

XX

Page 13: Mathematical Foundationsrafea/CSCE569/slides/snlp-math-foundation… · 3/28/2004 Mathematical Foundation 3 Motivations (Cont) nAn example of statistical inference is the task of

3/28/2004 Mathematical Foundation 13

Expectation

xXA

ApxXpxp

x

x

=Ω∈====)(:

)()()(

ωω

n The Expectation is the mean or average of a RV

∑ ==x

xxpxE µ)()(

∑ =x

xp 1)( 1)(0 ≤≤ xp

Page 14: Mathematical Foundationsrafea/CSCE569/slides/snlp-math-foundation… · 3/28/2004 Mathematical Foundation 3 Motivations (Cont) nAn example of statistical inference is the task of

3/28/2004 Mathematical Foundation 14

Variancen The variance of a RV is a measure of

whether the values of the RV tend to be consistent over trials or to vary a lot

n s is the standard deviation

222

2

)()(

)))((()(

σ=−=

−=

XEXE

XEXEXVar

Page 15: Mathematical Foundationsrafea/CSCE569/slides/snlp-math-foundation… · 3/28/2004 Mathematical Foundation 3 Motivations (Cont) nAn example of statistical inference is the task of

3/28/2004 Mathematical Foundation 15

Back to the Language Modeln In general, for language events, P is

unknownn We need to estimate P, (or model M

of the language)n We’ll do this by looking at evidence

about what P must be based on a sample of data

Page 16: Mathematical Foundationsrafea/CSCE569/slides/snlp-math-foundation… · 3/28/2004 Mathematical Foundation 3 Motivations (Cont) nAn example of statistical inference is the task of

3/28/2004 Mathematical Foundation 16

Estimation of P

n Frequentist statistics

n Bayesian statistics

Page 17: Mathematical Foundationsrafea/CSCE569/slides/snlp-math-foundation… · 3/28/2004 Mathematical Foundation 3 Motivations (Cont) nAn example of statistical inference is the task of

3/28/2004 Mathematical Foundation 17

Frequentist Statistics n Relative frequency: proportion of times an

outcome u occurs

n C(u) is the number of times u occurs in N trials

n For the relative frequency tends to stabilize around some number: probability estimates

NC(u)fu =

∞→N

Page 18: Mathematical Foundationsrafea/CSCE569/slides/snlp-math-foundation… · 3/28/2004 Mathematical Foundation 3 Motivations (Cont) nAn example of statistical inference is the task of

3/28/2004 Mathematical Foundation 18

Frequentist Statistics (cont)n Two different approach:

n Parametricn Non-parametric (distribution free)

Page 19: Mathematical Foundationsrafea/CSCE569/slides/snlp-math-foundation… · 3/28/2004 Mathematical Foundation 3 Motivations (Cont) nAn example of statistical inference is the task of

3/28/2004 Mathematical Foundation 19

Parametric Methodsn Assume that some phenomenon in language

is acceptably modeled by one of the well-known family of distributions (such binomial, normal)

n We have an explicit probabilistic model of the process by which the data was generated, and determining a particular probability distribution within the family requires only the specification of a few parameters (less training data)

Page 20: Mathematical Foundationsrafea/CSCE569/slides/snlp-math-foundation… · 3/28/2004 Mathematical Foundation 3 Motivations (Cont) nAn example of statistical inference is the task of

3/28/2004 Mathematical Foundation 20

Non-Parametric Methodsn No assumption about the underlying

distribution of the datan For ex, simply estimate P empirically

by counting a large number of random events is a distribution-free method

n Less prior information, more training data needed

Page 21: Mathematical Foundationsrafea/CSCE569/slides/snlp-math-foundation… · 3/28/2004 Mathematical Foundation 3 Motivations (Cont) nAn example of statistical inference is the task of

3/28/2004 Mathematical Foundation 21

Entropyn X: discrete RV, p(X) n Entropy (or self-information)

n Entropy measures the amount of information in a RV; it’s the average length of the message needed to transmit an outcome of that variable using the optimal code

p(x)p(x)logH(X)H(p)Xx

2∑∈

−==

Page 22: Mathematical Foundationsrafea/CSCE569/slides/snlp-math-foundation… · 3/28/2004 Mathematical Foundation 3 Motivations (Cont) nAn example of statistical inference is the task of

3/28/2004 Mathematical Foundation 22

Entropy (cont)

=

=

−=

p(x)1log E

p(x)1p(x)log

p(x)p(x)logH(X)

2

Xx2

Xx2

1p(X)0H(X)0H(X)

=⇔=

≥ i.e when the value of Xis determinate, hence providing no new information

Page 23: Mathematical Foundationsrafea/CSCE569/slides/snlp-math-foundation… · 3/28/2004 Mathematical Foundation 3 Motivations (Cont) nAn example of statistical inference is the task of

3/28/2004 Mathematical Foundation 23

Entropy and Linguisticsn Entropy is measure of uncertainty.

The more we know about something the lower the entropy.

n If a language model captures more of the structure of the language, then the entropy should be lower.

n We can use entropy as a measure of the quality of our models

Page 24: Mathematical Foundationsrafea/CSCE569/slides/snlp-math-foundation… · 3/28/2004 Mathematical Foundation 3 Motivations (Cont) nAn example of statistical inference is the task of

3/28/2004 Mathematical Foundation 24

The Noisy Channel Modeln The aim is to optimize in terms of

throughput and accuracy the communication of messages in the presence of noise in the channel

n Duality between compression (achieved by removing all redundancy) and transmission accuracy (achieved by adding controlled redundancy so that the input can be recovered in the presence of noise)

Page 25: Mathematical Foundationsrafea/CSCE569/slides/snlp-math-foundation… · 3/28/2004 Mathematical Foundation 3 Motivations (Cont) nAn example of statistical inference is the task of

3/28/2004 Mathematical Foundation 25

The Noisy Channel Modeln Goal: encode the message in such a way

that it occupies minimal space while still containing enough redundancy to be able to detect and correct errors

W X W*Yencoder decoderChannel

p(y|x)messageinput to channel

Output fromchannel

Attempt to reconstruct message based on output

Page 26: Mathematical Foundationsrafea/CSCE569/slides/snlp-math-foundation… · 3/28/2004 Mathematical Foundation 3 Motivations (Cont) nAn example of statistical inference is the task of

3/28/2004 Mathematical Foundation 26

The Noisy Channel Modeln Channel capacity: rate at which one can

transmit information through the channel with an arbitrary low probability of being unable to recover the input from the output

n

n We reach a channel capacity if we manage to design an input code X whose distribution p(X) maximizes I between input and output

Y)I(X;max Cp(X)

=

Page 27: Mathematical Foundationsrafea/CSCE569/slides/snlp-math-foundation… · 3/28/2004 Mathematical Foundation 3 Motivations (Cont) nAn example of statistical inference is the task of

3/28/2004 Mathematical Foundation 27

Linguistics and the Noisy Channel Modeln In linguistic we can’t control the encoding

phase. We want to decode the output to give the most likely input.

i)|p(i)p(oargmax p(o)

i)|p(i)p(oargmax o)|p(iargmax Iiii

===ˆ

decoderNoisy Channel

p(o|I)I O I

Page 28: Mathematical Foundationsrafea/CSCE569/slides/snlp-math-foundation… · 3/28/2004 Mathematical Foundation 3 Motivations (Cont) nAn example of statistical inference is the task of

3/28/2004 Mathematical Foundation 28

The noisy Channel Model

n p(i) is the language model and is the channel probability

n Ex: Machine translation, optical character recognition, speech recognition

i)|p(i)p(oargmax p(o)

i)|p(i)p(oargmax o)|p(iargmax Iiii

===ˆ

i)|p(o