statistical estimation vasileios hatzivassiloglou university of texas at dallas

Statistical Estimation

Vasileios HatzivassiloglouUniversity of Texas at Dallas

2

Obama contract at intrade.com

3

Instance profiles• Given k observations of maximum length n,

construct a |Σ|×n matrix A (profile) where entry Aij is the estimated probability that the ith letter occurs in position j

• One way to estimate Aij is to count each letter occuring at this position (cij); then

• This is maximum likelihood estimation (MLE)• Estimate becomes better as k increases

kc

A ijij

4

Example data

• 23 sample motif instances for the cyclic AMP receptor transcription factor (positions 3-9)

TTGTGGCTTTTGATAAGTGTCATTTGCACTGTGAGATGCAAAGTGTTAAATTTGAATTGTGATATTTATT

ACGTGATATGTGAGTTGTGAGCTGTAACCTGTGAATTGTGACGCCTGACTTGTGATTTGTGATGTGTGAA

CTGTGACATGAGACTTGTGAG

5

Calculated profile

1 2 3 4 5 6 7

A 0.348 0.043 0.000 0.043 0.130 0.826 0.261

C 0.174 0.087 0.043 0.043 0.000 0.043 0.304

G 0.130 0.000 0.783 0.000 0.826 0.043 0.174

T 0.348 0.870 0.174 0.913 0.043 0.087 0.261

6

Probability of a motif

• Suppose that we consider M as a candidate motif consensus

• How do we find the best M given the observations in A?

• Assuming independence of positions,

nMMM

n

nAAA

MMMPMP

21

21

21

)()(

7

Maximum likelihood estimation

• General method for estimating unknown parameters when we have– a sample of values that depend on these

parameters– a formula specifying the probability of

obtaining these values given the parameters

)|,,,(argmaxˆ21

nXXXP

8

MLE example: three coins

• Suppose we have three coins with probability of heads ⅓, ½, and ⅔

• One of them is used to generate a series of 20 tosses and we observe 11 heads

• θ = the heads probability of the coin used in the experiment

• Binomial distribution for the number of heads

9

Binomial distribution

• Count of one of two possible outcomes in a series of independent events

• The probabilities of the two outcomes are constant across events

• An example of iid events (independent, identically distributed)

10

Binomial probability mass

• If the probability of one outcome (let’s call it A) is p and there are n events– The probability of the other outcome is 1-p– The probability of obtaining a particular

sequence of outcomes with m A’s is– There are sequences with the same

number m of outcomes A• Overall

mn

mnm pp )1(

mnm ppmnnmP

)1()events in sA' (

11

MLE example: three coins

• Result: Choose θ = ½

0987.01120)| tosses20 ofout heads 11(

1602.01120)| tosses20 ofout heads 11(

0247.01120) | tosses20 ofout heads 11(

93111

32

32

92111

21

21

93211

31

31

P

P

P

12

MLE example: unknown coins

• θ can take any value between 0 and 1• m heads in n tosses

• Solve the differential equation

mnm

mnnmP

)1()| tosses in heads (

0ddP

13

Solving the differential equation

)1()()1(

)1()1()(

)1(1)1)((

)1()1(

11

11

11

mnmmn

mnmmn

mddmnm

n

dd

dd

mn

ddP

mnm

mmnmnm

mmnmnm

mmnmnm

nmmnmnm

ddP mn

m

10

001

0

0)1()(0)1(

00 1

1

14

MLE for binomial

• Of the three solutions, θ = 0 and θ = 1 result in P(X1,X2,...,Xn | θ) = 0, i.e., local minima

• On the other hand, for 0<θ<1, P(X1,X2,...,Xn | θ) > 0, so θ = m/n must be a local maximum

• Therefore the MLE estimate is nm

15

Properties of estimators

• The estimation error for a given sample is where x is the unknown true value

• An estimator is a random variable– because it depends on the sample

• The mean square error represents the overall quality of the estimation across all samples

xX ˆ

2)ˆ(E)ˆMSE( xXX

16

Expected values• Recall that the expected value of a discrete random

variable X is defined as

• The expected value of a dependent random variable f(X) is

• For continuous distributions, replace the sum with an integral

Xx

xxpXE of valuespossible All

)()(

Xx

xpxfXfE of valuespossible All

)()())((

17

Bias in estimation

• An estimator is unbiased if • MLE is not necessarily unbiased• Example: standard deviation

– Is the most commonly used measure of dispersion in a data set

– For a random variable X, it is defined as

)ˆE(

2)E(E XX

18

Estimators of standard deviation

• MLE estimator

where

• “Almost unbiased” estimator

( is an unbiased estimator of σ2)

N

ii XX

Ns

1

2MLE )(1

N

iiXN

X1

1

N

ii XX

Ns

1

2AU )(

11

2AUs

biased

statistical estimation vasileios hatzivassiloglou university of texas at dallas

Documents