short resume of statistical terms fall 2012 by yaohang li, ph.d

32
Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D.

Upload: dwayne-page

Post on 14-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D

Short Resume of Statistical Terms

Fall 2012

By Yaohang Li, Ph.D.

Page 2: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D

Review• Last Class

– Introduction to Monte Carlo• This Class

– Important Statistics Terms• Random Events

– Independence of Random Events– Axioms on Random Events

• Random Variables– Independence of Random Variables

• CDF• PDF• Expectation

– Characteristics of Expectation

• Moments of a Distribution– rth moment– rth central moment

• Mean• Variance• Standard Deviation• Covariance

– Characteristics of covariance

• Review of Statistics and Probability Terms• Important Distribution• Central Limit Theorem• Estimand and Estimator

• Next Class– Monte Carlo for Integration

Page 3: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D

Random Events and Probability• Random Event

– An event which has a chance of happening

• Probability– A numerical measure of that chance– Lying between 0 and 1, both inclusive

• Terminology– P(A)

• The probability that an event A occurs– P(A+B+…)

• The probability that at least one of the events A, B, … occurs– P(AB…)

• The probability that all the events A, B, … occur– P(A|B)

• The probability that the event A occurs when it known that the event B occurs• Conditional probability of A given B

Page 4: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D

Axioms in Probability

• P(A+B+…)P(A)+P(B)+…– If only one of the events A, B, … can occur, they are called

exclusive. The equality holds

– If at least one of the events A, B, … must occur, they are called exhaustive. P(A+B+…)=1

• P(AB)=P(A|B)P(B)– If P(A|B)=P(A), A and B are independent

• The chance of A occurring is uninfluenced by the occurrence of B

Page 5: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D

Random Variables and Distributions

• Random variable ()– A number to characterize a set of exclusive and exhaustive

events

• Cumulative Distribution Function (CDF)– F(y)=P( y)

– The probability that the event which occurs has a value not exceeding a prescribed y

– F(+)=1 and F(-)=1

– F(y) is a non-decreasing function of y

Page 6: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D

Expectation• If g() is a function of , the expectation (or mean value) of g is

denoted and defined by

– Stieltjes integral– The integral is taken over all values of y

• Explanation– Continuous random events

• F(y) is continuous and f(y) is a derivative

– Discrete random events• F(y) is a step function and fi is the step of height at the points of yi

• Probability Density Function (pdf)– f(y) and yi are the probability density functions

)()()( ydFygEg

dyyfygEg )()()(

i

ii fygEg )()(

Page 7: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D

More on Expectation

• The statistical physicist uses another notation for expectation– Suppose pi is the probability density function

• How about if g(x) is a constant function?

Page 8: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D

Linear Combination of the Expectation Values

Page 9: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D

Multi-dimensional Distribution• Multi-dimensional Random Variable

– Represented used a vector

• Multi-dimensional CDF– F(y)=P( y)

y means that each coordinate of is not greater than the corresponding coordinate of y

• Expectation

– Continuous multidimensional events

• where

)()()( yyη dFgEg

yyyη dfgEg )()()(

k

kk

k yyy

yyyFyyyff

...

),...,,(),...,,()(

21

2121y

Page 10: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D

Independence of Random Variables

• Consider a set of exhaustive and exclusive events, each characterized by a pair of numbers and , for which F(y,z) is the distribution. G(y) is an CDF for and H(z) is an CDF for .– F(y,z) = P( y, z)

– G(y) = P( y)

– H(z) = P( z)

• If it so happens that– F(y,z)=G(y)H(z) for all y and z

– the random variables and are called independent

Page 11: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D

Characteristics of Expectations

• Hold regardless whether or not the random variables i

are independent or not

• Hold only i are mutual independent

i i

iiii gEEg )()(

i

iii

ii gEEg )()(

Page 12: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D

Moments of Distribution• rth moment of a distribution

– E(r)

• Principle moment = E()

• rth central moment r= E{(- )r}

• Most important moments = E(), known as the mean of

• Measure of location of a random variable 2, known as the variance of (usually used abbreviation of “var”)

• Measure of dispersion about the mean– standard deviation

– coefficients of variation /

2

Page 13: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D

Covariance

• Definition of covariance (usually abbreviation of cov)– If and are random variables with means and v,

respectively, the quantity E{(- )(-v)} is called the covariance of and

– If and are independent, the covariance is 0

• Why?

– Also, cov(, )=var()

• Why?

Page 14: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D

Important Formula of Covariance

k

iji

k

i

k

ji

1 1 1

),cov()var(

Page 15: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D

Correlation Coefficient

• Definition

– Always between +1 and -1

– If =0, they are not correlated

– If <0, they are negatively correlated

– If >0, they are positively correlated

varvar/),cov(

Page 16: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D

Important Distributions

• Uniform Distribution• Exponential Distribution• Binomial Distribution• Poison Distribution• Normal Distribution

Page 17: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D

Uniform Distribution• Uniform Distribution (Rectangle Distribution)

– A distribution has constant probability

– Mean?

– Variance?

Page 18: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D

Exponential Distribution• Exponential Distribution

– mean 1/– variance 1/ 2

Page 19: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D

Binomial Distribution• Binomial Distribution

– Discrete probability distribution Pp(n|N) of obtaining exactly n successes out of N Bernoulli trials

– Each Bernoulli trial is true with probability p and false with probability q=1-p

= =

Page 20: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D

Poisson Distribution• Poisson Distribution

– The limit of the Binomial Distribution

– Mean is v

– Variance is v

!)(lim)(

n

evnPnP

vn

BN

v

Page 21: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D

Normal Distribution• Normal Distribution (Gaussian Distribution)

– Bell curve

– De Moivre developed the normal distribution as an approximation to the binomial distribution

Page 22: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D

Normal Distribution in Data Analysis• 68.26% of the data will be found within one SD either side of the mean

(±1SD)   95.44% of the data will be found within two SD either side of the

mean(±2SD)   99.74% of the data will be found within three SD either side of the mean

(±3SD)

Page 23: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D

Central Limit Theorem

• Central Limit Theorem– The sum of n independent random variables has an

approximately normal distribution when n is large

• Random variables conform to arbitrary distribution

Page 24: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D

Central Limit Theorem in Practice

• In practice– n = 10 is reasonably large number

– n = 25 is rather large (effective infinite)

Page 25: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D

Estimation• Monte Carlo Computation

– Goal: estimating the unknown numerical value of some parameter of some distribution• The parameter is called an estimand

• Sample• The available data (may consist of a number of observed random variables)• The number of observations in the sample is called the sample size

• Estimand– mean

• (1+ 2+…+ n)/n– weighted average

• (w11+w22+…+wnn)/(w1+w2+…+wn)• May be a better estimator

• Connection between the sample and the estimand– The estimand is a parameter of the distribution of the random variables constituting the

sample

Page 26: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D

Sampling Distribution• Parent Distribution

– We can represent the sample by a vector with coordinates 1, 2, 3,…, n

– The distribution of 1, 2, 3,…, n is called the Parent Distribution– To estimate the estimand (a parameter of the Parent Distribution), we use

some function t()• t is an estimator

• Sampling Distribution is a random variable, so is t()

• if we repeated the experiment, we should expect to get a different value of

– Since varies from experiment, t() has a distribution, called sampling distribution

– If t() is to be close to , then the sampling distribution ought to be closely concentrated around

Page 27: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D

Measuring Sampling Distribution

• The bias of t– The difference between and the average value of t() =E{t()-}

– t is an unbiased estimator if =0

• The sampling variance of t 2t=var{t()}=E{[t()-Et()]2}=E{[t- - ]2}

• If and 2t are small, t is a good estimator

Page 28: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D

Important Estimators

• Mean of the parent distribution

– standard error

• Variance of the parent distribution

– standard error

nn /)...( 21

n/

)1/()...(222

22

12 nns n

ns

5.0/22

Page 29: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D

Efficiency

• Goal of Monte Carlo Work– Obtain a respectably small standard error in the final result

– More random samples can lead to better accuracy

• Not very rewarding

– Variance Reduction Method

Page 30: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D

Summary• Important Statistics Terms

– Random Events• Independence of Random Events• Axioms on Random Events

– Random Variables• Independence of Random Variables

– CDF– PDF– Expectation

• Characteristics of Expectation– Moments of a Distribution

• rth moment• rth central moment

– Mean– Variance– Standard Deviation– Covariance

• Characteristics of covariance– Correlation Coefficient

Page 31: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D

Summary (Cont.)• Important Distributions

– Uniform Distribution– Exponential Distribution– Binomial Distribution– Poison Distribution– Normal Distribution

• Estimation– Sample– Estimand– Parent Distribution– Sampling Distribution– Estimator

• Important estimators– Buffon’s Needle

Page 32: Short Resume of Statistical Terms Fall 2012 By Yaohang Li, Ph.D

What I want you to do?

• Review Slides• Review basic probability/statistics concepts• Work on your Assignment 1