2. foundations of inferential...

17
1. Descriptive statistics 2. Foundations of inferential statistics 3. Estimation and confidence intervals 4. Testing statistical hypotheses 5. Regression analysis 2.1 Random variable 2.2 Univariate distribution functions 2.3 Population measures 2.4 Random sample 2. Foundations of inferential statistics Learning objectives In the first topic we have seen how to describe a sample, something that has already happened. The second topic describes a possible future event using random variables. When you have completed this topic you will be able to explain what is a random variable; describe characteristics of a random variable using probability distributions and population measures; view a data set as realizations of independent and identically distributed (i.i.d.) random variables. 23 / 221 Veronika Czellar HEC Paris Statistics

Upload: others

Post on 14-Mar-2020

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2. Foundations of inferential statisticsa1.phobos.apple.com/us/r30/CobaltPublic/v4/dd/67/ae/dd67ae7c-3cf0-ee95-97e1-c04dc3f2d...2. Foundations of inferential statistics 3. Estimation

1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals

4. Testing statistical hypotheses5. Regression analysis

2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample

2. Foundations of inferential statistics

Learning objectives

In the first topic we have seen how to describe a sample,something that has already happened. The second topic describesa possible future event using random variables.When you have completed this topic you will be able to

explain what is a random variable;

describe characteristics of a random variable using probabilitydistributions and population measures;

view a data set as realizations of independent and identicallydistributed (i.i.d.) random variables.

23 / 221 Veronika Czellar HEC Paris Statistics

Page 2: 2. Foundations of inferential statisticsa1.phobos.apple.com/us/r30/CobaltPublic/v4/dd/67/ae/dd67ae7c-3cf0-ee95-97e1-c04dc3f2d...2. Foundations of inferential statistics 3. Estimation

1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals

4. Testing statistical hypotheses5. Regression analysis

2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample

24 / 221 Veronika Czellar HEC Paris Statistics

Page 3: 2. Foundations of inferential statisticsa1.phobos.apple.com/us/r30/CobaltPublic/v4/dd/67/ae/dd67ae7c-3cf0-ee95-97e1-c04dc3f2d...2. Foundations of inferential statistics 3. Estimation

1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals

4. Testing statistical hypotheses5. Regression analysis

2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample

2.1 Random variable

Definition: random variable

A random variable (r.v.) X is the outcome of an experiment thathasn’t happened yet.

Examples

Consider the experiment of flipping a coin. Denote by X thenumber of heads. X is a random variable.

Denote Y the number of hours an HEC student spendsexercising per week. Y is a random variable.

25 / 221 Veronika Czellar HEC Paris Statistics

Page 4: 2. Foundations of inferential statisticsa1.phobos.apple.com/us/r30/CobaltPublic/v4/dd/67/ae/dd67ae7c-3cf0-ee95-97e1-c04dc3f2d...2. Foundations of inferential statistics 3. Estimation

1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals

4. Testing statistical hypotheses5. Regression analysis

2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample

Notation: random variables are denoted with uppercase letters(e.g. X ). Lowercase x represents a realization of X .

Definitions

The set of all possible realizations of x is denoted X and iscalled the range of the random variable.

The random variable X is called discrete if X is countable,and continuous if X is uncountable.

26 / 221 Veronika Czellar HEC Paris Statistics

Page 5: 2. Foundations of inferential statisticsa1.phobos.apple.com/us/r30/CobaltPublic/v4/dd/67/ae/dd67ae7c-3cf0-ee95-97e1-c04dc3f2d...2. Foundations of inferential statistics 3. Estimation

1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals

4. Testing statistical hypotheses5. Regression analysis

2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample

Examples: discrete or continuous?

1 The number of heads at the flip of a coin.

2 The number of hours HEC students spend exercising per week.

We now wish to measure how likely it is that each value occur.

27 / 221 Veronika Czellar HEC Paris Statistics

Page 6: 2. Foundations of inferential statisticsa1.phobos.apple.com/us/r30/CobaltPublic/v4/dd/67/ae/dd67ae7c-3cf0-ee95-97e1-c04dc3f2d...2. Foundations of inferential statistics 3. Estimation

1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals

4. Testing statistical hypotheses5. Regression analysis

2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample

2.2 Univariate distribution functions

2.2.1 Discrete random variables

Consider a discrete r.v. X with range X .

Definition: pmf

The probability mass function (pmf), f (x), gives the probabilitythat X takes the value x :

f (x) = P(X = x) ! 0 .

The pmf is a function f : X " [0, 1] satisfying

!

x!Xf (x) = 1 .

Jump to bivariate pmf

28 / 221 Veronika Czellar HEC Paris Statistics

Page 7: 2. Foundations of inferential statisticsa1.phobos.apple.com/us/r30/CobaltPublic/v4/dd/67/ae/dd67ae7c-3cf0-ee95-97e1-c04dc3f2d...2. Foundations of inferential statistics 3. Estimation

1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals

4. Testing statistical hypotheses5. Regression analysis

2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample

In our example of flipping a coin, the probability of landing headmay be 50%. However, some coins may be biased. For instance,the Belgium Euro coin has been accused of unfair flipping (NewScientist, Jan. 4, 2002).

In general: the flip of a coin can be modeled by a Bernoullidistribution.

29 / 221 Veronika Czellar HEC Paris Statistics

Page 8: 2. Foundations of inferential statisticsa1.phobos.apple.com/us/r30/CobaltPublic/v4/dd/67/ae/dd67ae7c-3cf0-ee95-97e1-c04dc3f2d...2. Foundations of inferential statistics 3. Estimation

1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals

4. Testing statistical hypotheses5. Regression analysis

2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample

Definition: Bernoulli distribution(Jacob Bernoulli)

A random variable X has a Bernoulli distribution if

X =

!

1 with probability p

0 with probability 1! p.

Hence, the pmf is

f (0) = 1! p and f (1) = p .

30 / 221 Veronika Czellar HEC Paris Statistics

Page 9: 2. Foundations of inferential statisticsa1.phobos.apple.com/us/r30/CobaltPublic/v4/dd/67/ae/dd67ae7c-3cf0-ee95-97e1-c04dc3f2d...2. Foundations of inferential statistics 3. Estimation

1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals

4. Testing statistical hypotheses5. Regression analysis

2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample

Notation: X ! f , if X has a pmf given by f .

Definition

The cumulative distribution function (cdf) of X is:

FX (x) = P(X " x) =!

y!x

f (y) .

31 / 221 Veronika Czellar HEC Paris Statistics

Page 10: 2. Foundations of inferential statisticsa1.phobos.apple.com/us/r30/CobaltPublic/v4/dd/67/ae/dd67ae7c-3cf0-ee95-97e1-c04dc3f2d...2. Foundations of inferential statistics 3. Estimation

1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals

4. Testing statistical hypotheses5. Regression analysis

2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample

We now extend these definitions to the case of continuous randomvariables...

Problem

The probability of occurrence of a single value (e.g. probability thatthe number of hours an HEC student spends exercising per week is3.1 hours) will typically be zero for a continuous random variable.

32 / 221 Veronika Czellar HEC Paris Statistics

Page 11: 2. Foundations of inferential statisticsa1.phobos.apple.com/us/r30/CobaltPublic/v4/dd/67/ae/dd67ae7c-3cf0-ee95-97e1-c04dc3f2d...2. Foundations of inferential statistics 3. Estimation

1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals

4. Testing statistical hypotheses5. Regression analysis

2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample

2.2.2 Continuous random variables Jump to bivariate pdf

Consider a continuous r.v. X with range X ! R.

Definitions

The probability density function (pdf) is a functionf : X " R+, satisfying:

1 P(a # X # b) =

! b

a

f (x)dx =area under f (x)between a and b

;

2

!

Xf (x)dx = 1 .

The cumulative distribution function (cdf) of X is:

FX (x) = P(X # x) =

! x

!"f (y)dy .

33 / 221 Veronika Czellar HEC Paris Statistics

Page 12: 2. Foundations of inferential statisticsa1.phobos.apple.com/us/r30/CobaltPublic/v4/dd/67/ae/dd67ae7c-3cf0-ee95-97e1-c04dc3f2d...2. Foundations of inferential statistics 3. Estimation

1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals

4. Testing statistical hypotheses5. Regression analysis

2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample

Definition: normal distribution

A r.v. X is normally distributed, denoted X ! N (µ,!2) withµ " R and ! > 0, if its pdf is given by

f (x) =1#2"!

e!(x!µ)2

2!2 , x " R .

The normal distribution with parameter values µ = 0 and ! = 1 iscalled the standard normal distribution.

34 / 221 Veronika Czellar HEC Paris Statistics

Page 13: 2. Foundations of inferential statisticsa1.phobos.apple.com/us/r30/CobaltPublic/v4/dd/67/ae/dd67ae7c-3cf0-ee95-97e1-c04dc3f2d...2. Foundations of inferential statistics 3. Estimation

1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals

4. Testing statistical hypotheses5. Regression analysis

2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample

Male height in U.S. (20+)

Male heights in centimeters

Freq

uenc

y

150 160 170 180 190 200

020

4060

8010

0

Sample mean and standard deviation: 176.3cm and 7.65cm.Almost all men are between [153.35, 199.25]cm. However. . .

35 / 221 Veronika Czellar HEC Paris Statistics

Page 14: 2. Foundations of inferential statisticsa1.phobos.apple.com/us/r30/CobaltPublic/v4/dd/67/ae/dd67ae7c-3cf0-ee95-97e1-c04dc3f2d...2. Foundations of inferential statistics 3. Estimation

1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals

4. Testing statistical hypotheses5. Regression analysis

2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample

Shaquille O’Neal: 216cm.

36 / 221 Veronika Czellar HEC Paris Statistics

Page 15: 2. Foundations of inferential statisticsa1.phobos.apple.com/us/r30/CobaltPublic/v4/dd/67/ae/dd67ae7c-3cf0-ee95-97e1-c04dc3f2d...2. Foundations of inferential statistics 3. Estimation

1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals

4. Testing statistical hypotheses5. Regression analysis

2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample

Alert

The cdf of a normal will play an important role, but itscomputation is challenging. Only numerical integration techniquesare available.

We can compute the cdf FX (x) by either using a Z-table or Excel.

Excel: use

NORMDIST(x , µ,!, 1) for the cdf of N (µ,!2);

NORMDIST(x , µ,!, 0) for the pdf of N (µ,!2).

37 / 221 Veronika Czellar HEC Paris Statistics

Page 16: 2. Foundations of inferential statisticsa1.phobos.apple.com/us/r30/CobaltPublic/v4/dd/67/ae/dd67ae7c-3cf0-ee95-97e1-c04dc3f2d...2. Foundations of inferential statistics 3. Estimation

1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals

4. Testing statistical hypotheses5. Regression analysis

2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample

Example

Consider X ! N (0, 1). Calculate FX (0.8) by first using a table,then, using Excel.

If µ "= 0 and/or ! "= 1, we can standardize the r.v. as follows.

Theorem

Consider X ! N (µ,!2). Then,

X # µ

!! N (0, 1) .

Example

Calculate FX (0.8) for X ! N (0.5, 22) by using a table, then, Excel.

38 / 221 Veronika Czellar HEC Paris Statistics

Page 17: 2. Foundations of inferential statisticsa1.phobos.apple.com/us/r30/CobaltPublic/v4/dd/67/ae/dd67ae7c-3cf0-ee95-97e1-c04dc3f2d...2. Foundations of inferential statistics 3. Estimation

1. Descriptive statistics2. Foundations of inferential statistics3. Estimation and confidence intervals

4. Testing statistical hypotheses5. Regression analysis

2.1 Random variable2.2 Univariate distribution functions2.3 Population measures2.4 Random sample

Sample PopulationObservation xi Random variable X

Histograms pdf, pmf, cdfSample mean x ?

Sample variance s2 ?Sample quantiles q̂! ?

39 / 221 Veronika Czellar HEC Paris Statistics