parametric test

41
Sampling and Hypothesis Testing(I) in MATLAB Kajal Rai [email protected]

Upload: chinnannan-periasamy

Post on 14-Jul-2015

421 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Parametric test

Sampling and Hypothesis Testing(I)

in MATLAB

Kajal Rai

[email protected]

Page 2: Parametric test

Contents:

• Sampling

• Hypothesis Test

• Types of parametric test

• One sample t-test

• Paired t-test

• Tailed t-test

• Two sample t-test

• z-test

• F-test

• Difference between t-test, z-test and F-test

Page 3: Parametric test

Sampling:

Sampling is the technique to be used in selecting the

items for the sample from the population.

Simple Random Sampling: In which each and every

unit of the population has an equal opportunity of being

selected in the sample.

Can be done with or without replacement.

If done with replacement, then each item has a

probability of 1/N of being drawn at each selection.

If done without replacement, then the first item has a

probability of 1/N, second item has 1/(N-1) and so on of

being drawn.

Page 4: Parametric test

Random Sampling in MATLAB

y = randsample(n,k) returns a vector of k sample

of values sampled uniformly at random, without

replacement, from the integers 1 to n.

y = randsample(population,k) returns a vector

of k values sampled uniformly at random,

without replacement, from the values in the

vector population.

Page 5: Parametric test

Random Sampling in MATLAB cntd…

y = randsample(n,k,replacement) or

y = randsample(population,k,replacement)

returns a sample taken with replacement

if replacement is true, or without replacement

if replacement is false. By default it is false.

Page 6: Parametric test
Page 7: Parametric test

Random Sampling in MATLAB cntd…

y = randsample(n,k,true,w) or

y =randsample(population,k,true,w) returns aweighted sample taken with replacement, using avector of positive weights w, whose length is n.The probability that the integer i is selected foran entry of y is w(i)/sum(w). Where, w is avector of probabilities.

randsample does not support weighted samplingwithout replacement.

Page 8: Parametric test

Generate a random sequence of the characters A, C, G,

and T, with replacement, according to the specified

probabilities.

Page 9: Parametric test

Hypothesis Tests:

A hypothesis test is a procedure for determiningif an assertion about a characteristic of apopulation is correct.

In hypothesis testing, the goal is to see if there issufficient statistical evidence to accept apresumed null hypothesis or to rejectthe alternative hypothesis[1].

The null hypothesis is usually denoted H0 whilethe alternative hypothesis is usually denoted H1.

Page 10: Parametric test

Types of parametric test:

• One sample t-test: The one-sample t-test is used when we want to knowwhether our sample comes from a particular population but we do nothave full population information available to us. Used when we don'tknow the variance.

• Paired t-test: A paired t-test looks at the difference between paired valuesin two samples, takes into account the variation of values within eachsample, and produces a single number known as a t-value.

• Two sample t-test:To compare responses from two groups. These twogroups can come from different experimental treatments, or different"populations".

• z-test: It is an appropriate parametric statistical procedure when there isone sample that is being compared to a population with a known mean andstandard deviation.

• F-test: The F-test is designed to test if two population variances are equal.

Page 11: Parametric test

One sample t-test:

[h,p,ci,stat] = ttest(X,M) performs a t-test of the hypothesis that

the data in X come from a distribution with mean M.

CI returns a 100*(1-ALPHA)% confidence interval for the true

mean of X.

STATS returns a structure with the following fields:

'tstat' -- the value of the test statistic

'df' -- the degrees of freedom of the test

'sd' -- the estimated population standard deviation.

Page 12: Parametric test

One sample t-test example:

• Ex: The specimen of copper wires drawn form a

large lot have the following breaking strength (in

kg. weight):

• 578, 572, 570, 568, 572, 578, 570, 572, 596, 544

• Test (using Student’s t-statistic)whether the

mean breaking strength of the lot may be taken

to be 578 kg. weight (Test at 5 per cent level of

significance).

Page 13: Parametric test
Page 14: Parametric test
Page 15: Parametric test

t-test with own significance level:

[h,p,ci,stat] = TTEST(...,ALPHA) performs the test at the significance level (100*ALPHA)%. ALPHA must be a scalar.

Page 16: Parametric test

Paired t-test:

A paired t-test looks at the difference between

paired values in two samples, takes into

account the variation of values within each

sample, and produces a single number known

as a t-value.

Page 17: Parametric test

Paired t-test in MATLAB:

H = TTEST(X,Y) performs a paired T-test of the

hypothesis that two matched samples, in the

vectors X and Y, come from distributions with

equal means. The difference X-Y is assumed to

come from a normal distribution with unknown

variance.

X and Y must have the same length.

Page 18: Parametric test

Example: Paired t-test

• Memory capacity of 9 students was testedbefore and after training. State at 5 percentlevel of significance whether the training waseffective from the following scores:

• Before:10,15,9,3,7,12,16,17,4

• After:12,17,8,5,6,11,18,20,3

• Take the score before training as X and thescore after training as Y and then taking thenull hypothesis that the mean of difference iszero

Page 19: Parametric test

we accept H0 and conclude that the difference in score before and after training is insignificant

i.e., it is only due to sampling fluctuations. Hence we can infer that the training was not

effective.

Page 20: Parametric test

Tailed t-test:

A one- or two-tailed t-test is determined by

whether the total area of α is placed in one tail or

divided equally between the two tails.

The one-tailed t-test is performed if the results

are interesting only if they turn out in a particular

direction.

The two-tailed t-test is performed if the results

would be interesting in either direction.

Page 21: Parametric test

One-Tailed t-Test:

There are two different one-tailed t-tests, one for eachtail.

In a one-tailed t-test, all the area associated with α isplaced in either one tail or the other. Selection of the taildepends upon which direction t would be (+ or -) if theresults of the experiment came out as expected.

The selection of the tail must be made before theexperiment is conducted and analyzed.

Test to see whether one mean was higher than the other.

Page 22: Parametric test

One-tailed t-test in the positive direction

The value tcrit would be positive. For example when α is set to .05

with ten degrees of freedom (df=10), tcrit would be equal to

+1.812.

Page 23: Parametric test

One-tailed t-test in the negative direction

The value tcrit would be negative. For example, when αis set to .05

with ten degrees of freedom (df=10), tcrit would be equal to -1.812.

Page 24: Parametric test

Two-Tailed t-Test:

A two-tailed t-test divides αin half, placing half in the each tail. The null hypothesis in

this case is a particular value, and there are two alternative hypotheses, one positive and

one negative. The critical value of t, tcrit, is written with both a plus and minus sign

(± ). For example, the critical value of t when there are ten degrees of freedom (df=10)

and α is set to .05, is tcrit=± 2.228.

We would use a two-tailed test to see if two means are different from each other (ie

from different populations), or from the same population.

Page 25: Parametric test

Tailed t-test in MATLAB

H = TTEST(...,TAIL) performs the test against

the alternative hypothesis specified by TAIL:

'both' -- "mean is M" (two-tailed test)

'right' -- "mean is greater than M" (right-tailed

test)

'left' -- "mean is less than M" (left-tailed test)

Page 26: Parametric test

One tailed t-test in MATLAB

Page 27: Parametric test

Two-tailed test in MATLAB

Page 28: Parametric test

Two sample t-test

H = TTEST2(X,Y) performs a T-test of the hypothesisthat two independent samples, in the vectors X and Y,come from distributions with equal means, and returnsthe result of the test in H.

H=0 indicates that the null hypothesis ("means areequal") cannot be rejected at the 5% significance level.H=1 indicates that the null hypothesis can be rejected atthe 5% level.

The data are assumed to come from normaldistributions with unknown, but equal, variances.

X and Y can have different lengths.

Page 29: Parametric test

Example:

• A group of seven-week old chickens reared on a high proteindiet weight 12, 15, 11, 16, 14, 14, and 16, a second group offive chickens, similarly treated except that they receive a lowprotein diet, weight 8, 10, 14, 10 and 13. Testing at 5 percentlevel whether there is significant evidence that additionalprotein has increased the weight of the chickens. Usingassumed mean = 10 for the sample of 7 and assumed mean = 8for the sample of 5 chickens in our calculations.

• Taking the null hypothesis that additional protein has notincreased the weight of the chickens

Page 30: Parametric test

Two sample t-test in MATLAB

we reject H0 and conclude that additional protein has increased the weight of chickens, at 5

per cent level of significance.

Page 31: Parametric test

Two sample t-test in MATLAB cntd…

H = TTEST2(X,Y,ALPHA,TAIL,VARTYPE) allows

you to specify the type of test. When VARTYPE is

'equal', TTEST2 performs the default test assuming

equal variances.

When VARTYPE is 'unequal', TTEST2 performs the

test assuming that the two samples come from normal

distributions with unknown and unequal variances.

This is known as the Behrens-Fisher problem.

Page 32: Parametric test

z-test in MATLAB

A z-test is used for testing the mean of a population orcomparing the means of two populations, with large (n ≥ 30)samples when we know the population standard deviation.

H = ZTEST(X,M,SIGMA) performs a Z-test of the hypothesisthat the data in the vector X come from a distribution with meanM, and returns the result of the test in H.

H=0 indicates that the null hypothesis ("mean is M") cannot berejected at the 5% significance level. H=1 indicates that the nullhypothesis can be rejected at the 5% level.

The data are assumed to come from a normal distribution withstandard deviation SIGMA.

Page 33: Parametric test

Example:

• A dog food manufacturer, had created new Super Vitamin Enriched Puppy Chow, specially designed for the active and growing Doberman Pincer.

• The sample of 10 Doberman puppies are 27.5, 33.5, 36.8, 39.5, 40.5, 42.5, 40.0, 22.9, 39.8, 40.8 and fed them nothing but with Super Vitamin Enriched Puppy Chow. When these dogs reached adulthood, they weighed 39.7 kg on average (M) and σ = 6.2 kg

• Did Puppy Chow make them grow especially big, test with a = .05?

• H0: The puppy chow did make the dogs grow more than normal.

• H1: The puppy chow did not make the dogs grow larger than normal

Page 34: Parametric test

We will accept H0 and conclude that the Super Vitamin Enriched Puppy Chow makes

Doberman Pincers grow significantly larger.

Page 35: Parametric test

F-test:

• F-test is used to compare the variance of thetwo-independent samples.

• This test is also used in the context of analysisof variance (ANOVA) for judging thesignificance of more than two sample means atone and the same time.

• It is also used for judging the significance ofmultiple correlation coefficients.

Page 36: Parametric test

F-test in MATLAB

• H = vartest2(X,Y) performs an F test of thehypothesis that two independent samples, in thevectors X and Y, come from normal distributionswith the same variance, against the alternative thatthey come from normal distributions with differentvariances.

• The result is H=0 if the null hypothesis ("variancesare equal") cannot be rejected at the 5% significancelevel, or H=1 if the null hypothesis can be rejected atthe 5% level.

• X and Y can have different lengths.

Page 37: Parametric test

Example:

• Two random samples drawn from two normal

populations are:

• Sample1: 20 16 26 27 23 22 18 24 25 19

• Sample2: 27 33 42 35 32 34 38 28 41 43 30 37

• At 5% significance level.

• We take the null hypothesis that the two populations

from where the samples have been drawn have the

same variances

Page 38: Parametric test

Since p value is more than 0.05 as such we accept the null hypothesis and conclude that

samples have been drawn from two populations having the same variances.

Page 39: Parametric test

Difference between t-test, z-test and F-test:

t-test z-test F-test

A t-test is used for testing the

mean of one population

against a standard or

comparing the means of two

populations. And when you

do not know the populations’

standard deviation and when

you have a limited sample (n

< 30).

A z-test is used for testing the

mean of a population versus a

standard, or comparing the

means of two populations, with

large (n ≥ 30) samples when we

know the population standard

deviation.

It is also used for testing the

proportion of some characteristic

versus a standard proportion, or

comparing the proportions of

two populations.

An F-test is used to

compare 2 populations’

variances. The samples can

be any size. It is the basis of

ANOVA.

Page 40: Parametric test

References:

Kothari, C.R.,1985, Research Methodology- Methods and

Techniques, New Delhi, Wiley Eastern Limited.

S.P.Gupta,Statistical Methods,eight revised edition 2009

http://www.mathworks.in/help/stats/ztest.html#btriieq

http://www.math.uah.edu/stat/hypothesis/Introduction.html

http://www.mathworks.in/products/statistics/description7.html

How to Do a T-Test in MATLAB

eHow http://www.ehow.com/how_12211819_ttest-

matlab.html#ixzz2WSQ6BN6o

Page 41: Parametric test

THANK YOU