parametric test

Sampling and Hypothesis Testing(I)

in MATLAB

Kajal Rai

[email protected]

Contents:

• Sampling

• Hypothesis Test

• Types of parametric test

• One sample t-test

• Paired t-test

• Tailed t-test

• Two sample t-test

• z-test

• F-test

• Difference between t-test, z-test and F-test

Sampling:

Sampling is the technique to be used in selecting the

items for the sample from the population.

Simple Random Sampling: In which each and every

unit of the population has an equal opportunity of being

selected in the sample.

Can be done with or without replacement.

If done with replacement, then each item has a

probability of 1/N of being drawn at each selection.

If done without replacement, then the first item has a

probability of 1/N, second item has 1/(N-1) and so on of

being drawn.

Random Sampling in MATLAB

y = randsample(n,k) returns a vector of k sample

of values sampled uniformly at random, without

replacement, from the integers 1 to n.

y = randsample(population,k) returns a vector

of k values sampled uniformly at random,

without replacement, from the values in the

vector population.

Random Sampling in MATLAB cntd…

y = randsample(n,k,replacement) or

y = randsample(population,k,replacement)

returns a sample taken with replacement

if replacement is true, or without replacement

if replacement is false. By default it is false.

Random Sampling in MATLAB cntd…

y = randsample(n,k,true,w) or

y =randsample(population,k,true,w) returns aweighted sample taken with replacement, using avector of positive weights w, whose length is n.The probability that the integer i is selected foran entry of y is w(i)/sum(w). Where, w is avector of probabilities.

randsample does not support weighted samplingwithout replacement.

Generate a random sequence of the characters A, C, G,

and T, with replacement, according to the specified

probabilities.

Hypothesis Tests:

A hypothesis test is a procedure for determiningif an assertion about a characteristic of apopulation is correct.

In hypothesis testing, the goal is to see if there issufficient statistical evidence to accept apresumed null hypothesis or to rejectthe alternative hypothesis[1].

The null hypothesis is usually denoted H0 whilethe alternative hypothesis is usually denoted H1.

Types of parametric test:

• One sample t-test: The one-sample t-test is used when we want to knowwhether our sample comes from a particular population but we do nothave full population information available to us. Used when we don'tknow the variance.

• Paired t-test: A paired t-test looks at the difference between paired valuesin two samples, takes into account the variation of values within eachsample, and produces a single number known as a t-value.

• Two sample t-test:To compare responses from two groups. These twogroups can come from different experimental treatments, or different"populations".

• z-test: It is an appropriate parametric statistical procedure when there isone sample that is being compared to a population with a known mean andstandard deviation.

• F-test: The F-test is designed to test if two population variances are equal.

One sample t-test:

[h,p,ci,stat] = ttest(X,M) performs a t-test of the hypothesis that

the data in X come from a distribution with mean M.

CI returns a 100*(1-ALPHA)% confidence interval for the true

mean of X.

STATS returns a structure with the following fields:

'tstat' -- the value of the test statistic

'df' -- the degrees of freedom of the test

'sd' -- the estimated population standard deviation.

One sample t-test example:

• Ex: The specimen of copper wires drawn form a

large lot have the following breaking strength (in

kg. weight):

• 578, 572, 570, 568, 572, 578, 570, 572, 596, 544

• Test (using Student’s t-statistic)whether the

mean breaking strength of the lot may be taken

to be 578 kg. weight (Test at 5 per cent level of

significance).

t-test with own significance level:

[h,p,ci,stat] = TTEST(...,ALPHA) performs the test at the significance level (100*ALPHA)%. ALPHA must be a scalar.

Paired t-test:

A paired t-test looks at the difference between

paired values in two samples, takes into

account the variation of values within each

sample, and produces a single number known

as a t-value.

Paired t-test in MATLAB:

H = TTEST(X,Y) performs a paired T-test of the

hypothesis that two matched samples, in the

vectors X and Y, come from distributions with

equal means. The difference X-Y is assumed to

come from a normal distribution with unknown

variance.

X and Y must have the same length.

Example: Paired t-test

• Memory capacity of 9 students was testedbefore and after training. State at 5 percentlevel of significance whether the training waseffective from the following scores:

• Before:10,15,9,3,7,12,16,17,4

• After:12,17,8,5,6,11,18,20,3

• Take the score before training as X and thescore after training as Y and then taking thenull hypothesis that the mean of difference iszero

we accept H0 and conclude that the difference in score before and after training is insignificant

i.e., it is only due to sampling fluctuations. Hence we can infer that the training was not

effective.

Tailed t-test:

A one- or two-tailed t-test is determined by

whether the total area of α is placed in one tail or

divided equally between the two tails.

The one-tailed t-test is performed if the results

are interesting only if they turn out in a particular

direction.

The two-tailed t-test is performed if the results

would be interesting in either direction.

One-Tailed t-Test:

There are two different one-tailed t-tests, one for eachtail.

In a one-tailed t-test, all the area associated with α isplaced in either one tail or the other. Selection of the taildepends upon which direction t would be (+ or -) if theresults of the experiment came out as expected.

The selection of the tail must be made before theexperiment is conducted and analyzed.

Test to see whether one mean was higher than the other.

One-tailed t-test in the positive direction

The value tcrit would be positive. For example when α is set to .05

with ten degrees of freedom (df=10), tcrit would be equal to

+1.812.

One-tailed t-test in the negative direction

The value tcrit would be negative. For example, when αis set to .05

with ten degrees of freedom (df=10), tcrit would be equal to -1.812.

Two-Tailed t-Test:

A two-tailed t-test divides αin half, placing half in the each tail. The null hypothesis in

this case is a particular value, and there are two alternative hypotheses, one positive and

one negative. The critical value of t, tcrit, is written with both a plus and minus sign

(± ). For example, the critical value of t when there are ten degrees of freedom (df=10)

and α is set to .05, is tcrit=± 2.228.

We would use a two-tailed test to see if two means are different from each other (ie

from different populations), or from the same population.

Tailed t-test in MATLAB

H = TTEST(...,TAIL) performs the test against

the alternative hypothesis specified by TAIL:

'both' -- "mean is M" (two-tailed test)

'right' -- "mean is greater than M" (right-tailed

test)

'left' -- "mean is less than M" (left-tailed test)

One tailed t-test in MATLAB

Two-tailed test in MATLAB

Two sample t-test

H = TTEST2(X,Y) performs a T-test of the hypothesisthat two independent samples, in the vectors X and Y,come from distributions with equal means, and returnsthe result of the test in H.

H=0 indicates that the null hypothesis ("means areequal") cannot be rejected at the 5% significance level.H=1 indicates that the null hypothesis can be rejected atthe 5% level.

The data are assumed to come from normaldistributions with unknown, but equal, variances.

X and Y can have different lengths.

Example:

• A group of seven-week old chickens reared on a high proteindiet weight 12, 15, 11, 16, 14, 14, and 16, a second group offive chickens, similarly treated except that they receive a lowprotein diet, weight 8, 10, 14, 10 and 13. Testing at 5 percentlevel whether there is significant evidence that additionalprotein has increased the weight of the chickens. Usingassumed mean = 10 for the sample of 7 and assumed mean = 8for the sample of 5 chickens in our calculations.

• Taking the null hypothesis that additional protein has notincreased the weight of the chickens

Two sample t-test in MATLAB

we reject H0 and conclude that additional protein has increased the weight of chickens, at 5

per cent level of significance.

Two sample t-test in MATLAB cntd…

H = TTEST2(X,Y,ALPHA,TAIL,VARTYPE) allows

you to specify the type of test. When VARTYPE is

'equal', TTEST2 performs the default test assuming

equal variances.

When VARTYPE is 'unequal', TTEST2 performs the

test assuming that the two samples come from normal

distributions with unknown and unequal variances.

This is known as the Behrens-Fisher problem.

z-test in MATLAB

A z-test is used for testing the mean of a population orcomparing the means of two populations, with large (n ≥ 30)samples when we know the population standard deviation.

H = ZTEST(X,M,SIGMA) performs a Z-test of the hypothesisthat the data in the vector X come from a distribution with meanM, and returns the result of the test in H.

H=0 indicates that the null hypothesis ("mean is M") cannot berejected at the 5% significance level. H=1 indicates that the nullhypothesis can be rejected at the 5% level.

The data are assumed to come from a normal distribution withstandard deviation SIGMA.

Example:

• A dog food manufacturer, had created new Super Vitamin Enriched Puppy Chow, specially designed for the active and growing Doberman Pincer.

• The sample of 10 Doberman puppies are 27.5, 33.5, 36.8, 39.5, 40.5, 42.5, 40.0, 22.9, 39.8, 40.8 and fed them nothing but with Super Vitamin Enriched Puppy Chow. When these dogs reached adulthood, they weighed 39.7 kg on average (M) and σ = 6.2 kg

• Did Puppy Chow make them grow especially big, test with a = .05?

• H0: The puppy chow did make the dogs grow more than normal.

• H1: The puppy chow did not make the dogs grow larger than normal

We will accept H0 and conclude that the Super Vitamin Enriched Puppy Chow makes

Doberman Pincers grow significantly larger.

F-test:

• F-test is used to compare the variance of thetwo-independent samples.

• This test is also used in the context of analysisof variance (ANOVA) for judging thesignificance of more than two sample means atone and the same time.

• It is also used for judging the significance ofmultiple correlation coefficients.

F-test in MATLAB

• H = vartest2(X,Y) performs an F test of thehypothesis that two independent samples, in thevectors X and Y, come from normal distributionswith the same variance, against the alternative thatthey come from normal distributions with differentvariances.

• The result is H=0 if the null hypothesis ("variancesare equal") cannot be rejected at the 5% significancelevel, or H=1 if the null hypothesis can be rejected atthe 5% level.

• X and Y can have different lengths.

Example:

• Two random samples drawn from two normal

populations are:

• Sample1: 20 16 26 27 23 22 18 24 25 19

• Sample2: 27 33 42 35 32 34 38 28 41 43 30 37

• At 5% significance level.

• We take the null hypothesis that the two populations

from where the samples have been drawn have the

same variances

Since p value is more than 0.05 as such we accept the null hypothesis and conclude that

samples have been drawn from two populations having the same variances.

Difference between t-test, z-test and F-test:

t-test z-test F-test

A t-test is used for testing the

mean of one population

against a standard or

comparing the means of two

populations. And when you

do not know the populations’

standard deviation and when

you have a limited sample (n

< 30).

A z-test is used for testing the

mean of a population versus a

standard, or comparing the

means of two populations, with

large (n ≥ 30) samples when we

know the population standard

deviation.

It is also used for testing the

proportion of some characteristic

versus a standard proportion, or

comparing the proportions of

two populations.

An F-test is used to

compare 2 populations’

variances. The samples can

be any size. It is the basis of

ANOVA.

References:

Kothari, C.R.,1985, Research Methodology- Methods and

Techniques, New Delhi, Wiley Eastern Limited.

S.P.Gupta,Statistical Methods,eight revised edition 2009

http://www.mathworks.in/help/stats/ztest.html#btriieq

http://www.math.uah.edu/stat/hypothesis/Introduction.html

http://www.mathworks.in/products/statistics/description7.html

How to Do a T-Test in MATLAB

eHow http://www.ehow.com/how_12211819_ttest-

matlab.html#ixzz2WSQ6BN6o

http://www.mathworks.in/help/stats/ztest.html

http://www.math.uah.edu/stat/hypothesis/Introduction.html

http://www.mathworks.in/products/statistics/description7.html

http://www.ehow.com/how_12211819_ttest-matlab.html

http://www.ehow.com/how_12211819_ttest-matlab.html

THANK YOU

parametric test

Documents

ftest sampling

sample of values

weighted sample

whetherour sample

simple random sampling

weighted sampling

random sequence

hypothesis testingi