hypothesis testing

7
Hypothesis Testing Hypothesis A hypothesis is an assumption or a statement that may or may not be true The hypothesis is tested on the basis of information obtained from a sample Instead of asking, for example, what the mean assessed value of an apartment in a multistoried building is, one may be interested in knowing whether or not the assessed value equals some particular value, say Rs.80 lakh Example – whether a new drug is more effective than the existing drug based on the sample data? Example – whether the proportion of smokers in a class is different from 0.30? Null Hypothesis The hypotheses that are proposed with the intent of receiving a rejection for them are called null hypotheses This requires that we hypothesize the opposite of what is desired to be proved For example, if we want to show that sales and advertisement expenditure are related, we formulate the null hypothesis that they are not related If we want to prove that the average wages of skilled workers in town 1 is greater than that of town 2, we formulate the null hypotheses that there is no difference in the average wages of the skilled workers in both the towns A null hypothesis is denoted by H 0

Upload: jinoy-p-mathew

Post on 18-Dec-2015

212 views

Category:

Documents


1 download

DESCRIPTION

Research Methodology Notes for MBA

TRANSCRIPT

Hypothesis Testing

Hypothesis A hypothesis is an assumption or a statement that may or may not be true The hypothesis is tested on the basis of information obtained from a sample Instead of asking, for example, what the mean assessed value of an apartment in a multistoried building is, one may be interested in knowing whether or not the assessed value equals some particular value, say Rs.80 lakh Example whether a new drug is more effective than the existing drug based on the sample data? Example whether the proportion of smokers in a class is different from 0.30?

Null Hypothesis The hypotheses that are proposed with the intent of receiving a rejection for them are called null hypotheses This requires that we hypothesize the opposite of what is desired to be proved For example, if we want to show that sales and advertisement expenditure are related, we formulate the null hypothesis that they are not related If we want to prove that the average wages of skilled workers in town 1 is greater than that of town 2, we formulate the null hypotheses that there is no difference in the average wages of the skilled workers in both the towns A null hypothesis is denoted by H0

Alternative Hypothesis Rejection of null hypotheses leads to the acceptance of alternative hypotheses The rejection of null hypothesis indicates that the relationship between variables (e.g., sales and advertisement expenditure) have statistical significance the difference between means (e.g., wages of skilled workers in town 1 and 2) have statistical significance the difference between proportions have statistical significance The acceptance of the null hypotheses indicates that these differences are due to chance The alternative hypotheses are denoted by H1

One-tailed & Two-tailed tests A test is called one-tailed only if the null hypothesis gets rejected when a value of the test statistic falls in one specified tail of the distribution A test is called two-tailed if null hypothesis gets rejected when a value of the test statistic falls in either one or the other of the two tails of its sampling distribution

Two-tailed test Example Consider a soft drink bottling plant which dispenses soft drinks in bottles of 300 ml capacity The bottling is done through an automatic plant An overfilling of bottle (liquid content more than 300 ml) means a huge loss to the company given the large volume of sales An under filling means the customers are getting less than 300 ml of the drink when they are paying for 300 ml This could bring bad reputation to the company The company wants to avoid both overfilling and under filling Therefore, it would prefer to test the hypothesis whether the mean content of the bottles is different from 300 ml This hypothesis could be written as: H0 : = 300 ml; H1 : 300 ml The hypotheses stated above are called two-tailed hypotheses

One-tailed test Example If the concern is the overfilling of bottles, it could be stated as: H0 : = 300 ml; H1 : > 300 ml; Such hypotheses are called one-tailed hypotheses and the researcher would be interested in the upper tail (right hand tail) of the distribution If the concern is loss of reputation of the company (under filling of the bottles), the hypothesis may be stated as: H0 : = 300 ml; H1 : < 300 ml; The hypothesis stated above is also called one-tailed test and the researcher would be interested in the lower tail (left hand tail) of the distribution

Type 1 & Type 2 Error Type-1 error It is the probability of rejecting the H0 when it is true. It is denoted as . In QC, it is termed as producers risk, because it is the probability of rejecting a good lot Type-2 error It is the probability of accepting the H0 when it is false. It is denoted as . In QC, it is termed as consumers risk, because it is the probability of accepting a bad lot The expression (1 ) is called power of test To decrease the risk of committing both types of errors, one may increase the sample size

Decision based on sample H0 is trueH0 is false

Reject H0Type-1 error Correct Decision

Accept H0Correct DecisionType-2 error

Formulation of HypothesisWhile designing any hypotheses, there are a few criteria that the researcher must fulfill. These are: It must be formulated in simple, clear, and declarative form A broad hypothesis might not be empirically testable To test only one relationship between only 2 variables at a time Consumer liking for the electronic advertisement for the new diet drink will have positive impact on brand awareness of the drink High organizational commitment will lead to lower turnover intention A hypothesis must be measurable and quantifiable The validation of the hypothesis would necessarily involve testing the statistical significance of the hypothesized relation

Testing of HypothesisThe following steps are followed in the testing of a hypothesis: Setting up of a hypothesis H0& H1 Setting up of a suitable significance level - The level of significance denotes the probability of rejecting the null hypothesis when it is true. The value of varies from problem to problem, but usually it is taken as either 5 % or 1 % Determination of a test statistic This could be Z or t or F or 2 test statistic & what is to be used depends on various assumptions Determination of critical region Before a sample is drawn from the population, it is very important to specify the values of test statistic that will lead to rejection or acceptance of the null hypothesis. The one that leads to the rejection of null hypothesis is called the critical region. Given , the optimal critical region for a two-tailed test consists of that /2 % area in the RH tail of the distribution plus that /2 % in the LH of the distribution where that null hypothesis is rejected Computing the value of test-statistic Inference H0 may be rejected or accepted depending upon whether the computed value falls in the rejection or the acceptance region

Degrees of Freedom (d.f.) The d.f. is the no. of values in a calculation that we can vary Suppose that we know the meanof certain data is 25 and that the values are 20,10, 50, and one unknown value, x. Then we can determine thatx= 20 Suppose that we know the mean of a data set is 25, with values 20, 10, and two unknown values, say x & y Means (20 + 10 +x +y)/4 = 25. we have 30 + x + y = 100 or x + y = 70. With this we obtainy= 70 -x; Once we choose a value forx, the value foryis determined. This shows that there is 1 d.f. Now we'll look at a sample size of 100. If we know that the mean of this sample data is 20, but do not know the values of any of the data, then there are 99 d.f. All values must add up to a total of 20 x 100 = 2000. Once we have the values of 99 elements in the data set, then the last one can be determined If the size of the given sample is n, then the d.f. will be (n-1). In the contingency table the d.f. is calculated in a slightly different manner. If the order of CT is r x c, then the d.f. will be (r 1)(c 1) where r = # of rows & c = # of columns

2-test The chi-square test is widely used in research. For the use of chi-square test, data is required in the form of frequencies Data expressed in percentages or proportion can also be used, provided it could be converted into frequencies The majority of the applications of chi-square (2) are with discrete data Unlike the normal and t distribution, the chi-square distribution is not symmetric The values of a chi-square are greater than or equal to zero The shape of a chi-square distribution depends upon the degrees of freedom* With the increase in degrees of freedom, the distribution tends to normal

2-test applications Goodness of Fit 2-test is used to find out how well the theoretical distribution fit with the empirical distribution of observed distribution obtained from the sample data Independence of Attributes 2-test is used to find out whether 2 or more attributes are associated or not? Equality of more than two population proportions For example, the interest may be in determining whether in an organization, the proportion of the satisfied employees in 4 categories, viz., class I, class II, class III and class IV employeesis the same?