01 psychological statistics 1
TRANSCRIPT
Psychological Statistics-III
Irshadiya College of Commerce and Social Sciences, Feroke Page 1
Study Notes on
PSYCHOLOGICAL STATISTICS
SEMESTER-III
B.Sc. COUNSELING PSYCHOLOGY
Prepared by:
Noushad P.K
Lecturer in Commerce
Irshadiya College of Commerce and Social Sciences, Feroke
Department of Psychology,
Irshadiya College of Commerce and Social Sciences
Feroke, Calicut-6736 31
MODULE –I
CORRELEATION ANALYSIS
MEANING AND DEFINITION
It is a statistical tool used to describe the relationship or interdependence between two or more variables. Two or more variables are said to be correlated, if the change in one variable results in a corresponding change in the other variable. That is, when two or more variables move together (in same direction or in opposite direction), we say they are correlated. For instance, when the price of a commodity increases, its supply goes up. On the basis of the theory of correlation, one can study the comparative changes occurring in two related phenomena and their cause-effect relationship.
A.M Tuttle defines “Correlation as an analysis of the association between two or more variables”. In the words of Simpson and Kafka “Correlation analysis deals with the association between two or more variables”.
SIGNIFICANCE OF CORRELATION ANALYSIS
1. It helps to find a single figure to measure the relationship between the variables.
2. It helps to understand the economic behaviour. 3. It can be used as a basis for the study of regression 4. It helps to reduce the range of uncertainty associated with decision
making 5. It helps to know whether the correlation is significant or not. This is
possible by comparing the correlation co efficient with 6PE. If ‘r’ is more than 6 PE, the correlation is significant.
CORRELATION AND CAUSATION
For correlation, it is usually implies that the two variables should have cause-effect relationship. For example, the relationship between price and demand, price and supply, rate of interest and savings, etc.
Correlation does not always imply cause-effect relationship. For example, a higher degree of correlation between yield per acre of rice and tea may be due to the fact that both are related to the amount of rainfall.
There may be a higher degree of correlation between the variables, but it may be difficult to pinpoint as to which is the effect. For example, increase in price leads to decrease in demand. Here change in price is the cause and change in demand is the effect. But it is also possible that increased demand is due to other reasons like growth of population.
Psychological Statistics-III
Irshadiya College of Commerce and Social Sciences, Feroke Page 2
Two series showing high degree of correlation may be purely from chance also. For example, during the last decade there has been a significant increase in the sale of newspaper and crime. We can establish correlation between these two variables. But there exists no cause-effect relationship between these two factors. Such illogical correlations are known as Non sensical Correlation/Spurious Correlation
CLASSIFICATION OF CORRELATION
Positive and Negative Correlation
Correlation can be either positive or negative. Whether correlation is positive or negative would depend upon the direction in which the variables are moving.
When the value of two variables move in the same direction, correlation is said to be positive. That is, an increase in the value of one variable results an increase in the value other variable also, or, a decrease in the value of one variable leads to a decrease in other variable also. Example, correlation between price and supply
Price: 10 20 30 40 50 Supply: 80 100 150 170 200 When the value of two variables move in the opposite direction,
correlation is said to be negative. That is, an increase in the value of one variable results a decrease in the value of other variable. Example, correlation between price and demand
Price: 5 10 15 20 25 Demand: 16 10 8 6 2
Linear and Non-linear Correlation
Correlation may be linear or non-linear. Here the distinction is based upon the constancy of the ratio of change between the variables.
When the amount of change in one variable leads to a constant ratio of change in the other variable, correlation is said to be linear. In a correlation analysis, if the ratio of change between the two sets of variable is same, then it is called linear correlation. In such a case, if the values of variables are plotted on a graph paper, then a straight line is obtained. Example, if price goes up by 10%, it leads to a rise in supply by 15% each time
Price: 10 15 30 60 Supply: 50 75 150 300
When the amount of change in one variable does not bring the same ratio of change in the other variable, that is the ratio happens to be variable instead of constant, the correlation is said to be non-linear. In such a case, we shall obtain a curve, if the values of variables are plotted on a graph paper. That is why it is known as curvy-linear correlation. For example,
X: 2 4 6 10 15 Y: 8 10 18 22 26
Simple, Partial and Multiple Correlations
In a correlation analysis, if only two variables are studied (of which one is independent and the other is dependent), the correlation is said to be simple. For example, the correlation between price and demand.
Multiple correlation studies the relationship between a dependent variable and two or more independent variables. For example, the correlation between yield with both rainfall and temperature
In partial correlation, we measure the correlation between a dependent variable and one particular independent variable assuming that all other independent variables remain constant. For example, there are three variables- yield, rainfall and temperature. And each is related with the other. Then, the relationship between yield and rainfall (assuming the temperature is constant) is the partial correlation
METHODS OF STUDYING CORRELATION
1. Graphic method a) Scatter diagram b) Correlation graph
2. Algebraic methods/Mathematical methods/statistical methods/Co-efficient of correlation methods a) Karl Pearson’s Co-efficient of correlation b) Spearman’s Rank correlation method c) Concurrent deviation method
SCATTER DIAGRAM
It is also known as dot chart. It is a graphical method of studying correlation between two variables. It is a visual aid to show the presence or absence of correlation between two variables.
In scatter diagram, one of the variables is shown on the X-axis and the other on Y-axis. Each pair of values is plotted by means of a dot mark. If these dot marks show some trends either upward or downward, the two variables are said
Psychological Statistics-III
Irshadiya College of Commerce and Social Sciences, Feroke Page 3
to be correlated. If the plotted dots do not show any trend, the two variables are not correlated. The greater the scatter of the dots, the lower is the relationship
Merits of Scatter Diagram Method
1. It is a simple method. 2. It is a non-mathematical method. 3. It is very easy to understand. 4. It is not affected by the size of extreme values. 5. It is usually the first step in correlation analysis.
Demerits of Scatter Diagram Method
1. It gives only a rough idea about the correlation between variables. 2. Further algebraic treatment is not possible. 3. The exact degree of correlation between the variables cannot be easily
determined. 4. If the number of pairs of variables is either very big or very small, the
method is not easy.
CORRELATION GRAPH METHOD
Under correlation graph method the individual values of the two variables are plotted on a graph paper. Then dots relating to these variables are joined separately so as to get two curves. By examining the direction and closeness of the two curves, we can infer whether the variables are related or not. If both the curves are moving in the same direction (either upward or downward) correlation is said to be positive. If the curves are moving in the opposite directions, correlation is said to be negative.
Merits of Correlation Graph Method
1. It is a simple method. 2. It does not require mathematical calculations. 3. It is very easy to understand
Demerits of Correlation Graph Method:
1. A numerical value of correlation cannot be calculated. 2. It is only a pictorial presentation of the relationship between variables. 3. It is not possible to establish the exact degree of relationship between the
variables.
MATHEMATICAL/STATISTICAL CORRELATION/CO-EFFICIENT OF
CORRELATION
It is an algebraic method of measuring correlation. It measures the degree
or extent of correlation between two variables. It is symbolically denoted by r.
The value of correlation co-efficient can never be +1 or -1. That is, +1 and -1 are
the limits of this co-efficient. It covers:
1. Karl Pearson’s co-efficient of correlation
2. Spearman’s rank correlation
3. Concurrent deviation
Karl Pearson’s Co-Efficient of Correlation
It was developed by the reputed statistician and biologist Prof: Karl
Pearson. It is also known as product moment correlation co-efficient.
Assumptions
1. There is a linear relationship between variables.
2. The variables are affected by a large number of dependent causes so as to
form a normal distribution.
3. There is a cause-effect relationship between the variables.
Properties
1. It has a well-defined formula.
2. It is a pure number and is independent of the units of measurement.
3. It lies in between ±1
4. It is the geometric mean of the two regressions co-efficient.
5. It does not change with reference to change of origin or change of scales.
6. Co-efficient of correlation between x and y is same as that between y
and x
Methods
a) When deviations are taken from actual mean
b) When deviations are taken from assumed mean
When deviations are taken from actual mean
STEPS:
1. Take the deviations of x series from the mean of x which is denoted by x
or dx
2. Square these deviations and get total. That is, ∑ or ∑ .
Psychological Statistics-III
Irshadiya College of Commerce and Social Sciences, Feroke Page 4
3. Take the deviations of y series from the mean of y which is denoted by y
or dy
4. Square these deviations and get total. That is, ∑ or ∑ .
5. Multiply the deviations of x and y series, and get the total. That is
∑
6. Apply the formula and find correlation co-efficient.
∑
√∑ ∑
Where, x = ̅
Y= ̅
OR
∑
√∑ ∑
Where, dx = ̅
dy= ̅
Illustration-1:
Find Karl Pearson’s co-efficient of correlation from the following data.
X: 8 4 10 2 6
Y: 9 11 5 8 7
Solution:
X Y x =
( ̅)
Y =
( ̅)
x2 y2 xy
8 4
10 2 6
9 11 5 8 7
2 -2 4 -4 0
1 3 -3 0 -1
4 4
16 16 0
1 9 9 0 1
2 -6
-12 0 0
∑
∑
∑ = 40
∑ = 20
∑
Arithmetic Mean of X, ̅ = ∑
=
= 6
Arithmetic Mean of Y, ̅ = ∑
=
= 8
Correlation, ∑
√∑ ∑ =
√ =
√ =
= - 0.565
Illustration-2:
Calculate Karl Pearson’s co-efficient of correlation, from the following data?
X: 2 3 4 5 6 7 8
Y: 4 5 6 12 9 5 4
Solution:
X Y dx =
( ̅) dy =
( ̅) dx
2 dy
2 dx.dy
2 3 4 5 6 7 8
4 5 6
12 9 7 6
-3 -2 -1 0 1 2 3
-3 -2 -1 5 2 0 -1
9 4 1 0 1 4 9
9 4 1
25 4 0 1
9 4 1 0 2 0 -3
∑
∑
∑ = 28
∑ = 44
∑
Arithmetic Mean of X, ̅ = ∑
=
= 5
Arithmetic Mean of Y, ̅ = ∑
=
= 7
Correlation, ∑
√∑ ∑ =
√ =
√ =
= 0.37
When deviations are taken from assumed mean
STEPS:
1. Take the deviations of x series from the assumed mean of x which is
denoted by dx
2. Square these deviations and get total. That is, ∑ .
3. Take the deviations of y series from the assumed mean of y which is
denoted by dy
4. Square these deviations and get total. That is, ∑ .
Psychological Statistics-III
Irshadiya College of Commerce and Social Sciences, Feroke Page 5
5. Multiply the deviations of x and y series, and get the total. That is
∑
6. Apply the formula and find correlation co-efficient.
∑
(∑ )(∑ )
√∑ (∑ )
√∑
(∑ )
Where, dx = X- Assumed Mean of X
dy= Y - Assumed Mean of Y
N = Number of pairs of observations
Illustration-3:
Calculate Karl Pearson’s co-efficient of correlation from the following data.
X: 5, 10, 5, 11, 12, 4, 3, 2, 7, 6
Y: 1, 6, 2, 8, 5, 1, 4, 6, 5, 2
Solution:
X Y dx
= X – A Dy
= Y – A dx
2 dy
2 dxdy
5 10 5
11 12 4 3 2
(7) 6
1 6 2 8
(5) 1 4 6 5 2
-2 3 -2 4 5 -3 -4 -5 0 1
-4 1 -3 3 0 4 -1 1 0 3
4 9 4
16 25 9
16 25 0 1
16 1 9 9 0
16 1 1 0 9
8 3 6
12 0
12 4 -5 0 3
∑ -5 ∑ ∑ = 109
∑ = 62
∑
Correlation,
( )( )
√
√
=
√ √
=
=
= 0.52
Merits
1. It gives an idea about the co-variation of the two series
2. It indicates the direction of relationship also
3. It provides a numerical measurement of co-efficient of correlation
4. It can be used for further algebraic treatment
5. It gives a single figure to explain the accurate degree of correlation
between two variables
Demerits
1. It assumes a linear relationship between the variables. But, in real
situations, it may not be so.
2. A high degree of correlation does not mean that a close relation exists
between variables.
3. Difficult to calculate.
4. It is unduly affected by extreme values.
PROBABLE ERROR
The quantity ( )
√ is known as the standard error of correlation co-
efficient. Usually, the correlation co-efficient is calculated from samples. For
different samples drawn from the same population, the co-efficient of correlation
may vary. But, the numerical value of such variation is expected to be less than
the probable error. It is a statistical measure which measures reliability and
dependability of the values of co-efficient of correlation. If probable error is
‘added to’ or ‘subtracted from’ the co-efficient of correlation, it would give two
such limits within which we can reasonably expect the value of co-efficient of
correlation to vary.
The probable error of the co-efficient of correlation can be obtained by
applying the formula:
Probable Error = ( )
√
Psychological Statistics-III
Irshadiya College of Commerce and Social Sciences, Feroke Page 6
If the value of r is less than the probable error, it is not at all significant. If
the value of r is more than six times of the probable error, it is significant. (If the
Probable Error is not much and if the value of r is 0.5 or more, it is generally
considered to be significant)
Uses
1. It is used to determine the limits within which the population correlation
co-efficient may be expected to lie.
2. It can be used to test if an observed value of sample correlation co-
efficient is significant of any correlation in population.
Spearman’s Rank Correlation
Karl Pearson’s correlation co-efficient is used to measure the correlation
between variables which are normally distributed. If population is not normal, or
the shape of the distribution is not known, Rank correlation is used. There are
many occasions whereby the value of certain variables cannot be measured in
quantitative form. For example, intelligence, beauty, character, morality, honesty,
etc. rank correlation is used to study association between such variables. It is a
method used to study the correlation between attributes. It was developed by
the British psychologist Charles Edward Spearman in 1904.
Cases
a) Ranks are not repeating
b) Repeated ranks/Tie in rank
Ranks are not repeating
STEPS:
1. Assign ranks to attributes
2. Compare the difference of ranks which is denoted by D
3. Calculate ƩD2
4. Apply the formula, and find correlation
R = ∑
Illustration-4:
Find out Spearman’s rank correlation co-efficient from the following data
X: 60, 34, 30, 50, 45, 41, 22, 43, 42, 66, 64, 46
Y: 75, 32, 34, 40, 45, 33, 12, 30, 36, 72, 41, 57
Solution:
X Y R1 R2 D D2
60 34 40 50 45 41 22 43 42 66 64 46
75 32 34 40 45 33 12 30 36 72 41 57
3 11 10 4 6 9
12 7 8 1 2 5
1 10 8 6 4 9
12 11 7 2 5 3
2 1 2 -2 2 0 0 -4 1 -1 -3 2
4 1 4 4 4 0 0
16 1 1 9 4
∑
Rank Correlation, R = ∑
=
=
=
= 1- 0.167
= 0.833
Repeated ranks/Tie in rank
STEPS:
1. Assign ranks to attributes
2. Compare the difference of ranks which is denoted by D
3. Calculate ƩD2
4. Calculate m3 - m
5. Apply the formula, and find correlation
Psychological Statistics-III
Irshadiya College of Commerce and Social Sciences, Feroke Page 7
R = ⌊∑
( )⌋
Illustration-5:
Find out Spearman’s rank correlation co-efficient from the following data
X: 68 64 75 50 64 80 75 40 55 64
Y: 62 58 68 45 81 60 68 48 50 70
Solution:
X Y R1 R2 D D2
68 64 75 50 64 80 75 40 55 64
62 58 68 45 81 60 68 48 50 70
4 6
2.5 9 6 1
2.5 10 8 6
5 7
3.5 10 1 6
3.5 9 8 2
-1 -1 -1 -1 5 -5 -1 1 0 4
1 1 1 1
25 25 1 1 0
16
∑
m = 2, m3 – m = 23 – 2 = 8 – 2 = 6
m = 3, m3 – m = 33 – 2 = 27 – 3 = 24
m = 2, m3 – m = 23 – 2 = 8 – 2 = 6
Rank Correlation, R = ⌊
( )⌋
=
=
= 1- 0.45
= 0.55
Merits
1. In this method, the sum of the differences between R1 and R2 is always
equal to zero. So it provides a check on the calculation.
2. It does not assume normality in the universe from which samples has
been drawn.
3. It is easy to understand and apply.
4. It is the way of studying correlation between qualitative data which
cannot be measured in quantitative terms.
Demerits
1. It cannot be measured in two-way frequency tables.
2. It can be conveniently used only when n is small.
3. Further algebraic treatment is not possible.
4. It is only approximate measure as the actual values are not used.
CO-EFFICIENT OF DETERMINATION
It is the square of co-efficient of correlation. It is more useful to measure
the percentage variation in the dependent variables in relation to the
independent variable.
Co-efficient of determination = r2
Or
=
The co-efficient of determination is a much useful and better measure of
interpreting the value of r. it states what percentage of variations in the
dependent variable is explained to be the dependent variable. If the value of r is
0.8, we cannot conclude that 80% of the value of the variation in the dependent
variable is due to the variation in the independent variable. The co-efficient of
determination in this case is r2 = 0.64 which implies that only 64% of variation in
the dependent variable has been explained by the independent variable and the
remaining 36% of variation is due to other factors.
Psychological Statistics-III
Irshadiya College of Commerce and Social Sciences, Feroke Page 8
MODULE-II
NON-PARAMETRIC TESTS
DEFINITION OF STATISTICS
The word statistics has been originated from the Latin word Status or the
Italian word Statista which means political state. According to Dr. S.P Gupta,
”Statistics is the science of collection, organization, presentation, analysis and
interpretation of numerical data”. Statistics can be divided into two branches:
1. Descriptive Statistics
2. Inferential Statistics
Descriptive Statistics deals with collection of data, its presentation in various
forms (tables, graphs, diagrams, etc.) and finding averages and other measures
which would describe the data. It refers to statistical techniques used to
summarize and describe a data set and the statistics (measures) used in such
summarizes.
Inferential Statistics deals with techniques used for analysis of data, making
the estimates and drawing conclusions from limited information taken on sample
basis and testing the reliability of estimates. It is the body of statistical techniques
that deals with the question “how reliable is the conclusions or estimates that we
derive from a set of data?”
PARAMETER AND STATISTIC
Parameter is a function of population values. It is a statistical measure
derived from the population. For example, arithmetic mean of a population.
Statistic is a function of sample values. It is a statistical measure derived
from the sample. For example, arithmetic mean of a sample.
Inferential statistics tries to predict the unknown parameter from the
known statistic.
TESTING OF HYPOTHESIS
Hypothesis is a statement subject to verification. It is an assumption made
about a population parameter. Lundberg defines hypothesis as a “tentative
generalization, the validity of which remains to be tested”. It is tentative, because
its veracity can be evaluated only after it has been tested empirically. It is stated
as an affirmative statement.
Hypothesis may be null hypothesis or alternative hypothesis and
directional hypothesis or non-directional hypothesis.
Null hypothesis is the original hypothesis. It states that there is no
significant difference between sample and population regarding a particular
matter under consideration. It is denoted by H0. For example, H0: There is no
significant mean difference in the mechanical aptitude between boys and girls
(µ1=µ2). Any hypothesis other than a null hypothesis is called alternative
hypothesis. It is denoted by H1. For example, H1: There is significant difference in
the mechanical aptitude between boys and girls (µ1≠µ2)
The statement ‘boys are better than girls in mechanical aptitude’ is a
directional hypothesis, as there is a clear indication of direction of change. But the
statement ‘boys and girls differ in mechanical aptitude’ is a non-directional
hypothesis, as there is no indication of direction of change.
PROCEDURE FOR TESTING HYPOTHESIS
Following are the various steps in the test of hypothesis.
1. Set-up hypotheses
Normally, the researcher has to set two types of hypotheses, viz; a null
hypothesis and an alternative hypothesis.
2. Set-up a suitable level of significance
The probability of rejecting a null hypothesis when it is true is known as
the level of significance. In other words, it is the probability of Type-I
error. Generally the level of significance is fixed at 5% or 1% (0.05 or 0.01).
Level of significance is denoted by α
3. Decide a test criterion.
The test criterion may be z-test, f-test, Ҳ2-test, etc
4. Determine the degree of freedom
Degree of freedom is defined as the number of independent observations
which is obtained by subtracting the number of constraints from the total
number of observations. That is, degree of freedom = Total number of
observations – Number of constraints.
Psychological Statistics-III
Irshadiya College of Commerce and Social Sciences, Feroke Page 9
5. Calculation of test statistic
Test statistic can be calculated by using the formula, Difference/Standard
Error
6. Obtain table value
Table value is obtained by considering both the level of significance and
the degree of freedom.
7. Making decision
The decision may be either to accept or to reject the null hypothesis. If the
calculated value of test statistic is more than the table value, we reject H0
and accept H1. If the calculated value of test statistic is less than the table
value, we accept H0 and reject H1.
TYPE-I AND TYPE-II ERROR
While testing a hypothesis, the decision is to accept or reject a hypothesis.
Therefore, there are four possibilities of decisions:
1. Accepting a null hypothesis when it is true
2. Rejecting a null hypothesis when it is false
3. Rejecting a null hypothesis when it is true
4. Accepting a null hypothesis when it is false
The first and second cases are correct and the third and fourth cases are
errors. The third case is known as Type-I error and the fourth case is known as
Type-II error. That is the error of rejecting H0 when it is true is Type-I error, and
the error of accepting H0 when it is false is Type-II error. Type-II error is more
serious error.
The probability of Type-I error, that is rejecting a null hypothesis when it is
true is known as level of significance. As the probability of Type-I error decreases,
probability of Type-II error increases and vice versa.
REJECTION REGION AND ACCEPTANCE REGION
The entire area under a normal curve may be divided into two parts. They
are:
1. Rejection region, and
2. Acceptance region
Rejection Region
It is the area which corresponds to the predetermined level of
significance. If the computed value of the test statistic falls in the rejection region,
we reject the null hypothesis. Rejection region is also known as critical region. It is
denoted by α.
Acceptance Region
It is the area which corresponds to 1-α. If the computed value of the test
statistic falls in the acceptance region, we accept the null hypothesis
TWO-TAILED TEST AND ONE-TAILED TEST
Two-tailed Test
A two tailed test is one in which we reject the null hypothesis, if the
computed value of the test statistic is significantly greater than or lower than the
table value. In two-tailed test, the rejection region is represented by both tails,
that is left and right tails. If we are testing the hypothesis at 5% level of
significance, the size of the acceptance region is 0.95 and the size of the rejection
region is 0.05 on both sides together. So, if the computed value of test statistic
falls either in the left tail or in the right tail, the null hypothesis is rejected.
For example, if we want to test the null hypothesis that the average
height of people in the population is 156 cm. Then the rejection would be on both
sides, since the null hypothesis is rejected if the average height in the sample is
much more than 156 cm or much less than 156 cm.
One-tailed Tests
In one-tailed test, the rejection region is represented by one tail, which
may be either left tail or right tail. For example, if we want to test the null
hypothesis that average height of people in the population is more than 156 cm.
then the rejection area would be on the right tail only, since the null hypothesis is
rejected if the average height in the sample is much less than 156 cm.
Similarly, if we want to test the null hypothesis that average height of
people in the population is less than 156 cm. Then the rejection area would be on
the left tail only, since the null hypothesis is rejected if the average height in the
sample is much more than 156 cm.
Psychological Statistics-III
Irshadiya College of Commerce and Social Sciences, Feroke Page 10
TEST STATISTICS
The decision to accept or to reject a null hypothesis is made on the basis
of a statistic computed from the sample. Such a statistic is called test statistic. Test
statistic can be classified into two groups:
1. Parametric tests, and
2. Non parametric tests
Parametric Tests
The statistical tests based on the assumption that the population or
population parameter is normally distributed are called parametric tests. The
important parametric tests are:
a) z-test
b) f-test
c) t-test
Non Parametric Tests
It is a test which is not concerned with testing of parameters and does not
depend on the particular form of the distribution of the population. It can be
defined as a distribution free statistical test where assumptions are fewer than
those associated with parametric test. It is used when the researcher concludes
that a parametric test is not applicable.
Assumptions of Non-Parametric Test
1. The sample observations are independent
2. The variables are continuous
3. Sample drawn is a random sample
4. Observations are measured on ordinal scale
Merits of Non-Parametric Test
1. Simple and easy to apply
2. There is no assumption about the probability distribution of the
population
3. Non restriction regarding the size of sample
4. It can be used even if the sample is small
Types of Non Parametric Tests
1. Chi-square test (Ҳ2-test)
2. Sign test
3. Signed rank test
4. Rank sum test
5. Runs test
CHI-SQUARE TEST (Ҳ2-TEST)
It is a statistical test which explains the significance of difference between
a set of observed frequencies and a set of corresponding theoretical frequencies
under certain assumptions. It is a test which is not concerned with testing of
parameters and does not depend on the particular form of the distribution of the
population. It was developed by Prof: Karl Pearson in 1900.
Characteristics of Chi-Square Test
1. It is a non-parametric test
2. It is a distribution-free test
3. It is easy to evaluate chi-square test statistic
4. It analyses the difference between a set of observed frequencies and a set
of corresponding expected frequencies
Uses/Applications of Chi-Square Test
1. It is useful for the test of independence of attributes: Chi-square test can
be used to find out whether two attributes are associated or not.
2. It is useful for the test of goodness of fit: Chi-square test can be used to
ascertain how well the theoretical distribution fit the data.
3. It is useful for the testing of homogeneity: Test of homogeneity is
concerned with whether different samples come from the same
population.
4. It is useful for the testing given population variance: It helps to test
whether given population variable is acceptable on the basis of samples
drawn from that population.
CHI-SQUARE TEST FOR GOODNESS OF FIT
Chi-square test is used for testing hypothesis related to sample
proportions with respect to the corresponding population properties. Chi-square
Psychological Statistics-III
Irshadiya College of Commerce and Social Sciences, Feroke Page 11
test for goodness of fit determines how well they obtained sample proportions fit
the population proportions specified by the null hypothesis
Steps:
1. Set-up hypotheses
In test of goodness of fit, the hypotheses will be set as follows.
H0: There is goodness of fit between expected frequencies and observed
frequencies.
H1: There is no goodness of fit between expected frequencies and
observed frequencies.
2. Set-up a suitable level of significance
Generally, the level of significance is fixed at 5% or 1%.
3. Decide a test criterion.
The test criterion will be chi-test
4. Determine the degree of freedom
Degree of freedom C-1, where C stands for the number of categories
5. Calculation of test statistic
Test statistic, Ҳ2 = [∑(O-E)2]/E
Where, O is the observed frequencies, and E is the expected frequencies
6. Obtain table value
Table value is obtained by considering both level of significance and
degree of freedom.
7. Making decision
The decision may be either to accept or to reject the null hypothesis. If the
calculated value of test statistic is more than the table value, we reject H0
and accept H1. If the calculated value of test statistic is less than the table
value, we accept H0 and reject H1
CHI-SQUARE TEST OF INDEPENDENCE
Chi-square test is used for testing whether the two variables associated or
not.
Steps:
1. Set-up hypotheses
In test of independence, the hypotheses will be set as follows.
H0: The two attributes are independent.
H1: The two attributes are dependent.
2. Set-up a suitable level of significance
Generally, the level of significance is fixed at 5% or 1%.
3. Determine the degree of freedom
Degree of freedom (R-1, C-1), where R stands for Number of rows, and C
stands for the number of columns.
4. Decide a test criterion.
The test criterion will be chi-test
5. Calculation of test statistic
Test statistic, Ҳ2 = [∑(O-E)2]/E
Where, O is the observed frequencies, and E is the expected frequencies
6. Obtain table value
Table value is obtained by considering both level of significance and
degree of freedom.
7. Making decision
The decision may be either to accept or to reject the null hypothesis. If the
calculated value of test statistic is more than the table value, we reject H0
and accept H1. If the calculated value of test statistic is less than the table
value, we accept H0 and reject H1
Contingency Table:
A contingency table is a frequency table in which a sample from the
population is classified according to two or more attributes, which are divided into
two or more column. When there are only two divisions for each attributes, the
contingency table is known as 2X2 contingency table. The frequencies appearing in
the contingency table are known as cell frequencies.
For example, a 2X2 contingency table based on the two attributes
smoking and drinking is:
Smokers Not smokers Total
Drinkers 40 30 70
Not drinkers 4 24 28
Psychological Statistics-III
Irshadiya College of Commerce and Social Sciences, Feroke Page 12
Total 44 54 98
Calculation of Expected Frequencies:
Let a, b, c and d are the observed frequencies, and it is shown In the form
of a contingency table as follows:
Column 1 Column 2 Total
Row 1 A B f1
Row 2 C D f2
Total f3 f3 N
Then the expected frequencies are:
(f1 X f3)/N, (f1 X f4)/N, (f2 X f3)/N, (f2 X f4)/N, OR
(a + b) (a + c), (a + b) (b + d), (c + d) (a + c), (c + d) (b + d)
N N N N
SIGN TEST
Sign test is used to test whether the two populations are identical or not.
It is used in the situations, where t-test cannot be used. It is based on the direction
of the plus or minus signs of observations, and not on their numerical magnitudes.
The sign test may be:
1. One Sample Sign Test, and
2. Two Sample Sign Test
One Sample Sign Test
It is a very simple non-parametric test applicable when:
1) Sample is taken from a continuous population
2) P (Sample value ˂ Mean) = ½ and P (Sample value ˃ Mean) = ½
Steps:
1. Set-up hypotheses
In One Sample Sign Test, the hypotheses will be set as follows.
H0: P = ½
H1: P ≠ ½
2. Set-up a suitable level of significance
Generally, the level of significance is fixed at 5% or 1%.
3. Decide a test criterion.
The test criterion will be One Sample Sign Test
4. Determine the degree of freedom
Degree of freedom is infinity.
5. Calculation of test statistic
Test statistic (p-P)/S.E
Where, p is the proportion of plus signs out of the total signs, P = ½, and
S.E = √ (PQ)/n, where Q = 1 – P
6. Obtain table value
Table value is obtained by considering both level of significance and
degree of freedom.
7. Making decision
The decision may be either to accept or to reject the null hypothesis. If the
calculated value of test statistic is more than the table value, we reject H0
and accept H1. If the calculated value of test statistic is less than the table
value, we accept H0 and reject H1
Illustration-1:
In a four round golf play scores of 11 professionals are 202, 210, 200, 203, 193,
203, 204, 195, 199, 202, and 201. Use one sample sign test at 5% level of
significance to test the null hypothesis that professional golfer’s average is 204.
Solution
X x-204
202 -
210 +
200 -
203 -
193 -
203 -
Psychological Statistics-III
Irshadiya College of Commerce and Social Sciences, Feroke Page 13
204 ….
195 -
199 -
202 -
201 -
H0: μ = 204
H1: μ ≠ 204
The level of significance is fixed at 5%.
The test criterion is One Sample Sign Test
Degree of freedom is infinity.
Test statistic = (p-P)/S.E
Where, p = 1/10, and P = ½
S.E =√
, that is, √
= √
Therefore, test statistic =
√ = -2.53
Table value at 5% level of significance and infinity degree of freedom is 1.96
As the calculated value of test statistic is more than (numerically) the table value,
we reject H0 and accept H1. That is, H1: μ ≠ 204
Two Sample Sign Test
Suppose X and Y are two variables and their n values are known. Then we
get n pair of values, first value of each pair being a value of X and the second is
that of Y. that is, if (x1, y1) is a pair, then X1 belongs to X and y1 belongs to Y.
In such cases, each pair can be replaced by + or – sign. If in a pair, first
value is greater than second value we put + sign. If first value is less than second
value we put - sign. If both are equal concerning value is discarded.
Steps:
1. Set-up hypotheses
In Two Sample Sign Test, the hypotheses will be set as follows.
H0: P = ½
H1: P ≠ ½
2. Set-up a suitable level of significance
Generally, the level of significance is fixed at 5% or 1%.
3. Decide a test criterion.
The test criterion will be Two Sample Sign Test
4. Determine the degree of freedom
Degree of freedom is infinity.
5. Calculation of test statistic
Test statistic = ( )
√( )
Where, p is the proportion of plus signs out of the total signs, and n is the
number of pairs compared.
6. Obtain table value
Table value is obtained by considering both level of significance and
degree of freedom.
7. Making decision
The decision may be either to accept or to reject the null hypothesis. If the
calculated value of test statistic is more than the table value, we reject H0
and accept H1. If the calculated value of test statistic is less than the table
value, we accept H0 and reject H1
Illustration-2:
The following are the numbers of tickets issued by two sales men on 11 days.
Sales man I: 7, 10, 14, 12, 6, 9, 11, 13, 7, 6, 10
Sales man II: 10, 13, 14, 11, 10, 7, 15, 11, 10, 9, 8
Use two Sample Sign Test at 1% level of significance to test the null hypothesis
that on the average the two sales men issue equal number of tickets
Solution
X Y Sign
7 10 -
10 13 -
Psychological Statistics-III
Irshadiya College of Commerce and Social Sciences, Feroke Page 14
14 14 …..
12 11 +
6 10 -
9 7 +
11 15 -
13 11 +
7 10 -
6 9 -
10 8 +
H0: μ = ½
H1: μ ≠ ½
The level of significance is fixed at 1%.
The test criterion is Two Sample Sign Test
Degree of freedom is infinity.
Test statistic = ( )
√( )
= ( )
√( )
= -0.63
Table value at 1% level of significance and infinity degree of freedom is 2.576
As the calculated value of test statistic is less than (numerically) the table value,
we accept H0, that is H0: μ = ½
SIGNED RANK TEST/WILCOXON MATCHED-PAIRS TEST
Signed Rank Test was originally proposed by Frank Wilcoxon in 1945. It is
a test used to evaluate the difference between the magnitude and signs of paired
observations. It can be used instead of T-test to produce a null hypothesis in cases
when the population does not conform to normal distribution.
CASE-I: When number of matched pairs (n) is less than 25
Steps:
1. Set up hypothesis
In Signed Rank Test, the hypothesis will be set as follows.
H0: There is no significant difference between two samples.
H1: There is significant difference between two samples.
2. Set up a suitable level of significance
Generally, the level of significance is fixed at 5% or 1%.
3. Determine a test criterion
The test criterion will be Signed Rank Test.
4. Determine the degree of freedom
Degree of freedom is n.
5. Calculation of test statistic
Test statistic, T is lower of sum of ranks with sign.
6. Obtain table value
Table value is obtained by considering both level of significance and
degree of freedom
7. Making decision
The decision may be either to accept or to reject the null hypothesis. If the
calculated value of test statistic is more than the table value, we reject
the H0 and accept H1. If the calculated value of test statistic is less than the
table value, we accept the H0 and reject H1.
Illustration-3:
Given below are 13 pairs of values showing the performance of two machines,
Test whether there is difference between the performances. Use Wilcoxon
Matched-Pairs Test.
Machine A
73 43 47 53 58 47 52 58 38 61 56 56 34 55 65 75
Machine B
51 41 43 41 47 32 24 58 43 53 52 57 44 57 40 68
Solution
Machine A Machine B d
Difference Rank with sign
Psychological Statistics-III
Irshadiya College of Commerce and Social Sciences, Feroke Page 15
73 51 22 13 …
43 41 2 2.5 …
47 43 4 4.5 …
53 41 12 11 …
58 47 11 10 …
47 32 15 12 …
52 24 28 15 …
58 58 0 … …
38 43 -5 … -6
61 53 8 8 …
56 52 4 4.5 …
56 57 -1 … -1
34 44 -10 … -9
55 57 -2 … -2.5
65 40 25 14 …
75 68 7 7 …
TOTAL 101.5 -18.5
H0: There is no significant difference between the performances of two machines.
H1: There is significant difference between the performances of two machines.
Set up a suitable level of significance
Here, the level of significance is fixed at 5%.
The test criterion is Signed Rank Test.
Degree of freedom is n. That is 15. As d=0 for the 8th pair, it is not considered.
Test statistic, T=18.5 (Lower of 101.5 and 18.5).
Table value at 5% level of significance and 15 degree of freedom is 25
Since the calculated value of test statistic is less than the table value, we accept
the H0. That is, there is no significant difference between the performances of two
machines
CASE-II: When number of matched pairs (n) is greater than 25
Steps:
1. Set up hypothesis
In Signed Rank Test, the hypothesis will be set as follows.
H0: There is no significant difference between two samples.
H1: There is significant difference between two samples.
2. Set up a suitable level of significance
Generally, the level of significance is fixed at 5% or 1%.
3. Determine a test criterion
The test criterion will be Signed Rank Test.
4. Determine the degree of freedom
Degree of freedom is infinity.
5. Calculation of test statistic
Test statistic, Z = (T-µ)/σ, where T is lower of sum of ranks with sign, µ = [n
(n+1)]/4, and σ = √ [n (n+1) (2n + 1)]/24
6. Obtain table value
Table value is obtained by considering both level of significance and
degree of freedom
7. Making decision
The decision may be either to accept or to reject the null hypothesis. If the
calculated value of test statistic is more than the table value, we reject
the H0 and accept H1. If the calculated value of test statistic is less than the
table value, we accept the H0 and reject H1.
Illustarion-4:
The following are the weights in kilo grams, before and after of 26 babies who
stayed on a diet for some weeks.
Before: 7.0, 3.5, 2.1, 1.6, 7.5, 6.3, 7.0, 5.4 , 7.7, 8.2, 6.8, 1.9, 1.3, 7.2, 7.8, 1.7, 2.4,
3.5, 4.5, 8.0, 1.5, 2.0, 5.8, 6.5, 3.5, 5.2
After: 7.9, 6.2, 9.0, 3.7, 3.5, 1.4 , 2.6, 3.2, 9.0, 5.4, 8.5, 4.4, 8.3, 9.0, 9.2, 3.2, 3.4
2.8, 3.4, 7.9, 3.5, 3.2, 6.2, 6.3, 3.0, 6.8
Psychological Statistics-III
Irshadiya College of Commerce and Social Sciences, Feroke Page 16
Solution
Before After d
Difference Rank with sign
7.0 7.9 -0.9 … -6
3.5 6.2 -2.7 … -20
2.1 9.0 -6.9 … -25
1.6 3.7 -2.1 … -17
7.5 3.5 +4.0 22 …
6.3 1.4 +4.9 24 …
7.0 2.6 +4.4 23 …
5.4 3.2 2.2 18 …
7.7 9.0 -1.3 … -10
8.2 5.4 +2.8 21 …
6.8 8.5 +1.7 … -14
1.9 4.4 -2.5 … -19
1.3 8.3 -7.0 … -26
7.2 9.0 -1.8 … -15
7.8 9.2 -1.4 … -11
1.7 3.2 -1.5 … -12
2.4 3.4 -1.0 … -7
3.5 2.8 +0.7 5 …
4.5 3.4 +1.1 8 …
8.0 7.9 +0.1 1 …
1.5 3.5 -2.0 … -16
2.0 3.2 -1.2 … -9
5.8 6.2 -0.4 … -3
6.5 6.3 +0.2 2 …
3.5 3.0 +0.5 4 …
5.2 6.8 -1.6 … -13
TOTAL 128 -223
H0: There is no significant difference between the weights of babies before and
after the diet.
H1: There is significant difference between the weights of babies before and after
the diet.
Here, the level of significance is fixed at 5%.
The test criterion is Signed Rank Test.
Degree of freedom is infinity.
Test statistic, Z= (T-µ)/σ
T = 128, that is, the lowest of 128 and 223
µ = [n (n+1)]/2, that is = [26 X (26 + 1)]/4, = 175.5
σ = √ [n (n+1) (2n + 1)]/24, that is = √ [26 (26+1) (2 X 26 + 1)]/24, = 39.37
Therefore Z= (128 – 175.5)/39.37, that is = -1.21
Table value at 5% level of significance and infinity degree of freedom is 1.96
Since the calculated value of test statistic is less than the table value, we accept
the H0. That is, there is no significant difference between the weights of babies
before and after the diet.
MANN–WHITNEY–WILCOXON U TEST
It is also called the Wilcoxon rank-sum test (WRS), or Wilcoxon–Mann–
Whitney test. It is a nonparametric test of the null hypothesis that two samples
come from the same population against an alternative hypothesis, especially that
a particular population tends to have larger values than the other. It can be
applied on unknown distributions contrary to t-test which has to be applied only
on normal distributions, and it is nearly as efficient as the t-test on normal
distributions.
Let x1, x2, …..,xn be the values of X variable and y1, y2, ….., yn be the values
of Y variable. Let the values of X form a sample independent of the sample formed
by values of Y. we want to test whether the two samples have come from two
identical populations. Let the probability function of X be f1 (x) and that of y be f2
(y).
Psychological Statistics-III
Irshadiya College of Commerce and Social Sciences, Feroke Page 17
Steps:
1. Set up hypothesis
In Mann–Whitney–Wilcoxon U Test, the hypothesis will be set as follows:
H0: The populations are identical, that is f1 (x) = f2 (y)
H1: The populations are not identical, that is f1 (x) ≠ f2 (y).
2. Set up a suitable level of significance
Generally, the level of significance is fixed at 5% or 1%.
3. Determine a test criterion
The test criterion will be Mann–Whitney–Wilcoxon U Test.
4. Determine the degree of freedom
Degree of freedom is infinity.
5. Calculation of test statistic
Test statistic, t =(µ - U)/S.E, where µ = (n1 X n2)/2, U = n1 X n2 + [ n1 X (n1 +
1)/2] – R1, and S.E = √*n1 X n2 (n1 + n2 + 1)]/12
6. Obtain table value
Table value is obtained by considering both level of significance and
degree of freedom
7. Making decision
The decision may be either to accept or to reject the null hypothesis. If the
calculated value of test statistic is more than the table value, we reject
the H0 and accept H1. If the calculated value of test statistic is less than the
table value, we accept the H0 and reject H1.
Illustration-5:
There are two samples. First contains the observations54, 39, 70, 58, 47, 40, 74,
49, 74, 75, 61, and 79. The second contains 45, 41, 62, 53, 33, 45, 71, 42, 68, 73, 54,
and 73. Apply Mann–Whitney–Wilcoxon U Test at 5% level of significance that
they come from populations with the same mean?
Solution
Values
(Ascending order) Rank
Sample
I or II
33 1 II
39 2 I
40 3 I
41 4 II
42 5 II
45 6.5 II
45 6.5 II
47 8 I
49 9 I
53 10 II
54 11.5 II
54 11.5 I
58 13 I
61 14 I
62 15 II
68 16 II
70 17 I
71 18 II
73 19.5 II
73 19.5 II
74 21.5 I
74 21.5 I
75 23 I
79 24 I
H0: The populations are identical, that is f1 (x) = f2 (y)
H1: The populations are not identical, that is f1 (x) ≠ f2 (y).
The level of significance is fixed at 5%.
The test criterion is Mann–Whitney–Wilcoxon U Test.
Degree of freedom is infinity.
Test statistic, t = (µ - U)/S.E
µ = (n1 X n2)/2, that is, = (12 X 12)/2 = 72
R1 = Sum of ranks assigned to the values in sample I, that is = 167.5
Psychological Statistics-III
Irshadiya College of Commerce and Social Sciences, Feroke Page 18
U = n1 X n2 + [ n1 X (n1 + 1)/2] – R1, that is, 12 X 12 + [ 12 X (12 + 1)/2] – 167.5, =
54.5
S.E = √*n1 X n2 (n1 + n2 + 1)+/12, that is, √*12 X 12 (12 + 12 + 1)+/12, = 17.32
Therefore, t = (72–54.5)/17.32 = 1.01
Table value at 5% level of significance and degree of freedom infinity is 1.96
As the calculated value of test statistic is less than the table value, we accept the
H0 and reject H1, that is, the populations are identical, that is f1 (x) = f2 (y)
********************