01 psychological statistics 1

Psychological Statistics-III

Irshadiya College of Commerce and Social Sciences, Feroke

Study Notes on

PSYCHOLOGICAL STATISTICS

SEMESTER-III

B.Sc. COUNSELING PSYCHOLOGY

Prepared by:

Noushad P.K

Lecturer in Commerce


Department of Psychology,

Irshadiya College of Commerce and Social Sciences

Feroke, Calicut-6736 31

MODULE –I

CORRELEATION ANALYSIS

MEANING AND DEFINITION

It is a statistical tool used to describe the relationship or interdependence between two or more variables. Two or more variables are said to be correlated, if the change in one variable results in a corresponding change in the other variable. That is, when two or more variables move together (in same direction or in opposite direction), we say they are correlated. For instance, when the price of a commodity increases, its supply goes up. On the basis of the theory of correlation, one can study the comparative changes occurring in two related phenomena and their cause-effect relationship.

A.M Tuttle defines “Correlation as an analysis of the association between two or more variables”. In the words of Simpson and Kafka “Correlation analysis deals with the association between two or more variables”.

SIGNIFICANCE OF CORRELATION ANALYSIS

1. It helps to find a single figure to measure the relationship between the variables.

2. It helps to understand the economic behaviour. 3. It can be used as a basis for the study of regression 4. It helps to reduce the range of uncertainty associated with decision

making 5. It helps to know whether the correlation is significant or not. This is

possible by comparing the correlation co efficient with 6PE. If ‘r’ is more than 6 PE, the correlation is significant.

CORRELATION AND CAUSATION

For correlation, it is usually implies that the two variables should have cause-effect relationship. For example, the relationship between price and demand, price and supply, rate of interest and savings, etc.

Correlation does not always imply cause-effect relationship. For example, a higher degree of correlation between yield per acre of rice and tea may be due to the fact that both are related to the amount of rainfall.

There may be a higher degree of correlation between the variables, but it may be difficult to pinpoint as to which is the effect. For example, increase in price leads to decrease in demand. Here change in price is the cause and change in demand is the effect. But it is also possible that increased demand is due to other reasons like growth of population.



Two series showing high degree of correlation may be purely from chance also. For example, during the last decade there has been a significant increase in the sale of newspaper and crime. We can establish correlation between these two variables. But there exists no cause-effect relationship between these two factors. Such illogical correlations are known as Non sensical Correlation/Spurious Correlation

CLASSIFICATION OF CORRELATION

Positive and Negative Correlation

Correlation can be either positive or negative. Whether correlation is positive or negative would depend upon the direction in which the variables are moving.

When the value of two variables move in the same direction, correlation is said to be positive. That is, an increase in the value of one variable results an increase in the value other variable also, or, a decrease in the value of one variable leads to a decrease in other variable also. Example, correlation between price and supply

Price: 10 20 30 40 50 Supply: 80 100 150 170 200 When the value of two variables move in the opposite direction,

correlation is said to be negative. That is, an increase in the value of one variable results a decrease in the value of other variable. Example, correlation between price and demand

Price: 5 10 15 20 25 Demand: 16 10 8 6 2

Linear and Non-linear Correlation

Correlation may be linear or non-linear. Here the distinction is based upon the constancy of the ratio of change between the variables.

When the amount of change in one variable leads to a constant ratio of change in the other variable, correlation is said to be linear. In a correlation analysis, if the ratio of change between the two sets of variable is same, then it is called linear correlation. In such a case, if the values of variables are plotted on a graph paper, then a straight line is obtained. Example, if price goes up by 10%, it leads to a rise in supply by 15% each time

Price: 10 15 30 60 Supply: 50 75 150 300

When the amount of change in one variable does not bring the same ratio of change in the other variable, that is the ratio happens to be variable instead of constant, the correlation is said to be non-linear. In such a case, we shall obtain a curve, if the values of variables are plotted on a graph paper. That is why it is known as curvy-linear correlation. For example,

X: 2 4 6 10 15 Y: 8 10 18 22 26

Simple, Partial and Multiple Correlations

In a correlation analysis, if only two variables are studied (of which one is independent and the other is dependent), the correlation is said to be simple. For example, the correlation between price and demand.

Multiple correlation studies the relationship between a dependent variable and two or more independent variables. For example, the correlation between yield with both rainfall and temperature

In partial correlation, we measure the correlation between a dependent variable and one particular independent variable assuming that all other independent variables remain constant. For example, there are three variables- yield, rainfall and temperature. And each is related with the other. Then, the relationship between yield and rainfall (assuming the temperature is constant) is the partial correlation

METHODS OF STUDYING CORRELATION

1. Graphic method a) Scatter diagram b) Correlation graph

2. Algebraic methods/Mathematical methods/statistical methods/Co-efficient of correlation methods a) Karl Pearson’s Co-efficient of correlation b) Spearman’s Rank correlation method c) Concurrent deviation method

SCATTER DIAGRAM

It is also known as dot chart. It is a graphical method of studying correlation between two variables. It is a visual aid to show the presence or absence of correlation between two variables.

In scatter diagram, one of the variables is shown on the X-axis and the other on Y-axis. Each pair of values is plotted by means of a dot mark. If these dot marks show some trends either upward or downward, the two variables are said



to be correlated. If the plotted dots do not show any trend, the two variables are not correlated. The greater the scatter of the dots, the lower is the relationship

Merits of Scatter Diagram Method

1. It is a simple method. 2. It is a non-mathematical method. 3. It is very easy to understand. 4. It is not affected by the size of extreme values. 5. It is usually the first step in correlation analysis.

Demerits of Scatter Diagram Method

1. It gives only a rough idea about the correlation between variables. 2. Further algebraic treatment is not possible. 3. The exact degree of correlation between the variables cannot be easily

determined. 4. If the number of pairs of variables is either very big or very small, the

method is not easy.

CORRELATION GRAPH METHOD

Under correlation graph method the individual values of the two variables are plotted on a graph paper. Then dots relating to these variables are joined separately so as to get two curves. By examining the direction and closeness of the two curves, we can infer whether the variables are related or not. If both the curves are moving in the same direction (either upward or downward) correlation is said to be positive. If the curves are moving in the opposite directions, correlation is said to be negative.

Merits of Correlation Graph Method

1. It is a simple method. 2. It does not require mathematical calculations. 3. It is very easy to understand

Demerits of Correlation Graph Method:

1. A numerical value of correlation cannot be calculated. 2. It is only a pictorial presentation of the relationship between variables. 3. It is not possible to establish the exact degree of relationship between the

variables.

MATHEMATICAL/STATISTICAL CORRELATION/CO-EFFICIENT OF

CORRELATION

It is an algebraic method of measuring correlation. It measures the degree

or extent of correlation between two variables. It is symbolically denoted by r.

The value of correlation co-efficient can never be +1 or -1. That is, +1 and -1 are

the limits of this co-efficient. It covers:

1. Karl Pearson’s co-efficient of correlation

2. Spearman’s rank correlation

3. Concurrent deviation

Karl Pearson’s Co-Efficient of Correlation

It was developed by the reputed statistician and biologist Prof: Karl

Pearson. It is also known as product moment correlation co-efficient.

Assumptions

1. There is a linear relationship between variables.

2. The variables are affected by a large number of dependent causes so as to

form a normal distribution.

3. There is a cause-effect relationship between the variables.

Properties

1. It has a well-defined formula.

2. It is a pure number and is independent of the units of measurement.

3. It lies in between ±1

4. It is the geometric mean of the two regressions co-efficient.

5. It does not change with reference to change of origin or change of scales.

6. Co-efficient of correlation between x and y is same as that between y

and x

Methods

a) When deviations are taken from actual mean

b) When deviations are taken from assumed mean

When deviations are taken from actual mean

STEPS:

1. Take the deviations of x series from the mean of x which is denoted by x

or dx

2. Square these deviations and get total. That is, ∑ or ∑ .



3. Take the deviations of y series from the mean of y which is denoted by y

or dy

4. Square these deviations and get total. That is, ∑ or ∑ .

5. Multiply the deviations of x and y series, and get the total. That is

∑

6. Apply the formula and find correlation co-efficient.

∑

√∑ ∑

Where, x = ̅

Y= ̅

OR

∑

√∑ ∑

Where, dx = ̅

dy= ̅

Illustration-1:

Find Karl Pearson’s co-efficient of correlation from the following data.

X: 8 4 10 2 6

Y: 9 11 5 8 7

Solution:

X Y x =

( ̅)

Y =

( ̅)

x2 y2 xy

8 4

10 2 6

9 11 5 8 7

2 -2 4 -4 0

1 3 -3 0 -1

4 4

16 16 0

1 9 9 0 1

2 -6

-12 0 0

∑

∑

∑ = 40

∑ = 20

∑

Arithmetic Mean of X, ̅ = ∑

=

= 6

Arithmetic Mean of Y, ̅ = ∑

=

= 8

Correlation, ∑

√∑ ∑ =

√ =

√ =

= - 0.565

Illustration-2:

Calculate Karl Pearson’s co-efficient of correlation, from the following data?

X: 2 3 4 5 6 7 8

Y: 4 5 6 12 9 5 4

Solution:

X Y dx =

( ̅) dy =

( ̅) dx

2 dy

2 dx.dy

2 3 4 5 6 7 8

4 5 6

12 9 7 6

-3 -2 -1 0 1 2 3

-3 -2 -1 5 2 0 -1

9 4 1 0 1 4 9

9 4 1

25 4 0 1

9 4 1 0 2 0 -3

∑

∑

∑ = 28

∑ = 44

∑

Arithmetic Mean of X, ̅ = ∑

=

= 5

Arithmetic Mean of Y, ̅ = ∑

=

= 7

Correlation, ∑

√∑ ∑ =

√ =

√ =

= 0.37

When deviations are taken from assumed mean

STEPS:

1. Take the deviations of x series from the assumed mean of x which is

denoted by dx

2. Square these deviations and get total. That is, ∑ .

3. Take the deviations of y series from the assumed mean of y which is

denoted by dy

4. Square these deviations and get total. That is, ∑ .



5. Multiply the deviations of x and y series, and get the total. That is

∑

6. Apply the formula and find correlation co-efficient.

∑

(∑ )(∑ )

√∑ (∑ )

√∑

(∑ )

Where, dx = X- Assumed Mean of X

dy= Y - Assumed Mean of Y

N = Number of pairs of observations

Illustration-3:

Calculate Karl Pearson’s co-efficient of correlation from the following data.

X: 5, 10, 5, 11, 12, 4, 3, 2, 7, 6

Y: 1, 6, 2, 8, 5, 1, 4, 6, 5, 2

Solution:

X Y dx

= X – A Dy

= Y – A dx

2 dy

2 dxdy

5 10 5

11 12 4 3 2

(7) 6

1 6 2 8

(5) 1 4 6 5 2

-2 3 -2 4 5 -3 -4 -5 0 1

-4 1 -3 3 0 4 -1 1 0 3

4 9 4

16 25 9

16 25 0 1

16 1 9 9 0

16 1 1 0 9

8 3 6

12 0

12 4 -5 0 3

∑ -5 ∑ ∑ = 109

∑ = 62

∑

Correlation,

( )( )

√

√

=

√ √

=

=

= 0.52

Merits

1. It gives an idea about the co-variation of the two series

2. It indicates the direction of relationship also

3. It provides a numerical measurement of co-efficient of correlation

4. It can be used for further algebraic treatment

5. It gives a single figure to explain the accurate degree of correlation

between two variables

Demerits

1. It assumes a linear relationship between the variables. But, in real

situations, it may not be so.

2. A high degree of correlation does not mean that a close relation exists

between variables.

3. Difficult to calculate.

4. It is unduly affected by extreme values.

PROBABLE ERROR

The quantity ( )

√ is known as the standard error of correlation co-

efficient. Usually, the correlation co-efficient is calculated from samples. For

different samples drawn from the same population, the co-efficient of correlation

may vary. But, the numerical value of such variation is expected to be less than

the probable error. It is a statistical measure which measures reliability and

dependability of the values of co-efficient of correlation. If probable error is

‘added to’ or ‘subtracted from’ the co-efficient of correlation, it would give two

such limits within which we can reasonably expect the value of co-efficient of

correlation to vary.

The probable error of the co-efficient of correlation can be obtained by

applying the formula:

Probable Error = ( )

√



If the value of r is less than the probable error, it is not at all significant. If

the value of r is more than six times of the probable error, it is significant. (If the

Probable Error is not much and if the value of r is 0.5 or more, it is generally

considered to be significant)

Uses

1. It is used to determine the limits within which the population correlation

co-efficient may be expected to lie.

2. It can be used to test if an observed value of sample correlation co-

efficient is significant of any correlation in population.

Spearman’s Rank Correlation

Karl Pearson’s correlation co-efficient is used to measure the correlation

between variables which are normally distributed. If population is not normal, or

the shape of the distribution is not known, Rank correlation is used. There are

many occasions whereby the value of certain variables cannot be measured in

quantitative form. For example, intelligence, beauty, character, morality, honesty,

etc. rank correlation is used to study association between such variables. It is a

method used to study the correlation between attributes. It was developed by

the British psychologist Charles Edward Spearman in 1904.

Cases

a) Ranks are not repeating

b) Repeated ranks/Tie in rank

Ranks are not repeating

STEPS:

1. Assign ranks to attributes

2. Compare the difference of ranks which is denoted by D

3. Calculate ƩD2

4. Apply the formula, and find correlation

R = ∑

Illustration-4:

Find out Spearman’s rank correlation co-efficient from the following data

X: 60, 34, 30, 50, 45, 41, 22, 43, 42, 66, 64, 46

Y: 75, 32, 34, 40, 45, 33, 12, 30, 36, 72, 41, 57

Solution:

X Y R1 R2 D D2

60 34 40 50 45 41 22 43 42 66 64 46

75 32 34 40 45 33 12 30 36 72 41 57

3 11 10 4 6 9

12 7 8 1 2 5

1 10 8 6 4 9

12 11 7 2 5 3

2 1 2 -2 2 0 0 -4 1 -1 -3 2

4 1 4 4 4 0 0

16 1 1 9 4

∑

Rank Correlation, R = ∑

=

=

=

= 1- 0.167

= 0.833

Repeated ranks/Tie in rank

STEPS:

1. Assign ranks to attributes

2. Compare the difference of ranks which is denoted by D

3. Calculate ƩD2

4. Calculate m3 - m

5. Apply the formula, and find correlation



R = ⌊∑

( )⌋

Illustration-5:

Find out Spearman’s rank correlation co-efficient from the following data

X: 68 64 75 50 64 80 75 40 55 64

Y: 62 58 68 45 81 60 68 48 50 70

Solution:

X Y R1 R2 D D2

68 64 75 50 64 80 75 40 55 64

62 58 68 45 81 60 68 48 50 70

4 6

2.5 9 6 1

2.5 10 8 6

5 7

3.5 10 1 6

3.5 9 8 2

-1 -1 -1 -1 5 -5 -1 1 0 4

1 1 1 1

25 25 1 1 0

16

∑

m = 2, m3 – m = 23 – 2 = 8 – 2 = 6

m = 3, m3 – m = 33 – 2 = 27 – 3 = 24

m = 2, m3 – m = 23 – 2 = 8 – 2 = 6

Rank Correlation, R = ⌊

( )⌋

=

=

= 1- 0.45

= 0.55

Merits

1. In this method, the sum of the differences between R1 and R2 is always

equal to zero. So it provides a check on the calculation.

2. It does not assume normality in the universe from which samples has

been drawn.

3. It is easy to understand and apply.

4. It is the way of studying correlation between qualitative data which

cannot be measured in quantitative terms.

Demerits

1. It cannot be measured in two-way frequency tables.

2. It can be conveniently used only when n is small.

3. Further algebraic treatment is not possible.

4. It is only approximate measure as the actual values are not used.

CO-EFFICIENT OF DETERMINATION

It is the square of co-efficient of correlation. It is more useful to measure

the percentage variation in the dependent variables in relation to the

independent variable.

Co-efficient of determination = r2

Or

=

The co-efficient of determination is a much useful and better measure of

interpreting the value of r. it states what percentage of variations in the

dependent variable is explained to be the dependent variable. If the value of r is

0.8, we cannot conclude that 80% of the value of the variation in the dependent

variable is due to the variation in the independent variable. The co-efficient of

determination in this case is r2 = 0.64 which implies that only 64% of variation in

the dependent variable has been explained by the independent variable and the

remaining 36% of variation is due to other factors.



MODULE-II

NON-PARAMETRIC TESTS

DEFINITION OF STATISTICS

The word statistics has been originated from the Latin word Status or the

Italian word Statista which means political state. According to Dr. S.P Gupta,

”Statistics is the science of collection, organization, presentation, analysis and

interpretation of numerical data”. Statistics can be divided into two branches:

1. Descriptive Statistics

2. Inferential Statistics

Descriptive Statistics deals with collection of data, its presentation in various

forms (tables, graphs, diagrams, etc.) and finding averages and other measures

which would describe the data. It refers to statistical techniques used to

summarize and describe a data set and the statistics (measures) used in such

summarizes.

Inferential Statistics deals with techniques used for analysis of data, making

the estimates and drawing conclusions from limited information taken on sample

basis and testing the reliability of estimates. It is the body of statistical techniques

that deals with the question “how reliable is the conclusions or estimates that we

derive from a set of data?”

PARAMETER AND STATISTIC

Parameter is a function of population values. It is a statistical measure

derived from the population. For example, arithmetic mean of a population.

Statistic is a function of sample values. It is a statistical measure derived

from the sample. For example, arithmetic mean of a sample.

Inferential statistics tries to predict the unknown parameter from the

known statistic.

TESTING OF HYPOTHESIS

Hypothesis is a statement subject to verification. It is an assumption made

about a population parameter. Lundberg defines hypothesis as a “tentative

generalization, the validity of which remains to be tested”. It is tentative, because

its veracity can be evaluated only after it has been tested empirically. It is stated

as an affirmative statement.

Hypothesis may be null hypothesis or alternative hypothesis and

directional hypothesis or non-directional hypothesis.

Null hypothesis is the original hypothesis. It states that there is no

significant difference between sample and population regarding a particular

matter under consideration. It is denoted by H0. For example, H0: There is no

significant mean difference in the mechanical aptitude between boys and girls

(µ1=µ2). Any hypothesis other than a null hypothesis is called alternative

hypothesis. It is denoted by H1. For example, H1: There is significant difference in

the mechanical aptitude between boys and girls (µ1≠µ2)

The statement ‘boys are better than girls in mechanical aptitude’ is a

directional hypothesis, as there is a clear indication of direction of change. But the

statement ‘boys and girls differ in mechanical aptitude’ is a non-directional

hypothesis, as there is no indication of direction of change.

PROCEDURE FOR TESTING HYPOTHESIS

Following are the various steps in the test of hypothesis.

1. Set-up hypotheses

Normally, the researcher has to set two types of hypotheses, viz; a null

hypothesis and an alternative hypothesis.

2. Set-up a suitable level of significance

The probability of rejecting a null hypothesis when it is true is known as

the level of significance. In other words, it is the probability of Type-I

error. Generally the level of significance is fixed at 5% or 1% (0.05 or 0.01).

Level of significance is denoted by α

3. Decide a test criterion.

The test criterion may be z-test, f-test, Ҳ2-test, etc

4. Determine the degree of freedom

Degree of freedom is defined as the number of independent observations

which is obtained by subtracting the number of constraints from the total

number of observations. That is, degree of freedom = Total number of

observations – Number of constraints.



5. Calculation of test statistic

Test statistic can be calculated by using the formula, Difference/Standard

Error

6. Obtain table value

Table value is obtained by considering both the level of significance and

the degree of freedom.

7. Making decision

The decision may be either to accept or to reject the null hypothesis. If the

calculated value of test statistic is more than the table value, we reject H0

and accept H1. If the calculated value of test statistic is less than the table

value, we accept H0 and reject H1.

TYPE-I AND TYPE-II ERROR

While testing a hypothesis, the decision is to accept or reject a hypothesis.

Therefore, there are four possibilities of decisions:

1. Accepting a null hypothesis when it is true

2. Rejecting a null hypothesis when it is false

3. Rejecting a null hypothesis when it is true

4. Accepting a null hypothesis when it is false

The first and second cases are correct and the third and fourth cases are

errors. The third case is known as Type-I error and the fourth case is known as

Type-II error. That is the error of rejecting H0 when it is true is Type-I error, and

the error of accepting H0 when it is false is Type-II error. Type-II error is more

serious error.

The probability of Type-I error, that is rejecting a null hypothesis when it is

true is known as level of significance. As the probability of Type-I error decreases,

probability of Type-II error increases and vice versa.

REJECTION REGION AND ACCEPTANCE REGION

The entire area under a normal curve may be divided into two parts. They

are:

1. Rejection region, and

2. Acceptance region

Rejection Region

It is the area which corresponds to the predetermined level of

significance. If the computed value of the test statistic falls in the rejection region,

we reject the null hypothesis. Rejection region is also known as critical region. It is

denoted by α.

Acceptance Region

It is the area which corresponds to 1-α. If the computed value of the test

statistic falls in the acceptance region, we accept the null hypothesis

TWO-TAILED TEST AND ONE-TAILED TEST

Two-tailed Test

A two tailed test is one in which we reject the null hypothesis, if the

computed value of the test statistic is significantly greater than or lower than the

table value. In two-tailed test, the rejection region is represented by both tails,

that is left and right tails. If we are testing the hypothesis at 5% level of

significance, the size of the acceptance region is 0.95 and the size of the rejection

region is 0.05 on both sides together. So, if the computed value of test statistic

falls either in the left tail or in the right tail, the null hypothesis is rejected.

For example, if we want to test the null hypothesis that the average

height of people in the population is 156 cm. Then the rejection would be on both

sides, since the null hypothesis is rejected if the average height in the sample is

much more than 156 cm or much less than 156 cm.

One-tailed Tests

In one-tailed test, the rejection region is represented by one tail, which

may be either left tail or right tail. For example, if we want to test the null

hypothesis that average height of people in the population is more than 156 cm.

then the rejection area would be on the right tail only, since the null hypothesis is

rejected if the average height in the sample is much less than 156 cm.

Similarly, if we want to test the null hypothesis that average height of

people in the population is less than 156 cm. Then the rejection area would be on

the left tail only, since the null hypothesis is rejected if the average height in the

sample is much more than 156 cm.



TEST STATISTICS

The decision to accept or to reject a null hypothesis is made on the basis

of a statistic computed from the sample. Such a statistic is called test statistic. Test

statistic can be classified into two groups:

1. Parametric tests, and

2. Non parametric tests

Parametric Tests

The statistical tests based on the assumption that the population or

population parameter is normally distributed are called parametric tests. The

important parametric tests are:

a) z-test

b) f-test

c) t-test

Non Parametric Tests

It is a test which is not concerned with testing of parameters and does not

depend on the particular form of the distribution of the population. It can be

defined as a distribution free statistical test where assumptions are fewer than

those associated with parametric test. It is used when the researcher concludes

that a parametric test is not applicable.

Assumptions of Non-Parametric Test

1. The sample observations are independent

2. The variables are continuous

3. Sample drawn is a random sample

4. Observations are measured on ordinal scale

Merits of Non-Parametric Test

1. Simple and easy to apply

2. There is no assumption about the probability distribution of the

population

3. Non restriction regarding the size of sample

4. It can be used even if the sample is small

Types of Non Parametric Tests

1. Chi-square test (Ҳ2-test)

2. Sign test

3. Signed rank test

4. Rank sum test

5. Runs test

CHI-SQUARE TEST (Ҳ2-TEST)

It is a statistical test which explains the significance of difference between

a set of observed frequencies and a set of corresponding theoretical frequencies

under certain assumptions. It is a test which is not concerned with testing of

parameters and does not depend on the particular form of the distribution of the

population. It was developed by Prof: Karl Pearson in 1900.

Characteristics of Chi-Square Test

1. It is a non-parametric test

2. It is a distribution-free test

3. It is easy to evaluate chi-square test statistic

4. It analyses the difference between a set of observed frequencies and a set

of corresponding expected frequencies

Uses/Applications of Chi-Square Test

1. It is useful for the test of independence of attributes: Chi-square test can

be used to find out whether two attributes are associated or not.

2. It is useful for the test of goodness of fit: Chi-square test can be used to

ascertain how well the theoretical distribution fit the data.

3. It is useful for the testing of homogeneity: Test of homogeneity is

concerned with whether different samples come from the same

population.

4. It is useful for the testing given population variance: It helps to test

whether given population variable is acceptable on the basis of samples

drawn from that population.

CHI-SQUARE TEST FOR GOODNESS OF FIT

Chi-square test is used for testing hypothesis related to sample

proportions with respect to the corresponding population properties. Chi-square



test for goodness of fit determines how well they obtained sample proportions fit

the population proportions specified by the null hypothesis

Steps:


In test of goodness of fit, the hypotheses will be set as follows.

H0: There is goodness of fit between expected frequencies and observed

frequencies.

H1: There is no goodness of fit between expected frequencies and

observed frequencies.


Generally, the level of significance is fixed at 5% or 1%.


The test criterion will be chi-test


Degree of freedom C-1, where C stands for the number of categories


Test statistic, Ҳ2 = [∑(O-E)2]/E

Where, O is the observed frequencies, and E is the expected frequencies


Table value is obtained by considering both level of significance and

degree of freedom.

7. Making decision




value, we accept H0 and reject H1

CHI-SQUARE TEST OF INDEPENDENCE

Chi-square test is used for testing whether the two variables associated or

not.

Steps:


In test of independence, the hypotheses will be set as follows.

H0: The two attributes are independent.

H1: The two attributes are dependent.




Degree of freedom (R-1, C-1), where R stands for Number of rows, and C

stands for the number of columns.


The test criterion will be chi-test


Test statistic, Ҳ2 = [∑(O-E)2]/E

Where, O is the observed frequencies, and E is the expected frequencies



degree of freedom.

7. Making decision





Contingency Table:

A contingency table is a frequency table in which a sample from the

population is classified according to two or more attributes, which are divided into

two or more column. When there are only two divisions for each attributes, the

contingency table is known as 2X2 contingency table. The frequencies appearing in

the contingency table are known as cell frequencies.

For example, a 2X2 contingency table based on the two attributes

smoking and drinking is:

Smokers Not smokers Total

Drinkers 40 30 70

Not drinkers 4 24 28



Total 44 54 98

Calculation of Expected Frequencies:

Let a, b, c and d are the observed frequencies, and it is shown In the form

of a contingency table as follows:

Column 1 Column 2 Total

Row 1 A B f1

Row 2 C D f2

Total f3 f3 N

Then the expected frequencies are:

(f1 X f3)/N, (f1 X f4)/N, (f2 X f3)/N, (f2 X f4)/N, OR

(a + b) (a + c), (a + b) (b + d), (c + d) (a + c), (c + d) (b + d)

N N N N

SIGN TEST

Sign test is used to test whether the two populations are identical or not.

It is used in the situations, where t-test cannot be used. It is based on the direction

of the plus or minus signs of observations, and not on their numerical magnitudes.

The sign test may be:

1. One Sample Sign Test, and

2. Two Sample Sign Test

One Sample Sign Test

It is a very simple non-parametric test applicable when:

1) Sample is taken from a continuous population

2) P (Sample value ˂ Mean) = ½ and P (Sample value ˃ Mean) = ½

Steps:


In One Sample Sign Test, the hypotheses will be set as follows.

H0: P = ½

H1: P ≠ ½




The test criterion will be One Sample Sign Test


Degree of freedom is infinity.


Test statistic (p-P)/S.E

Where, p is the proportion of plus signs out of the total signs, P = ½, and

S.E = √ (PQ)/n, where Q = 1 – P



degree of freedom.

7. Making decision





Illustration-1:

In a four round golf play scores of 11 professionals are 202, 210, 200, 203, 193,

203, 204, 195, 199, 202, and 201. Use one sample sign test at 5% level of

significance to test the null hypothesis that professional golfer’s average is 204.

Solution

X x-204

202 -

210 +

200 -

203 -

193 -

203 -



204 ….

195 -

199 -

202 -

201 -

H0: μ = 204

H1: μ ≠ 204

The level of significance is fixed at 5%.

The test criterion is One Sample Sign Test


Test statistic = (p-P)/S.E

Where, p = 1/10, and P = ½

S.E =√

, that is, √

= √

Therefore, test statistic =

√ = -2.53

Table value at 5% level of significance and infinity degree of freedom is 1.96

As the calculated value of test statistic is more than (numerically) the table value,

we reject H0 and accept H1. That is, H1: μ ≠ 204

Two Sample Sign Test

Suppose X and Y are two variables and their n values are known. Then we

get n pair of values, first value of each pair being a value of X and the second is

that of Y. that is, if (x1, y1) is a pair, then X1 belongs to X and y1 belongs to Y.

In such cases, each pair can be replaced by + or – sign. If in a pair, first

value is greater than second value we put + sign. If first value is less than second

value we put - sign. If both are equal concerning value is discarded.

Steps:


In Two Sample Sign Test, the hypotheses will be set as follows.

H0: P = ½

H1: P ≠ ½




The test criterion will be Two Sample Sign Test




Test statistic = ( )

√( )

Where, p is the proportion of plus signs out of the total signs, and n is the

number of pairs compared.



degree of freedom.

7. Making decision





Illustration-2:

The following are the numbers of tickets issued by two sales men on 11 days.

Sales man I: 7, 10, 14, 12, 6, 9, 11, 13, 7, 6, 10

Sales man II: 10, 13, 14, 11, 10, 7, 15, 11, 10, 9, 8

Use two Sample Sign Test at 1% level of significance to test the null hypothesis

that on the average the two sales men issue equal number of tickets

Solution

X Y Sign

7 10 -

10 13 -



14 14 …..

12 11 +

6 10 -

9 7 +

11 15 -

13 11 +

7 10 -

6 9 -

10 8 +

H0: μ = ½

H1: μ ≠ ½


The test criterion is Two Sample Sign Test


Test statistic = ( )

√( )

= ( )

√( )

= -0.63


As the calculated value of test statistic is less than (numerically) the table value,

we accept H0, that is H0: μ = ½

SIGNED RANK TEST/WILCOXON MATCHED-PAIRS TEST

Signed Rank Test was originally proposed by Frank Wilcoxon in 1945. It is

a test used to evaluate the difference between the magnitude and signs of paired

observations. It can be used instead of T-test to produce a null hypothesis in cases

when the population does not conform to normal distribution.

CASE-I: When number of matched pairs (n) is less than 25

Steps:

1. Set up hypothesis

In Signed Rank Test, the hypothesis will be set as follows.

H0: There is no significant difference between two samples.

H1: There is significant difference between two samples.

2. Set up a suitable level of significance


3. Determine a test criterion

The test criterion will be Signed Rank Test.


Degree of freedom is n.


Test statistic, T is lower of sum of ranks with sign.



degree of freedom

7. Making decision


calculated value of test statistic is more than the table value, we reject

the H0 and accept H1. If the calculated value of test statistic is less than the

table value, we accept the H0 and reject H1.

Illustration-3:

Given below are 13 pairs of values showing the performance of two machines,

Test whether there is difference between the performances. Use Wilcoxon

Matched-Pairs Test.

Machine A

73 43 47 53 58 47 52 58 38 61 56 56 34 55 65 75

Machine B

51 41 43 41 47 32 24 58 43 53 52 57 44 57 40 68

Solution

Machine A Machine B d

Difference Rank with sign



73 51 22 13 …

43 41 2 2.5 …

47 43 4 4.5 …

53 41 12 11 …

58 47 11 10 …

47 32 15 12 …

52 24 28 15 …

58 58 0 … …

38 43 -5 … -6

61 53 8 8 …

56 52 4 4.5 …

56 57 -1 … -1

34 44 -10 … -9

55 57 -2 … -2.5

65 40 25 14 …

75 68 7 7 …

TOTAL 101.5 -18.5

H0: There is no significant difference between the performances of two machines.

H1: There is significant difference between the performances of two machines.

Set up a suitable level of significance

Here, the level of significance is fixed at 5%.

The test criterion is Signed Rank Test.

Degree of freedom is n. That is 15. As d=0 for the 8th pair, it is not considered.

Test statistic, T=18.5 (Lower of 101.5 and 18.5).

Table value at 5% level of significance and 15 degree of freedom is 25

Since the calculated value of test statistic is less than the table value, we accept

the H0. That is, there is no significant difference between the performances of two

machines

CASE-II: When number of matched pairs (n) is greater than 25

Steps:


In Signed Rank Test, the hypothesis will be set as follows.

H0: There is no significant difference between two samples.

H1: There is significant difference between two samples.




The test criterion will be Signed Rank Test.




Test statistic, Z = (T-µ)/σ, where T is lower of sum of ranks with sign, µ = [n

(n+1)]/4, and σ = √ [n (n+1) (2n + 1)]/24



degree of freedom

7. Making decision





Illustarion-4:

The following are the weights in kilo grams, before and after of 26 babies who

stayed on a diet for some weeks.

Before: 7.0, 3.5, 2.1, 1.6, 7.5, 6.3, 7.0, 5.4 , 7.7, 8.2, 6.8, 1.9, 1.3, 7.2, 7.8, 1.7, 2.4,

3.5, 4.5, 8.0, 1.5, 2.0, 5.8, 6.5, 3.5, 5.2

After: 7.9, 6.2, 9.0, 3.7, 3.5, 1.4 , 2.6, 3.2, 9.0, 5.4, 8.5, 4.4, 8.3, 9.0, 9.2, 3.2, 3.4

2.8, 3.4, 7.9, 3.5, 3.2, 6.2, 6.3, 3.0, 6.8



Solution

Before After d

Difference Rank with sign

7.0 7.9 -0.9 … -6

3.5 6.2 -2.7 … -20

2.1 9.0 -6.9 … -25

1.6 3.7 -2.1 … -17

7.5 3.5 +4.0 22 …

6.3 1.4 +4.9 24 …

7.0 2.6 +4.4 23 …

5.4 3.2 2.2 18 …

7.7 9.0 -1.3 … -10

8.2 5.4 +2.8 21 …

6.8 8.5 +1.7 … -14

1.9 4.4 -2.5 … -19

1.3 8.3 -7.0 … -26

7.2 9.0 -1.8 … -15

7.8 9.2 -1.4 … -11

1.7 3.2 -1.5 … -12

2.4 3.4 -1.0 … -7

3.5 2.8 +0.7 5 …

4.5 3.4 +1.1 8 …

8.0 7.9 +0.1 1 …

1.5 3.5 -2.0 … -16

2.0 3.2 -1.2 … -9

5.8 6.2 -0.4 … -3

6.5 6.3 +0.2 2 …

3.5 3.0 +0.5 4 …

5.2 6.8 -1.6 … -13

TOTAL 128 -223

H0: There is no significant difference between the weights of babies before and

after the diet.

H1: There is significant difference between the weights of babies before and after

the diet.

Here, the level of significance is fixed at 5%.

The test criterion is Signed Rank Test.


Test statistic, Z= (T-µ)/σ

T = 128, that is, the lowest of 128 and 223

µ = [n (n+1)]/2, that is = [26 X (26 + 1)]/4, = 175.5

σ = √ [n (n+1) (2n + 1)]/24, that is = √ [26 (26+1) (2 X 26 + 1)]/24, = 39.37

Therefore Z= (128 – 175.5)/39.37, that is = -1.21


Since the calculated value of test statistic is less than the table value, we accept

the H0. That is, there is no significant difference between the weights of babies

before and after the diet.

MANN–WHITNEY–WILCOXON U TEST

It is also called the Wilcoxon rank-sum test (WRS), or Wilcoxon–Mann–

Whitney test. It is a nonparametric test of the null hypothesis that two samples

come from the same population against an alternative hypothesis, especially that

a particular population tends to have larger values than the other. It can be

applied on unknown distributions contrary to t-test which has to be applied only

on normal distributions, and it is nearly as efficient as the t-test on normal

distributions.

Let x1, x2, …..,xn be the values of X variable and y1, y2, ….., yn be the values

of Y variable. Let the values of X form a sample independent of the sample formed

by values of Y. we want to test whether the two samples have come from two

identical populations. Let the probability function of X be f1 (x) and that of y be f2

(y).



Steps:


In Mann–Whitney–Wilcoxon U Test, the hypothesis will be set as follows:

H0: The populations are identical, that is f1 (x) = f2 (y)

H1: The populations are not identical, that is f1 (x) ≠ f2 (y).




The test criterion will be Mann–Whitney–Wilcoxon U Test.




Test statistic, t =(µ - U)/S.E, where µ = (n1 X n2)/2, U = n1 X n2 + [ n1 X (n1 +

1)/2] – R1, and S.E = √*n1 X n2 (n1 + n2 + 1)]/12



degree of freedom

7. Making decision





Illustration-5:

There are two samples. First contains the observations54, 39, 70, 58, 47, 40, 74,

49, 74, 75, 61, and 79. The second contains 45, 41, 62, 53, 33, 45, 71, 42, 68, 73, 54,

and 73. Apply Mann–Whitney–Wilcoxon U Test at 5% level of significance that

they come from populations with the same mean?

Solution

Values

(Ascending order) Rank

Sample

I or II

33 1 II

39 2 I

40 3 I

41 4 II

42 5 II

45 6.5 II

45 6.5 II

47 8 I

49 9 I

53 10 II

54 11.5 II

54 11.5 I

58 13 I

61 14 I

62 15 II

68 16 II

70 17 I

71 18 II

73 19.5 II

73 19.5 II

74 21.5 I

74 21.5 I

75 23 I

79 24 I

H0: The populations are identical, that is f1 (x) = f2 (y)

H1: The populations are not identical, that is f1 (x) ≠ f2 (y).


The test criterion is Mann–Whitney–Wilcoxon U Test.


Test statistic, t = (µ - U)/S.E

µ = (n1 X n2)/2, that is, = (12 X 12)/2 = 72

R1 = Sum of ranks assigned to the values in sample I, that is = 167.5



U = n1 X n2 + [ n1 X (n1 + 1)/2] – R1, that is, 12 X 12 + [ 12 X (12 + 1)/2] – 167.5, =

54.5

S.E = √*n1 X n2 (n1 + n2 + 1)+/12, that is, √*12 X 12 (12 + 12 + 1)+/12, = 17.32

Therefore, t = (72–54.5)/17.32 = 1.01

Table value at 5% level of significance and degree of freedom infinity is 1.96

As the calculated value of test statistic is less than the table value, we accept the

H0 and reject H1, that is, the populations are identical, that is f1 (x) = f2 (y)

********************

01 psychological statistics 1

Education