sahasri singar academy

Sahasri Singar Academy

CA | CMA | CS

Statistics – Vol 1

CA | CMA

Foundation

CMA CS Yamuna Sridhar

Price:

For users who are benefited, pay to…

Account holder name: Singar Educational and Charitable Trust

Account number: 1262 1150 0000 9481

IFSC code: KVBL0001262

Bank name: Karur Vysya Bank

CALL OR VISIT FOR COPIES

Published by

SINGAR BOOKS AND PUBLICATIONS

Head Office: 32-B, Vivekananda Nagar, Ramalinga Nagar, Woriur, Trichy 620

003, TN

Branch Office: 76/1, New Street, ValluvarKottam High Road,

Nungambakkam, Chennai – 600 034

Ph: Trichy: 93451 22645 | Chennai: 93453 96855

www.singaracademy.in | singaracademy@gmail.com

Content

1 Statistical Description of Data 1.1

2 Measures of Central Tendency (Averages)

and Dispersion

3 Probability 3.1

4 Correlation and Regression 4.1

SSA Statistics 1.1

1. STATISTICAL DESCRIPTION OF DATA

Introduction of Statistics

Language Word for statistic

1 Latin Status

2 Italian Statista

3 German Statistik

4 French Statisque

Application of statistics: qualitative information and quantitative information of Economics, Business

Management & Statistics in Commerce and Industry

Limitations of Statistics: (1) Statistics deals with the aggregates, (2) Statistics is concerned with

quantitative data (3) Future projections are possible under a specificset of conditions and (4) Statistical

inferences is built upon random sampling.

It means it is ‘science of counting’ or‘science of averages’.

Statistics

Definition

Plural Sense

Data Collection for statistical analysis

Singular Sense

Collecting

Analysing Presenting

Drawing interface

SSA Statistics 1.2

quantitative (variable)

Discrete (measurable)

no. of petals in a flower

no. of misprints in a book

Annual income of a person

Marks of a student

continuous (any value)

height

weight

qualitative (attribute)

Gender of the baby

nationality

colour of the flower

drinking habit

Statistics

Definition

Singular Sense -data

collecting

Primary source collected fresh

Secondary source

Already collected

SSA Statistics 1.3

Interview

Personal

Natural calamity

Indirect

Road accident

telephone

cheap and quickest but inconsistant

mailed questionnaire

widest coverage

maximum non-response

observation

with instuments

Eg. heights of students collected

using scale

questionnaire + enumerators

Secondary source

(Already collected)

International

National

(Government source)

Eg. Religion data from census report

quasi-government

Statistics

Definition

analysingpresenting

Classifed

Chronological

Temporal

Time Series

Geographical

Spatial

Qualitative

Ordinal

Quantitative

Cardinal

TextualTabular ( best &

accurate method)

Diagrammatic (attractive and trend noticed)

Diagram

Charts

Pictures

drawing inference

SSA Statistics 1.4

COLLECTION OF DATA

Scrutiny of Data: verification of accuracy as well as internal consistency can be verified with a number of

related series.

(a) Textual presentation

(b) Tabular presentation or Tabulation (types (two): simple (uni-variate) and complex (bi-variate))

no. →

Table __

Students opting for CA and college

on →

CA Students College Students Total

on →

No. No. No

No. No

→ Box

→ Body

(Footnote)

Abscissa and Ordinate

The horizontal (“x”) value in a pair of

coordinates. How far along the point is.

Always written first in an ordered pair of

coordinates such as (12, 5).

In this example, the value “12” is the

abscissa.

(The second value “5” shows how far up or

down and is called the Ordinate)

SSA Statistics 1.5

(c) Diagrammatic representation of data

Line diagram or Historiagram Bar diagram Pie chart

Line diagram or Historiagram (graph) (relationship between two variables)

Pair of

values(𝑡, 𝑦𝑡)

𝑦𝑡 𝑖𝑠 𝑡ℎ𝑒 𝑡𝑖𝑚𝑒 𝑠𝑒𝑟𝑖𝑒𝑠 𝑖𝑛

𝑡 − 𝑦𝑡 𝑝𝑙𝑎𝑛𝑒

Year Profit

2002 50

2003 80

2004 130

2005 90

2006 150

Logarithmic

or ratio chart

(wide range of

time series)

Where:

log 𝑦𝑡 𝑛𝑜𝑡 𝑦𝑡

Year Profit

2002 10 101

2003 100 102

2004 1000 103

2005 10000 104

2006 100000 105

Multiple

line chart

(two or more

related time series

when the

variables are

expressed in the

same unit)

50 250

80 375

130 500

90 425

150 600

4 Multiple AXIS line chart:

(two or more related time series when the variables are expressed in the DIFFERENT unit)

SSA Statistics 1.6

Bar Diagram: (Bars: rectangle usually with equal width with varying length)

Horizontal

(Qualitative data or

data varying

over space)

Age No.

18 130

20 150

Vertical

(Quantitative data)

(time series data)

Year Profit

2002 50

2003 80

2004 130

2005 90

2006 150

Multiple or

Grouped

(compare two or

more related series)

Year Profit Sales

2002 50 250

2003 80 375

2004 130 500

2005 90 425

2006 150 600

Component or sub-divided: data with multiple

components

50 100 50 50

80 150 100 45

130 200 100 70

90 150 150 35

150 225 150 75

SSA Statistics 1.7

Divided or Percentage: comparing different components

of a variable, the relation of different components to the

whole (pie diagram is a replacement)

250 50 100 50 50

375 80 150 100 45

500 130 200 100 70

425 90 150 150 35

600 150 225 150 75

Pie Diagram or Circle Diagram

(comparing different components and their relation to the total)

Particulars Rupees Degree

Material 100 100

250× 360

= 40°

Labour 50 50

250× 360

= 20°

Expenses 50 50

250× 360

= 20°

Profit 50 50

250× 360

= 20°

Total 250 360°

SSA Statistics 1.8

Frequency: the number of observation falling within a class

FREQUENCY DISTRIBUTION

Tabular representation of observed statistical data (measurable characteristic), usually in ascending order

(individual value or group value)

𝐅𝐫𝐞𝐪𝐮𝐞𝐧𝐜𝐲 𝐃𝐢𝐬𝐭𝐫𝐢𝐛𝐮𝐭𝐢𝐨𝐧 𝐨𝐟 𝐚 𝐕𝐚𝐫𝐢𝐚𝐛𝐥𝐞 = 𝑁𝑜. 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 × 𝑐𝑙𝑎𝑠𝑠 𝑙𝑒𝑛𝑔𝑡ℎ ≅ 𝑅𝑎𝑛𝑔𝑒

Class Limit (CL):

1 Minimum value of a class interval Lowerclass limit (LCL) (LCB)

2 Maximum value of a class interval Upper class limit (UCL) (UCB)

Class Boundary (CB): Actual class interval

Data classification Variable

(usually) Example

Overlapping classification

(Mutually exclusive)

(Excludes UCL and Includes LCL)

Continuous 10–20, 20–30, 30–40,

……

𝐿𝐶𝐿 + 𝑈𝐶𝐿

𝐿𝐶𝐵 + 𝑈𝐶𝐵

Non-Overlapping classification

(Inclusive) Grouped

0–9, 10–19, 20–

29,……

Frequency Distribution

Quantitative

Discrete variable

Eg. distribution of shares

Classification -Mutually inclusive

Continuous variable (grouped frequency

distribution)

Eg. distribution of profits

Classification -Mutually exclusive

Qualitative (Attribute)

Eg. Nationality & Drinking habit

SSA Statistics 1.9

Non-overlapping

Mutually inclusive

1 Lower class Boundary (LCB) 𝐿𝐶𝐵 = 𝐿𝐶𝐿 −

2 Upper class Boundary (LCB) 𝐿𝐶𝐵 = 𝑈𝐶𝐿 +

𝑊𝑖𝑑𝑡ℎ 𝑜𝑟 𝑠𝑖𝑧𝑒 𝑜𝑓 𝑎 𝑐𝑙𝑎𝑠𝑠 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 (𝑐𝑙𝑎𝑠𝑠 𝑙𝑒𝑛𝑔𝑡ℎ 𝑜𝑟 𝑤𝑖𝑑𝑡ℎ 𝑜𝑟 𝑠𝑖𝑧𝑒) = 𝑈𝐶𝐵 − 𝐿𝐶𝐵

Cumulative Frequency: less than cumulative (usually) and more than cumulative (add up to total

frequency)

𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑑𝑒𝑛𝑠𝑖𝑡𝑦 𝑜𝑓 𝑎 𝑐𝑙𝑎𝑠𝑠 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 =𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑒 𝑜𝑓 𝑎 𝑐𝑙𝑎𝑠𝑠

𝑐𝑙𝑎𝑠𝑠 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙

𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 (𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒)𝑜𝑓 𝑎 𝑐𝑙𝑎𝑠𝑠 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 =𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑒 𝑜𝑓 𝑎 𝑐𝑙𝑎𝑠𝑠

𝑡𝑜𝑡𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑒%

Relative frequency lies between 0 and 1.

Graphical Representation of a Frequency Distribution

1. Histogram or Area diagram;

a. mode can be calculated

b. most commonly drawn with class boundary

c. unequal widths of classes can be acceptable

d. frequency density is the vertical bar

e. looks like a vertical bar chart

f. comparison among the class frequency is possible

2. Frequency Polygon;

a. single frequency distribution or

b. midpoint in case of grouped frequency provided it should have common width

c. An approximate idea of the shape of frequency cure

3. Ogives or cumulative Frequency graphs.

a. A graphical representation of a cumulative frequency distribution

b. A line diagram

c. Two types (less than and more than)

d. Used for calculating median and quartiles (< ogive curve is alone sufficient)

Frequency Curve: (limiting form of a histogram and frequency polygon)

Bell-shaped curve (Uni - Modal)

(height, weight, mark, profit etc)

U-shaped curve

(may be Uni-Modal / Bi - Modal)

SSA Statistics 1.10

J-shaped curve (Uni - Modal)

(profit of a company)

Mixed curve (Bi - Modal)

Illustration: Consider

Interval Frequency

< ogive

> ogive

UCL ‹ cf LCL › cf

0 – 20 5 20 5 0 60

20 – 40 10 40 15 20 55

40 – 60 25 60 40 40 45

60 – 80 15 80 55 60 20

80 - 100 5 100 60 80 5

SSA Statistics 1.11

1. Tally marks determines class frequency

2. Class mark (a representative value of the class interval) is midpoint or mid value

3. Classes with zero frequency is called empty class

4. Cumulative frequency distribution – for finding number of observations less (more) than any

given value

5. Cumulative frequency usually refers to less than type

6. Most extreme values which would ever be included in a class interval

7. When one (or both) end of a class is not specified then it is called as open-end class.

SSA Statistics 1.12

Statistics

Definition

Plural Sense

Data Collection

quantitative (variable)

Discrete (measurable)

no. of petals in a flower

no. of misprints in a book

continuous (any value)

height

weight

qualitative (attribute)

Gender of the baby

nationality

colour of the flower

for statistical analysis

collecting

Interview

Personal

Natural calamity

Indirect

Road accident

telephone

cheap and quickest but inconsistant

mailed questionnaire

widest coverage

maximum non-response

observation

with instuments

questionnaire + enumerators

Secondary source

Already collected

International

National

quasi-government

analysing presenting

Classifed

Chronological

Temporal

Time Series

Geographical

Spatial

Qualitative

Ordinal

Quantitative

Cardinal

TextualTabular ( best &

accurate method)

Diagrammatic (attractive and trend noticed)

Diagram

Charts

Pictures

drawing inference

SSA Statistics 2.1

2. MEASURES OF CENTRAL TENDENCY (AVERAGES) & DISPERSION

DEFINITION OF CENTRAL TENDENCY / AVERAGES:

Central tendency (tending to the central value), which helps for finding performance and comparison

X 𝑓

00-19 1 Minimum

20-39 3 Gradually increasing

40-59 7 Maximum

60-79 2 Gradually decreasing

80-99 1 Minimum

X - (Any variable: Height, Weight, Marks, Profits, Wages, and so on)

𝑓 - Frequency, (Usually, repetitiveness, frequent happenings, number of times of occurrence)

List of Formula

Arithmetic Mean (�̅�) Geometric Mean (𝑮𝑴) Harmonic Mean (𝑯𝑴)

Weighted Average

X̅ =∑ 𝑤𝑋

∑ 𝑤

G = (𝑋1𝑤1 × 𝑋2

𝑤2 × …

× 𝑋𝑛𝑤𝑛)

∑ 𝑤

Or 𝐺 = 𝐴𝑛𝑡𝑖𝑙𝑜𝑔 (∑ 𝑤 𝑙𝑜𝑔𝑋

∑ 𝑤)

𝐻 =∑ 𝑤

∑𝑤

Combined Mean

x̅ =𝑛1x̅1 + 𝑛2x̅2

𝑛1 + 𝑛2

𝐺 = 𝐴𝑛𝑡𝑖𝑙𝑜𝑔 (𝑛1 log 𝐺1 + 𝑛2 log 𝐺2

𝑛1 + 𝑛2

) 𝐻 =𝑛1 + 𝑛2𝑛1

𝐻1+

SSA Statistics 2.2

Measures of Central Tendency (Averages)

Mean Partition Values: (Arrange the items in ascending order)

Mode (𝑴𝒐) Arithmetic

(usual cases)

(Direct Method)

Geometric

(Comparisons

– ratios,

Proportions and %)

Harmonic

(Two units together

E.g. speed =

distance / time

Median (𝑴𝒆) Fractiles (𝑭𝒆)

Individual

=𝑋1 + 𝑋2 … 𝑋𝑛

X̅ =∑ 𝑋𝑖

𝑁𝑖=1

x̅ =∑ 𝑋

GM = (𝑋1. 𝑋2. … 𝑋𝑛)1

𝒐𝒓 𝐺𝑀

= 𝐴𝑛𝑡𝑖𝑙𝑜𝑔 (∑ 𝑙𝑜𝑔 𝑋

𝐻𝑀 =𝑛

If ‘n’ is odd:

𝑀𝑒 = (𝑛 + 1

𝑡ℎ

𝑜𝑏𝑠

(i.e. the middle obs)

If ‘n’ is even:

𝑀𝑒

𝑡ℎ

+ (𝑛

𝑡ℎ

𝑜𝑏𝑠

𝐹𝑒 =𝑒(𝑛 + 1)

𝑀𝑜 = 𝑚𝑜𝑠𝑡 𝑢𝑠𝑢𝑎𝑙

(𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑡ℎ𝑜𝑑)

Discrete series

X̅ =∑ 𝑓𝑋

∑ 𝑓

=∑ 𝑓𝑋

= (𝑋1𝑓1 . 𝑋2

𝑓2 . … 𝑋𝑛𝑓𝑛)

𝒐𝒓 𝐺

= 𝐴𝑛𝑡𝑖𝑙𝑜𝑔 (∑ 𝑓 𝑙𝑜𝑔 𝑋

𝐻𝑀 =𝑛

∑𝑓

𝑀𝑒

= 𝑠𝑖𝑧𝑒 𝑜𝑓 (𝐶𝑓 >𝑁 + 1

𝐹𝑒

= 𝑠𝑖𝑧𝑒 𝑜𝑓 (𝐶𝑓

>𝑒(𝑁 + 1)

Regular frequency

𝑀𝑜 = 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑡ℎ𝑜𝑑

Irregular frequency

𝑀𝑜 = 𝑔𝑟𝑜𝑢𝑝𝑖𝑛𝑔 𝑚𝑒𝑡ℎ𝑜𝑑

Continuous / Grouped Frequency / (Interpolation Method)

X̅ =∑ 𝑓𝑚

∑ 𝑓

=∑ 𝑓𝑚

= (𝑚1𝑓1 . 𝑚2

𝑓2 . … 𝑚𝑛𝑓𝑛)

𝑶𝒓 𝐺

= 𝐴𝑛𝑡𝑖𝑙𝑜𝑔 (∑ 𝑓 𝑙𝑜𝑔 𝑚

𝐻𝑀 =𝑛

∑𝑚

𝑀𝑒

= 𝑙1 + (

2− 𝑁𝑙

𝑁𝑢 − 𝑁𝑙

) × 𝐶

𝑶𝒓 𝑙 +

2− 𝑚

𝑓× 𝑐

𝐹𝑒 = 𝑙1 + (𝑒

𝐹− 𝑁𝑙

𝑁𝑢 − 𝑁𝑙

) × 𝐶

𝑶𝒓 𝑙 +𝑒

𝐹− 𝑚

𝑓× 𝑐

𝑀𝑜 = 𝑙1 + (𝑓0 − 𝑓−1

2𝑓0 − 𝑓−1 − 𝑓1

× 𝐶

1. Indirect / Shortcut / Assumed Mean (A) Method: Deviation Method (𝑑 = 𝑋 − 𝐴): X̅ = 𝐴 +∑ 𝑑

𝑛 & Step-Deviation Method (𝑑 =

𝑋−𝐴

𝐶): x̅ = 𝐴 +

∑ 𝑑

𝑛× 𝐶

2. Empirical relationship (thumb rule): If mode is ill-defined (𝑖𝑛 𝑐𝑎𝑠𝑒 𝑜𝑓 𝑚𝑜𝑑𝑒𝑟𝑎𝑡𝑒𝑙𝑦 𝑠𝑘𝑒𝑤𝑒𝑑 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛): X̅ − 𝑀𝑜 = 3(X̅ − 𝑀𝑒) 𝑜𝑟 𝑀𝑜 = 3𝑀𝑒 − 2X̅

3. Fractiles: Quartiles (Q), Octiles (O), Deciles (D) and Percentiles (P)

SSA Statistics 2.3

Measures of Dispersion

Absolute Relative

(i) 𝐑𝐚𝐧𝐠𝐞 (𝐑) = 𝐿 − 𝑆 𝐂𝐨𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐭 𝐨𝐟 𝐫𝐚𝐧𝐠𝐞(𝐶𝑜 𝑅)

=𝐿 − 𝑆

𝐿 + 𝑆× 100

(ii) 𝐐𝐮𝐚𝐫𝐭𝐢𝐥𝐞 𝐃𝐞𝐯𝐢𝐚𝐭𝐢𝐨𝐧 (𝐐𝐃) =

𝑄3 − 𝑄1

(Otherwise Semi inter quartile range)

𝐈𝐧𝐭𝐞𝐫 𝐪𝐮𝐚𝐫𝐭𝐢𝐥𝐞 𝐫𝐚𝐧𝐠𝐞 = 𝑄3 − 𝑄1

Coefficient of Quartile Deviation (Co

𝐶𝑜 𝑄𝐷 =𝑄3 − 𝑄1

𝑄3 + 𝑄1

× 100

(iii) Mean Deviation (MD) about A, (𝑨 = X̅,

𝑀𝑒 , 𝑀𝑜)

Coefficient of Mean Deviation

(𝐶𝑜 𝑀𝐷𝐴)

𝐈𝐧𝐝𝐢𝐯𝐢𝐝𝐮𝐚𝐥: M𝐷𝐴 =

𝑛∑|𝑥 − 𝐴| 𝐶𝑜 𝑀𝐷𝐴 =

𝑀𝐷𝐴

𝐴× 100

𝐃𝐢𝐬𝐜𝐫𝐞𝐭𝐞: M𝐷𝐴 =

𝑁∑ 𝑓|𝑥 − 𝐴|

𝐂𝐨𝐧𝐭𝐢𝐧𝐮𝐨𝐮𝐬: 𝑀𝐷𝐴 =

𝑁∑ 𝑓|𝑚 − 𝐴|

(iv) Standard Deviation (s) Coefficient of Variation (CV)

𝐈𝐧𝐝𝐢𝐯𝐢𝐝𝐮𝐚𝐥: 𝑠 = √∑(𝑋 − X̅)2

𝑛 𝑜𝑟√

∑ 𝑋2

𝑛− X̅2

𝐶𝑉 =𝑠

X̅× 100

𝐃𝐢𝐬𝐜𝐫𝐞𝐭𝐞: 𝑠 = √∑ 𝑓(𝑋 − X̅)2

𝑁 𝑜𝑟√

∑ 𝑓𝑋2

𝑁− X̅2

𝐂𝐨𝐧𝐭𝐢𝐧𝐮𝐨𝐮𝐬: 𝑠

= √∑ 𝑓(𝑚 − X̅)2

𝑁 𝑜𝑟 √

∑ 𝑓𝑚2

𝑁− X̅2

𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝑠2

Shortcut:

𝑠 = √∑ 𝑓𝑑2

𝑁− (

∑ 𝑓𝑑

𝑊ℎ𝑒𝑟𝑒 𝑑 = 𝑋 − 𝐴 (𝑓𝑜𝑟 𝑖𝑛𝑑𝑖𝑣𝑖𝑑𝑢𝑎𝑙 𝑎𝑛𝑑 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒) & 𝑑 =𝑚 − 𝐴

𝐶 𝑓𝑜𝑟 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠

Comparison

Absolute Measure Relative Measure

1 Dependent of unit Independent of unit

2 Not considered for comparison considered for comparison

3 Not much difficult compared to Relative

measure

Difficult to compute and

comprehend.

SSA Statistics 2.4

INDIVIDUAL OBSERVATIONS

Question 1: From the Individual Observations: 3, 6, 48 & 24, find out the following

Measures of Averages Measures of Dispersion

Arithmetic Mean Absolute Measure Relative Measure

Geometric Mean Range Coefficient of Range

Harmonic Mean Quartile Deviation Coefficient of Quartile

Deviation

Median Mean Deviation Coefficient of Mean Deviation

Fractiles

(𝑄1, 𝑄3, 𝑂6, 𝐷7 & 𝑃75)

Standard Deviation /

Variation

Coefficient of Variation

Answer:

Measures of Averages

Mean Formula Calculation Answer

AM X̅ =

∑ 𝑋

3 + 6 + 24 + 48

GM GM = (𝑋1 × 𝑋2 × …

× 𝑋)1

(3 × 6 × 24

× 48)1

(34. 44)1

HM 𝐻𝑀 =𝑛

4 × 48

16 + 8 + 2 + 1

𝑿 3 6 24 48

𝐥𝐨𝐠 𝑿 0.4771 0.7782 1.3802 1.6812

∑ log 𝑋 4.3167

Formula Calculation Answer

GM 𝐴𝑛𝑡𝑖𝑙𝑜𝑔 (

∑ log 𝑋

𝑛) 𝐴𝑛𝑡𝑖𝑙𝑜𝑔 (

4.3167

Positional Average

Formula Calculations Answer

𝑀𝑒 𝑠𝑖𝑧𝑒 𝑜𝑓 (𝑛 + 1

𝑡ℎ

𝑜𝑏𝑠

𝑆𝑖𝑧𝑒 𝑜𝑓 2.5𝑡ℎ 𝑜𝑏𝑠

6 + 0.5(24 – 6) 15 2𝑛𝑑 𝑜𝑏𝑠

+ 0.5 (3𝑟𝑑 𝑜𝑏𝑠 – 2𝑛𝑑 𝑜𝑏𝑠)

𝑄1 𝑠𝑖𝑧𝑒 𝑜𝑓 (𝑛 + 1

𝑡ℎ

𝑜𝑏𝑠

3 + 0.25(6 – 3) 3.75 1𝑠𝑡 𝑜𝑏𝑠

+ 0.25 (2𝑛𝑑 𝑜𝑏𝑠 – 1𝑠𝑡 𝑜𝑏𝑠)

𝑄3 𝑠𝑖𝑧𝑒 𝑜𝑓 (3(𝑛 + 1)

𝑡ℎ

𝑜𝑏𝑠

SSA Statistics 2.5

𝑃75 𝑠𝑖𝑧𝑒 𝑜𝑓 (75(𝑛 + 1)

𝑡ℎ

𝑜𝑏𝑠 𝑆𝑖𝑧𝑒 𝑜𝑓 3.75𝑡ℎ 𝑜𝑏𝑠

3𝑟𝑑 𝑜𝑏𝑠

+ 0.75 (4𝑡ℎ 𝑜𝑏𝑠 – 3𝑟𝑑 𝑜𝑏𝑠)

+ 0.75 (48 – 24)

𝟒𝟐

𝑂6 𝑠𝑖𝑧𝑒 𝑜𝑓 (6(𝑛 + 1)

𝑡ℎ

𝑜𝑏𝑠

𝑵𝒐𝒕𝒆: 𝑄3 = 𝑂6 = 𝑃75

𝐷7 𝑠𝑖𝑧𝑒 𝑜𝑓 (7(𝑛 + 1)

𝑡ℎ

𝑜𝑏𝑠

𝑆𝑖𝑧𝑒 𝑜𝑓 3.5𝑡ℎ 𝑜𝑏𝑠 24 + 0.5(48

− 24) 𝟑𝟔 3𝑟𝑑 𝑜𝑏𝑠

+ 0.5 (4𝑡ℎ 𝑜𝑏𝑠 – 3𝑟𝑑 𝑜𝑏𝑠)

Mode is ill-defined (Since all the observation has equal appearance)

Hence, the empirical relation is used to arrive 𝑀𝑜

𝑀𝑜 𝑀𝑒𝑎𝑛 − 𝑀𝑜𝑑𝑒 = 3(𝑀𝑒𝑎𝑛

− 𝑀𝑒𝑑𝑖𝑎𝑛)

20.25 − 𝑀𝑜𝑑𝑒

= 3(20.25 – 15) 4.5

Measures of Dispersion (Absolute and Relative)

1 Range (R) 𝐿 – 𝑆 48 – 3 45

Co – efficient of Range 𝐿 − 𝑆

𝐿 + 𝑆 =

48−3

48+3 0.8823

2 Quartile Deviation (𝑸𝑫) 𝑄3 − 𝑄1

42 − 3.75

19.125

Coefficient of Quartile Deviation 𝑄3 − 𝑄1

𝑄3 + 𝑄1

42 − 3.75

42 + 3.75

3 Mean Deviation (𝑀𝐷X̅) 1

𝑛∑|𝑋 − X̅|

Co – efficient of MD 𝑀𝐷X̅

𝑀𝑒𝑎𝑛

4 Standard Deviation (𝒔) √

∑(𝑋 − X̅)2

𝑛 √

1284.75

17.921

∑𝑋2

𝑛− (

∑𝑋

√2925

4− (

17.921

𝑉𝑎𝑟 (𝑋) 𝑆2 17.9212 321.16

𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛, 𝑣𝑎𝑟(𝑥) 𝑠

X̅× 100

17.921

20.25× 100

88.49%

SSA Statistics 2.6

Working note

𝑿 |𝑿 − �̅�| 𝑿 − 𝐗 (𝑿 − �̅�)𝟐 𝑿𝟐

3 17.25 17.25 297.5625 9

6 14.25 14.25 203.0625 36

24 3.75 -3.75 14.0625 576

48 27.25 -27.75 770.0625 2304

Total 63 1284.75 2925

Question 2: Find Median, 𝑸𝟏, 𝑸𝟑,𝑶𝟔, 𝑫𝟕, 𝑷𝟕𝟓 for the observations: 1, 3, 6, 24, 48.

Answer:

Positional Average

𝑀𝑒 𝑠𝑖𝑧𝑒 𝑜𝑓 (

𝑛 + 1

𝑡ℎ

𝑜𝑏𝑠 𝑆𝑖𝑧𝑒 𝑜𝑓 3𝑟𝑑 𝑜𝑏𝑠 6 + 0.5(24 – 6) 6

𝑄1 𝑠𝑖𝑧𝑒 𝑜𝑓 (

𝑛 + 1

𝑡ℎ

𝑜𝑏𝑠 𝑆𝑖𝑧𝑒 𝑜𝑓 1.5𝑡ℎ 𝑜𝑏𝑠 1 + 0.5(3 – 1) 2

1𝑠𝑡 𝑜𝑏𝑠

+ 0.5 (2𝑛𝑑 𝑜𝑏𝑠 – 1𝑠𝑡 𝑜𝑏𝑠)

𝑄3 𝑠𝑖𝑧𝑒 𝑜𝑓 (

3(𝑛 + 1)

𝑡ℎ

𝑜𝑏𝑠

4𝑡ℎ 𝑜𝑏𝑠

+ 0.5 (5𝑡ℎ 𝑜𝑏𝑠 – 4𝑡ℎ 𝑜𝑏𝑠)

+ 0.5 (48 – 24)

36 𝑃75

𝑠𝑖𝑧𝑒 𝑜𝑓 (75(𝑛 + 1)

𝑡ℎ

𝑜𝑏𝑠

𝑂6 𝑠𝑖𝑧𝑒 𝑜𝑓 (

6(𝑛 + 1)

𝑡ℎ

𝑜𝑏𝑠

𝑵𝒐𝒕𝒆: 𝑄3 = 𝑂6 = 𝑃75

𝐷7 𝑠𝑖𝑧𝑒 𝑜𝑓 (

7(𝑛 + 1)

𝑡ℎ

𝑜𝑏𝑠 𝑆𝑖𝑧𝑒 𝑜𝑓 4.2𝑡ℎ 𝑜𝑏𝑠 24 + 0.2(48

− 24)

𝟐𝟖. 𝟖

4𝑡ℎ 𝑜𝑏𝑠

+ 0.2 (5𝑡ℎ 𝑜𝑏𝑠 – 4𝑡ℎ 𝑜𝑏𝑠)

Question 3: Discrete Frequency Distribution

x 10 11 12 13 14 15 16 17 18 19

f 8 15 20 100 98 95 90 75 50 30

Answer:

1 Arithmetic Mean(x̅) ∑𝑓𝑋

581 15.02

2 Geometric Mean(𝐺𝑀) Antilog (∑ 𝑓 log 𝑋

𝑁) Antilog (

682.4203

581) 14.95

SSA Statistics 2.7

3 Harmonic Mean (𝐻𝑀) 𝑁

∑𝑓

39.25 14.802

Working Note:

𝑿 𝒇 𝒇𝑿 𝐥𝐨𝐠 𝑿 𝒇 𝐥𝐨𝐠 𝑿 𝒇

10 8 80 1.0000 8.0000 0.800

11 15 165 1.0414 15.6210 1.360

12 20 240 1.0792 21.5840 1.670

13 100 1300 1.1139 111.3900 7.690

14 98 1372 1.1461 112.3178 7.000

15 95 1425 1.1761 111.7295 6.330

16 90 1440 1.2041 108.3690 5.625

17 75 1275 1.2304 92.2800 4.411

18 50 900 1.2553 62.7650 2.780

19 30 570 1.2788 38.364 1.578

Total 581 8727 682.4203 39.25

Positional Average

Formula Calculations Answer Working Notes

𝑀𝑒 𝑠𝑖𝑧𝑒 𝑜𝑓 (𝑁 + 1

𝑡ℎ

𝑜𝑏𝑠 𝑆𝑖𝑧𝑒 𝑜𝑓 291𝑠𝑡 𝑜𝑏𝑠

(𝑖. 𝑒. 𝑐𝑓 > 291) 15

𝑿 𝑓 𝑐𝑓

10 8 8

11 15 23

12 20 43

13 100 143

14 98 241

15 95 336

16 90 426

17 75 501

18 50 551

19 30 581

𝑄1 𝑠𝑖𝑧𝑒 𝑜𝑓 (1(𝑁 + 1)

𝑡ℎ

(𝑖. 𝑒. 𝑐𝑓 > 145.5) 14

𝑄3 𝑠𝑖𝑧𝑒 𝑜𝑓 (3(𝑁 + 1)

𝑡ℎ

𝑜𝑏𝑠

(𝑖. 𝑒. 𝑐𝑓 > 436.5)

𝑃75 𝑠𝑖𝑧𝑒 𝑜𝑓 (75(𝑁 + 1)

𝑡ℎ

𝑜𝑏𝑠

𝑂6 𝑠𝑖𝑧𝑒 𝑜𝑓 (6(𝑁 + 1)

𝑡ℎ

𝑜𝑏𝑠

𝑵𝒐𝒕𝒆: 𝑄3 = 𝑂6 = 𝑃75

𝐷7 𝑠𝑖𝑧𝑒 𝑜𝑓 (7(𝑁 + 1)

𝑡ℎ

𝟏𝟔 (𝑖. 𝑒. 𝑐𝑓 > 407.4)

SSA Statistics 2.8

Mode: Since there is a sudden increase in frequency from 20 to 100, we obtain mode by Grouping Table

Grouping Table The highest frequency total in each of the six

columns of the grouping table is identified and

analyzed (Tally marks)

(1) (2) (3) (4) (5) (6)

𝑿 𝒇 (1) (2) (3) (4) (5) (6)

10 8 23

11 15 35

12 20 120

13 100 198

| | | 3

14 98 193

| | | | 4

15 95 185

| | | | 4

16 90 165

17 75 125

18 50 80

19 30 0

Explanation to column

(𝟏) Original Frequency

(𝟐) grouping in “two’s

(𝟑) Leaving the first and grouping

the rest in “two’s”

(𝟒) grouping in “three’s”

(𝟓) Leaving the first and grouping in

“three’s”

(𝟔) Leaving the first & second and

grouping in “three’s”

Mode is ill-defined or bi-modal

(Since “14” and “15” occur equal number of times)

Hence, the empirical relation is used to arrive 𝑀𝑜

𝑀𝑜 𝑀𝑒𝑎𝑛 − 𝑀𝑜𝑑𝑒 = 3(𝑀𝑒𝑎𝑛 − 𝑀𝑒𝑑𝑖𝑎𝑛)

15.02 − 𝑀𝑜𝑑𝑒 = 3(15.02 – 15)

Points to Ponder:

Under Location Method, Mode = 13 (as the highest frequency is 100)

Under Grouping Method, Mode is ill- defined.

But, Under Empirical Relationship, Mode = 14.96, which brings the issues an accuracy

1 Range (R) 𝐿 − 𝑆 19 − 10 10

𝐿 + 𝑆

19 − 10

19 + 10 0.31

17 − 14

𝑄3 + 𝑄1

17 − 14

17 + 14 0.0967

3 Mean Deviation (𝑀𝐷X̅) 1

𝑁∑|𝑋 − X̅|

969.82

58.1 1.669

SSA Statistics 2.9

𝑀𝑒𝑎𝑛

15.02 0.111133

4 Standard Deviation (𝒔) √∑ 𝑓(𝑋 − X̅)2

𝑁 √

2204.7628

581 3.80

𝑉𝑎𝑟 (𝑋) 𝑠2 3.802 14.44

𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛, 𝑣𝑎𝑟(𝑋) 𝑠

x̅× 100

15.02× 100 25.29%

Working Note:

for MD For SD

𝑿 𝒇 |𝑿 − �̅�| 𝒇|𝑿 − 𝐗| (𝑿 − �̅�) 𝒇(𝑿 − �̅�)𝟐

10 8 5 .02 40.16 -5 .02 201.6032

11 15 4.02 60.30 -4.02 242.4060

12 20 3.02 68.40 -3.02 182.4080

13 100 2.02 202.00 -2.02 81.6080

14 98 1.02 99.96 -1.02 101.9592

15 95 0.02 1.90 -0.02 0.0380

16 90 0.98 88.20 0.98 86.4360

17 75 1.98 143.50 1.98 294.0300

18 50 2.98 1.49 2.98 444.0200

19 36 3.98 119.40 3.98 570.2544

∑ 581 969.82 2204.7628

Question 4: Continuous Frequency Distribution:

Marks 01-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100

Number of Students 3 7 13 17 12 10 8 8 6 6

Also verify the empirical relation

Answer:

(Direct Method)

∑ 𝑓𝑚

(Short cut -Method)

𝐴 + ∑ 𝑓𝑑

𝑁× 𝑐 (A=45.5) 45.5 +

90× 10

𝐴 + ∑ 𝑓𝑑

𝑁× 𝑐 (A = 55.5) 55.5 +

−620

90× 10

Geometric Mean, GM 𝐴𝑛𝑡𝑖𝑙𝑜𝑔 (

∑ 𝑓 log 𝑚

𝑁) 𝐴𝑛𝑡𝑖𝑙𝑜𝑔

150.5439

Harmonic Mean, HM 𝑁

∑𝑓

2.7905

SSA Statistics 2.10

Working Note:

(Class

boundaries

𝒎 𝒇 𝒇𝒎 𝒅

=𝒎 − 𝟒𝟓. 𝟓

𝟏𝟎

𝒇𝒅 𝐥𝐨𝐠 𝒎 𝒇 𝐥𝐨𝐠 𝒎 𝒇

0.5 – 10.5 5.5 3 16.5 -4 -

2.2212 0.545

10.5 – 20.5 15.

7 108.5 -3 -

1.903 13.3210 0.451

20.5 – 30.5 25.

331.5 -2 -

18.2845 0.509

30.5 – 40. 5 35.

603.5 -1 -

26.3534 0.478

40.5 – 50.5 45.

546.0 0 0 1.658

19.8960 0.263

50.5 – 60.5 55.

555.0 1 1

17.4430 0.180

60.5 – 70.5 65.

8 524.0 2 1

14.5296 0.122

70.5 – 80.5 75.

8 604.0 3 2

15.0232 0.106

80.5 – 90.5 85.

6 513.0 4 2

11.5920 0.070

90.5 – 100.5 95.

6 573.0 5 3

11.8800 0.062

Total 9

150.543

SSA Statistics 2.11

Positional Average and Mode

Working Note

𝑀𝑒 𝑙 +

2− 𝑚

𝑓× 𝑐 40.5 +

45 − 40

12× 10 44.67

𝑿 𝒇 𝒄𝒇

0.5–

10.5–

20.5–

30.5–

40.5–

50.5–

60.5–

70.5–

80.5–

90.5–

𝑄1 𝑙 +

4− 𝑚

𝑓× 𝑐 20.5 +

22.5 − 10

13× 10 30.12

𝑄3 𝑙 +

4− 𝑚

𝑓× 𝑐

60.5 +67.5 − 62

8× 10 67.38 𝑂6 𝑙 +

8− 𝑚

𝑓× 𝑐

𝑃75 𝑙 +

75𝑁

100− 𝑚

𝑓× 𝑐

𝑂3 = 𝑂6 = 𝑃75 =

𝐷7 𝑙 +

10− 𝑚

𝑓× 𝑐 60.5 +

63 − 62

8 × 10 61.75

𝑀𝑜

+ (𝑓0 − 𝑓−1

2𝑓0 − 𝑓−1 − 𝑓1

× 𝐶

+ (17 − 13

2 × 17 − 13 − 12)

𝑴𝒐 𝒄𝒍𝒂𝒔𝒔 𝒊𝒔 (𝟑𝟎. 𝟓 − 𝟒𝟎. 𝟓), since 17 is the highest

frequency

Graphical Method: Ogive Curves for Positional Average:

Marks Number

of Students

Less than ogive curve More than ogive curve

UCL < cf LCL >cf

0.5 – 10.5 3 10.5 3 0.5 90 (= ∑𝑓)

10.5 – 20.5 7 20.5 10 10.5 87

20.5 – 30.5 13 30.5 23 20.5 80

30.5 – 40. 5 17 40. 5 40 30.5 67

40.5 – 50.5 12 50.5 52 40.5 50

50.5 – 60.5 10 60.5 62 50.5 38

60.5 – 70.5 8 70.5 70 60.5 28

70.5 – 80.5 8 80.5 78 70.5 20

80.5 – 90.5 6 90.5 84 80.5 12

90.5 – 100.5 6 100.5 90 (= ∑𝑓) 90.5 6

SSA Statistics 2.12

Verification of Empirical relation:

Mean – Mode = 3 (Mean - Median)

(i.e.,) 48.61 – 34.94 = 3 (48.61 – 44.67)

13.67 = 3 ( 4.006)

13.67 = 12.18, which is not true

Graphical Method

𝑀𝑜 = 35 (𝐺𝑟𝑎𝑝ℎ𝑖𝑐𝑎𝑙 𝑀𝑒𝑡ℎ𝑜𝑑)

𝑄1 = 30, 𝑄3 = 45 & 𝑄3 = 67

1 Range (R) 𝐿 – 𝑆 100 − 1 99

Other-way 100.5 − 0.5 100

𝐿 + 𝑆

100 − 1

100 + 1 0.98

67.38 − 30.12

2 18.63

𝑄3 + 𝑄1

67.38 − 30.12

67.38 + 30.12 0.38

3 Mean Deviation (𝑀𝐷X̅)

𝑁∑|𝑚

− X̅|

1843.54

90 20.48

𝑀𝑒𝑎𝑛

48.61 0.42

4 Standard Deviation (𝒔) √∑ 𝑓(𝑚 − X̅)2

𝑁 √

53128.89

90 24.29

𝑉𝑎𝑟 (𝑋) 𝑆2 (24.29)2 590.49

𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛, 𝑣𝑎𝑟(𝑋) 𝑠

X̅× 100

48.61× 100 25.29%

SSA Statistics 2.13

Working Notes

(Class boundaries) 𝒎 𝒇

− 𝐗| f|𝒎 − �̅�| (𝑿 − �̅�)𝟐 f (𝑿 − 𝐗)𝟐

0.5 – 10.5 5.5 3 43.11 129.33 1858.4701 5575.4163

10.5 – 20.5 15.5 7 33.11 231.77 1096.2721 7673.9047

20.5 – 30.5 25.5 13 23.11 300.43 534.0721 6942.9373

30.5 – 40. 5 35.5 17 13.11 222.87 171.8721 2921.8252

40.5 – 50.5 45.5 12 3.11 37.32 9.6721 116.0652

50.5 – 60.5 55.5 10 6.89 68.90 47.4733 474.7210

60.5 – 70.5 65.5 8 16.89 135.12 285.2721 2282.1768

70.5 – 80.5 75.5 8 26.89 215.12 723.0721 5784.5768

80.5 – 90.5 85.5 6 36.89 221.34 1360.8721 8165.2326

90.5 – 100.5 95.5 6 46.89 281.34 2198.6721 13192.0326

Total 90 1843.54 53128.889

PROPERTIES: (A) MEASURES OF AVERAGES / CENTRAL TENDENCY

Arithmetic Mean

Property 1: If all the observations assumed by a variable are constants, say k, then the AM is also k.

Illustration: Consider 2, 2, 2

Property Calculation Answer

X̅ =𝑘 + 𝑘 + ⋯ + 𝑘

𝑛= 𝑘

2 + 2 + 2

Property 2: (a) The algebraic sum of deviations of a set of observations from their AM is zero. And

(b) the sum of the square of the deviation taken from the Mean (X̅) is always minimum compared to

the deviations taken from any other Assumed Mean (𝐴)

Illustration: Consider (X): 2, 3, 4

Property Formula Calculation Ans

X̅ = ∑ 𝑋 2 + 3 + 4

∑(𝑋 − X̅) = 0

∑ 𝑓(𝑋 − X̅) = 0

∑(𝑋 − X̅) (2 − 3) + (3 − 3 + (4 − 3) 0

b ∑(𝑋 − X̅)2 ≤ ∑(𝑋 − 𝐴)2

∑(𝑋 − X̅)2

(2 − 3)2 + (3 − 3)2

− 3)2

∑(𝑋 − 𝐴)2

𝑊ℎ𝑒𝑟𝑒 𝐴 = 4

(2 − 4)2 + (3 − 4)2

− 4)2

SSA Statistics 2.14

Property 3: AM is affected due to a change of origin (+/−) and / or scale (×/÷)

i.e., If 𝑦 = 𝑎 + 𝑏𝑥, then the AM of y is given by y̅ = 𝑎 + 𝑏�̅� (where a is change of origin and b is change

of scale)

Illustration: Consider (𝑋) = 2, 3, 4,

Formula Calculation Answer �̅� =∑ 𝒀

𝒏= 𝒂 + 𝒃𝒙

1 𝑿 = 𝟐, 𝟑, 𝟒, �̅� =∑ 𝑿

𝟐 + 𝟑 + 𝟒

𝟑 3

2 𝑌 = 4, 5, 6, �̅� =∑ 𝑌

4 + 5 + 6

Change of Origin (𝑎 =

2) 𝐵𝑒𝑖𝑛𝑔 𝑌 = 𝑋 + 2

= 𝑎

+ 𝑏�̅�

2 + 1 ×3 5

3 𝑌 = 0, 1, 2, �̅� =∑ 𝑌

0 + 1 + 2

Change of Origin (𝑎 =

−2) 𝐵𝑒𝑖𝑛𝑔 𝑌 = 𝑋 − 2

= 𝑎

+ 𝑏�̅�

−2 + 1 ×3 1

4 𝑌 = 4, 6, 8, �̅� =∑ 𝑌

4 + 6 + 8

Change of Scale (𝑏 = 2)

𝐵𝑒𝑖𝑛𝑔 𝑌 = 𝑋 × 2

= 𝑎

+ 𝑏�̅�

0 + 2 ×3 6

5 𝑌 = 1, 1.5, 2, �̅� =∑ 𝑌

1 + 1.5 + 2

Change of Scale (𝑏 =1

𝐵𝑒𝑖𝑛𝑔 𝑌 = 𝑋 ×1

= 𝑎

+ 𝑏�̅�

2×3 1.5

6 𝑌 = 7, 9, 11, �̅� =∑ 𝑌

7 + 9 + 11

3 9 Change of Origin and

change of scale

(𝑎 = 3)&(𝑏 = 2) 𝐵𝑒𝑖𝑛𝑔 𝑌

= 3 + 2 × 𝑋

= 𝑎

+ 𝑏�̅�

3 + 2 ×3 9

Property 4: If there are two groups containing 𝑛1 and 𝑛2 observations and �̅�1 and �̅�2 as the respective

arithmetic means, then the combined AM is given by (�̅�12) =𝑛1�̅�̅1+𝑛2�̅�̅2

𝑛1+𝑛2

Illustration Combined mean Calculation Answer

Group 1 Group II

𝑛1 = 5 𝑛2 = 15

�̅�1 = 9 �̅�2 = 5

�̅�12

=𝑛1�̅�1 + 𝑛2�̅�2

𝑛1 + 𝑛2

(5 × 9) + (15 × 5)

5 + 15 6

SSA Statistics 2.15

Points to Ponder:

1 In the case of “n” number of groups, Combined mean (�̅�1…𝑛) =∑𝑛𝑖�̅�̅𝑖

∑𝑛𝑖

2 If sizes of the group are same, then the combined Mean is the average of the group means

Explanation: If 𝑛1= 𝑛2 =n, then �̅�1+2 =𝑛�̅�̅1+𝑛�̅�̅2

𝑛+𝑛=

𝑛(�̅�̅1+�̅�̅2)

2𝑛=

�̅�̅1+ �̅�̅2

Illustration

1 𝑋1

= 2, 3, 4, �̅�1 =

∑ 𝑋1

2 + 3 + 4

2 𝑋2

= 4, 5, 6, �̅�2 =

∑ 𝑋2

4 + 5 + 6

3 �̅�1+2

=𝑛�̅�1 + 𝑛�̅�2

𝑛 + 𝑛

3 × 3 + 3 × 5

3 + 3 4

�̅�1+2

=𝑛(�̅�1 + �̅�2)

3(3 + 5)

2 × 3 4

�̅�1+2 =�̅�1 + �̅�2

3 If the averages are same, then the combined mean is the average itself

Explanation: If �̅�1 = �̅�2 = �̅�12

�̅�12 =𝑛1�̅� + 𝑛2�̅�

𝑛1 + 𝑛2

=�̅�(𝑛1 + 𝑛2)

𝑛1 + 𝑛2

Illustration

1 𝑋1

= 2, 3, 4, �̅�1 =

∑ 𝑋1

2 + 3 + 4

2 𝑋2

= 4, 2, �̅�2 =

∑ 𝑋2

3 �̅�1+2

=𝑛�̅�1 + 𝑛�̅�2

𝑛 + 𝑛

3 × 3 + 2 × 3

3 + 2 3

�̅�(𝑛1 + 𝑛2)

𝑛1 + 𝑛2

3(3 + 2)

2 + 3 3

�̅�1+2 = �̅�1

= �̅�2 3

Geometric Mean

Property 1: Transformation in terms of log function

𝐺𝑀 = 𝐴𝑛𝑡𝑖𝑙𝑜𝑔 (1

𝑛∑ 𝑙𝑜𝑔 𝑥) 𝑂𝑟 𝑙𝑜𝑔 𝐺𝑀 =

𝑛∑ 𝑙𝑜𝑔 𝑥

Property 2: If all the observations assumed by a variable are constants, say 𝑘 > 0, then the GM of the

observations is also K.

SSA Statistics 2.16

Property Illustration Calculation Answer

(𝑘 × 𝑘 × … .× 𝑘)1 𝑛⁄

= 𝑘

Consider: 2, 2, 2 = (2 × 2 × 2)1 3⁄ 2

Property 3: GM of the product of two variables is the product of their GM‘s.

Property 4: GM of the ratio of two variables is the ratio of the GM’s of the two variables.

Illustration Formula Calculation Answer

𝑋 = 3, 6, 12 GM = (𝑋1 × 𝑋2 × …

× 𝑋)1

(3 × 6

× 12)1 3⁄ 6

𝑌 = 1, 2, 4 (1 × 2

× 4)1 3⁄ 2

𝑍 = 3, 12, 48 (3 × 12

× 48)1 3⁄ 12

Property 3 Being 𝑍= 𝑋 × 𝑌

GM𝑍 = GM𝑋 × GM𝑌 6 × 2 12

𝑍 = 3

(3 × 3

× 3)1 3⁄ 3

Property 4 Being 𝑍 =𝑋

𝑌 𝐺𝑀𝑧 =

GM𝑋

GM𝑌

Harmonic Mean:

Property 1: If all the observations taken by a variable are constants, say k, then the HM of the

observations is also k.

Property Illustration Calculation Answer

𝑘+ … . +

= 𝑘

𝑋 = 2, 2, 2 31

Property 2: If there are two groups containing 𝒏𝟏 and 𝒏𝟐 observations and 𝑿𝟏 and 𝑿𝟐 as the respective

Harmonic Means, then the combined HM is given by (�̅�𝟏𝟐) = 𝒏𝟏+𝒏𝟐𝒏𝟏�̅�𝟏

+ 𝒏𝟐�̅�𝟐

Illustration Combined H.M. Calculation Answer

Group 1 Group II

𝑛1 = 15 𝑛2 = 10

�̅�1 = 3 �̅�2 = 2

�̅�𝟏𝟐 =𝑛1 + 𝑛2𝑛1

�̅�̅1+

�̅�̅2

15 + 1015

Median:

Property 1: If x and y are two variables, to be related by 𝑌 = 𝑎 + 𝑏𝑋 for any two constants a and b, then

the median of y is given by 𝑌𝑀𝑒= 𝑎 + 𝑏𝑋𝑀𝑒

(i.e., Median is affected due to a change of origin (+/−) and / or scale (×/÷))

Illustration: Consider (𝑋) = 2, 3, 4,

Formula Calculation Answer 𝒀𝑴𝒆= 𝒂 + 𝒃𝑿𝑴𝒆

SSA Statistics 2.17

1 𝑋 = 2, 3, 4,

�̅�𝑀𝑒

= (𝑛 + 1

𝑡ℎ

𝑜𝑏𝑠 (

𝑡ℎ

𝑜𝑏𝑠 3

2 𝑌 = 4, 5, 6,

�̅�𝑀𝑒

= (𝑛 + 1

𝑡ℎ

𝑜𝑏𝑠 (

𝑡ℎ

𝑜𝑏𝑠 5 Change of Origin

(𝑎 = 2)

𝐵𝑒𝑖𝑛𝑔 𝑌

= 𝑋 + 2 𝑌𝑀𝑒

= 𝑎 + 𝑏𝑋𝑀𝑒 2 + 1 ×3 5

3 𝑌 = 0, 1, 2,

�̅�𝑀𝑒

= (𝑛 + 1

𝑡ℎ

𝑜𝑏𝑠 (

𝑡ℎ

(𝑎 = −2)

= 𝑋 − 2 𝑌𝑀𝑒

= 𝑎 + 𝑏𝑋𝑀𝑒 −2 + 1 ×3 1

4 𝑌 = 4, 6, 8,

�̅�𝑀𝑒

= (𝑛 + 1

𝑡ℎ

𝑜𝑏𝑠 (

𝑡ℎ

𝑜𝑏𝑠 6 Change of Scale (𝑏 =

= 𝑋 × 2 𝑌𝑀𝑒

= 𝑎 + 𝑏𝑋𝑀𝑒 0 + 2 ×3 6

5 𝑌 = 1, 1.5, 2,

�̅�𝑀𝑒

= (𝑛 + 1

𝑡ℎ

𝑜𝑏𝑠 (

𝑡ℎ

𝑜𝑏𝑠 1.5 Change of Scale

(𝑏 =1

= 𝑋 ×1

𝑌𝑀𝑒= 𝑎 + 𝑏𝑋𝑀𝑒

2×3 1.5

6 𝑌 = 7, 9, 11,

�̅�𝑀𝑒

= (𝑛 + 1

𝑡ℎ

𝑜𝑏𝑠 (

𝑡ℎ

change of scale

(𝑎 = 3)&(𝑏 = 2) 𝐵𝑒𝑖𝑛𝑔 𝑌

= 3 + 2 × 𝑋 𝑌𝑀𝑒

= 𝑎 + 𝑏𝑋𝑀𝑒 3 + 2 ×3 9

Property 2: For a set of observations, the sum of absolute deviations is minimum when the deviations

are taken from the median.

Illustration: Consider (X): 0.5, 3, 4

Calculation Answer Property

𝑀𝑒

= (𝑛 + 1

𝑡ℎ

𝑜𝑏𝑠 (

𝑡ℎ

𝑜𝑏𝑠 3

�̅� =∑ 𝑋

0.5 + 3 + 4

(a) ∑ |𝑋 − X̅| |0.5 − 2.5| + |3 − 2.5|

+ |4 − 2.5| 4

(𝑏)< (𝑎)

(b) ∑ |𝑋 − 𝑀𝑒| |0.5 − 3| + |3 − 3| + |4 − 3| 3.5

SSA Statistics 2.18

Property 1: If 𝑌 = 𝑎 + 𝑏𝑋, then 𝑌𝑀𝑜= 𝑎 + 𝑏𝑋𝑀𝑜

(i.e., Mode is affected due to a change of origin (+/−) and / or scale (×/÷))

Illustration: Consider (𝑋) = 2, 3, 3, 4

Formula Calculation Answer 𝒀𝑴𝒐= 𝒂 + 𝒃𝑿𝑴𝒐

1 𝑋 = 2, 3, 3, 4, Most usual 3

2 𝑌 = 4, 5, 5, 6, 5 Change of Origin (𝑎 =

2) 𝐵𝑒𝑖𝑛𝑔 𝑌

= 𝑋 + 2

𝑌𝑀𝑜

= 𝑎 + 𝑏𝑋𝑀𝑜

2 + 1 ×3 5

3 𝑌 = 0, 1, 1, 2, Most usual 1 Change of Origin (𝑎 =

−2) 𝐵𝑒𝑖𝑛𝑔 𝑌

= 𝑋 − 2

𝑌𝑀𝑜

−2 + 1 ×3 1

4 𝑌 = 4, 6, 6, 8, Most usual 6 Change of Scale (𝑏 =

2) 𝐵𝑒𝑖𝑛𝑔 𝑌

= 𝑋 × 2

𝑌𝑀𝑒

= 𝑎 + 𝑏𝑋𝑀𝑒

0 + 2 ×3 6

5 𝑌

= 1, 1.5, 1.5, 2, Most usual 1.5

Change of Scale (𝑏 =

= 𝑋 ×1

𝑌𝑀𝑜

2×3 1.5

6 𝑌 = 7, 9, 9, 11, Most usual 9 Change of Origin and

change of scale

(𝑎 = 3)&(𝑏 = 2)

= 3 + 2 × 𝑋

𝑌𝑀𝑜

3 + 2 ×3 9

(B) MEASURES OF DISPERSION: PROPERTY

Property Measure / Explanation

1 All the observations assumed by a variable are

constant,

then measure of dispersion = 0

Range (R) = 0

Mean Deviation (MD) = 0

Standard Deviation (s) = 0

Illustration: Consider (𝑿): 2, 2, 2

�̅� =∑ 𝑿

2 + 2 + 2

Range = L – S 2 − 2

𝑀𝐷X̅

𝑛∑|𝑋 − X̅|

|2 − 2| + |2 − 2| + |2 − 2|

𝑆𝐷

= √∑(𝑋 − X̅)2

√∑(𝑋 − 2)2

SSA Statistics 2.19

2 Affected due to change of Scale, but not of origin 𝑅𝑦 = 0 + |𝑏| × 𝑅𝑥

𝑀𝐷y̅ = 0 + |𝑏| × MDx̅

𝑠𝑦 = 0 + |𝑏| × 𝑠𝑥̅

3 Mean deviation takes its minimum value

when A = Median

𝑀𝐷𝑀𝑒=

𝑛∑|𝑋 − 𝑀𝑒| is minimum

4 Combined SD 𝑠12

= √𝑛1𝑆1

2 + 𝑛2𝑆22 + 𝑛1𝑑1

2 + 𝑛2𝑑22

𝑛1 + 𝑛2

where 𝑑1 = �̅�1 − �̅�12 and 𝑑2 =

�̅�2 − �̅�12

Note: If �̅�1 = �̅�2 , then �̅�1 = �̅�2 =

�̅�12

Then 𝑑1 = 0 & 𝑑2 = 0

∴ 𝑠12 = √𝑛1𝑆1

2 + 𝑛2𝑆22

𝑛1 + 𝑛2

Illustration Calculation

𝑛1 = 5 𝑛2

�̅�1

�̅�2

�̅�12 = 6

𝑠12

= √5 × (0.8)2 + (15 × (0.5)2) + (5 × 32) + (15 × (−1)2)

5 + 15

𝑑1 = �̅�1 - �̅�12 = 9 – 6 = 3

𝑑2 = �̅�2 − �̅�12 = 5 − 6 = −1

Problem for SD under Change of scale and origin

Formula Calculation Answer �̅� =∑ 𝒀

𝒏= 𝒂 + 𝒃𝒙

1 𝑿 = 𝟐, 𝟑, 𝟒, �̅� =∑ 𝑿

𝟐 + 𝟑 + 𝟒

𝟑 3

𝑅𝑋 = 𝐿 − 𝑆 4 − 2 2

𝑀𝐷x̅

=∑|𝑋 − X̅|

∑|𝑋 − 3|

𝑠𝑋

= √∑(𝑋 − X̅ )2

√∑(𝑋 − 3)2

3 0.82

2 𝑌 = 4, 5, 6, �̅� =∑ 𝑌

4 + 5 + 6

SSA Statistics 2.20

= 𝑋 + 2 y̅ = 𝑎 + 𝑏�̅� 2 + 1 ×3 5

Change of Origin

(𝑎 = 2)

𝑅𝑌 = 𝐿 − 𝑆 6−4 2

𝑀𝐷Y̅

=∑|𝑌 − Y̅|

∑|𝑌 − 5|

𝑠𝑌

= √∑(𝑌 − Y̅ )2

√∑(𝑌 − 3)2

3 0.82

3 𝑌 = 0, 1, 2, �̅� =∑ 𝑌

0 + 1 + 2

Change of Origin

(𝑎 = −2)

= 𝑋 − 2 y̅ = 𝑎 + 𝑏�̅� −2 + 1 ×3 1

𝑅𝑌 = 𝐿 − 𝑆 2 − 0 2

𝑀𝐷Y̅

=∑|𝑌 − Y̅|

∑|𝑌 − 1|

𝑠𝑌

= √∑(𝑌 − Y̅ )2

√∑(𝑌 − 1)2

3 0.82

4 𝑌 = 4, 6, 8, �̅� =∑ 𝑌

4 + 6 + 8

= 𝑋 × 2 y̅ = 𝑎 + 𝑏�̅� 0 + 2 ×3 6

𝑅𝑌 = 𝐿 − 𝑆 8 − 4 4

𝑀𝐷Y̅

=∑|𝑌 − Y̅|

∑|𝑌 − 6|

𝑠𝑌

= √∑(𝑌 − Y̅ )2

√∑(𝑌 − 6)2

3 1.64

5 𝑌 = 1, 1.5, 2, �̅� =∑ 𝑌

1 + 1.5 + 2

= 𝑋 ×1

y̅ = 𝑎 + 𝑏�̅� 0 +1

2×3 1.5

𝑅𝑌 = 𝐿 − 𝑆 2 − 1 1

𝑀𝐷Y̅

=∑|𝑌 − Y̅|

∑|𝑌 − 1.5|

𝑠𝑌

= √∑(𝑌 − Y̅ )2

√∑(𝑌 − 1.5)2

3 0.41

6 𝑌 = 7, 9, 11, �̅� =∑ 𝑌

7 + 9 + 11

SSA Statistics 2.21

= 3 + 2 × 𝑋 y̅ = 𝑎 + 𝑏�̅� 3 + 2 ×3 9

Change of Origin

change of scale

(𝑎 = 3)&(𝑏 = 2)

𝑅𝑋 = 𝐿 − 𝑆 11 − 7 4

𝑀𝐷x̅

=∑|𝑋 − X̅|

∑|𝑋 − 9|

𝑠𝑋

= √∑(𝑋 − X̅ )2

√∑(𝑋 − 9)2

3 0.41

Coefficient of Variation (CV): 𝐶𝑉 =𝑠

�̅�̅× 100

Illustration Calculation Comparison

�̅�1 = 9 �̅�2 = 5

𝑠1 = 0.8 𝑠2 = 0.5

�̅�12 = 6

𝐶𝑉(𝐼) =0.8

9 × 100

= 8.88%

𝐶𝑉(𝐼𝐼) =0.5

5× 100

𝐶𝑉(𝐼)

= 8.88%

< 𝐶𝑉(𝐼𝐼) = 10%

More Stable

Consistent

Less Variable

Dispersed

Less Stable

Consistent

More Variable

Dispersed

EXTRA PROBLEMS

Comparison between Arithmetic Mean and Geometric Mean

Question 1: Find the average rate of return.

Year 1 2 3

Rate of Return (r %) 10% 60% 20%

Answer: The average rate of return

= (𝑋1 × 𝑋2 × …

× 𝑋𝑛)1

(1.10 × 1.60

× 1.20)1 3⁄

1.283 𝑜𝑟 128.3% 𝑜𝑟 28.3%

AM X̅ =

∑ 𝑋

1.10 + 1.60 + 1.20

1.3 𝑜𝑟 130% 𝑜𝑟 30%

which is not possible

Comparison between Arithmetic Mean and Harmonic Mean

Question 2: An aeroplane covered a distance of 800 miles with four different speeds of 100, 200, 300

and 400 m/p.h for the first, second, third and fourth quarter of the distance. Find the average speed in

miles per hour.

Answer: The average speed is given by the H.M. of the given set of data.

SSA Statistics 2.22

H M 𝐻𝑀

192 m/p.h

AM X̅ =

∑ 𝑋

100 + 200 + 300 + 400

250 m/p.h,

which is not true

Combined Mean

Question 3: Two groups of students reported mean weights of 160 kg and 150 kg respectively. Find

out, when the weight of both the groups together be 155 kg?

Answer:

Given Data Formula Calculation Answer

Number 𝑁1 𝑁2

X̅1 =

X̅2 =

Combined Mean: X̅12 = 155kg

=𝑁1X̅1 + 𝑁2�̅�2

𝑁1 + 𝑁2

155 =160𝑁1 + 150𝑁2

𝑁1 + 𝑁2

155𝑁1 + 155𝑁2

= 160𝑁1 + 150𝑁2

𝑁1 = 𝑁2

Question 4: Show that for any two numbers a and b, standard deviation is given by |𝑎−𝑏|

Answer: For two numbers a and b, AM is given by X̅ =𝑎+𝑏

The variance is =∑(𝑋𝑖 − X̅)2

=(𝑎 −

𝑎+𝑏

+ (𝑏 − 𝑎+𝑏

(𝑎−𝑏)

+ (𝑎−𝑏)2

(𝑎 − 𝑏)2

4 ⟹ 𝑠 =

|𝑎 − 𝑏|

(The absolute sign is taken, as SD cannot be negative).

Question 5: Prove that for the first n natural numbers, 𝑖𝑠 √𝑛2− 1

Answer: for the first n natural numbers AM is given by

X̅ =1 + 2 + … … … + 𝑛

𝑛(𝑛 + 1)

2𝑛=

𝑛 + 1

∴ 𝑆𝐷 = √∑ 𝑋𝑖

𝑛− X̅2 = √

12 + 22 + 32 … … . . +𝑛2

𝑛− (

𝑛 + 1

√𝑛(𝑛 + 1)(2𝑛 + 1)

6𝑛− (

𝑛 + 1

= √(𝑛 + 1)(2𝑛 + 1)

6− (

𝑛 + 1

√(𝑛 + 1)(2𝑛 + 1)

𝑛 + 1

2= √(𝑛 + 1) (

(2𝑛 + 1)

𝑛 + 1

√(𝑛 + 1)(4𝑛 + 2 − 3𝑛 − 3)

12= √

𝑛2 − 1

Thus, SD of first n natural numbers is SD = √𝑛2 − 1

SSA Statistics 2.23

COMPARISON BETWEEN MEASURES OF CENTRAL TENDENCY N

1 Well defined Yes Yes Yes Yes

No (when the

number of

observations is

small, then use

Empirical

Relationship)

Yes Yes A may be

X̅, 𝑀𝑒, 𝑀𝑜 Yes

Easy to calculate &

simple to

understand

Yes No No Yes

Location Method,

but not Grouping

method

Yes Yes Yes No

3 Based on all the

items Yes

Yes (but able

to find only

for Positive

Values)

positive

values

and no

“0”)

No No No No Yes Yes

capable of further

mathematical

treatment

Yes (Useful

calculation of

Numbers)

Yes (but only in

Mean Deviation,

no combined

Median)

No (But in case

of Quality

control and

stock market

fluctuations)

No (Useful for

Economists and

Businessmen and

in public reports)

5 Good basis for

comparison Yes Not much Yes

6 Necessary for

arrange of data No No No Yes No ------Not on Discussion-----

7 Affected by extreme

values Yes

Yes (Not

much Yes No No Yes No Less than SD Yes

SSA Statistics 2.24

compared to

Not Precise – Mis-

leading impressions

(E.g. Average

number of persons

is 1.5 which is not

possible)

Yes (except

when Median

lies in between

two values)

Yes (except on

continuous series) ------Not on Discussion-----

9 Location

(Inspection) Method No No No

Yes (on

arrangement) Yes ------Not on Discussion-----

10 Graphical Method Yes (using Ogive

Curves) ------Not on Discussion-----

Calculated in the

case of open end

class intervals

No No No Yes Yes No Yes Based on “A” No

Affected by

sampling

fluctuations

(least) No No Yes Yes Yes Yes Yes

affected

Affected by Change

of origin Yes Yes Yes Yes Yes No No No No

Affected by Change

of Scale Yes Yes Yes Yes Yes Yes Yes Yes Yes

SSA Statistics 2.25

Explanations to Formulae:

1. Geometric Mean

Logarithmic formulae of Geometric Mean

Individual Observation Discrete Continuous

GM = √𝑥1 × 𝑥2 × … .× 𝑥𝑛𝑛

log 𝐺. 𝑀 = log √𝑥1 × 𝑥2 × … .× 𝑥𝑛𝑛

𝑛log(𝑥1 × 𝑥2 ×. . .× 𝑥𝑛)

𝑛(log 𝑥1 + log 𝑥2

+ … . log 𝑥𝑛)

𝑛∑ log 𝑥

GM = Anti log (1

𝑛∑ log 𝑥)

GM = √𝑥1𝑓1 × 𝑥2

𝑓2 × … . 𝑥𝑛𝑓𝑛

log 𝐺. 𝑀 = log √𝑥1𝑓1 × 𝑥2

𝑓2 × … . 𝑥𝑛𝑓𝑛

𝑁[(log 𝑥1

𝑓1 × 𝑥2𝑓2 × … . 𝑥𝑛

𝑓𝑛)]

𝑁[log 𝑥1

𝑓1 + log 𝑥2𝑓2

+ … . log 𝑥𝑛𝑓𝑛]

𝑁[𝑓1 log 𝑥1 + 𝑓2 log 𝑥2

+ ⋯ 𝑓𝑛 log 𝑥𝑛]

𝑁∑ 𝑓 log 𝑥

GM = Antilog 1

𝑁∑ 𝑓 log 𝑥

GM = √

𝑚1𝑓1 × 𝑚2

𝑓2 ×

… .× 𝑚𝑛𝑓𝑛

log 𝐺. 𝑀 = log √

𝑚1𝑓1 × 𝑚2

𝑓2 ×

… × 𝑚𝑛𝑓𝑛

𝑁[(log 𝑚1

𝑓1 × 𝑚2𝑓2 × … × 𝑚𝑛

𝑓𝑛)]

𝑁[log 𝑚1

𝑓1 + log 𝑚2𝑓2

+ … . log 𝑚𝑛𝑓𝑛]

𝑁[𝑓1 log 𝑚1 + 𝑓2 log 𝑚2

+ ⋯ 𝑓𝑛 log 𝑚𝑛]

𝑁∑ 𝑓 log 𝑚

GM = Antilog

𝑁∑ 𝑓 log 𝑚

SSA Statistics 2.26

Standard Deviation:

𝑠 = √∑(𝑋 − X̅)2

∑(𝑋 − X̅)2 = ∑[𝑋2 − 2𝑋X̅ + X̅2]

∑(𝑋 − X̅)2 = ∑ 𝑋2 − ∑(2𝑋X̅) + ∑ X̅2

∑(𝑋 − X̅)2 = ∑ 𝑋2 − 2X̅ ∑ 𝑋 + 𝑛X̅2

∑(𝑋 − X̅)2 = ∑ 𝑋2 − 2∑ 𝑋

𝑛∑ 𝑋 + 𝑛.

∑ 𝑋

𝑛.∑ 𝑋

∑(𝑋 − X̅)2 = ∑ 𝑋2 − 2(∑ 𝑋)2

(∑ 𝑋)2

∑(𝑋 − X̅)2 = ∑ 𝑋2 − 2(∑ 𝑋)2

(∑ 𝑋)2

∑(𝑋 − X̅)2 = ∑ 𝑋2 −(∑ 𝑋)2

𝑛(2 − 1)

∑(𝑋 − X̅)2

∑ 𝑋2 −(∑ 𝑋)2

∑(𝑋 − X̅)2

𝑛 ∑ 𝑋2−(∑ 𝑋)2

∑ 𝑋2

𝑛− (

∑ 𝑋

=∑ 𝑋2

𝑛− X̅2

SSA Statistics 2.27

Graphical Method

Weighted Average:

1. Calculate goodwill using weighted average method:

Profit 20,000 10,000 (7000)

Weight 3 2 1

Missing Frequency:

1. Given N = 581 and Mean = 15. Find the missing frequencies.

x 10 11 12 13 14 15 16 17 18 19

f 8 15 x 100 98 95 y 75 50 30

2. Given Mean = 47, Median = 45, Mode = 35 and N= 90. Find the missing frequencies.

Marks 01-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100

Number of Students 3 7 x 17 12 y 8 8 6 6

SSA Statistics 3.1

3. Probability

Introduction:

‘Probably’ “in all likelihood’, ‘chance’, ‘odds in favour, odds against

A branch of Mathematics

An integral part of statistics

Application on testing Hypothesis/Estimation

First Application – by a group of mathematicians in Europe about 300 hundreds years back to enhance

their chances of winning in different games of gambling.

Development by Mathematicians & Statisticians

Abraham De Moicere, & Piere-Simon De Laplace of France, Reverend Thomas Bayes & R.A.

Fisher of England, chebyshev, Morkov, Khinchin, Kolmgorov of Russia.

Divisions

Subjective

Dependent on personal judgement and experience, influenced by the personal belief, attitude

& bias.

Helpful in the field of uncertainty & in the area of decision making management.

Objective: The measure based on a recorded observation rather than a subjective estimate.

Definition/Terms

Experiment: An experiment may be described as a performance that produces certain results.

Random Experiment: An experiment is defined to be random if the results of the experiment

depend on chance only.

Trial: The result is known only after the experiment is done.

Events: The results or outcomes of a random experiment are known as events.

Sometimes events may be combination of outcomes.

Sample Space (S): The set of all events (Hence, applicability of set theory)

Example

Tossing of coin Experiment

Tossing of “any” coin Random Experiment

“Tossing” Trial

Head – H and Tail – T Events

S = {H,T} Sample Space

SSA Statistics 3.2

Types of Events

Events Examples

I Simple/Elementary – No decomposition Toss a coin, S = {H, T}

Composite/compound – Decomposed into

two or more events.

Toss two coins, S = {HH, TT, TH, HT}

II Mutually Exclusive Events / Incompatible

Events

Not more than one events occur

simultaneously

Happening of one excludes the

happening of the other.

Occurrence of one event implies the

non-occurrence of the other events.

On tossing a coin

Mutually Exclusive:

If H occurs, T does not occur

Exhaustive events:

Either H or T occurs

Equally Likely:

H & T has equal chances of occurrence.

Exhaustive events – one of the events in the

sample space must necessarily occur

Equally Likely Events / Mutually

symmetric Events / Equi-probable

Equality of the events.

No event in expected to occur more

frequently as compared to the other events

III Finite Events: n (number of events) is finite Tossing 1 coin, n=2

Infinite Events: n is Infinite Tossing a coin continuously, 𝑛 = ∞

IV Unbiased Events: getting events on

performing

On tossing a coin, either H or T will

definitely turn up.

Biased Events: Not getting the events on the

performance

On tossing a coin in a sandy floor, one side

of the coin showing H & the other site

showing T

V Sure event: P(A) = 1

On tossing a coin,

Let A – Getting H or T, P(A) =1

Let B – Getting neither H nor T, P(B) = 0 Impossible event: P(A) = 0

VI Dependent Events: The event depends on

the previous trials.

A box contains 5 balls.

If First ball is drawn and not replaced,

then drawing the second ball is a

dependent event.

If First ball drawn is not replaced, then

drawing the second ball is an independent

event.

Independent Events: The event does not

depends on the previous trials.

SSA Statistics 3.3

Different Definitions on probability

I. Classical Definition / Aprior Definition

Let n – finite elementary events/equally likely

𝑛𝐴(≤ 𝑛) - favourable to A.

Then, 𝑃(𝐴) = 𝑛𝐴

𝑁𝑜.𝑜𝑓 𝑒𝑞𝑢𝑎𝑙𝑙𝑦 𝑙𝑖𝑘𝑒𝑙𝑦 𝑒𝑣𝑒𝑛𝑡𝑠 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒 𝑡𝑜 𝐴

𝑡𝑜𝑡𝑎𝑙 𝑛𝑜.𝑜𝑓 𝑒𝑞𝑢𝑎𝑙𝑙𝑦 𝑙𝑖𝑘𝑒𝑙𝑦 𝑒𝑣𝑒𝑛𝑡𝑠

Also, m(≤ 𝑛) – Composite events / mutually Exclusive and Exhaustive Equally likely

𝑚𝐴(≤ 𝑚) – favourable to 𝐴

Then, 𝑷(𝑨) = 𝒎𝑨

𝑵𝒐.𝒐𝒇 𝒎𝒖𝒕𝒖𝒂𝒍𝒚 𝑬𝒙𝒄𝒍𝒖𝒔𝒊𝒗𝒆 𝑬𝒙𝒉𝒂𝒖𝒔𝒕𝒊𝒗𝒆 & 𝒆𝒒𝒖𝒂𝒍𝒍𝒚 𝒍𝒊𝒌𝒆𝒍𝒚 𝒆𝒗𝒆𝒏𝒕𝒔 𝒇𝒂𝒗𝒐𝒖𝒓𝒂𝒃𝒍𝒆 𝒕𝒐 𝑨

𝑻𝒐𝒕𝒂𝒍 𝒏𝒐.𝒐𝒇 𝒎𝒖𝒕𝒖𝒂𝒍𝒚 𝒆𝒙𝒉𝒖𝒔𝒊𝒗𝒆,𝒆𝒙𝒉𝒂𝒖𝒔𝒊𝒗𝒆 & 𝒆𝒒𝒖𝒂𝒍𝒍𝒚 𝒍𝒊𝒌𝒆𝒍𝒚 𝒆𝒗𝒆𝒏𝒕𝒔.

Points to Ponder:

1. Indebted to Bernoulli/ Laplace.

2. Based on prior knowledge

Demerits / Limitations

1. n-finite

2. Assumption: Events must equally likely / equi – probable

3. Limited applications (events – certain) – Coin tossing, dice throwing

4. Inapplicability – field of uncertainity / no prior knowledge.

1. 0 ≤ 𝑃(𝐴) ≤ 1

𝑃(𝐴) = 0 – Impossible event and 𝑃(𝐴) = 1 - sure event

2. Complimentary Event:

A’ / 𝐴𝑐 / �̅� – Non – occurence of event A

Points to Ponder:

• A &A’ are mutually Exclusive

• 𝑃(𝐴) + 𝑃(A’) = 1

P(A’) = 1 - 𝑚𝐴

𝑚 =

𝑚−𝑚𝐴

3. Odds in favour of A = 𝑚𝐴: (𝑚 − 𝑚𝐴)

Odds against A = (𝑚 − 𝑚𝐴):𝑚𝐴

Question 1: A coin is tossed three times. What is the probability of getting 2 heads or At least 2 heads?

Answer: All the elementary events, when a coin is tossed three times,

𝑆 = {𝐻𝐻𝐻, 𝐻𝐻𝑇, 𝐻𝑇𝐻, 𝐻𝑇𝑇, 𝑇𝐻𝐻, 𝑇𝐻𝑇, 𝑇𝑇𝐻, 𝑇𝑇𝑇}

𝑛 = 8

𝑛𝐴 = 𝑡𝑤𝑜 ℎ𝑒𝑎𝑑𝑠 𝑓𝑟𝑜𝑚 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑎𝑟𝑦 𝑒𝑣𝑒𝑛𝑡𝑠 = 3

𝑃(𝐴) =𝑛𝐴

𝑁𝑜. 𝑜𝑓 𝑒𝑞𝑢𝑎𝑙𝑙𝑦 𝑙𝑖𝑘𝑒𝑙𝑦 𝑒𝑣𝑒𝑛𝑡𝑠 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒 𝑡𝑜 𝐴

𝑇𝑜𝑡𝑎𝑙 𝑛𝑜. 𝑜𝑓 𝑒𝑞𝑢𝑎𝑙𝑙𝑦 𝑙𝑖𝑘𝑒𝑙𝑦 𝑒𝑣𝑒𝑛𝑡𝑠=

𝑛𝐴 = 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑡𝑤𝑜 ℎ𝑒𝑎𝑑𝑠 𝑓𝑟𝑜𝑚 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑎𝑟𝑦 𝑒𝑣𝑒𝑛𝑡𝑠 = 4

SSA Statistics 3.4

𝑃(𝐴) =𝑛𝐴

𝑁𝑜. 𝑜𝑓 𝑒𝑞𝑢𝑎𝑙𝑙𝑦 𝑙𝑖𝑘𝑒𝑙𝑦 𝑒𝑣𝑒𝑛𝑡𝑠 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒 𝑡𝑜 𝐴

𝑇𝑜𝑡𝑎𝑙 𝑛𝑜. 𝑜𝑓 𝑒𝑞𝑢𝑎𝑙𝑙𝑦 𝑙𝑖𝑘𝑒𝑙𝑦 𝑒𝑣𝑒𝑛𝑡𝑠=

Question 2: A dice is rolled twice. What is the probability of getting a difference of 2 points?

Answer: If an experiment results in p outcomes and if the experiment is repeated q times, then the

total number of outcomes is pq. In the present case, since a dice results in 6 outcomes and the dice is

rolled twice, total no. of outcomes or elementary events is 62 or 36. We assume that the dice is

unbiased which ensures that all these 36 elementary events are equally likely

Now a difference of 2 points in the uppermost faces of the dice thrown twice can occur in the

following cases:

1st Throw 2nd Throw Difference

Thus denoting the event of getting a difference of 2 points by A, we find that the no. of outcomes

favourable to A, from the above table, is 8. By classical definition of probability, we get

𝑃(𝐴) =8

Question 3: Two dice are thrown simultaneously. Find the probability that the sum of points on the

two dice would be 7 or more.

Answer: If two dice are thrown then, as explained in the last problem, total no. of elementary events is

62 or 36. Now a total of 7 or more i.e. 7 or 8 or 9 or 10 or 11 or 12 can occur only in the following

combinations:

SUM = 7 (1,6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)

SUM = 8 (2,6), (3,5), (4,4), (5,3), (6,2)

SUM = 9 (3,6), (4,5), (5,4), (6,3)

SUM = 10 (4,6), (5,5), (6,4)

SUM = 11 (5,6), (6,5)

SUM = 12 (6,6)

Thus the no. of favourable outcomes is 21. Letting A stand for getting a total of 7 points or more, we

have 𝑃(𝐴) =21

SSA Statistics 3.5

Question 4: What is the chance of picking a spade or an ace not of spade from a pack of 52 cards?

Answer: A pack of 52 cards contain 13 Spades, 13 Hearts, 13 Clubs and 13 Diamonds. Each of these

groups of 13 cards has an ace. Hence the total number of elementary events is 52 out of which 13 + 3 or

16 are favourable to the event A representing picking a Spade or an ace not of Spade. Thus we have

𝑃(𝐴) =16

Question 5: Find the probability that a 4 digit number comprising the digits 2, 5, 6 and 7 would be

divisible by 4.

Answer: Since there are four digits, all distinct, the total number of four digit numbers that can be

formed without any restriction is 4! or 4 × 3 × 2 × 1 or 24. Now a four digit number would be divisible

by 4 if the number formed by the last two digits is divisible by 4. This could happen when the four

digit number ends with 52 or 56 or 72 or 76. If we fix the last two digits by 52, and then the 1st two

places of the four digit number can be filled up using the remaining 2 digits in 2! or 2 ways. Thus there

are 2 four digit numbers that end with 52. Proceeding in this manner, we find that the number of four

digit numbers that are divisible by 4 is 4 × 2 or 8. If (A) denotes the event that any four digit number

using the given digits would be divisible by 4, then we have

𝑃(𝐴) =8

Question 6: A committee of 7 members is to be formed from a group comprising 8 gentlemen and 5

ladies. What is the probability that the committee would comprise:

a. 2 ladies,

b. at least 2 ladies.

Answer: Since there are altogether 8 + 5 or 13 persons, a committee comprising 7 members can be

formed in

13C7 or 13!

7!6! or

13×12×11×10×9×8×7!

7!×6×5×4×3×2×1 or 11×12×13 ways.

a. When the committee is formed taking 2 ladies out of 5 ladies, the remaining (7–2) or 5

committee members are to be selected from 8 gentlemen. Now 2 out of 5 ladies can be selected in 5C2

ways and 5 out of 8 gentlemen can be selected in 8C5 ways. Thus, if A denotes the event of having the

committee with 2 ladies, then A can occur in5C2× 8C5 or 5×4

2×1×

8×7×6

3×2 or 10×56 ways.

Thus 𝑃(𝐴) = 10×56

11×12×13=

Since the minimum number of ladies is 2, we can have the following combinations;

Population 5L 8G

Sample 2L + 5G

Or 3L + 4G

Or 4L + 3G

Or 5L + 2G

SSA Statistics 3.6

b. Thus if B denotes the event of having at least two ladies in the committee, then B can occur in

5C2×8C5 +5C3×8C4 + 5C4 ×8C3+5C5× 8C2 = 1568 ways.

𝐻𝑒𝑛𝑐𝑒 𝑃(𝐵) =1568

11 × 12 × 13=

II.Statistical Definition (Limiting form)

To overcome the limitation of finite number of elements in classical definition

Developed by British Mathematicians.

Here, P(A) = lim n⟶∞

𝐹𝐴

A occurs 𝐹𝐴 times - Random experiment repeated a very good number of times, say n, under an

identical set of conditions.

Applicability

1. Limit should exist

2. Tends to finite values

Question 7: The following data relate to the distribution of wages of a group of workers:

Wages in Rs 50 – 60 60 – 70 70 – 80 80 – 90 90 – 100 100 - 110 110 – 120

No. of workers 15 23 36 42 17 12 5

If a worker is selected at random from the entire group of workers, what is the probability that

a. his wage would be less than Rs 50?

b. his wage would be less than Rs 80?

c. his wage would be more than Rs 100?

d. his wages would be between Rs 70 and Rs 100?

Answer: As there are altogether 150 workers, n=150.

a. Since there is no worker with wage less than ₹50, the probability that the wage of a randomly

selected worker would be less than ₹50 is P(A) = 0

150 = 0

b. Since there are (15 + 23 + 36) or 74 worker having wages less than ₹80 out of a group of 150 workers,

the probability that the wage of a worker, selected at random from the group, would be less than

₹80 is 𝑃(𝐵) = 74

c. There are ( 12 +5) or 17 workers with wages more than ₹100. Thus the probability of finding a

worker, selected at random, with wage more than ₹100 is 𝑃(𝐶) = 17

d. There are (36+42+17) or 95 workers with wages in between ₹70 and ₹100. Thus 𝑃(𝐷) = 95

Operations on Sets

1. A∪B = {𝑥: 𝑥𝜖𝐴 𝑜𝑟 𝑥𝜖𝐵}

2. A∩B = {𝑥: 𝑥𝜖𝐴 & 𝑥𝜖𝐵}

3. A – B = {𝑥: 𝑥𝜖𝐴 & 𝑥 ∉ 𝐵}/ 𝐵 − 𝐴 = {𝑥: 𝑥𝜖𝐵 & 𝑥 ∉ 𝐴}

4. A’ = {𝑥: 𝑥 ∉ 𝐴}

SSA Statistics 3.7

Question 8: Three events A, B and C are mutually exclusive, exhaustive and equally likely. What is

the probably of the complementary event of A?

Answer: A, B and C are

Mutually exclusive: 𝑃(𝐴 ∪ 𝐵 ∪ 𝐶) = 𝑃(𝐴) + 𝑃(𝐵) + 𝑃(𝐶)

Exhaustive: 𝑃(𝐴 ∪ 𝐵 ∪ 𝐶) = 1

Equally likely; 𝑃(𝐴) = 𝑃(𝐵) = 𝑃(𝐶)

Thus, Combining the above

1 = 𝑘 + 𝑘 + 𝑘

⇒ 𝑘 = 1

Thus 𝑃(𝐴) = 𝑃(𝐵) = 𝑃(𝐶) =1

Hence 𝑃(𝐴′) = 1 −1

III. Axiomatic/modern

Let 𝐴 ⊆ 𝑆.

Then, real valued function P = 𝑃(𝐴) − probability of A, if P satisfies the following axioms:

1. P(A) ≥ 0 for every A ≤ S

2. P(S) = 1

3. For any sequence of mutually exclusive events 𝐴1, 𝐴2, 𝐴3, … ..

𝑃(𝐴1 ∪ 𝐴2 ∪ 𝐴3 ∪ … … ) = 𝑃(𝐴1) + 𝑃(𝐴2) + 𝑃(𝐴3 ) + … …

Addition Theorem / Theorem on Total probability

Theorem – 1:

Let A & B (k, no. of events=2) be ME, then

𝑃(𝐴 ∪ 𝐵) 𝑜𝑟 𝑃(𝐴 + 𝐵) 𝑜𝑟 𝑃(𝐴 𝑜𝑟 𝐵) = 𝑃(𝐴) + 𝑃(𝐵)

Question 9: A number is selected from the first 25 natural numbers. What is the probability that it

would be divisible by 4 or 7?

Answer: Let A be the event that the number selected would be divisible by 4 and B, the event that the

selected number would be divisible by 7. Then AUB denotes the event that the number would be

divisible by 4 or 7. Next we note that A = {4, 8, 12, 16, 20, 24} and B = {7, 14, 21} whereas S = {1, 2, 3,

……... 25}. Since A∩B =𝜙 the two events A and B are mutually exclusive and as such we have

𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵)

𝑃(𝐴) = 𝑛(𝐴)

𝑛(𝑆)=

25 and 𝑃(𝐵) =

𝑛(𝐵)

𝑛(𝑆)=

∴ 𝑃(𝐴 ∪ 𝐵) =6

Hence the probability that the selected number would be divisible by 4 or 7 is 9

25 or 0.36

SSA Statistics 3.8

Question 10: A coin is tossed thrice. What is the probability of getting 2 or more heads?

Answer: If a coin is tossed three times, then we have the following sample space.

S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} 2 or more heads imply 2 or 3 heads.

If A and B denote the events of occurrence of 2 and 3 heads respectively, then we find that

A = {HHT, HTH, THH} and B = {HHH}

𝑃(𝐴) = 𝑛(𝐴)

𝑛(𝑆)=

8 and 𝑃(𝐵) =

𝑛(𝐵)

𝑛(𝑆)=

As A and B are mutually exclusive, the probability of getting 2 or more heads is

∴ 𝑃(𝐴 ∪ 𝐵) =3

8= 0.5

Theorem – 2: (Extension of Theorem –1)

𝐿𝑒𝑡 𝐴1, 𝐴2, … . , 𝐴𝑘( 𝑘 ≥ 2)𝑏𝑒 𝑡ℎ𝑒 𝑀𝐸 𝑒𝑣𝑒𝑛𝑡𝑠, 𝑡ℎ𝑒𝑛

P(𝐴1 ∪ 𝐴2 ∪ … . .∪ 𝐴𝑘 ) = P(𝐴1) + P(𝐴2) +....+ P(𝐴𝑘)

Theorem 3:

P(either A occurs or B occurs) = P(A) + P(B) – P(Simultaneous occurrence of the events A&B)

𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)

Points to Ponder: Stronger than Theorem -1, as it can be derived from Theorem -3

Question 11: A number is selected at random from the first 1000 natural numbers. What is the

probability that it would be a multiple of 5 or 9?

Answer: Let A, B, A∩B and A∩B denote the events that the selected number would be a multiple of 5,

9, 5 or 9 and both 5 and 9 i.e. LCM of 5 and 9 i.e. 45 respectively.

Since 1000 = 5 ×200

= 9× 111 × 1

= 42×22 + 10

it is obvious that

P(A) = 200

1000 , P(B) =

1000 , 𝑃(𝐴 ∩ 𝐵) =

Hence the probability that the selected number would be a multiple of 4 or 9 is given by

𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵) = 200

1000−

SSA Statistics 3.9

Question 12: The probability that an Accountant's job applicant has a B. Com. Degree is 0.85, that he is

a CA is 0.30 and that he is both B. Com. and CA is 0.25 out of 500 applicants, how many would be B.

Com. or CA?

Answer: Let the event that the applicant is a B. Com. be denoted by B and that he is a CA be denoted

by C Then as given,

𝑃(𝐵) = 0.85 , 𝑃(𝐶) = 0.30 and 𝑃(𝐵 ∩ 𝐶) = 0.25

The probability that an applicant is B. Com. or CA is given by

𝑃(𝐵 ∪ 𝐶) = 𝑃(𝐵) + 𝑃(𝐶) − 𝑃(𝐵 ∩ 𝐶) = 0.85 + 0.30 – 0.25 = 0.90

Question 13: If 𝑃(𝐴 − 𝐵) = 1

5, 𝑃(𝐴) =

3 and 𝑃(𝐴) =

2 , what is the probability that out of the two

events A and B, only B would occur?

Answer: A glance at Figure 13.3 suggests that

Only A, 𝑃(𝐴 − 𝐵) = 𝑃(𝐴 ∩ 𝐵′) = 𝑃(𝐴) − 𝑃(𝐴 ∩ 𝐵) = 1

𝑃(𝐴 − 𝐵) = 𝑃(𝐴) − 𝑃(𝐴 ∩ 𝐵) = 1

3 − 𝑃(𝐴 ∩ 𝐵) =

𝑃(𝐴 ∩ 𝐵) =2

Only B, 𝑃(𝐵 − 𝐴) = 𝑃(𝐵 ∩ 𝐴′) = 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)

𝑃(𝐵 − 𝐴) = 𝑃(𝐵 ∩ 𝐴′) = 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵) =1

Theorem – 4:

Let A, B & C be 3 events, then the probability that atleast one of the events is given by

𝑃(𝐴 ∪ 𝐵 ∪ 𝐶) = 𝑃(𝐴) + 𝑃(𝐵) + 𝑃(𝐶) − 𝑃(𝐴 ∩ 𝐵) − 𝑃(𝐵 ∩ 𝐶) − 𝑃(𝐴 ∩ 𝐶) + 𝑃(𝐴 ∩ 𝐵 ∩ 𝐶)

Question 14: There are three persons A, B and C having different ages. The probability that A survives

another 5 years is 0.80, B survives another 5 years is 0.60 and C survives another 5 years is 0.50. The

probabilities that A and B survive another 5 years is 0.46, B and C survive another 5 years is 0.32 and

A and C survive another 5 years 0.48. The probability that all these three persons survive another 5

years is 0.26. Find the probability that at least one of them survives another 5 years.

Answer:

As given P(A) = 0.80, P(B) = 0.60, P(C) = 0.50,

P(A∩B) = 0.46, P(B∩C) = 0.32, P(A∩C) = 0.48 and

P(A∩B∩C) = 0.26

SSA Statistics 3.10

The probability that at least one of them survives another 5 years in given by

𝑃(𝐴 ∪ 𝐵 ∪ 𝐶) = 𝑃(𝐴) + 𝑃(𝐵) + 𝑃(𝐶) − 𝑃(𝐴 ∩ 𝐵) − 𝑃(𝐵 ∩ 𝐶) − 𝑃(𝐴 ∩ 𝐶) + 𝑃(𝐴 ∩ 𝐵 ∩ 𝐶)

= 0.80 + 0.60 + 0.50 – 0.46 – 0.32 – 0.48 + 0.26 = 0.90

Conditional Probability / Compound Theorem / Multiplication Theorem

Compound / Joint Probability P(𝑨 ∩ 𝑩)/𝑷(𝑨𝟏 ∩ 𝑨𝟐 ∩ … .∩ 𝑨𝒌)- The probability of occurrence of two or

more events A &B simultaneously

Situations

1. Dependent Events P(B/A) – The occurrence of one event B impossible is influenced by the

occurrence of another event, A (not an impossible event)

𝑃(𝐵/𝐴) = 𝑃(𝐵 ∩ 𝐴)

𝑃(𝐴), 𝑃(𝐴 > 0)

If A depends on B, then

𝑃(𝐴/𝐵) = 𝑃(𝐴∩𝐵)

𝑃(𝐵), 𝑃(𝐵) > 0

Points to Ponder:

1. 𝑃(𝐵/𝐴) = 𝑃(𝐵∩𝐴)

𝑃(𝐴)=

𝑃(𝐴∩𝐵)

𝑃(𝐵) (since P(𝐴 ∩ 𝐵) = P(𝐵 ∩ 𝐴) – commutative property)

2. If B is not dependent on A, then P(B/A) = P(B)

3. If A is not dependent on B, then P(A/B) = P(A)

Thus, 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) × 𝑃(𝐵)

4. If A & B are independent, then

A &B’ are independent i.e 𝑃(𝐴 ∩ 𝐵′) = 𝑃(𝐴) × 𝑃(𝐵′) = 𝑃(𝐴) × [1 − 𝑃(𝐵)]

A’ &B are independent i.e 𝑃(𝐴′ ∩ 𝐵) = 𝑃(𝐴′) × 𝑃(𝐵) = [1 − 𝑃(𝐴)] × 𝑃(𝐵)

A’ & B’ are independent i.e 𝑃(𝐴′ ∩ 𝐵′) = 𝑃(𝐴′) × 𝑃(𝐵′) = [1 − 𝑃(𝐴)] × [1 − 𝑃(𝐵)]

IfIf A, B &C are independent, then 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) × 𝑃(𝐵) 𝑃(𝐴 ∩ 𝐶) = 𝑃(𝐴) × 𝑃(𝐶) 𝑃(𝐵 ∩ 𝐶) = 𝑃(𝐵) × 𝑃(𝐶) 𝑃(𝐴 ∩ 𝐵 ∩ 𝐶) = 𝑃(𝐴) × 𝑃(𝐵) × 𝑃(𝐶)

If A, B &C are dependent, then 𝑃(𝐴 ∩ 𝐵 ∩ 𝐶) = 𝑃(𝐴) × 𝑃(𝐵)/𝐴)

× 𝑃(𝐶/𝐵 ∩ 𝐶)

Theorems of Compound Probability

Theorem -5

P(A &B occur simultaneously) = product of the unconditional probability of A and the conditional

probability of B, given that A has already occurred

𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) × 𝑃(𝐵)/𝐴)

SSA Statistics 3.11

Theorem – 6:

Let A, B & C be any 3 events, the probability that they occur jointly is

𝑃(𝐴 ∩ 𝐵 ∩ 𝐶) = 𝑃(𝐴) × 𝑃(𝐵)/𝐴) × 𝑃 (𝐶

𝐵∩ 𝐶), provided P(A∩ 𝐵)>0

If independent, then

𝑃(𝐴 ∩ 𝐵 ∩ 𝐶) = 𝑃(𝐴) × 𝑃(𝐵) × 𝑃(𝐶)

Question 15: Rupesh is known to hit a target in 5 out of 9 shots whereas David is known to hit the

same target in 6 out of 11 shots. What is the probability that the target would be hit once they both try?

Answer: Let A denote the event that Rupesh hits the target and B, the event that David hits the target.

Then as given,

𝑃(𝐴) =5

9 and 𝑃(𝐵)

𝑃(𝐴 ∩ 𝐵 ) = 𝑃(𝐴) × 𝑃(𝐵) = 5

Alternately

𝑃(𝐴 ∪ 𝐵) = 1 − 𝑃(𝐴 ∪ 𝐵)′ = 1 − 𝑃(𝐴′ ∩ 𝐵′)

= 1 − [(1 − 𝑃(𝐴)) × (1 − 𝑃(𝐵))] -

= 1 – (1 - 5

9)× (1-

11) = 1-

Question 16: A pair of dice is thrown together and the sum of points of the two dice is noted to be 10.

What is the probability that one of the two dice has shown the point 4?

Answer: Let A denote the event of getting 4 points on one of the two dice and B denote the event of

getting a total of 10 points on the two dice. Then we have

P(A) = 1

12 and P(A∩ 𝐵) =

[Since a total of 10 points may result in (4, 6) / (5, 5) / (6, 4) and two of these combinations contain 4]

Thus 𝑃(𝐵/𝐴) =𝑃(𝐴∩𝐵)

𝑃(𝐴)=

Alternately the sample space for getting a total of 10 points when two dice are thrown simultaneously

is given by S = {(4,6),(5,5),(6,4)}

Out of these 3 case, we get 4 in 2cases. Thus by the definition of probability, we have 𝑃(𝐵/𝐴) = 2

Question 17: In a group of 20 males and 15 females, 12 males and 8 females are service holders. What

is the probability that a person selected at random from the group is a service holder given that the

selected person is a male?

Answer: Let S and M stand for service holder and male respectively. We are to evaluate P (S / M).

SSA Statistics 3.12

We note that (S∩ 𝑀)represents the event of both service holder and male.

Thus 𝑃(𝑆/𝑀) = 𝑃(𝑆∩𝑀)

𝑃(𝑀)=

20/35= 0.60

Question 18: In connection with a random experiment, it is found that

𝑃(𝐴) =2

3 , 𝑃(𝐵) =

5 = and 𝑃(𝐴 ∪ 𝐵) =

Evaluate the following probabilities:

1. P(A/B)

2. P(B/A)

3. P(A’/B)

4. P(A/B’)

5. P(A’/B’)

Answer:

𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)

5− 𝑃(𝐴 ∩ 𝐵)

𝑃(𝐴 ∩ 𝐵) = 13

1. 𝑃(𝐴/𝐵) =𝑃(𝐴∩𝐵)

𝑃(𝐵) =

2. 𝑃(𝐵/𝐴) =𝑃(𝐴∩𝐵)

𝑃(𝐴) =

3. 𝑃(𝐴′/𝐵) =𝑃(𝐴′∩𝐵)

𝑃(𝐵) =

𝑃(𝐵)− 𝑃(𝐴∩𝐵)

𝑃(𝐵) =

4. 𝑃(𝐴/𝐵′) =𝑃(𝐴∩𝐵′)

𝑃(𝐵′) =

𝑃(𝐴)−𝑃(𝐴∩𝐵)

1−𝑃(𝐵) =

5. 𝑃 (𝐴′

𝐵′) =𝑃(𝐴′∩𝐵′)

𝑃(𝐵′)=

𝑃(𝐴∪𝐵)′

𝑃(𝐵) [ by De-Morgan’s Law A’∩ 𝐵′ = (A∪ 𝐵)′]

= 1−𝑃(𝐴∪𝐵)

1−𝑃(𝐵) =

1−5/6

1−3/5 =

Question 19: The odds in favour of an event is 2 : 3 and the odds against another event is 3 : 7. Find

the probability that only one of the two events occurs.

Answer: We denote the two events by A and B respectively.

P(A) = 2

5 and P(B) =

As A and B are independent, 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) × 𝑃(𝐵) =2

Probability (either only A occurs or only B occurs) = 𝑃(𝐴 − 𝐵) + 𝑃(𝐵 − 𝐴)

= [P(A) – P(A∩ 𝐵)] + [𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)]

= P(A) + P(B) – 2 P(A∩ 𝐵)

SSA Statistics 3.13

10− 2 ×

Question.20: There are three boxes with following compositions;

Colour Box Blue Red White Total

I. 5 8 10 23

II. 4 9 8 21

III. 3 6 7 16

Tow balls are drawn from each box. What is the probability that they would be of the same colour?

Answer: Either the balls would be Blue or Red or White. Denoting Blue, Red and White balls by B, R

and W respectively and the box by lower suffix, the required probability is

=P(𝐵1 ∩ 𝐵2 ∩ 𝐵3) + 𝑃(𝑅1 ∩ 𝑅2 ∩ 𝑅3) + 𝑃(𝑊1 ∩ 𝑊2 ∩ 𝑊3)

=P(𝐵1) ×P(𝐵2) × 𝑃(𝐵3) + 𝑃(𝑅1) × 𝑃(𝑅2) × 𝑃(𝑅3) + 𝑃(𝑊1) × 𝑃(𝑊2) × 𝑃(𝑊3)

Question 21: Mr. Roy is selected for three separate posts. For the first post, there are three candidates,

for the second, there are five candidates and for the third, there are 10 candidates. What is the

probability that Mr. Roy would be selected?

Answer: Denoting the three posts by A, B and C respectively, we have

P(A) = 1

3 , P(B) =

5 and P(C) =

The probability that Mr. Roy would be selected (i.e.selected for at least one post).

=P(A∪ 𝐵 ∪ 𝐶)

=1- P[A∪ 𝐵 ∪ 𝐶)′]

=1 – P(A’∩ 𝐵′ ∩ 𝐶′) (by De-Morgan’s Law)

= 1 – P(A’)×P(B’)× 𝑃(𝐶′) (As A,B and C are independent, so are their complements)

= 1 – (1 − 1

3) × (1 −

5) × (1 −

Question 22: The independent probabilities that the three sections of a costing department will

encounter a computer error are 0.2, 0.3 and 0.1 per week respectively what is the probability that there

would be

1. at least one computer error per week?

2. one and only one computer error per week?

Answer: Denoting the three sections by A, B and C respectively, the probabilities of encountering a

computer error by these three sections are given by P(A) = 0.20, P(B) = 0.30 and P(C) = 0.10

1. Probability that there would be at least one computer error per week.

SSA Statistics 3.14

= 1 – Probability of having no computer error in any at the three sections.

= 1 – P(A’∩B’∩C’)

= 1 – P(A’)×P(B’) ×P(C’) [Since A, B and C are independent]

= 1 – (1 – 0.20) × (1 – 0.30) ×(1 – 0.10)

= 0.50

2. Probability of having one and only one computer error per week

= P(A∩B’∩C’) + P(A’∩B∩C’) +P(A’∩B’∩C)

= P(A)×P(B’) ×P(C’) + P(A’) ×P(B) ×P(C’) + P(A’) ×P(B’) ×P(C)

= 0.20 ×0.70×0.90 + 0.80×0.30×0.90 + 0.80×0.70 ×0.10

= 0.40

Question 23: A lot of 10 electronic components is known to include 3 defective parts. If a sample of 4

components is selected at random from the lot, what is the probability that this sample does not

contains more than one defectives?

Answer: Denoting detective component and non-defective components by D and D’ respectively, we

have the following situation:

Lot 3 7 10

Sample(1) 0 4 4

Sample(2) 1 3 4

Thus the required probability is given by

= (3C0 × 7C4 + 3C1 × 7C3) / 10C4

= 1×35+3×35

Question 24: There are two urns containing 5 red and 6 white balls and 3 red and 7 white balls

respectively. If two balls are drawn from the first urn without replacement and transferred to the

second urn and then a draw of another two balls is made from it, what is the probability that both the

balls drawn are red?

Answer: Since two balls are transferred from the first urn containing 5 red and 6 white balls to the

second urn containing 3 red and 7 white balls, we are to consider the following cases :

Case A: Both the balls transferred are red. In this case, the second urn contains 5 red and 7 white balls.

Case B: The two balls transferred are of different colours. Then the second urn contains 4 red and 8

white balls.

Case C: Both the balls transferred are white. Now the second urn contains 3 red and 7 white balls.

The required probability is given by

P(𝑅 ∩ 𝐴) + 𝑃(𝑅 ∩ 𝐵) + 𝑃(𝑅 ∩ 𝐶)

= P(R/A) × P(A) + P(R/B) × P(B) + P(R/C) × P(C)

12𝐶2×

11𝐶2×

4𝐶2

12𝐶2×

5𝐶1×6𝐶1

11𝐶2×

3𝐶2

12𝐶2×

6𝐶2

11𝐶2

SSA Statistics 3.15

66×55 =

Question 25: If 8 balls are distributed at random among three boxes, what is the probability that the

first box would contain 3 balls?

Answer: The first ball can be distributed to the 1st box or 2nd box or 3rd box i.e. it can be distributed

in 3 ways. Similarly, the second ball also can be distributed in 3 ways. Thus the first two balls can be

distributed in 32 ways. Proceeding in this way, we find that 8 balls can be distributed to 3 boxes in 38

ways which is the total number of elementary events. Let A be the event that the first box contains 3

balls which implies that the remaining 5 both must go to the remaining 2 boxes which, as we have

already discussed, can be done in 2 5 ways. Since 3 balls out of 8 balls can be selected in 8C3 ways, the

event can occur in 8C3 × 25 ways, thus we have

P(A) = 8C3 ×25

38 = 1792

Question 26: There are 3 boxes with the following composition:

Box I : 7 Red + 5 White + 4 Blue balls

Box II : 5 Red + 6 White + 3 Blue balls

Box III : 4 Red + 3 White + 2 Blue balls

One of the boxes is selected at random and a ball is drawn from it. What is the probability that the

drawn ball is red?

Answer: Let A denote the event that the drawn ball is blue. Since any of the 3 boxes may be drawn, we

have P (BI) = P (BII) = P (BIII) =1

Also P (R1/BII) = probability of drawing a red ball from the first box = 7

P(𝑅2/𝐵𝑛) = 5

14 and P(𝑅3/𝐵𝑚) =

Thus we have

= P(A) = P(A)= P(R1∩BI) + P(R2∩BII) + P(R3∩BIII)

=P (𝑅1/𝐵1) × 𝑃(𝐵1) + 𝑃(𝑅2/𝐵𝐼𝐼) × 𝑃(𝐵𝐼𝐼) + 𝑃(𝑅3/𝐵𝐼𝐼𝐼) × 𝑃(𝐵𝐼𝐼𝐼)

Random Variable – Probability Distribution

Random/Stochastic variable – A function defined on a sample space associated with a random

experiment assuming any value from R and assigning a real number to each and every sample point

of the random experiment.

Example:

SSA Statistics 3.16

Let A – an event of getting a head on tossing a

coin (S={H,T})

X – number of heads

∴ X= 0, if T turns up and X=1, if H turns up

P(X) ½ ½

On tossing 2 coins , S = {HH,TT,HT, TH}

A TT HT, HH HH

X 0 1 1

P(X) ¼ 2/4 ¼

On tossing n coins

A T…..T …..

X 0 1 2 ….. N

P(X) 𝑛𝐶0

𝑛𝐶1

𝑛𝐶2

….. 𝑛𝐶𝑛

Types Example

Discrete: the variable defined on

a discrete sample space.

Number of car accidents

Number of heads on tossing a coin

Continuous: the variable defined on

a continuous sample space, assuming

an uncountably infinite number of values.

Height

Weight

Probability Distribution:

The Statement that expresses the different values taken by a random variable and the corresponding

probabilities.

Probability Distribution function: If a random variable x assumes n finite values 𝑋1, 𝑋2 … … , 𝑋𝑛 with

corresponding probabilites 𝑃1,𝑃2,𝑃3,…..,𝑃𝑛

Such that

I.𝑃𝑖 ≥0, for every i

II.∑ 𝑃𝑖 = 1 (over all i)

Then pd of x is given by

X 𝑋1 𝑋2 …… 𝑋𝑛

P 𝑃1 𝑃2 …… 𝑃𝑛

Case Function Definition

Discrete Probability Mass function

f(x) ≥0, for every x

&∑ f(x) = 1, where f(x)

=P(X=x)

Continous Probability density function

x is a continuous random

variable defined in an interval

(∝, 𝛽), 𝛽 ≥∝ when x can

SSA Statistics 3.17

assume an infinite number of

values.

f(x)≥ 0, x ∈ [∝, 𝛽]

∫ 𝑓(𝑥)𝑑𝑥

f & x lies between a,b, i.e ∝≤

𝑎 < 𝑏 ≤ 𝛽 then

∫ 𝑓(𝑥)𝑏𝑥

Expected value of a Random Variable

Expected value / Mathematical expectations / Expectation E[x] = (𝜇)

The sum of the products of the different values taken by the random variable and the

corresponding probabilities.

𝜇 = E(x) = ∑ 𝑃𝑖𝑥𝑖

Points to Ponder:

1. 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑥2, 𝐸(𝑥2) = ∑ 𝑃𝑖𝑥𝑖2

2. Expected value of Montonic function, E[g(x)] = ∑ 𝑃𝑖𝑔(𝑥𝑖)

3. Variance of x, 𝜎2 / V(x) =E(𝑥 − 𝜇)2 = E(𝑥)2 − 𝜇2

4. 𝜎 – ‘+’ ve square root of variance

5. If y = a +bx, (x, y – random variable & a, b – constants), then

𝜇𝑦 = a + b 𝜇𝑥̅ & 𝜎𝑦 = |𝑏| × 𝜎𝑥̅

Discrete case Continous case(𝒙 ∈ (−∞, ∞))

𝜇=∑ 𝑥𝑓(𝑥)

𝜎2 = E(𝑥2)- 𝜇2, where E(𝑥2) = ∑ 𝑥2𝑓(𝑥)

𝐸(𝑥) = ∫ 𝑥𝑓(𝑥)𝑑𝑥⋈

−⋈

𝜎2 = 𝐸(𝑥2) − 𝜇2, 𝑤ℎ𝑒𝑟𝑒 𝐸(𝑥2)

= ∫ 𝑥2𝑓(𝑥)𝑑𝑥∞

−∞

Properties of Expected Values

1. Expectation of a constant k is k

i.e E(k) = k for any constant k.

2. Expectation of sum of two random variables is the sum of their expectations.

i.e E(𝑥 + 𝑦) = E(𝑥) + ∑(𝑦) for any two random variables and 𝑥 and 𝑦

3. Expectation of the product of a constant and a random variable is the product of the constant and

the expectation of the random variable.

i.e E(𝑘𝑥) = kE(𝑥) for any constant k …… (13.53)

4. Expectation of the product of two random variables is the product of the expectation of the two

random variables, provided the two variables are independent.

i.e 𝐸(𝑥𝑦) = 𝐸(𝑥) × 𝐸(𝑦) , whenever 𝑥 and 𝑦 are independent.

SSA Statistics 3.18

Example

Property -1:

X 1 1 1

𝑝(𝑥) ¼ 2/4 ¼

𝐸(𝑥) = ∑ 𝑝𝑥 = (1 ×1

4) + (1 ×

x 2 2 2

𝑝(𝑥) ¼ 2/4 ¼

𝐸(𝑥)= ∑ 𝑝𝑥 = (2 ×1

4) + (2 ×

4) = 2

Hence, 𝐸(𝑘) = 𝑘𝑥. 𝐸(𝑥)

Property – 3 Toss a coin A- Getting Head

𝑃(𝑋 = 𝑥) ½ ½

𝐸(𝑥) = ∑ 𝑝𝑥 =(0 ×1

2) + (1 ×

Let k = 2

𝑘𝑥 0 2

p(𝑘𝑥) ½ ½

𝐸(𝑘𝑥) = ∑ 𝑝(𝑘𝑥) =(0 ×1

2) + (2 ×

2) = 1

Hence, 𝐸(𝑘𝑥) = 𝐸(𝑥)

Property - 2

𝐸(𝑥) =∑ 𝑝𝑥 =((2 ×1

2) + (−1 ×

2) = 1/ 2 = 0.5

𝐸(𝑦) =∑ 𝑝𝑥 =((2 ×1

2) + (−1 ×

2) = 1/ 2 = 0.5

Consider

(𝑥 + 𝑦) 4 -2.5

𝑝(𝑥 + 𝑦) ½ ½

𝐸(𝑥 + 𝑦) = ∑ 𝑝𝑥 (𝑥 + 𝑦)

= (4 ×1

2) + (−2.5 ×

2) =0.75

Hence, 𝐸(𝑥 + 𝑦) = 𝐸(𝑥) + 𝐸(𝑦)

𝑥 2 -1

𝑝(𝑥) ½ ½

𝑦 2 -1.5

𝑝(𝑦) ½ ½

Property – 4

Toss a coin

A- Getting Head

𝑃(𝑋 = 𝑥) ½ ½

𝐸(𝑥) = ∑ 𝑝𝑥 = (0 ×1

2) + (1 ×

B - Getting Tail

𝑃(𝑌 = 𝑦) ½ ½

𝐸(𝑦) = ∑ 𝑝𝑦 = (0 ×1

2) + (1 ×

Consider

𝑥 × 𝑦 0 1

𝑃(𝑥 × 𝑦) ¼ ¼

𝐸(𝑥𝑦) = ∑ 𝑝(𝑥 × 𝑦) = (0 ×1

4) + (1 ×

𝐸(𝑥) × 𝐸(𝑦) = ½ ×½ = ¼

Question 27: An unbiased coin is tossed three times. Find the expected value of the number of heads

and also its standard deviation.

Answer: If x denotes the number of heads when an unbiased coin is tossed three times, then the

probability distribution of x is given by

X: 0 1 2 3

The expected value of x is given by

𝜇 = E(x) = ∑ 𝑃𝑖𝑋𝑖 = 1

8× 0 +

8× 1 +

8× 2+

8× 3= 1.50

Also = E(𝑋2) = ∑ 𝑃𝑖𝑥𝑖2 =

8× 02 +

8× 12 +

8× 22 +

8× 32 = 3

𝜎2 = E(𝑋2) − 𝜇2 = 3 – (1.50)2 =0.75

SSA Statistics 3.19

∴SD, 𝜎 = 0.87

Question 28: A random variable has the following probability distribution:

X: 4 5 7 8 10

P: 0.15 0.20 0.40 0.15 0.10

Find E[𝑋 − 𝐸(𝑋)]2 . Also obtain v(3x – 4 )

Answer: The expected value of x is given by

E(x) = ∑ 𝑃𝑖𝑋𝑖 = 0.15 ×4+0.20×5 + 0.40 × 7 + 0.15 × 8 + 0.10 × 10 = 6.60

Also, E[𝑋 − 𝐸(𝑋)]2 = ∑ 𝜇𝑖2𝑃𝑖 where = 𝜇𝑖 = 𝑋𝑖 − 𝐸(𝑋)

Let y = 3X – 4 = (-4) +(3)x. then Variance of y= var y = 𝑏2 × 𝜎𝑥̅2 = 9× 𝜇𝑥̅

Table 13.1

Computation of E[𝑿 − 𝑬(𝑿)]𝟐

𝑿𝒊 𝑷𝒊 𝝁𝒊 = 𝑿𝒊 − 𝑬(𝑿) 𝝁𝒊𝟐 𝝁𝒊

𝟐𝑷𝒊

4 0.15 -2.60 6.76 1.014

5 0.20 -1.60 2.56 0.512

7 0.40 0.40 0.16 0.064

8 0.15 1.40 1.96 0.294

10 0.10 3.40 11.56 1.156

Total 1.00 - - 3.040

Thus E[𝑋 − 𝐸(𝑋)]2 = 3.04

As 𝜇𝑥̅2 = 3.04, v(y) =9×3.04 = 27.36

Question 29: In a business venture, a man can make a profit of Rs. 50,000 or incur a loss of ₹20,000.

The probabilities of making profit or incurring loss, from the past experience, are known to be 0.75

and 0.25 respectively. What is his expected profit?

Answer: If the profit is denoted by x, then we have the following probability distribution of x:

X: ₹50,000 ₹-20,000

P: 0.75 0.25

Thus, his expected profit

E(X) = 𝑃1𝑋1 + 𝑃2𝑋2 = 0.75 ×₹50,000 + 0.25×₹-20,000 =₹32,500

Question 30 A box contains 12 electric lamps of which 5 are defectives. A man selects three lamps at

random. What is the expected number of defective lamps in his selection?

Answer: Let x denote the number of defective lamps x can assume the values 0, 1, 2 and 3. P(x = 0) =

Prob. of having 0 defective out of 5 defectives and 3 non defective out of 7 non defectives.

= 5𝐶0×7𝐶3

12𝐶3 =

SSA Statistics 3.20

Similarly P(x=1) = 5𝐶1×7𝐶2

12𝐶3 =

P(x=2) = 5𝐶2×7𝐶1

12𝐶3 =

And P(x=3) = 5𝐶3×7𝐶0

12𝐶3 =

Probability Distribution of No. of Defective Lamp

X: 0 1 2 3

P: 35 105 70 10

220 220 220 220

Thus the expected number of defectives is given by

220× 0 +

220× 1 +

220× 2 +

220× 3 =1.25

Question 31: Moidul draws 2 balls from a bag containing 3 white and 5 Red balls. He gets ₹500 if he

draws a white ball and ₹200 if he draws a red ball. What is his expectation? If he is asked to pay ₹400

for participating in the game, would he consider it a fair game and participate?

Answer: We denote the amount by x. Then x assumes the value 2 x ₹500 i.e. ₹1000 if 2 white balls are

drawn, the value ₹500 + ₹200 i.e. ₹700 if 1 white and 1 red balls are drawn and the value 2 x ₹200 i.e.

₹400 if 2 red balls are drawn. The respective probabilities are given by

P(WW) = 3𝐶2

8𝐶2 =

P(WR) = 3𝐶1×5𝐶1

8𝐶2 =

And = 5𝐶2

8𝐶2

Probability Distribution of x

X: ₹1000 ₹700 ₹400

Hence E(X) = 3

28× 1000 ×

28× 700

28× 400 = ₹625 > 400.

Therefore, the game is fair and he would participate.

Question 32: A number is selected at random from a set containing the first 100 natural numbers and

another number is selected at random from another set containing the first 200 natural numbers. What

is the expected value of the product?

Answer: We denote the number selected from the first set by x and the number selected from the

second set by y. Since the selections are independent of each other, the expected value of the product

is given by E(xy) =E(x) ×E(y)

Now x can assume any value between 1 to 100 with the same probability 1

100 and any value between 1

to 200 with the same probability 1

200 , the probability distribution of x is given by

SSA Statistics 3.21

X: 1 2 ….. 3

100 ……. 1

E(x) = 1

100× 1

100× 2 +

100× 3 + ⋯

100× 100

= 1+2+3+⋯+100

= 100×101

2×100 [Since 1+2+….+n =

𝑛(𝑛+1)

X: 1 2 ….. 200

200 ……. 1

E(y) = 201

∴ E(xy) = 101

2 = 5075.25

Question 33: A dice is thrown repeatedly till a 'six' appears. Write down the sample space. Also find

the expected number of throws.

Answer: Let p denote the probability of getting a six and q = 1 – p, the probability of not getting a six.

If the dice is unbiased then

6 and q =

If a six obtained with the very first throw then the experiment ends and the probability of getting a six,

as we have already seen, is p. However, if the first throw does not produce a six, the dice is thrown

again and if a six appears with the second throw, the experiment ends. The probability of getting a six

preceded by a non–six is qp. If the second thrown does not yield a six, we go for a third throw and if

the third throw produces a six, the experiment ends and the probability of getting a Six in the third

attempt is q2p. The experiment is carried on and we get the following countably infinite sample space.

S = { p, qp, q2p, q3p, …..}

If x denotes the number of throws necessary to produce a six, then x is a random variable with the

following probability distribution:

X 1 2 3 4 …..

P P qp 𝑞2𝑝 𝑞3𝑝

E(x) = p× 1 + 𝑞𝑝 + 2 + 𝑞2𝑝 × 3 + 𝑞3𝑝 × 4 + ⋯

= p(1+2q+3𝑞2+4𝑞3+….)

=p(1 − 𝑞)−2

In case of an unbiased dice, p = 1

Question 34: A random variable x has the following probability distribution:

X 0 1 2 3 4 5 6 7

P(X) 0 2k 2k k 2k k2 7𝑘2 2𝑘2 + 4

I.The value of k

SSA Statistics 3.22

II.P(x<3)

III.P(x≥ 4)

IV.P(2<x≥ 5)

Answer:

∑ 𝑃(𝑥) =1

⟹ 0+2k+3k+k+2k+𝑘2 + 7𝑘2 + 2𝑘2 + 𝑘 = 1

⟹ 10𝑘2 + 9𝑘 − 1 = 0

⟹ (k+1) (10k-1) =0

⟹ k=1

I. Thus the value of k is 0.10

II. P(x<3) = P(x=0) +P(x=1) +P(x=2) = 0+2k+3k = 5k = 0.50

III. P(x≥ 4) = P(x=4) + P(x=5) +P(x=6) +P(x=7) = 2k+𝑘2 + 7𝑘2 + (2𝑘2 + 𝑘)

=10𝑘2 + 3𝑘

=10× (0.10)2 + 3 × 0.10

IV. P(2<x≤ 5)= P(x=3)+P(x=4)+P(x=5) = k+2k+𝑘2 = 𝑘2+3k = (0.10)2 + 3×0.10 = 0.31

Extra problems on Multiplication Theorem

Question 1. A man wants to marry a girl having qualities: White complexion the probability of getting

such girl is 1 in 20. Handsome dowry - the probability of getting is 1 in 50. Westernised style - the

probability is 1 in 100.Find out the probability of his getting married to such a girl, who has all the

three qualities.

Answer :

The probability of a girl with white complexion = 1

20 = 0.05

The probability of a girl with handsome dowry = 1

50 = 0.02

The probability of a girl with westernised style = 1

100= 0.01

Since the events are independent, the probability of simultaneous occurrence of all three qualities

100=0.00001

Question 2: Suppose it is 11 to 5 against a person who is now 38 years of age living till he is 73 and 5

to 3 against B who is 43 Living till he is 78, find the chance that at least one of these persons will be

alive 35 years hence.

Answer:

The probability that A will die within 35 years = 11

The probability that B will die within 35 years = 5

The probability that both of them will die within 35 years = 11

SSA Statistics 3.23

The probability that both of them will not die i.e. atleast one of them will be alive = [1 - 55

SSA Statistics 4.1

4. Correlation and Regression

Introduction

Necessity – to study / analyse more than a variable

Nature of Variables: uni-variate, bi-variate, tri-variate or more

Example:

1. Univariate – Distribution of height, weight, mark, profit, wage

2. Bivariate – to know what amount of investment (x) would yield a desired level of profit (y)

Bivariate Data – Data collected on two variables simultaneously.

Bivariate Frequency Distribution – The distribution constructed for the bivariate data.

Points to Ponder:

1. Also known as joint frequency distribution / two way classification table.

2. Horizontal classification – for ′𝑥′ and Vertical classification – for ′𝑦′

Marks in

Statistics

𝒙 Marks in Mathematics

0 – 4 4 – 8 8 – 12 12 – 16 16 – 20 Total

0 – 4 1 1 2 0 0 4

4 – 8 2 4 5 1 1 13

8 – 12 0 2 4 6 1 13

12 – 16 0 1 3 2 5 11

16 – 20 0 0 1 5 3 9

Total 3 8 15 14 10 50

Here, 𝑓𝑖𝑗 is the cell frequency for 𝑖𝑡ℎ row & 𝑗𝑡ℎ column. (𝑓12 = 1 is the number of students who has

secured the marks between 0 – 4 in statistics & marks between 4 – 8 in Maths).

Marginal Distribution

The distribution of any one of the variable.

It is a univariate Distribution

The means & S.D are called as Marginal mean & Marginal SD respectively.

Conditional Distribution

The distribution of a variable w.r.t a condition.

It is a univariate Distribution.

In general (m+n) conditional distributions exists.

SSA Statistics 4.2

Marginal Distribution Conditional Distribution

Marks in

Statistics

No. of

Students

0 – 4 4

4 – 8 12

8 – 12 14

12 – 16 11

16 – 20 9

Total 50

Marks in

(𝒚)

No. of

student

0 – 4 3

4 – 8 8

8 – 12 15

12 – 16 14

16 – 20 10

Total 50

Marks (𝒙)

w.r.t y

in 8 – 12

No. of

Students

0 – 4 2

4 – 8 5

8 – 12 4

12 – 16 3

16 – 20 1

Total 15

Marks (𝒚)

w.r.t 𝒙

in (12 – 16)

No. of

student

0 – 4 0

4 – 8 1

8 – 12 3

12 – 16 2

16 – 20 5

Total 11

Correlation analysis: To find an association or the lack of it between the two variables x and y (above)

using different measures. It helps in planning and controlling

Examples: A car owner knows that there is a definite relationship between petrol consumed and

distance travelled.

Points to Ponder:

Cause and Effect – The influence of a third variable (x) in finding out the association (correlation) of the

other two variables (x and y), although no causal relationship exists between the two variables.

Correlation: Definition

The change in one variable is reciprocated by a corresponding change in the other variable either

directly or inversely, else are dissociated / uncorrelated / independent.

If two variables vary in such a way that movements in one are accompanied by movements in the

other, then these quantities are said to be correlated.

Types of Correlation Definition Examples

Positive correlation Directly related moves in the same

direction (either both increases/decrease)

Profit & Investment

Negative Correlation Inversely related moves in the opposite

direction (i.e. one increase & other

decrease.

Price & demand

Profits of Insurance

company & the number of

claims

Simple Correlation Only two variables under study Height & weight

Multiple Correlation three or more variables are under study

Partial Correlation A multiple correlation, where only two

variables influence each other & the others

kept constant

Linear Correlation A constant ratio between the two variables

is maintained

SSA Statistics 4.3

Non–Linear

Correlation (Curvi–

linear)

No constant ratio is maintained

Uncorrelated The movement of one making any change

in the movement of the other

Measures of Correlation

(1) Scatter Diagram

A Simple diagrammatic method

The totality of all the plotted points forms a scatter diagram

The pattern reveals the shape / nature of correlation.

Scatter Diagram

SSA Statistics 4.4

Advantages Disadvantages

Applied for any type of correlation,

both linear & curvilinear

It can distinguish different types,

but fails to measure

(2) Karl Pearson’s product Moment Correlation coefficient.

Involves the method of least squares.

The relationship should be linear only.

Definition: The ratio of co-variance between the two variable to the product of the SD of the two

variables.

𝑟 = 𝑟𝑥̅𝑦 =𝑐𝑜𝑣(𝑥, 𝑦)

𝑠𝑥̅𝑠𝑦

𝑁𝑜𝑡𝑒: 𝑐𝑜𝑣(𝑥, 𝑦) =∑(𝑥 − x̅)(𝑦 − y̅)

𝑛 𝑜𝑟

∑ 𝑥𝑦

𝑛− x̅. y̅

𝑠𝑥̅ = √∑(𝑥 − x̅)2

𝑛 𝑜𝑟 √

∑ 𝑥2

𝑛− x̅2

𝐶𝑜𝑚𝑏𝑖𝑛𝑒𝑑 𝑓𝑜𝑟𝑚𝑢𝑙𝑎: 𝑟 =𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦

√𝑛 ∑ 𝑥2 − (∑ 𝑥)2 × √𝑛 ∑ 𝑦2 − (∑ 𝑦)2

In case of a bivariate frequency distribution

𝑁𝑜𝑡𝑒: 𝑐𝑜𝑣(𝑥, 𝑦) =∑ 𝑥𝑖𝑦𝑖𝑓𝑖𝑗𝑖,𝑗

𝑁− x̅. y̅ & 𝑠𝑥̅ = √

∑ 𝑓𝑖𝑜𝑥𝑖2

𝑛− x̅2 & 𝑠𝑦 = √

∑ 𝑓𝑜𝑗𝑥𝑗2

𝑛− x̅2

𝒙𝒊 𝑚𝑖𝑑𝑣𝑎𝑙𝑢𝑒𝑠 𝑜𝑓 𝑖𝑡ℎ 𝑐𝑙𝑎𝑠𝑠 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙

𝒚𝒋 𝑚𝑖𝑑𝑣𝑎𝑙𝑢𝑒𝑠 𝑜𝑓 𝑦𝑡ℎ 𝑐𝑙𝑎𝑠𝑠 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙

𝒇𝒊𝒐 Marginal frequency of x

𝒇𝒐𝒋 Marginal frequency of y

𝒇𝒊𝒋 Frequency of the (𝑖, 𝑗)𝑡ℎ cell

𝑵 ∑ 𝑓𝑖,𝑗

𝑖,𝑗

= ∑ 𝑓𝑖𝑜

= ∑ 𝑓𝑜𝑗

Properties

a) A unit free measure – Height in inch & weight in kgs gives the correlation in number only, but not

in inches or kgs.

b) Unaffected due to change of origin & / or scale but w.r.t signs i.e. if

𝑢 =𝑥 − 𝑎

𝑏 & 𝑣 =

𝑦 − 𝑐

𝑑 then 𝑟𝑥̅𝑦 =

𝑏 𝑑

|𝑏||𝑑|𝑟𝑢𝑣

c) −1 ≤ 𝑟 ≤ 1

SSA Statistics 4.5

(3) Spearman’s Rank Correlation

To measure qualitative characteristics.

To find the level of agreement/disagreement between the two judge assessment

𝑟𝑅 = 1 −6 ∑ 𝑑2

𝑛(𝑛2 − 1), 𝑤ℎ𝑒𝑟𝑒 d = 𝑥 − 𝑦

In case of Tied Rank

𝑟𝑅 = 1 – (6 ∑ 𝑑2

𝑛(𝑛2 − 1)+

6 ∑ (𝑡𝑗

3−𝑡𝑗

12)𝑗

𝑛(𝑛2 − 1))

𝑟𝑅 = 1 −6 [∑ 𝑑𝑖

2 + ∑ (𝑡𝑗

3−𝑡𝑗

12)𝑗𝑖 ]

𝑛(𝑛2 − 1)

(𝒕𝒋) is the number of times a rank is repeated

(4) Co efficient of concurrent Deviation

A Simple & Casual method to find correlation.

The deviation is concurrent, if both the ‘+’ sign deviation has the same sign value – if the value is

more than the previous value.

‘-‘ sign – if the value in less than the previous value.

𝑟𝐶 = ±√±(2𝑐 − 𝑚)

Here ‘c’ – number of concurrent deviations ‘m' – total number of deviations (𝑚 = 𝑛 – 1)

Note 2𝑐 − 𝑚 > 0 ⟹ 𝑇𝑎𝑘𝑒 ‘

+ ’ 𝑏𝑜𝑡ℎ 𝑖𝑛𝑠𝑖𝑑𝑒 & 𝑜𝑢𝑡𝑠𝑖𝑑𝑒

2𝑐 − 𝑚 < 0 ⟹ 𝑇𝑎𝑘𝑒 ‘

− ‘ 𝑏𝑜𝑡ℎ 𝑖𝑛𝑠𝑖𝑑𝑒 & 𝑜𝑢𝑡𝑠𝑖𝑑𝑒

Practical Problems

Question 1: Compute the correlation coefficient between x and y from the following data

n = 10, ∑ 𝑥𝑦 = 220, ∑ 𝑥2= 200, ∑ 𝑦2 = 262, ∑ 𝑥 = 40 and ∑ 𝑦= 50

Answer:

𝑟 =𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦

√𝑛 ∑ 𝑥2 − (∑ 𝑥)2 × √𝑛 ∑ 𝑦2

− (∑ 𝑦)2

=10 × 220 − 40 × 50

√10 × 200 − (40)2 × √10 × 262 − (50)2= 0.91

Thus, there is a good amount of positive correlation between the two variables x and y.

Alternately

𝑥 =∑ 𝑥

10= 4 & 𝑦 =

∑ 𝑦

SSA Statistics 4.6

Cov (𝑥, 𝑦) =∑ 𝑥𝑦

𝑛− 𝑥. 𝑦 =

10− 4 × 5 = 2

𝑆𝑥̅ = √∑ 𝑥2

𝑛− (𝑥)2 = √

10− 42 = 2 & 𝑆𝑦 = √

∑ 𝑦2

𝑛− (𝑦)2 = √

10− 52 = 1.0954

𝑟 =𝑐𝑜𝑣(𝑥, 𝑦)

𝑆𝑥̅ . 𝑆𝑦 =

2 × 1.0954= 0.91

Question 2: Find product moment correlation coefficient from the following information:

x 2 3 5 5 6 8

y 9 8 8 6 5 3

Answer: In order to find the covariance and the two standard deviation, we prepare the following table:

𝒙𝒊 𝒚𝒊 𝒙𝒊𝒚𝒊 𝒙𝒊𝟐 𝒚𝒊

Column No (1) (2) (3) (4) (5)

Calculations (𝟑) = (𝟏) × (𝟐) (𝟒) = (𝟏)𝟐 (𝟓) = (𝟐)𝟐

2 9 18 4 81

3 8 24 9 64

5 8 40 25 64

5 6 30 25 36

6 5 30 36 25

8 3 24 64 9

∑ 29 39 166 163 279

We have

𝑥 =∑ 𝑥

6= 4.8333 & 𝑦 =

∑ 𝑦

5= 6.5

Cov (𝑥, 𝑦) =∑ 𝑥𝑦

𝑛− 𝑥. 𝑦 =

6− 4.8333 × 6.5 = −3.7498

𝑆𝑥̅ = √∑ 𝑥2

𝑛− (𝑥)2 = √

6− 4.83332 = 1.9509 & 𝑆𝑦 = √

∑ 𝑦2

𝑛− (𝑦)2 = √

6− 6.52 = 2.0616

Thus the correlation coefficient between x and y in given by

𝑆𝑥̅ . 𝑆𝑦 =

−3.7498

1.9509 × 2.0616= −0.93

We find a high degree of negative correlation between x and y.

Question 3: The following data relate to the test scores obtained by eight salesmen in an aptitude test

and their daily sales in thousands of rupees:

Sales man 1 2 3 4 5 6 7 8

SSA Statistics 4.7

Scores 60 55 62 56 62 64 70 54

Sales 31 28 26 24 30 35 28 24

Answer: Let the scores and sales be denoted by x and y respectively. We take a, origin of x as the average

of the two extreme values i.e. 54 and 70. Hence 𝑎 = 62 similarly, the origin of y is taken as 𝑏 =24+35

Computation of Correlation Coefficient Between Test Scores and Sales.

Scores

(𝒙𝒊)

₹ 000

(𝒚𝒊)

𝒖𝒊 =

𝒙𝒊 − 𝟔𝟐

𝒗𝒊 =

𝒚𝒊 = 𝟑𝟎

𝒖𝒊 𝒗𝒊 =

(3)×(4)

𝒖𝒊𝟐 =

(𝟑)𝟐

𝒗𝒊𝟐 =

(𝟒)𝟐

(1) (2) (3) (4) (5) (6) (7)

60 31 -2 1 -2 4 1

55 28 -7 -2 14 49 4

62 26 0 -4 0 0 16

56 24 -6 -6 36 36 36

62 30 0 0 0 0 0

64 35 2 5 10 4 25

70 28 8 -2 -16 64 4

54 24 -8 -6 48 64 36

Total - -13 -14 90 221 122

Since correlation coefficient remains unchanged due to change of origin, we have

𝑟 = 𝑟𝑥̅𝑦 = 𝑟𝑢𝑣 =𝑛 ∑ 𝑢𝑖𝑣𝑖 − ∑ 𝑢𝑖 × ∑ 𝑣𝑖

√𝑛 ∑ 𝑢𝑖2 − (∑ 𝑢𝑖)

2 × √𝑛 ∑ 𝑣𝑖2 − (∑ 𝑣𝑖)

𝑟 =8 × 90 − (−13) × (−14)

√8 × 221 − (−13)2 × √8 × 122 − (−14)2=

√1768 − 169 × √976 − 196= 0.48

Note: change of origin reduces the computational labor to a great extent.

Question 4: Examine whether there is any correlation between age and blindness on the basis of the

following data:

Age in years 0 –

10 –

20 –

30 –

40 –

50 –

60 –

70 –

No. of persons (in

thousands)

90 120 140 100 80 60 40 20

No. blind Persons 10 15 18 20 15 12 10 06

SSA Statistics 4.8

Answer: Let us denote the mid-value of age in years as x and the number of blind persons per lakh as

y. Then as before, we compute correlation coefficient between x and y.

Computation of correlation between age and blindness

Age in

Mid–value

No. of

Persons

(‘000)

No. of

No. of blind

per lakh

𝒚 =

(𝟒)

(𝟑)× 𝟏 𝒍𝒂𝒌𝒉

𝒙𝒚 =

(𝟐). (𝟓)

𝒙𝟐

(𝟐)𝟐

𝒚𝟐

(𝟓)𝟐

(1) (2) (3) (4) (5) (6) (7) (8)

0 – 10 5 90 10 11 55 25 121

10 – 20 15 120 15 12 180 225 144

20 – 30 25 140 18 13 325 625 169

30 – 40 35 100 20 20 700 1225 400

40 – 50 45 80 15 19 855 2025 361

50 – 60 55 60 12 20 1100 3025 400

70 – 80 75 20 6 30 2250 5625 900

Total 320 - - 150 7090 17000 3120

The correlation coefficient between age and blindness is given by

𝑟 =𝑛 ∑ 𝑥𝑦 − ∑ 𝑥. ∑ 𝑦

√𝑛 ∑ 𝑥2 − (∑ 𝑥)2 × √𝑛 ∑ 𝑦2 − (∑ 𝑦)2=

8 × 7,090 − 320 × 150

√8 × 17,000 − (320)2 × √8 × 3120 − (150)2= 0.96

Which exhibits a very high degree of positive correlation between age and blindness.

Note: There may be some confusion about selecting the pair of variables for which correlation is wanted.

Question 5: Coefficient of correlation between x and y for 20 items is 0.4. The AM’s and SD’s of x and y

are known to be 12 and 15 and 3 and 4 respectively. Later on, it was found that the pair (20, 15) was

wrongly taken as (15, 20). Find the correct value of the correlation coefficient.

Answer: 𝑊𝑒 𝑎𝑟𝑒 𝑔𝑖𝑣𝑒𝑛 𝑡ℎ𝑎𝑡 𝑛 = 20 𝑎𝑛𝑑 𝑡ℎ𝑒 𝑜𝑟𝑖𝑔𝑖𝑛𝑎𝑙 𝑟 = 0.4, 𝑥 = 12, 𝑦 = 15, 𝑆𝑥̅ = 3 𝑎𝑛𝑑 𝑆𝑦 = 4

𝑆𝑥̅ × 𝑆𝑦

0.4 =𝑐𝑜𝑣(𝑥, 𝑦)

3 × 4= 𝑐𝑜𝑣(𝑥, 𝑦) = 4.8

∑ 𝑥𝑦

𝑛− 𝑥. 𝑦 = 4.8,

∑ 𝑥𝑦

20− 12 × 15 = 4.8 𝑎𝑛𝑑 ∑ 𝑥𝑦 = 3696

𝐻𝑒𝑛𝑐𝑒, 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑 ∑ 𝑥𝑦 = 3696 − 20 × 15 + 15 × 20 = 3696

Also, 𝑆𝑥̅2 =

∑ 𝑥2

20− 122 = 9 𝑎𝑛𝑑 ∑ 𝑥2 = 3060

𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑙𝑦, 𝑆𝑦2 =

∑ 𝑦2

20− 152 = 16 𝑎𝑛𝑑 ∑ 𝑦2 = 4820

SSA Statistics 4.9

Thus corrected ∑ 𝑥 = 𝑛𝑋 − 𝑊𝑟𝑜𝑛𝑔 𝑣𝑎𝑙𝑢𝑒 + 𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝑉𝑎𝑙𝑢𝑒

Corrected ∑ 𝑥 = 20 × 12 − 15 + 20 = 245

Corrected ∑ 𝑦 = 20 × 15 − 20 + 15 = 295

𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑 ∑ 𝑥2 = 3060 − 152 + 202 = 3235

𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑 ∑ 𝑦2 = 4820 − 202 + 152 = 4645

Thus corrected value of the correlation coefficient by applying formula

20 × 3696 − 245 × 295

√20 × 3235 − (245)2 × √20 × 4645 − (295)2

=73920 − 72275

68.3740 × 76.6480= 0.31

Question 6: Compute the coefficient of correlation between marks in Statistics and Mathematics for the

bivariate frequency distribution shown in Table given below

Answer: For the sake of computational advantage, we effect change of origin and scale for both the

variable x and y.

𝐷𝑒𝑓𝑖𝑛𝑒 𝑢𝑖 =𝑥𝑖 − 𝑎

𝑥𝑖 − 10

4 𝑎𝑛𝑑 𝑣𝑗 =

𝑦𝑖 − 𝑐

𝑦𝑖 − 10

Computation of Correlation Coefficient Between Marks of Mathematics and Statistics

CI 0 – 4 4 – 8 8 – 12 12 – 16 16 – 20

𝑚 2 6 10 14 18

CI 𝑚 𝑣𝑗

𝑢𝑖` –2 –1 0 1 2 𝑓𝑖𝑜 𝑓𝑖𝑜𝑢𝑖 𝑓𝑖𝑜𝑢𝑖

2 𝑓𝑖𝑗𝑢𝑖𝑣𝑗

0 – 4 2 –2 1 [4] 1 [2] 2 [0] 4 –8 16 6

64 – 8 6 –1 2 [4] 4 [4] 5 [0] 1 [−1] 1 [−2] 13 –13 13 5

8 – 12 10 0 2 [0] 4 [0] 6 [0] 1 [0] 13 0 0 0

12 – 16 14 1 1 [1] 3 [0] 2 [2] 5 [10] 11 11 11 11

16 – 20 18 2 1 [0] 5 [10] 3 [12] 9 18 36 22

𝑓𝑜𝑗 3 8 15 14 10 50 5 76 44

𝑓𝑜𝑗𝑣𝑗 –6 –8 0 14 20 20

𝑓𝑜𝑗𝑣𝑗2 12 8 0 14 40 74

𝑓𝑖𝑗𝑢𝑖𝑣𝑗 8 5 0 11 20 44 Check

A single formula for computing correlation coefficient from bivariate frequency distribution is given by

SSA Statistics 4.10

𝑟 =𝑁 ∑ 𝑓𝑖𝑗𝑢𝑖𝑣𝑗 − ∑ 𝑓𝑖𝑜𝑢𝑖 × ∑ 𝑓𝑜𝑗𝑣𝑗𝑖,𝑗

√𝑁 ∑ 𝑓𝑖𝑜𝑢𝑖2 − (∑ 𝑓𝑖𝑜𝑢𝑖)

2 × ∑ 𝑓𝑜𝑗𝑣𝑗2 − (∑ 𝑓𝑜𝑗𝑣𝑗)

50 × 44 − 8 × 20

√50 × 76 − 82√50 × 74 − 202=

61.12 × 57.45

= 0.58

The value of r shown a good amount of positive correlation between the marks in Statistics and

Mathematics on the basis of the given data.

Question 7: Given that the correlation coefficient between x and y is 0.8, write down the correlation

coefficient between u and v where

1. 2u + 3x + 4 = 0 and 4v + 16y + 11 = 0

2. 2u – 3x + 4 = 0 and 4v + 16y + 11 = 0

3. 2u – 3x + 4 = 0 and 4v – 16y + 11 = 0

4. 2u + 3x + 4 = 0 and 4v – 16y + 11 = 0

Answer: change of origin and scale have no impact in value but affects the sign

𝑟𝑥̅𝑦 =𝑏𝑑

|𝑏||𝑑|𝑟𝑢𝑣

𝑟𝑥̅𝑦 = 𝑟𝑢𝑣 𝑖𝑓 𝑏 𝑎𝑛𝑑 𝑑 𝑎𝑟𝑒 𝑠𝑎𝑚𝑒 𝑠𝑖𝑔𝑛

𝑟𝑥̅𝑦 = −𝑟𝑢𝑣 𝑖𝑓 𝑏 𝑎𝑛𝑑 𝑑 𝑎𝑟𝑒 𝑜𝑝𝑝𝑜𝑠𝑖𝑡𝑒 𝑠𝑖𝑔𝑛𝑠

𝐼𝑛 (1), 𝑢 = −2 −3

2𝑥 𝑎𝑛𝑑 𝑣 = −

4− 4𝑦 ℎ𝑒𝑛𝑐𝑒 𝑟𝑢𝑣 = 0.8

𝐼𝑛 (2), 𝑢 = −2 +3

4− 4𝑦 ℎ𝑒𝑛𝑐𝑒 𝑟𝑢𝑣 = −0.8

𝐼𝑛 (3), 𝑢 = −2 +3

4+ 4𝑦 ℎ𝑒𝑛𝑐𝑒 𝑟𝑢𝑣 = 0.8

𝐼𝑛 (4), 𝑢 = −2 −3

4+ 4𝑦 ℎ𝑒𝑛𝑐𝑒 𝑟𝑢𝑣 = −0.8

Question 8: compute the coefficient of rank correlation between sales and advertisement expressed in

thousands of rupees from the following data:

Sales (𝑥𝑖) 90 85 68 75 82 80 95 70

Advertisement (𝒚𝒊) 7 6 2 3 4 5 8 1

Answer:

Computation of Rank correlation between Sales and Advertisement

(𝒙𝒊) (𝒚𝒊) Rank for (𝒙𝒊) Rank for (𝒚𝒊) 𝒅𝒊 = 𝒙𝒊 − 𝒚𝒊 𝒅𝒊𝟐

90 7 2 2 0 0

85 6 3 3 0 0

68 2 8 7 1 1

75 3 6 6 0 0

SSA Statistics 4.11

82 4 4 5 -1 1

80 5 5 4 1 1

95 8 1 1 0 0

70 1 7 8 -1 1

Total - - - 0 4

𝑟𝑅 = 1 −6 ∑ 𝑑𝑖

𝑛(𝑛2 − 1)= 1 −

6 × 4

8(82 − 1)= 0.95

The high positive value of the rank correlation coefficient indicates that there is a very good amount of

agreement between sales and advertisement.

Tied Rank

Question 9: Compute the coefficient of rank correlation between Eco. Marks and stats. Marks as given

below:

Economics Marks (𝑥𝑖) 80 56 50 48 50 62 60

Stats Marks (𝒚𝒊) 90 75 75 65 65 50 65

Answer:

Computation of Rank Correlation Between Eco Marks and Stats Marks with Tied Marks

Eco Mark (𝒙𝒊) Stats Mark (𝒚𝒊) Rank for Eco (𝒙𝒊) Rank for stats (𝒚𝒊) 𝒅𝒊 = 𝒙𝒊 - 𝒚𝒊 𝒅𝒊𝟐

80 90 1 1 0 0

56 75 4 2.50= 2+3

2 1.50 2.25

50 75 5.50 = 5+6

2 2.50 =

48 65 7 5 = 4+5+6

50 65 5.50 = 5+6

3 0.50 0.25

62 50 2 7 -5 25

60 65 3 5 = 4+5+6

3 -2 4

Total - - - 0 44.50

For Economics mark there is one tie of length 2 and for stats mark, there are two ties of lengths 2 and 3

respectively.

𝑟𝑅 = 1 −6 [∑ 𝑑𝑖

2 + ∑ (𝑡𝑗

3−𝑡𝑗

12)𝑗𝑖 ]

𝑛(𝑛2 − 1)= 1 −

6 × (44.50 +(23−2)+ (23−2)+ (33−3)

7(72 − 1)= 0.15

Question 10: For a number of towns, the coefficient of rank correlation between the people living below

the poverty line and increase of population is 0.50. If the sum of squares of the differences in ranks

awarded to these factors is 82.50, find the number of towns.

Answer:

SSA Statistics 4.12

𝐴𝑠 𝑔𝑖𝑣𝑒𝑛 𝑟𝑅 = 0.50, ∑ 𝑑𝑖2 = 82.50.

𝑡ℎ𝑢𝑠 𝑟𝑅 = 1 −6 ∑ 𝑑𝑖

𝑛(𝑛2 − 1)

= 0.50 = 1 −6 × 82.50

𝑛(𝑛2 − 1)

𝑛(𝑛2 − 1) = 990 ∴ 𝑛 = 10 𝑎𝑠 𝑛 𝑚𝑢𝑠𝑡 𝑏𝑒 𝑎 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑖𝑛𝑡𝑒𝑔𝑒𝑟

Question 11: While computing rank correlation coefficient between profits and investment for 10 years

of a firm, the difference in rank for a year was taken as 7 instead of 5 by mistake and the value of rank

correlation coefficient was computed as 0.80. What would be the correct value of rank correlation

coefficient after rectifying the mistake?

Answer: We are given that n = 10,

𝑟𝑅 = 0.80 and the wrong 𝑑𝑖 = 7 should be replaced by 5

𝑟𝑅 −6 ∑ 𝑑𝑖

𝑛(𝑛2 − 1)

0.80 = 1 −6 ∑ 𝑑𝑖

10(102 − 1) & ∑ 𝑑𝑖

2 = 33

𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑 ∑ 𝑑𝑖2 = 33 − 72 + 52 = 9

Hence rectified value of rank correlation coefficient

1 −6 × 9

10 × (102 − 1)= 0.95

Question 12: Find the coefficient of concurrent deviations from the following data.

Year 1990 1991 1992 1993 1994 1995 1996 1997

Price 25 28 30 23 35 38 39 42

Demand 35 34 35 30 29 28 26 23

Answer:

Computation of Coefficient of Concurrent Deviations

Year Price Sign of deviation (a) Demand Sign of deviation (b) Product of deviation (ab)

1990 25 35

1991 28 + 34 - -

1992 30 + 35 + +

1993 23 - 30 - +

1994 35 + 29 - -

1995 38 + 29 - -

1996 39 + 26 - -

1997 42 + 23 - -

𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑎𝑖𝑟𝑠 𝑜𝑓 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛𝑠 (𝑚) = 7

SSA Statistics 4.13

𝑁𝑜. 𝑜𝑓 + 𝑠𝑖𝑔𝑛𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑟𝑜𝑑𝑢𝑐𝑡 𝑜𝑓 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑐𝑜𝑙𝑢𝑚𝑛 𝑜𝑟 𝑁𝑜. 𝑜𝑓 𝑐𝑜𝑛𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛𝑠 (𝑐) = 2

𝑡ℎ𝑢𝑠 𝑟𝑐 = ±√±(2𝑐 − 𝑚)

±√±(4 − 7)

𝑚= ±√±

(−3)

7= √

7= −0.65

(𝑠𝑖𝑛𝑐𝑒 2𝑐 − 𝑚

𝑚 =

7 𝑤𝑒 𝑡𝑎𝑘𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑠𝑖𝑔𝑛 𝑏𝑜𝑡ℎ 𝑖𝑛𝑠𝑖𝑑𝑒 𝑎𝑛𝑑 𝑜𝑢𝑡𝑠𝑖𝑑𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑟𝑒𝑑𝑖𝑐𝑎𝑙 𝑠𝑖𝑔𝑛)

𝑇ℎ𝑢𝑠 𝑡ℎ𝑒𝑟𝑒 𝑖𝑠 𝑎 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑝𝑟𝑖𝑐𝑒 𝑎𝑛𝑑 𝑑𝑒𝑚𝑎𝑛𝑑.

SSA Statistics 4.14

Regression Analysis

To predict the value of the dependent variable corresponding to a known value of the independent

variable

A statistical / Mathematical relationship between the variables that indicates the degree & direction

of the association.

Do not bring functional / Algebraic relationship

Applicable to both linear & as well as curviliner.

Points to Ponder:

1. Assumption: There exists a mathematical / Average relationship between the two variables

2. Variable ‘y’ (if influenced by 𝑥) is the dependent / Regression / Explained variable and

3. variable ‘𝑥′ - Independent / predictor / explanator

Regression Lines – The line of best fit (method of least square)

𝑦 = 𝑎 + 𝑏𝑥

𝑊ℎ𝑒𝑟𝑒, 𝑎 & 𝑏 – (𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡𝑠) 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 &

𝑏 = 𝑏𝑦𝑥̅ = 𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑦 𝑜𝑛 𝑥

Regression: Normal Equations - Method of Least Squares: to minimize the Error / Residue,

𝑒𝑖 = Observed Value – Estimated Value

∑ 𝑒𝑖2 = ∑(𝑦𝑖 − �̂�𝑖)

2 = ∑(𝑦𝑖 − 𝑎 − 𝑏𝑥𝑖)2

Here 𝑦𝑖 Actual / observed value

�̂�𝑖 the estimated value of �̂�𝑖 for given 𝑥𝑖

𝑒𝑖 = (𝑦𝑖 − �̂�𝑖) Error / Residue – the difference between the observed & estimate value

Comprehensive Question: Marks of 8 students in Mathematics and statistics are given as:

SSA Statistics 4.15

Mathematics (x) 80 75 76 69 70 85 72 68

Statistics (y) 85 65 72 68 67 88 80 70

Find the following:

1. Karl Pearson’s Product Moment Correlation Co-efficient

2. Spearsman Rank Correlation Coeeficeint

3. Correlation Coefficeint of concurrent Deviation

4. Find the regression lines.

5. When marks in Mathematics is 90, what is the most likely marks in statistics?

6. When marks in Statistics is 92, what is the most likely marks in Mathematics?

Answer:

Working Note:

𝒙 𝒚 𝒙𝒚 𝒙𝟐 𝒚𝟐 𝒙 − �̅� 𝒚 − �̅� (𝒙 − �̅�) (𝒚 − �̅�)

(𝒙 − �̅�)𝟐 (𝒚 − �̅�)𝟐 (𝒙 − �̅�)𝒚 (𝒚 − �̅�)𝒙

80 85 6800 6400 7225 5.625 10.625 59.7656 31.641 112.8906 478.125 850

75 65 4875 5625 4225 0.625 -9.375 -5.8594 0.391 87.89063 40.625 -703.125

76 72 5472 5776 5184 1.625 -2.375 -3.8594 2.641 5.640625 117 -180.5

69 68 4692 4761 4624 -5.375 -6.375 34.2656 28.891 40.64063 -365.5 -439.875

70 67 4690 4900 4489 -4.375 -7.375 32.2656 19.141 54.39063 -293.125 -516.25

85 88 7480 7225 7744 10.625 13.625 144.7656 112.891 185.6406 935 1158.125

72 80 5760 5184 6400 -2.375 5.625 -13.3594 5.641 31.64063 -190 405

68 70 4760 4624 4900 -6.375 -4.375 27.8906 40.641 19.14063 -446.25 -297.5

595 595 44529 44495 44791 0 0 275.875 241.875 537.875 275.875 275.875

𝒙 𝒚 𝒖1 𝒗2 𝒖𝟐 𝒗𝟐 𝒖𝒗 𝒓𝒙 𝒓𝒚 𝒅𝟐 𝒂 𝒃 𝒂𝒃

80 85 6 9 36 81 54 2 2 0

75 65 1 -11 1 121 -11 4 8 16 - - +

76 72 2 -4 4 16 -8 3 4 1 + + +

69 68 -5 -8 25 64 40 7 6 1 - - +

70 67 -4 -9 16 81 36 6 7 1 + - -

85 88 11 12 121 144 132 1 1 0 + + +

72 80 -2 4 4 16 -8 5 3 4 - - +

68 70 -6 -6 36 36 36 8 5 9 - - +

595 595 3 -13 243 559 271 32

�̅� =∑ 𝑥

8= 74.375 𝑦 =

∑ 𝑥

8= 74.375

To find 𝑺𝒙:

1 𝒖 = (𝒙 − 𝟕𝟒) 2 𝒗 = (𝒚 − 𝟕𝟔)

SSA Statistics 4.16

Formulae 1: 𝑆𝑥̅ = √∑ 𝑥̅2

𝑛− (𝑥)2 = √

44,495

8− 74.3752 = 5.4986

Formulae 2: 𝑆𝑥̅ = √∑(𝑥̅−𝑥̅)2

𝑛= √

241.875

8= 5.4986

To find 𝑺𝒚:

Formulae 1: 𝑆𝑦 = √∑ 𝑦2

𝑛− (𝑦)2 = √

44,791

8− 74.3752 = 8.1997

Formulae 2: 𝑆𝑦 = √∑(𝑦−𝑦)2

𝑛= √

537.875

8= 8.1997

To find 𝐂𝐨𝐯 (𝒙, 𝒚):

Formulae 1: Cov (𝑥, 𝑦) =∑ 𝑥̅𝑦

𝑛− 𝑥. 𝑦 =

44,529

8− 74.375 × 74.375 = 34.4844

Formulae 2: Cov (𝑥, 𝑦) =∑(𝑥̅−𝑥̅)(𝑦−𝑦)

𝑛= Cov (𝑥, 𝑦) =

275.875

8= 34.4844

To find Karl Pearson’s Product Moment Correlation Co-efficient - r

Formulae 1: 𝑟 =𝑐𝑜𝑣(𝑥̅,𝑦)

𝑆𝑥 .𝑆𝑦 = 𝑟 =

34.4844

5.4986×8.1997 = 0.7648

Formulae 2: 𝑟 =𝑛 ∑ 𝑥̅𝑦− ∑ 𝑥̅ ∑ 𝑦

√𝑛 ∑ 𝑥̅2−(∑ 𝑥̅)2×√𝑛 ∑ 𝑦2

−(∑ 𝑦)2

= 𝑟 =8×44,529−595×595

√8×44,495−5952×√8×44,791−5952= 0.7648

Formulae 3: 𝑟 = 𝑟𝑥̅𝑦 = 𝑟𝑢𝑣 =𝑛 ∑ 𝑢𝑖𝑣𝑖−∑ 𝑢𝑖×∑ 𝑣𝑖

√𝑛 ∑ 𝑢𝑖2−(∑ 𝑢𝑖)2×√𝑛 ∑ 𝑣𝑖

2− (∑ 𝑣𝑖)2= 𝑟 =

8×271−(3×(−13)

√8×243−32×√8×559−(−13)2= 0.7648

Formulae 4: 𝑟 = √𝑏𝑦𝑥̅ × 𝑏𝑥̅𝑦 = 𝑟 = √𝑟𝑠𝑦

𝑠𝑥× 𝑟

𝑠𝑥

𝑠𝑦= 𝑟 = √1.1406 × 0.5129 = 0.7648

To find Spearsman Rank Correlation Co-efficient - 𝑟𝑅

𝑟𝑅 = 1 −6 ∑ 𝑑𝑖

𝑛(𝑛2 − 1)= 1 −

6 × 32

8(82 − 1)= 0.62

To find Concurrent Deviation - 𝑟𝑐

𝑟𝑐 = ±√±(2𝑐−𝑚)

𝑚= ±√±

(2×6−7)

7 = 0.84

Y on X X on Y

1 Normal Equation

𝑦 = 𝑎 + 𝑏𝑥 𝑥 = �̂� + �̂�𝑦

∑ 𝑦

= 𝑛𝑎 + 𝑏 ∑ 𝑥

595 = 8𝑎 + 595𝑏 → (1) ∑ 𝑥

= 𝑛�̂� + �̂� ∑ 𝑦

595 = 8𝑎 + 595𝑏 → (1)

∑ 𝑥𝑦

= 𝑎 ∑ 𝑥

+ 𝑏 ∑ 𝑥2

44,529 = 595𝑎 + 44,495𝑏

→ (2) ∑ 𝑥𝑦

= �̂� ∑ 𝑦

+ �̂� ∑ 𝑦2

44,529 = 595𝑎 + 44,791𝑏

→ (2)

SSA Statistics 4.17

3,54,025

= 4,760𝑎 + 3,54,025𝑏

→ (3) = ((1) × 595)

3,54,025

= 4,760𝑎 + 3,54,025𝑏

→ (3) = ((1) × 595)

3,56,232

= 4,760𝑎 + 3,55,960𝑏

→ (4) = ((2) × 8)

3,56,232

= 4,760𝑎 + 3,58,328𝑏

→ (4) = ((2) × 8)

𝑏 =

3,56,232 − 3,54,025

3,55,960 − 3,54,025

= 1.1406

− (3)

𝑏 =

3,56,232 − 3,54,025

3,58,328 − 3,54,025

= 0.5129

− (3)

𝑎 = −10.4571

→ 1.1406 𝑓𝑜𝑟 𝑏 𝑖𝑛 (1) 𝑎 = 36.2281

→ 0.5129 𝑓𝑜𝑟 𝑏 𝑖𝑛 (1)

𝑦 = −10.4571 + 1.1406𝑥 𝑥 = 36.2281 + 0.5129𝑦

= −10.4571 + 1.1406(90)

𝑥 = 36.2281 + 0.5129(92)

2 Simplified Formula using Normal Equation

𝑦 = 𝑎 + 𝑏𝑥 𝑥 = �̂� + �̂�𝑦

𝑏 = 𝑏𝑦𝑥̅

=𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦

𝑛 ∑ 𝑥2 − (∑ 𝑥)2

=8 × 44,529 − 595 × 595

8 × 44,495 − (595)2

= 1.1406

�̂� = 𝑏𝑥̅𝑦

=𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦

𝑛 ∑ 𝑦2 − (∑ 𝑦)2

=8 × 44,529 − 595 × 595

8 × 44,791 − (595)2

= 0.5129

𝑎 = �̅� − 𝑏�̅� 𝑎

= 74.375

− 74.375(1.1406)

= 10.4571

�̂� = �̅� − �̂��̅� 𝑎

= 74.375

− 74.375(0.5129)

= 36.2281

= −10.4571 + 1.1406(90)

𝑥 = 36.2281 + 0.5129(90)

3 Deviation Method (Deviation taken from mid value)

𝑦 = 𝑎 + 𝑏(𝑥 − �̅�) 𝑥 = �̂� + �̂�(𝑦 − �̅�)

𝑎 =

∑ 𝑦

𝑛 𝑎 =

8= 74.375 �̂� =

∑ 𝑥

𝑛 𝑎 =

8= 74.375

𝑏 = 𝑏𝑦𝑥̅

=∑(𝑥 − �̅�)𝑦

∑(𝑥 − �̅�)2

𝑏 =275.875

241.875= 1.1406

�̂� = 𝑏𝑥̅𝑦

=∑(𝑦 − �̅�)𝑥

∑(𝑦 − �̅�)2

𝑏 =275.875

537.875= 0.5129

= 74.375

+ 1.1406(90 − 74.375)

= 74.375

+ 0.5129(92 − 74.375)

4 Deviation Method (Deviation taken from assumed value)

𝑦 = 𝑎 + 𝑏𝑥 𝑥 = �̂� + �̂�𝑦

SSA Statistics 4.18

𝑏 = 𝑏𝑦𝑥̅ = 𝑏𝑣𝑢

=𝑛 ∑ 𝑢𝑣 − ∑ 𝑢 ∑ 𝑣

𝑛 ∑ 𝑢2 − (∑ 𝑢)2

𝑏 =8 × 271 − 3 × (−13)

8 × 243 − 32

= 1.1406

𝑏 = 𝑏𝑥̅𝑦 = 𝑏𝑣𝑢

=𝑛 ∑ 𝑢𝑣 − ∑ 𝑢 ∑ 𝑣

𝑛 ∑ 𝑣2 − (∑ 𝑣)2

𝑏 =8 × 271 − 3 × (−13)

8 × 559 − (−13)2

= 0.5129

𝑎 = �̅� − 𝑏�̅� 𝑎

= 74.375 − 1.1406 × 74.375

= −10.4571

�̂� = �̅� − �̂��̅� 𝑎 = 74.375 − 0.5129

× 74.375

= 36.2281

= −10.4571 + 1.1406(90)

𝑥 = 36.2281 + 0.5129(92)

5 Point Slope Form

𝑦 − �̅�

= 𝑏𝑦𝑥̅ (𝑥 − �̅�)

𝑦 − 74.375 = 1.1405(90

− 74.375)

= 𝑦 = 92

𝑥 − �̅�

= 𝑏𝑥̅𝑦(𝑦 − �̅�)

𝑥 − 74.375 = 0.5129 (92

− 74.375)

= 𝑥 = 83

𝑚 = 𝑏𝑦𝑥̅ = 𝑟𝑠𝑦

𝑠𝑥̅

𝑏𝑦𝑥̅ = 0.7648 ×8.1997

5.4986= 1.1405

𝑚 = 𝑏𝑥̅𝑦 = 𝑟𝑠𝑥̅

𝑠𝑦

𝑏𝑦𝑥̅ = 0.7648 ×5.4986

8.1997= 0.5129

Properties of Regression Lines

(1) The regression coefficients remain unchanged due to a shift of origin but change due to a shift of

scale.

𝑰𝒇 𝒖 =𝒙 − 𝒂

𝒑 𝒂𝒏𝒅 𝒗 =

𝒚 − 𝒄

𝒒 𝒕𝒉𝒆𝒏 𝒃𝒚𝒙 =

𝒑× 𝒃𝒗𝒖 𝒂𝒏𝒅 𝒃𝒙𝒚 =

𝒒× 𝒃𝒖𝒗

Problem 1: Find out the coefficients of 𝒙 and 𝒚

𝒙 12 17 22 27 32

𝒚 24 44 55 64 84

Answer:

𝑭𝒐𝒓𝒎𝒖𝒍𝒂 𝑪𝒂𝒍𝒄𝒖𝒍𝒂𝒕𝒊𝒐𝒏 𝑨𝒏𝒔𝒘𝒆𝒓

𝑏 = 𝑏𝑦𝑥̅ =𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦

𝑛 ∑ 𝑥2 − (∑ 𝑥)2 𝑏𝑦𝑥̅ =

5 × 6,640 − 110 × 270

5 × 2,670 − (110)2 2.8

𝑏 = 𝑏𝑦𝑥̅ = 𝑏𝑣𝑢 =𝑛 ∑ 𝑢𝑣 − ∑ 𝑢 ∑ 𝑣

𝑛 ∑ 𝑢2 − (∑ 𝑢)2 𝑏𝑣𝑢 =

5 × 114 − 20 × 25

5 × 90 − (20)2 1.4

𝑰𝒇 𝒖 =𝒙 − 𝒂

𝒑 𝒂𝒏𝒅 𝒗 =

𝒚 − 𝒄

𝒒 𝒕𝒉𝒆𝒏 𝒃𝒚𝒙 =

𝒑× 𝒃𝒖𝒗 𝒂𝒏𝒅 𝒃𝒙𝒚 =

𝒒× 𝒃𝒗𝒖

𝒖 =𝒙 − 𝟐

𝟓 𝒂𝒏𝒅 𝒗 =

𝒚 − 𝟒

𝟏𝟎 𝒕𝒉𝒆𝒏 𝟐. 𝟖 =

𝟏𝟎

𝟓× 𝒃𝒖𝒗 𝒂𝒏𝒅 𝒃𝒗𝒖 = 𝟏. 𝟒

𝒙 𝒚 𝒖 =𝒙 − 𝟐

𝟓 𝒗 =

𝒚 − 𝟒

𝟏𝟎 𝒙𝟐 𝒚𝟐 𝒙𝒚 𝒖𝟐 𝒗𝟐 𝒖𝒗

12 24 2 2 144 576 288 4 4 4

17 44 3 4 289 1936 748 9 16 12

22 54 4 5 484 2916 1188 16 25 20

27 64 5 6 729 4096 1728 25 36 30

SSA Statistics 4.19

32 84 6 8 1024 7056 2688 36 64 48

110 270 20 25 2670 16580 6640 90 145 114

Question 2: If the relationship between two variables 𝑥 and u is 𝑢 + 3𝑥 = 10 and between two other

variables 𝑦 and 𝑣 is 2𝑦 + 5𝑣 = 25, and the regression coefficient of 𝑦 𝑜𝑛 𝑥 is known as 0.80, what would

be the regression coefficient of 𝑣 on 𝑢?

Answer:

𝑢 + 3𝑥 = 10 & 𝑢 =(𝑥 −

𝑎𝑛𝑑 2𝑦 + 5𝑣 = 25 & 𝑣 =(𝑦 −

𝑏𝑦𝑥̅ =𝑞

𝑝× 𝑏𝑣𝑢 = 0.8 =

−5/2

−1/3𝑏𝑣𝑢 𝑎𝑛𝑑 𝑏𝑣𝑢 =

15× 0.8 = 0.1067

(2) The two lines of regression intersect at the point 𝒙, �̅�, where x and y are the variables under

consideration.

(3) The Correlation coefficient, r is the Geometric Mean of the Regression Coefficients

𝑟 = ±√± 𝑏𝑦𝑥̅ × 𝑏𝑥̅𝑦

𝑖𝑓 𝑏𝑦𝑥̅ & 𝑏𝑥̅𝑦 𝑎𝑟𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒, 𝑟 𝑖𝑠 – 𝑣𝑒

𝑖𝑓 𝑏𝑦𝑥̅ & 𝑏𝑥̅𝑦 𝑎𝑟𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 , 𝑟 𝑖𝑠 + 𝑣𝑒

Question 3: For the variables 𝑥 and 𝑦, the regression equations are given as 7𝑥– 3𝑦– 18 = 0 and

4𝑥– 𝑦– 11 = 0

1. Find the arithmetic means of 𝑥 and 𝑦.

2. Identify the regression equation of 𝑦 on 𝑥.

3. Compute the correlation coefficient between 𝑥 and 𝑦.

4. Given the variance of 𝑥 is 9, find the SD of 𝑦.

Answer:

(1) Since the two lines of regression intersect at the point (𝑥, 𝑦) replacing 𝑥 any y by 𝑥 and 𝑦

7𝑥 − 3𝑦 − 18 = 0 𝑎𝑛𝑑 4𝑥 − 𝑦 − 11 = 0

Solving these two equations, we get 𝑥 = 3 𝑎𝑛𝑑 𝑦 = 1

Thus the arithmetic means of 𝑥 and y are given by 3 and 1 respectively.

(2) Let us assume that 7𝑥– 3𝑦– 18 = 0 represents the regression line of y on 𝑥 and 4𝑥– 𝑦– 11 = 0

represents the regression line of 𝑥 on y.

𝑁𝑜𝑤 7𝑥 − 3𝑦 − 18 = 0

⟹ 𝑦 = (−6) +7

3𝑥 ∴ 𝑏𝑦𝑥̅ =

Again 4 𝑥 − 𝑦 − 11 = 0

SSA Statistics 4.20

⟹ 𝑥 =11

4𝑦 ∴ 𝑏𝑥̅𝑦 =

𝑇ℎ𝑢𝑠 𝑟2 = 𝑏𝑦𝑥̅ × 𝑏𝑥̅𝑦 =7

Since |𝑟| ≤ 1 ⟹ 𝑟2 ≤ 1, our assumptions are correct. Thus, 7𝑥 − 3𝑦 − 18 = 0 truly represents the

regression line of 𝑦 𝑜𝑛 𝑥

(3) Since 𝑟2 =7

12 ∴ 𝑟 = √

12= 0.7638 (we take the sign of r as positive since both the regression

coefficient are positive)

(4) 𝑏𝑦𝑥̅ = 𝑟 ×𝑆𝑦

𝑆𝑥 ⟹

3= 0.7638 ×

𝑆𝑦

3 (∴ 𝑆𝑥̅

2 = 9 𝑎𝑠 𝑔𝑖𝑣𝑒𝑛)

⟹ 𝑆𝑦 =7

0.7638= 9.1647

Probable Error (PE) – A method to obtain correlation coefficient of population

𝑃. 𝐸 = 0.674 ×1 − 𝑟2

√𝑁

Here r – Correlation coefficient from n pairs of sample observations.

𝑃. 𝐸 =2

3𝑆𝐸 (Where SE – Standard error of correlation coefficient)

∴ 𝑆𝐸 =1 − 𝑟2

√𝑁

Limit: 𝑃 = 𝑟 ± 𝑃. 𝐸. , P – population correlation co efficient

Assumption (as probable errors are significant)

1. 𝑟 < 𝑃𝐸, 𝑁𝑜 𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒 𝑜𝑓 𝑐𝑜𝑟𝑟𝑙𝑒𝑎𝑡𝑖𝑜𝑛

2. 𝑟 > 𝑃𝐸, 𝑡ℎ𝑒 𝑝𝑟𝑒𝑠𝑒𝑎𝑛𝑐𝑒 𝑜𝑓 𝑟 𝑖𝑠 𝑐𝑒𝑟𝑡𝑎𝑖𝑛

3. 𝑃𝐸 𝑖𝑠 𝑛𝑒𝑣𝑒𝑟 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 (𝑎𝑠 − 1 ≤ 𝑟 ≤ 1)

Points to Ponder:

1. The sampling followed is Simple Random Sampling

2. The population is normal.

Question 4: Compute the Probable Error assuming the correlation coefficient of 0.8 from a sample of 25

pairs of items.

Answer: 𝑟 = 0.8, 𝑛 = 25

𝑷. 𝑬 = 𝟎. 𝟔𝟕𝟒 ×𝟏 − 𝒓𝟐

√𝑵 𝑃. 𝐸 = 0.674 ×

1 − 0.82

√25 0.0485

Question 5: If 𝑟 = 0.7; and 𝑛 = 64 find out the probable error of the coefficient of correlation and

determine the limits for the population correlation coefficient:

SSA Statistics 4.21

Answer: 𝑟 = 0.7, 𝑛 = 64

𝑷. 𝑬 = 𝟎. 𝟔𝟕𝟒 ×𝟏 − 𝒓𝟐

√𝑵

𝑃. 𝐸

= 0.674 ×1 − 0.72

𝑳𝒊𝒎𝒊𝒕𝒔 𝒇𝒐𝒓 𝒕𝒉𝒆 𝒑𝒐𝒑𝒖𝒍𝒂𝒕𝒊𝒐𝒏 𝒄𝒐𝒓𝒓𝒆𝒍𝒂𝒕𝒊𝒐𝒏 𝒄𝒐𝒆𝒇𝒇𝒊𝒄𝒊𝒆𝒏𝒕

= 𝒓 ± 𝑷. 𝑬 0.7 ± 0.043 0.743, 0.657

Karl Pearsons Coefficient and Coefficient of Determination

Limitation: 𝑟 = 0, need not imply the relationship to be independent or uncorrelated.

Example: For the set of values (−2, 4), (−1,1), (0,0), (1,1) & (2,4)

Cov (𝑥, 𝑦) (−2 × 4) + (−1 × 1) + (0 × 0) + (1 × 1) + (2 × 4) 0

∴ r (𝑎𝑠 𝑥 = 0) 0

But, the non-linear relationship between 𝑥 & 𝑦 is 𝑦 = 𝑥2 and Then 𝑥 & 𝑦 are not independent

Correlation coefficient measuring a linear relationship between the two variables indicates the amount

of variation of one variable accounted for by the other variable. A better measure for this purpose is

provided by the square of the correlation coefficient, Known as ‘coefficient of determination’.

Coefficient of Determination (a better measure)

Description Formula 𝑰𝒇 𝒓 = 𝟎. 𝟔

Calculation Answer

1 Coefficient of determination (by the

factor) 𝑟2 =

𝐸𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒

𝑇𝑜𝑡𝑎𝑙 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 0.62% 36%

2 Coefficient of Non – Determination

(by the other factor)

(1 − 𝑟2)

=𝑈𝑛𝑒𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒

𝑇𝑜𝑡𝑎𝑙 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒

(1 − 0.62)% 64%

Correlation & Regression

𝒓 Correlation Regression

𝒓 = 𝟏 Perfect +ve correlation the two

𝒓 = −𝟏 Perfect –ve correlation Regression times coincide

𝒓 = 𝟎 Not Correlated Lines are perpendicular to each other

Views of different persons

Ya-lun

“There are two related but distinct aspects of the study of association between

variables. Correlation analysis and regression analysis. Correlation analysis has the

objective of determining the degree or strength of the relationship between variables.

Regression analysis attempts to establish the nature of the relationship between

SSA Statistics 4.22

variables – that is, to study the functional relationship between the variables and

thereby provide a mechanism of prediction, or forecasting.”

Croxton

Cowden

“when relationship between two variables is of quantitative nature the appropriate

statistical tool for measuring and expressing it in formula is known as correlation.

Thus correlation is a statistical device which helps in analyzing the relationship and

also the covariation of two or more variables.

Simpson

and Kafta

“correlation analysis deals with the association between two or more variables.”

Formula

1 𝒓 =𝒄𝒐𝒗(𝒙, 𝒚)

𝑺𝒙 . 𝑺𝒚

𝑟 =

∑ 𝑥̅𝑦

𝑛− (

∑ 𝑥̅

𝑛) (

∑ 𝑦

√∑ 𝑥̅2

𝑛− (

∑ 𝑥̅

. √∑ 𝑦2

𝑛− (

∑ 𝑦

𝒓 =𝒏 ∑ 𝒙𝒚 − ∑ 𝒙 ∑ 𝒚

√𝒏 ∑ 𝒙𝟐 − (∑ 𝒙)𝟐 × √𝒏 ∑ 𝒚𝟐

− (∑ 𝒚)𝟐

𝒃𝒚 𝒕𝒂𝒌𝒊𝒏𝒈 𝑳𝑪𝑴

𝒃𝒚𝒙 × 𝒃𝒙𝒚 = 𝒓𝒔𝒚

𝒔𝒙

× 𝒓𝒔𝒙

𝒔𝒚

= 𝒓𝟐 𝒐𝒓 𝒓 = √𝒃𝒚𝒙 × 𝒃𝒙𝒚

𝐜𝐨𝐯(𝐱, 𝐲) = 𝐫𝒔𝒙 𝒔𝒚

𝐂𝐨𝐯 (𝒙, 𝒚) =∑(𝒙 − 𝒙)(𝒚 − 𝒚)

𝒏 𝒐𝒓 𝐂𝐨𝐯 (𝒙, 𝒚) =

∑ 𝒙𝒚

𝒏− 𝒙. 𝒚 𝒐𝒓

∑ 𝒙𝒚

𝒏− (

∑ 𝒙

𝒏) (

∑ 𝒚

𝑺𝒙 = √∑(𝒙 − 𝒙)𝟐

𝒏 𝒐𝒓 𝑺𝒙 = √

∑ 𝒙𝟐

𝒏− (𝒙)𝟐 𝒐𝒓 √

∑ 𝒙𝟐

𝒏− (

∑ 𝒙

𝑺𝒚 = √∑(𝒚 − 𝒚)𝟐

𝒏 𝒐𝒓 𝑺𝒚 = √

∑ 𝒚𝟐

𝒏− (𝒚)𝟐 𝒐𝒓 √

∑ 𝒚𝟐

𝒏− (

∑ 𝒚

𝒓 = 𝒓𝒙𝒚 = 𝒓𝒖𝒗 =

𝒏 ∑ 𝒖𝒊𝒗𝒊 − ∑ 𝒖𝒊 × ∑ 𝒗𝒊

√𝒏 ∑ 𝒖𝒊𝟐 − (∑ 𝒖𝒊)

𝟐 × √𝒏 ∑ 𝒗𝒊𝟐 − (∑ 𝒗𝒊)

𝒓 = 𝒓𝒙𝒚 = 𝒓𝒖𝒗 =𝒏 ∑ 𝒖𝒊𝒗𝒊

√𝒏 ∑ 𝒖𝒊𝟐 × √𝒏 ∑ 𝒗𝒊

𝑾𝒉𝒆𝒓𝒆 𝒖 = (𝒙 − 𝒙) & 𝒗 = (𝒚 − �̅�)

2 𝒓𝑹 = 𝟏 −𝟔 ∑ 𝒅𝒊

𝒏(𝒏𝟐 − 𝟏)

3 𝒓𝒄 = ±√±(𝟐𝒄 − 𝒎)

SSA Statistics 4.23

Y on X X on Y

1 Normal Equation

𝒚 = 𝒂 + 𝒃𝒙 𝒙 = �̂� + �̂�𝒚

∑ 𝒚 = 𝒏𝒂 + 𝒃 ∑ 𝒙 → (𝟏) ∑ 𝒙 = 𝒏�̂� + �̂� ∑ 𝒚 → (𝟏)

∑ 𝒙𝒚 = 𝒂 ∑ 𝒙 + 𝒃 ∑ 𝒙𝟐 → (𝟐) ∑ 𝒙𝒚 = �̂� ∑ 𝒚 + �̂� ∑ 𝒚𝟐 → (𝟐)

𝐶𝑜𝑛𝑠𝑖𝑑𝑒𝑟, 𝑦 = 𝑎 + 𝑏𝑥 → 𝐴

When X = x1, y1 = a + bx1

When X = x2, y2 = a + bx2

When X = xn, yn = a + bxn

Summing up, ∑ y = na + b ∑ x → (1)

X. x1, y1x1 = ax1 + bx12

X. x2, y2x2 = ax2 + bx22

X. xm, ymyn = axm + bxn2

Summing up, ∑ xy = a ∑ x + b ∑ x2 → (2)

2 Simplified Formula using Normal Equation

𝒚 = 𝒂 + 𝒃𝒙 𝒙 = �̂� + �̂�𝒚

𝒃 = 𝒃𝒚𝒙 =

𝒏 ∑ 𝒙𝒚 − ∑ 𝒙 ∑ 𝒚

𝒏 ∑ 𝒙𝟐 − (∑ 𝒙)𝟐 �̂� = 𝒃𝒙𝒚 =

𝒏 ∑ 𝒙𝒚 − ∑ 𝒙 ∑ 𝒚

𝒏 ∑ 𝒚𝟐 − (∑ 𝒚)𝟐

𝒂 = �̅� − 𝒃𝒙 �̂� = �̅� − �̂�𝒙

∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥 → (1)

∑ 𝑥𝑦 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥2 → (2)

∑ 𝑥 ∑ 𝑦 = 𝑎𝑛 ∑ 𝑥 + 𝑏 ∑ 𝑥 ∑ 𝑥 → (3) = (1) × ∑ 𝑥

𝑛 ∑ 𝑥𝑦 = 𝑎𝑛 ∑ 𝑥 + 𝑛𝑏 ∑ 𝑥2 → (4) = (2) × 𝑛

∑ 𝑥 ∑ 𝑦 − 𝑛 ∑ 𝑥𝑦 = 0 + 𝑏 ∑ 𝑥 ∑ 𝑥 − 𝑛𝑏 ∑ 𝑥2 → (5) = (2) − (1)

𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦 = 𝑏 (𝑛 ∑ 𝑥2 − (∑ 𝑥)2

) → (6) = 𝐶ℎ𝑎𝑛𝑔𝑖𝑛𝑔 𝑡ℎ𝑒 𝑠𝑖𝑔𝑛

𝑏 =𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦

𝑛 ∑ 𝑥2 − (∑ 𝑥)2→ (6) = 𝐶ℎ𝑎𝑛𝑔𝑖𝑛𝑔 𝑡ℎ𝑒 𝑠𝑖𝑔𝑛

SSA Statistics 4.24

𝒃 = 𝒃𝒚𝒙 =𝑪𝒐𝒗(𝒙, 𝒚)

𝒔𝒙𝟐

𝒓 =𝒄𝒐𝒗(𝒙, 𝒚)

𝑺𝒙 . 𝑺𝒚

𝐜𝐨𝐯(𝐱, 𝐲) = 𝐫𝒔𝒙 𝒔𝒚

𝒃 = 𝒃𝒚𝒙 =𝐫𝒔𝒙 𝒔𝒚

𝒔𝒙𝟐

𝒃 = 𝒃𝒚𝒙 = 𝐫 𝒔𝒚

𝒔𝒙

3 Deviation Method (Deviation taken from mid value)

𝒚 = 𝒂 + 𝒃(𝒙 − 𝒙) 𝒙 = �̂� + �̂�(𝒚 − �̅�)

𝒂 =

∑ 𝒚

𝒏 �̂� =

∑ 𝒙

𝒃 = 𝒃𝒚𝒙 =

∑(𝒙 − 𝒙)𝒚

∑(𝒙 − 𝒙)𝟐 �̂� = 𝒃𝒙𝒚 =

∑(𝒚 − �̅�)𝒙

∑(𝒚 − �̅�)𝟐

4 Deviation Method (Deviation taken from assumed value)

𝒚 = 𝒂 + 𝒃𝒙 𝒙 = �̂� + �̂�𝒚

𝒃 = 𝒃𝒚𝒙 = 𝒃𝒗𝒖 =

𝒏 ∑ 𝒖𝒗 − ∑ 𝒖 ∑ 𝒗

𝒏 ∑ 𝒖𝟐 − (∑ 𝒖)𝟐 𝒃 = 𝒃𝒙𝒚 = 𝒃𝒗𝒖 =

𝒏 ∑ 𝒖𝒗 − ∑ 𝒖 ∑ 𝒗

𝒏 ∑ 𝒗𝟐 − (∑ 𝒗)𝟐

𝒂 = �̅� − 𝒃𝒙 �̂� = �̅� − �̂�𝒙

5 Point Slope Form

𝒚 − �̅� = 𝒃𝒚𝒙 (𝒙 − 𝒙) 𝒙 − 𝒙 = 𝒃𝒙𝒚(𝒚 − �̅�)

𝒎 = 𝒃𝒚𝒙 = 𝒓𝒔𝒚

𝒔𝒙

𝒎 = 𝒃𝒙𝒚 = 𝒓𝒔𝒙

𝒔𝒚

𝑦 − 𝑦1 = 𝑚(𝑥 − 𝑥1)

𝐿𝑒𝑡 (𝑥1, 𝑦1) = (�̅�, �̅�) 𝑎𝑛𝑑 𝑚 = 𝑏𝑦𝑥̅ = 𝑟𝑠𝑦

𝑠𝑥̅

𝑦 − �̅� = 𝑏𝑦𝑥̅ (𝑥 − �̅�)

𝑦 − �̅� = 𝑟𝑠𝑦

𝑠𝑥̅

(𝑥 − �̅�)

𝑦 𝑜𝑛 𝑥 (y − �̅�

𝑠𝑦

) = 𝑟 ( 𝑥 − �̅�

𝑠𝑥̅

𝑥 − 𝑥1 = 𝑚(𝑦 − 𝑦1)

𝐿𝑒𝑡 (𝑥1, 𝑦1) = (�̅�, �̅�) 𝑎𝑛𝑑 𝑚 = 𝑏𝑥̅𝑦 = 𝑟𝑠𝑥̅

𝑠𝑦

𝑥 − �̅� = 𝑏𝑥̅𝑦(𝑦 − �̅�)

𝑥 − �̅� = 𝑟𝑠𝑥̅

𝑠𝑦

(𝑦 − �̅�)

𝑥 𝑜𝑛 𝑦 ( 𝑥 − �̅�

𝑠𝑥̅

) = 𝑟 (y − �̅�

𝑠𝑦

sahasri singar academy

Documents

inspire academy iice academy €¦ · inspire...

academy evolution: cisco academy orientation -...

vampire academy 01 - vampire academy

mc academy brochure - mc academy - english language academy

upadesa sahasri - biblioteca espiritual, citas espirituales...

mphc cj...abhishek kumar khare vandana mandloi vinod kumar...

upadesa sahasri part 1

05 managing engg. design & development - singar

upadesha sahasri - vedanta students · 2018-11-02 · •...

from thiruvinnagar to naimisaranyam - sadagopan.org...to...

boletÍn informativo singar

scanned by...

tölfræ ilegar uppl singar um erlenda ríkisborgara og...

upadesha sahasri - vedanta students...4 gita : my limbs fail...

north/south/central neighbourhoods with secondary schools...

hrunamannaafréttur landn tingar- og landgræ sluáætlun...

a›rar uppl‡singar - deloitte united states...kaupver›...

academy profession degree in multimedia design ·...

upadesa sahasri

app seq no addess applicant name city - mponline …...