sahasri singar academy
Post on 23-Feb-2022
9 Views
Preview:
TRANSCRIPT
Sahasri Singar Academy
CA | CMA | CS
Statistics – Vol 1
CA | CMA
Foundation
CMA CS Yamuna Sridhar
Price:
For users who are benefited, pay to…
Account holder name: Singar Educational and Charitable Trust
Account number: 1262 1150 0000 9481
IFSC code: KVBL0001262
Bank name: Karur Vysya Bank
CALL OR VISIT FOR COPIES
Published by
SINGAR BOOKS AND PUBLICATIONS
Head Office: 32-B, Vivekananda Nagar, Ramalinga Nagar, Woriur, Trichy 620
003, TN
Branch Office: 76/1, New Street, ValluvarKottam High Road,
Nungambakkam, Chennai – 600 034
Ph: Trichy: 93451 22645 | Chennai: 93453 96855
www.singaracademy.in | singaracademy@gmail.com
Content
1 Statistical Description of Data 1.1
2 Measures of Central Tendency (Averages)
and Dispersion
2.1
3 Probability 3.1
4 Correlation and Regression 4.1
SSA Statistics 1.1
1. STATISTICAL DESCRIPTION OF DATA
Introduction of Statistics
Language Word for statistic
1 Latin Status
2 Italian Statista
3 German Statistik
4 French Statisque
Application of statistics: qualitative information and quantitative information of Economics, Business
Management & Statistics in Commerce and Industry
Limitations of Statistics: (1) Statistics deals with the aggregates, (2) Statistics is concerned with
quantitative data (3) Future projections are possible under a specificset of conditions and (4) Statistical
inferences is built upon random sampling.
It means it is ‘science of counting’ or‘science of averages’.
Statistics
Definition
Plural Sense
Data Collection for statistical analysis
Singular Sense
data
Collecting
Analysing Presenting
Drawing interface
SSA Statistics 1.2
Data
quantitative (variable)
Discrete (measurable)
no. of petals in a flower
no. of misprints in a book
Annual income of a person
Marks of a student
continuous (any value)
height
weight
age
qualitative (attribute)
Gender of the baby
nationality
colour of the flower
drinking habit
Statistics
Definition
Singular Sense -data
collecting
Primary source collected fresh
Secondary source
Already collected
SSA Statistics 1.3
Primary source collected fresh
Interview
Personal
Natural calamity
Indirect
Road accident
telephone
cheap and quickest but inconsistant
mailed questionnaire
widest coverage
maximum non-response
observation
with instuments
Eg. heights of students collected
using scale
questionnaire + enumerators
Secondary source
(Already collected)
International
National
(Government source)
Eg. Religion data from census report
quasi-government
Statistics
Definition
Singular Sense -data
analysingpresenting
Classifed
Chronological
Temporal
Time Series
Geographical
Spatial
Qualitative
Ordinal
Quantitative
Cardinal
Mode
TextualTabular ( best &
accurate method)
Diagrammatic (attractive and trend noticed)
Diagram
Charts
Pictures
drawing inference
SSA Statistics 1.4
COLLECTION OF DATA
Scrutiny of Data: verification of accuracy as well as internal consistency can be verified with a number of
related series.
(a) Textual presentation
(b) Tabular presentation or Tabulation (types (two): simple (uni-variate) and complex (bi-variate))
Table
no. →
Table __
Title
→
Students opting for CA and college
Capti
on →
Stat
us
CA Students College Students Total
Capti
on →
Yea
r
Ma
le
Femal
e
Tot
al
Ma
le
Femal
e
Tot
al
Ma
le
Fem
ale
Tot
al
No
.
No. No. No
.
No. No
.
No
.
No. No
.
201
6
201
7
↑
Stu
b
→ Box
head
→ Body
Sourc
e:
(Footnote)
Abscissa and Ordinate
The horizontal (“x”) value in a pair of
coordinates. How far along the point is.
Always written first in an ordered pair of
coordinates such as (12, 5).
In this example, the value “12” is the
abscissa.
(The second value “5” shows how far up or
down and is called the Ordinate)
SSA Statistics 1.5
(c) Diagrammatic representation of data
Line diagram or Historiagram Bar diagram Pie chart
Line diagram or Historiagram (graph) (relationship between two variables)
1
Basic
Pair of
values(𝑡, 𝑦𝑡)
𝑦𝑡 𝑖𝑠 𝑡ℎ𝑒 𝑡𝑖𝑚𝑒 𝑠𝑒𝑟𝑖𝑒𝑠 𝑖𝑛
𝑡 − 𝑦𝑡 𝑝𝑙𝑎𝑛𝑒
Year Profit
2002 50
2003 80
2004 130
2005 90
2006 150
2
Logarithmic
or ratio chart
(wide range of
time series)
Where:
log 𝑦𝑡 𝑛𝑜𝑡 𝑦𝑡
Year Profit
2002 10 101
2003 100 102
2004 1000 103
2005 10000 104
2006 100000 105
3
Multiple
line chart
(two or more
related time series
when the
variables are
expressed in the
same unit)
Yea
r
Profi
t
Sale
s
200
2
50 250
200
3
80 375
200
4
130 500
200
5
90 425
200
6
150 600
4 Multiple AXIS line chart:
(two or more related time series when the variables are expressed in the DIFFERENT unit)
SSA Statistics 1.6
Bar Diagram: (Bars: rectangle usually with equal width with varying length)
1
Horizontal
(Qualitative data or
data varying
over space)
Age No.
16 50
17 80
18 130
19 90
20 150
2
Vertical
(Quantitative data)
(time series data)
Year Profit
2002 50
2003 80
2004 130
2005 90
2006 150
3
Multiple or
Grouped
(compare two or
more related series)
Year Profit Sales
2002 50 250
2003 80 375
2004 130 500
2005 90 425
2006 150 600
4
Component or sub-divided: data with multiple
components
Ti
me
Sal
es
Pro
fit
Mater
ial
Labo
ur
Expen
ses
20
02
25
0
50 100 50 50
20
03
37
5
80 150 100 45
20
04
50
0
130 200 100 70
20
05
42
5
90 150 150 35
20
06
60
0
150 225 150 75
SSA Statistics 1.7
5
Divided or Percentage: comparing different components
of a variable, the relation of different components to the
whole (pie diagram is a replacement)
Ti
me
Sal
es
Prof
it
Mater
ial
Labo
ur
Ex
p.
200
2
250 50 100 50 50
200
3
375 80 150 100 45
200
4
500 130 200 100 70
200
5
425 90 150 150 35
200
6
600 150 225 150 75
Pie Diagram or Circle Diagram
(comparing different components and their relation to the total)
6
Particulars Rupees Degree
Material 100 100
250× 360
= 40°
Labour 50 50
250× 360
= 20°
Expenses 50 50
250× 360
= 20°
Profit 50 50
250× 360
= 20°
Total 250 360°
SSA Statistics 1.8
Frequency: the number of observation falling within a class
FREQUENCY DISTRIBUTION
Tabular representation of observed statistical data (measurable characteristic), usually in ascending order
(individual value or group value)
𝐅𝐫𝐞𝐪𝐮𝐞𝐧𝐜𝐲 𝐃𝐢𝐬𝐭𝐫𝐢𝐛𝐮𝐭𝐢𝐨𝐧 𝐨𝐟 𝐚 𝐕𝐚𝐫𝐢𝐚𝐛𝐥𝐞 = 𝑁𝑜. 𝑜𝑓 𝑐𝑙𝑎𝑠𝑠 𝐼𝑛𝑡𝑒𝑟𝑣𝑎𝑙 × 𝑐𝑙𝑎𝑠𝑠 𝑙𝑒𝑛𝑔𝑡ℎ ≅ 𝑅𝑎𝑛𝑔𝑒
Class Limit (CL):
1 Minimum value of a class interval Lowerclass limit (LCL) (LCB)
2 Maximum value of a class interval Upper class limit (UCL) (UCB)
Class Boundary (CB): Actual class interval
Data classification Variable
(usually) Example
Mid
value
1
Overlapping classification
(Mutually exclusive)
(Excludes UCL and Includes LCL)
Continuous 10–20, 20–30, 30–40,
……
𝐿𝐶𝐿 + 𝑈𝐶𝐿
2
Or
𝐿𝐶𝐵 + 𝑈𝐶𝐵
2 2
Non-Overlapping classification
(Inclusive) Grouped
0–9, 10–19, 20–
29,……
Frequency Distribution
Quantitative
Discrete variable
Eg. distribution of shares
Classification -Mutually inclusive
Continuous variable (grouped frequency
distribution)
Eg. distribution of profits
Classification -Mutually exclusive
Qualitative (Attribute)
Eg. Nationality & Drinking habit
SSA Statistics 1.9
Non-overlapping
Mutually inclusive
1 Lower class Boundary (LCB) 𝐿𝐶𝐵 = 𝐿𝐶𝐿 −
𝐷
2
2 Upper class Boundary (LCB) 𝐿𝐶𝐵 = 𝑈𝐶𝐿 +
𝐷
2
𝑊𝑖𝑑𝑡ℎ 𝑜𝑟 𝑠𝑖𝑧𝑒 𝑜𝑓 𝑎 𝑐𝑙𝑎𝑠𝑠 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 (𝑐𝑙𝑎𝑠𝑠 𝑙𝑒𝑛𝑔𝑡ℎ 𝑜𝑟 𝑤𝑖𝑑𝑡ℎ 𝑜𝑟 𝑠𝑖𝑧𝑒) = 𝑈𝐶𝐵 − 𝐿𝐶𝐵
Cumulative Frequency: less than cumulative (usually) and more than cumulative (add up to total
frequency)
𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 𝑑𝑒𝑛𝑠𝑖𝑡𝑦 𝑜𝑓 𝑎 𝑐𝑙𝑎𝑠𝑠 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 =𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑒 𝑜𝑓 𝑎 𝑐𝑙𝑎𝑠𝑠
𝑐𝑙𝑎𝑠𝑠 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝐹𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦 (𝑝𝑒𝑟𝑐𝑒𝑛𝑡𝑎𝑔𝑒)𝑜𝑓 𝑎 𝑐𝑙𝑎𝑠𝑠 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙 =𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑒 𝑜𝑓 𝑎 𝑐𝑙𝑎𝑠𝑠
𝑡𝑜𝑡𝑎𝑙 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑒%
Relative frequency lies between 0 and 1.
Graphical Representation of a Frequency Distribution
1. Histogram or Area diagram;
a. mode can be calculated
b. most commonly drawn with class boundary
c. unequal widths of classes can be acceptable
d. frequency density is the vertical bar
e. looks like a vertical bar chart
f. comparison among the class frequency is possible
2. Frequency Polygon;
a. single frequency distribution or
b. midpoint in case of grouped frequency provided it should have common width
c. An approximate idea of the shape of frequency cure
3. Ogives or cumulative Frequency graphs.
a. A graphical representation of a cumulative frequency distribution
b. A line diagram
c. Two types (less than and more than)
d. Used for calculating median and quartiles (< ogive curve is alone sufficient)
Frequency Curve: (limiting form of a histogram and frequency polygon)
Bell-shaped curve (Uni - Modal)
(height, weight, mark, profit etc)
U-shaped curve
(may be Uni-Modal / Bi - Modal)
SSA Statistics 1.10
J-shaped curve (Uni - Modal)
(profit of a company)
Mixed curve (Bi - Modal)
Illustration: Consider
Class
Interval Frequency
< ogive
curve
> ogive
curve
UCL ‹ cf LCL › cf
0 – 20 5 20 5 0 60
20 – 40 10 40 15 20 55
40 – 60 25 60 40 40 45
60 – 80 15 80 55 60 20
80 - 100 5 100 60 80 5
SSA Statistics 1.11
Note:
1. Tally marks determines class frequency
2. Class mark (a representative value of the class interval) is midpoint or mid value
3. Classes with zero frequency is called empty class
4. Cumulative frequency distribution – for finding number of observations less (more) than any
given value
5. Cumulative frequency usually refers to less than type
6. Most extreme values which would ever be included in a class interval
7. When one (or both) end of a class is not specified then it is called as open-end class.
SSA Statistics 1.12
Statistics
Definition
Plural Sense
Data Collection
quantitative (variable)
Discrete (measurable)
no. of petals in a flower
no. of misprints in a book
continuous (any value)
height
weight
qualitative (attribute)
Gender of the baby
nationality
colour of the flower
for statistical analysis
Singular Sense -data
collecting
Primary source collected fresh
Interview
Personal
Natural calamity
Indirect
Road accident
telephone
cheap and quickest but inconsistant
mailed questionnaire
widest coverage
maximum non-response
observation
with instuments
scale
questionnaire + enumerators
Secondary source
Already collected
International
National
quasi-government
analysing presenting
Classifed
Chronological
Temporal
Time Series
Geographical
Spatial
Qualitative
Ordinal
Quantitative
Cardinal
Mode
TextualTabular ( best &
accurate method)
Diagrammatic (attractive and trend noticed)
Diagram
Charts
Pictures
drawing inference
SSA Statistics 2.1
2. MEASURES OF CENTRAL TENDENCY (AVERAGES) & DISPERSION
DEFINITION OF CENTRAL TENDENCY / AVERAGES:
Central tendency (tending to the central value), which helps for finding performance and comparison
X 𝑓
00-19 1 Minimum
20-39 3 Gradually increasing
40-59 7 Maximum
60-79 2 Gradually decreasing
80-99 1 Minimum
X - (Any variable: Height, Weight, Marks, Profits, Wages, and so on)
𝑓 - Frequency, (Usually, repetitiveness, frequent happenings, number of times of occurrence)
List of Formula
Arithmetic Mean (�̅�) Geometric Mean (𝑮𝑴) Harmonic Mean (𝑯𝑴)
Weighted Average
X̅ =∑ 𝑤𝑋
∑ 𝑤
G = (𝑋1𝑤1 × 𝑋2
𝑤2 × …
× 𝑋𝑛𝑤𝑛)
1
∑ 𝑤
Or 𝐺 = 𝐴𝑛𝑡𝑖𝑙𝑜𝑔 (∑ 𝑤 𝑙𝑜𝑔𝑋
∑ 𝑤)
𝐻 =∑ 𝑤
∑𝑤
𝑋
Combined Mean
x̅ =𝑛1x̅1 + 𝑛2x̅2
𝑛1 + 𝑛2
𝐺 = 𝐴𝑛𝑡𝑖𝑙𝑜𝑔 (𝑛1 log 𝐺1 + 𝑛2 log 𝐺2
𝑛1 + 𝑛2
) 𝐻 =𝑛1 + 𝑛2𝑛1
𝐻1+
𝑛2
𝐻2
SSA Statistics 2.2
Measures of Central Tendency (Averages)
Mean Partition Values: (Arrange the items in ascending order)
Mode (𝑴𝒐) Arithmetic
(usual cases)
(Direct Method)
Geometric
(Comparisons
– ratios,
Proportions and %)
Harmonic
(Two units together
E.g. speed =
distance / time
Median (𝑴𝒆) Fractiles (𝑭𝒆)
Individual
X̅
=𝑋1 + 𝑋2 … 𝑋𝑛
𝑛
X̅ =∑ 𝑋𝑖
𝑁𝑖=1
𝑛
x̅ =∑ 𝑋
𝑛
GM = (𝑋1. 𝑋2. … 𝑋𝑛)1
𝑛
𝒐𝒓 𝐺𝑀
= 𝐴𝑛𝑡𝑖𝑙𝑜𝑔 (∑ 𝑙𝑜𝑔 𝑋
𝑛)
𝐻𝑀 =𝑛
∑1
𝑋
If ‘n’ is odd:
𝑀𝑒 = (𝑛 + 1
2)
𝑡ℎ
𝑜𝑏𝑠
(i.e. the middle obs)
If ‘n’ is even:
𝑀𝑒
=(
𝑛
2)
𝑡ℎ
+ (𝑛
2+ 1)
𝑡ℎ
𝑜𝑏𝑠
2
𝐹𝑒 =𝑒(𝑛 + 1)
𝐹
𝑀𝑜 = 𝑚𝑜𝑠𝑡 𝑢𝑠𝑢𝑎𝑙
(𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑡ℎ𝑜𝑑)
Discrete series
X̅ =∑ 𝑓𝑋
∑ 𝑓
=∑ 𝑓𝑋
𝑁
= (𝑋1𝑓1 . 𝑋2
𝑓2 . … 𝑋𝑛𝑓𝑛)
1
𝑁
𝒐𝒓 𝐺
= 𝐴𝑛𝑡𝑖𝑙𝑜𝑔 (∑ 𝑓 𝑙𝑜𝑔 𝑋
𝑁)
𝐻𝑀 =𝑛
∑𝑓
𝑋
𝑀𝑒
= 𝑠𝑖𝑧𝑒 𝑜𝑓 (𝐶𝑓 >𝑁 + 1
2)
𝐹𝑒
= 𝑠𝑖𝑧𝑒 𝑜𝑓 (𝐶𝑓
>𝑒(𝑁 + 1)
𝐹)
Regular frequency
𝑀𝑜 = 𝑙𝑜𝑐𝑎𝑡𝑖𝑜𝑛 𝑚𝑒𝑡ℎ𝑜𝑑
Irregular frequency
𝑀𝑜 = 𝑔𝑟𝑜𝑢𝑝𝑖𝑛𝑔 𝑚𝑒𝑡ℎ𝑜𝑑
Continuous / Grouped Frequency / (Interpolation Method)
X̅ =∑ 𝑓𝑚
∑ 𝑓
=∑ 𝑓𝑚
𝑁
G
= (𝑚1𝑓1 . 𝑚2
𝑓2 . … 𝑚𝑛𝑓𝑛)
1
𝑁
𝑶𝒓 𝐺
= 𝐴𝑛𝑡𝑖𝑙𝑜𝑔 (∑ 𝑓 𝑙𝑜𝑔 𝑚
𝑁)
𝐻𝑀 =𝑛
∑𝑚
𝑋
𝑀𝑒
= 𝑙1 + (
𝑁
2− 𝑁𝑙
𝑁𝑢 − 𝑁𝑙
) × 𝐶
𝑶𝒓 𝑙 +
𝑁
2− 𝑚
𝑓× 𝑐
𝐹𝑒 = 𝑙1 + (𝑒
𝑁
𝐹− 𝑁𝑙
𝑁𝑢 − 𝑁𝑙
) × 𝐶
𝑶𝒓 𝑙 +𝑒
𝑁
𝐹− 𝑚
𝑓× 𝑐
𝑀𝑜 = 𝑙1 + (𝑓0 − 𝑓−1
2𝑓0 − 𝑓−1 − 𝑓1
)
× 𝐶
Note:
1. Indirect / Shortcut / Assumed Mean (A) Method: Deviation Method (𝑑 = 𝑋 − 𝐴): X̅ = 𝐴 +∑ 𝑑
𝑛 & Step-Deviation Method (𝑑 =
𝑋−𝐴
𝐶): x̅ = 𝐴 +
∑ 𝑑
𝑛× 𝐶
2. Empirical relationship (thumb rule): If mode is ill-defined (𝑖𝑛 𝑐𝑎𝑠𝑒 𝑜𝑓 𝑚𝑜𝑑𝑒𝑟𝑎𝑡𝑒𝑙𝑦 𝑠𝑘𝑒𝑤𝑒𝑑 𝑑𝑖𝑠𝑡𝑟𝑖𝑏𝑢𝑡𝑖𝑜𝑛): X̅ − 𝑀𝑜 = 3(X̅ − 𝑀𝑒) 𝑜𝑟 𝑀𝑜 = 3𝑀𝑒 − 2X̅
3. Fractiles: Quartiles (Q), Octiles (O), Deciles (D) and Percentiles (P)
SSA Statistics 2.3
Measures of Dispersion
Absolute Relative
(i) 𝐑𝐚𝐧𝐠𝐞 (𝐑) = 𝐿 − 𝑆 𝐂𝐨𝐞𝐟𝐟𝐢𝐜𝐢𝐞𝐧𝐭 𝐨𝐟 𝐫𝐚𝐧𝐠𝐞(𝐶𝑜 𝑅)
=𝐿 − 𝑆
𝐿 + 𝑆× 100
(ii) 𝐐𝐮𝐚𝐫𝐭𝐢𝐥𝐞 𝐃𝐞𝐯𝐢𝐚𝐭𝐢𝐨𝐧 (𝐐𝐃) =
𝑄3 − 𝑄1
2
(Otherwise Semi inter quartile range)
𝐈𝐧𝐭𝐞𝐫 𝐪𝐮𝐚𝐫𝐭𝐢𝐥𝐞 𝐫𝐚𝐧𝐠𝐞 = 𝑄3 − 𝑄1
Coefficient of Quartile Deviation (Co
QD)
𝐶𝑜 𝑄𝐷 =𝑄3 − 𝑄1
𝑄3 + 𝑄1
× 100
(iii) Mean Deviation (MD) about A, (𝑨 = X̅,
𝑀𝑒 , 𝑀𝑜)
Coefficient of Mean Deviation
(𝐶𝑜 𝑀𝐷𝐴)
𝐈𝐧𝐝𝐢𝐯𝐢𝐝𝐮𝐚𝐥: M𝐷𝐴 =
1
𝑛∑|𝑥 − 𝐴| 𝐶𝑜 𝑀𝐷𝐴 =
𝑀𝐷𝐴
𝐴× 100
𝐃𝐢𝐬𝐜𝐫𝐞𝐭𝐞: M𝐷𝐴 =
1
𝑁∑ 𝑓|𝑥 − 𝐴|
𝐂𝐨𝐧𝐭𝐢𝐧𝐮𝐨𝐮𝐬: 𝑀𝐷𝐴 =
1
𝑁∑ 𝑓|𝑚 − 𝐴|
(iv) Standard Deviation (s) Coefficient of Variation (CV)
𝐈𝐧𝐝𝐢𝐯𝐢𝐝𝐮𝐚𝐥: 𝑠 = √∑(𝑋 − X̅)2
𝑛 𝑜𝑟√
∑ 𝑋2
𝑛− X̅2
𝐶𝑉 =𝑠
X̅× 100
𝐃𝐢𝐬𝐜𝐫𝐞𝐭𝐞: 𝑠 = √∑ 𝑓(𝑋 − X̅)2
𝑁 𝑜𝑟√
∑ 𝑓𝑋2
𝑁− X̅2
𝐂𝐨𝐧𝐭𝐢𝐧𝐮𝐨𝐮𝐬: 𝑠
= √∑ 𝑓(𝑚 − X̅)2
𝑁 𝑜𝑟 √
∑ 𝑓𝑚2
𝑁− X̅2
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒 = 𝑠2
Shortcut:
𝑠 = √∑ 𝑓𝑑2
𝑁− (
∑ 𝑓𝑑
𝑁)
2
𝑊ℎ𝑒𝑟𝑒 𝑑 = 𝑋 − 𝐴 (𝑓𝑜𝑟 𝑖𝑛𝑑𝑖𝑣𝑖𝑑𝑢𝑎𝑙 𝑎𝑛𝑑 𝑑𝑖𝑠𝑐𝑟𝑒𝑡𝑒) & 𝑑 =𝑚 − 𝐴
𝐶 𝑓𝑜𝑟 𝑐𝑜𝑛𝑡𝑖𝑛𝑢𝑜𝑢𝑠
Comparison
Absolute Measure Relative Measure
1 Dependent of unit Independent of unit
2 Not considered for comparison considered for comparison
3 Not much difficult compared to Relative
measure
Difficult to compute and
comprehend.
SSA Statistics 2.4
INDIVIDUAL OBSERVATIONS
Question 1: From the Individual Observations: 3, 6, 48 & 24, find out the following
Measures of Averages Measures of Dispersion
Arithmetic Mean Absolute Measure Relative Measure
Geometric Mean Range Coefficient of Range
Harmonic Mean Quartile Deviation Coefficient of Quartile
Deviation
Median Mean Deviation Coefficient of Mean Deviation
Fractiles
(𝑄1, 𝑄3, 𝑂6, 𝐷7 & 𝑃75)
Standard Deviation /
Variation
Coefficient of Variation
Mode
Answer:
Measures of Averages
Mean Formula Calculation Answer
AM X̅ =
∑ 𝑋
𝑛
3 + 6 + 24 + 48
4
81
4
20.25
GM GM = (𝑋1 × 𝑋2 × …
× 𝑋)1
𝑛
(3 × 6 × 24
× 48)1
4
(34. 44)1
4 12
HM 𝐻𝑀 =𝑛
∑1
𝑋
4
1
3+
1
6+
1
24+
1
48
4 × 48
16 + 8 + 2 + 1
=192
27
7.11
Note:
𝑿 3 6 24 48
𝐥𝐨𝐠 𝑿 0.4771 0.7782 1.3802 1.6812
∑ log 𝑋 4.3167
Formula Calculation Answer
GM 𝐴𝑛𝑡𝑖𝑙𝑜𝑔 (
∑ log 𝑋
𝑛) 𝐴𝑛𝑡𝑖𝑙𝑜𝑔 (
4.3167
4)
11.94
Positional Average
Formula Calculations Answer
𝑀𝑒 𝑠𝑖𝑧𝑒 𝑜𝑓 (𝑛 + 1
2)
𝑡ℎ
𝑜𝑏𝑠
𝑆𝑖𝑧𝑒 𝑜𝑓 2.5𝑡ℎ 𝑜𝑏𝑠
6 + 0.5(24 – 6) 15 2𝑛𝑑 𝑜𝑏𝑠
+ 0.5 (3𝑟𝑑 𝑜𝑏𝑠 – 2𝑛𝑑 𝑜𝑏𝑠)
𝑄1 𝑠𝑖𝑧𝑒 𝑜𝑓 (𝑛 + 1
2)
𝑡ℎ
𝑜𝑏𝑠
𝑆𝑖𝑧𝑒 𝑜𝑓 1.25𝑡ℎ 𝑜𝑏𝑠
3 + 0.25(6 – 3) 3.75 1𝑠𝑡 𝑜𝑏𝑠
+ 0.25 (2𝑛𝑑 𝑜𝑏𝑠 – 1𝑠𝑡 𝑜𝑏𝑠)
𝑄3 𝑠𝑖𝑧𝑒 𝑜𝑓 (3(𝑛 + 1)
4)
𝑡ℎ
𝑜𝑏𝑠
SSA Statistics 2.5
𝑃75 𝑠𝑖𝑧𝑒 𝑜𝑓 (75(𝑛 + 1)
100)
𝑡ℎ
𝑜𝑏𝑠 𝑆𝑖𝑧𝑒 𝑜𝑓 3.75𝑡ℎ 𝑜𝑏𝑠
3𝑟𝑑 𝑜𝑏𝑠
+ 0.75 (4𝑡ℎ 𝑜𝑏𝑠 – 3𝑟𝑑 𝑜𝑏𝑠)
24
+ 0.75 (48 – 24)
𝟒𝟐
𝑂6 𝑠𝑖𝑧𝑒 𝑜𝑓 (6(𝑛 + 1)
8)
𝑡ℎ
𝑜𝑏𝑠
𝑵𝒐𝒕𝒆: 𝑄3 = 𝑂6 = 𝑃75
𝐷7 𝑠𝑖𝑧𝑒 𝑜𝑓 (7(𝑛 + 1)
10)
𝑡ℎ
𝑜𝑏𝑠
𝑆𝑖𝑧𝑒 𝑜𝑓 3.5𝑡ℎ 𝑜𝑏𝑠 24 + 0.5(48
− 24) 𝟑𝟔 3𝑟𝑑 𝑜𝑏𝑠
+ 0.5 (4𝑡ℎ 𝑜𝑏𝑠 – 3𝑟𝑑 𝑜𝑏𝑠)
Mode
Mode is ill-defined (Since all the observation has equal appearance)
Hence, the empirical relation is used to arrive 𝑀𝑜
Formula Calculations Answer
𝑀𝑜 𝑀𝑒𝑎𝑛 − 𝑀𝑜𝑑𝑒 = 3(𝑀𝑒𝑎𝑛
− 𝑀𝑒𝑑𝑖𝑎𝑛)
20.25 − 𝑀𝑜𝑑𝑒
= 3(20.25 – 15) 4.5
Measures of Dispersion (Absolute and Relative)
Formula Calculation Answer
1 Range (R) 𝐿 – 𝑆 48 – 3 45
Co – efficient of Range 𝐿 − 𝑆
𝐿 + 𝑆 =
48−3
48+3 0.8823
2 Quartile Deviation (𝑸𝑫) 𝑄3 − 𝑄1
2
42 − 3.75
2
19.125
Coefficient of Quartile Deviation 𝑄3 − 𝑄1
𝑄3 + 𝑄1
42 − 3.75
42 + 3.75
0.84
3 Mean Deviation (𝑀𝐷X̅) 1
𝑛∑|𝑋 − X̅|
63
4
15.75
Co – efficient of MD 𝑀𝐷X̅
𝑀𝑒𝑎𝑛
15.75
20.25
0.778
4 Standard Deviation (𝒔) √
∑(𝑋 − X̅)2
𝑛 √
1284.75
5
17.921
Or
√
∑𝑋2
𝑛− (
∑𝑋
𝑛)
2
√2925
4− (
81
4)
2
17.921
𝑉𝑎𝑟 (𝑋) 𝑆2 17.9212 321.16
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛, 𝑣𝑎𝑟(𝑥) 𝑠
X̅× 100
17.921
20.25× 100
88.49%
SSA Statistics 2.6
Working note
𝑿 |𝑿 − �̅�| 𝑿 − 𝐗 (𝑿 − �̅�)𝟐 𝑿𝟐
3 17.25 17.25 297.5625 9
6 14.25 14.25 203.0625 36
24 3.75 -3.75 14.0625 576
48 27.25 -27.75 770.0625 2304
Total 63 1284.75 2925
Question 2: Find Median, 𝑸𝟏, 𝑸𝟑,𝑶𝟔, 𝑫𝟕, 𝑷𝟕𝟓 for the observations: 1, 3, 6, 24, 48.
Answer:
Positional Average
Formula Calculations Answer
𝑀𝑒 𝑠𝑖𝑧𝑒 𝑜𝑓 (
𝑛 + 1
2)
𝑡ℎ
𝑜𝑏𝑠 𝑆𝑖𝑧𝑒 𝑜𝑓 3𝑟𝑑 𝑜𝑏𝑠 6 + 0.5(24 – 6) 6
𝑄1 𝑠𝑖𝑧𝑒 𝑜𝑓 (
𝑛 + 1
4)
𝑡ℎ
𝑜𝑏𝑠 𝑆𝑖𝑧𝑒 𝑜𝑓 1.5𝑡ℎ 𝑜𝑏𝑠 1 + 0.5(3 – 1) 2
1𝑠𝑡 𝑜𝑏𝑠
+ 0.5 (2𝑛𝑑 𝑜𝑏𝑠 – 1𝑠𝑡 𝑜𝑏𝑠)
𝑄3 𝑠𝑖𝑧𝑒 𝑜𝑓 (
3(𝑛 + 1)
4)
𝑡ℎ
𝑜𝑏𝑠
𝑆𝑖𝑧𝑒 𝑜𝑓 4.5𝑡ℎ 𝑜𝑏𝑠
4𝑡ℎ 𝑜𝑏𝑠
+ 0.5 (5𝑡ℎ 𝑜𝑏𝑠 – 4𝑡ℎ 𝑜𝑏𝑠)
24
+ 0.5 (48 – 24)
36 𝑃75
𝑠𝑖𝑧𝑒 𝑜𝑓 (75(𝑛 + 1)
100)
𝑡ℎ
𝑜𝑏𝑠
𝑂6 𝑠𝑖𝑧𝑒 𝑜𝑓 (
6(𝑛 + 1)
8)
𝑡ℎ
𝑜𝑏𝑠
𝑵𝒐𝒕𝒆: 𝑄3 = 𝑂6 = 𝑃75
𝐷7 𝑠𝑖𝑧𝑒 𝑜𝑓 (
7(𝑛 + 1)
10)
𝑡ℎ
𝑜𝑏𝑠 𝑆𝑖𝑧𝑒 𝑜𝑓 4.2𝑡ℎ 𝑜𝑏𝑠 24 + 0.2(48
− 24)
𝟐𝟖. 𝟖
4𝑡ℎ 𝑜𝑏𝑠
+ 0.2 (5𝑡ℎ 𝑜𝑏𝑠 – 4𝑡ℎ 𝑜𝑏𝑠)
Question 3: Discrete Frequency Distribution
x 10 11 12 13 14 15 16 17 18 19
f 8 15 20 100 98 95 90 75 50 30
Answer:
Measures of Averages
Formula Calculation Answer
1 Arithmetic Mean(x̅) ∑𝑓𝑋
𝑁
8727
581 15.02
2 Geometric Mean(𝐺𝑀) Antilog (∑ 𝑓 log 𝑋
𝑁) Antilog (
682.4203
581) 14.95
SSA Statistics 2.7
3 Harmonic Mean (𝐻𝑀) 𝑁
∑𝑓
𝑋
581
39.25 14.802
Working Note:
𝑿 𝒇 𝒇𝑿 𝐥𝐨𝐠 𝑿 𝒇 𝐥𝐨𝐠 𝑿 𝒇
𝑿
10 8 80 1.0000 8.0000 0.800
11 15 165 1.0414 15.6210 1.360
12 20 240 1.0792 21.5840 1.670
13 100 1300 1.1139 111.3900 7.690
14 98 1372 1.1461 112.3178 7.000
15 95 1425 1.1761 111.7295 6.330
16 90 1440 1.2041 108.3690 5.625
17 75 1275 1.2304 92.2800 4.411
18 50 900 1.2553 62.7650 2.780
19 30 570 1.2788 38.364 1.578
Total 581 8727 682.4203 39.25
Positional Average
Formula Calculations Answer Working Notes
𝑀𝑒 𝑠𝑖𝑧𝑒 𝑜𝑓 (𝑁 + 1
2)
𝑡ℎ
𝑜𝑏𝑠 𝑆𝑖𝑧𝑒 𝑜𝑓 291𝑠𝑡 𝑜𝑏𝑠
(𝑖. 𝑒. 𝑐𝑓 > 291) 15
𝑿 𝑓 𝑐𝑓
10 8 8
11 15 23
12 20 43
13 100 143
14 98 241
15 95 336
16 90 426
17 75 501
18 50 551
19 30 581
𝑄1 𝑠𝑖𝑧𝑒 𝑜𝑓 (1(𝑁 + 1)
4)
𝑡ℎ
𝑜𝑏𝑠 𝑆𝑖𝑧𝑒 𝑜𝑓 145.5𝑡ℎ 𝑜𝑏𝑠
(𝑖. 𝑒. 𝑐𝑓 > 145.5) 14
𝑄3 𝑠𝑖𝑧𝑒 𝑜𝑓 (3(𝑁 + 1)
4)
𝑡ℎ
𝑜𝑏𝑠
𝑆𝑖𝑧𝑒 𝑜𝑓 436.5𝑡ℎ 𝑜𝑏𝑠
(𝑖. 𝑒. 𝑐𝑓 > 436.5)
17
𝑃75 𝑠𝑖𝑧𝑒 𝑜𝑓 (75(𝑁 + 1)
100)
𝑡ℎ
𝑜𝑏𝑠
𝑂6 𝑠𝑖𝑧𝑒 𝑜𝑓 (6(𝑁 + 1)
8)
𝑡ℎ
𝑜𝑏𝑠
𝑵𝒐𝒕𝒆: 𝑄3 = 𝑂6 = 𝑃75
𝐷7 𝑠𝑖𝑧𝑒 𝑜𝑓 (7(𝑁 + 1)
10)
𝑡ℎ
𝑜𝑏𝑠 𝑆𝑖𝑧𝑒 𝑜𝑓 407.4𝑡ℎ 𝑜𝑏𝑠
𝟏𝟔 (𝑖. 𝑒. 𝑐𝑓 > 407.4)
SSA Statistics 2.8
Mode: Since there is a sudden increase in frequency from 20 to 100, we obtain mode by Grouping Table
Grouping Table The highest frequency total in each of the six
columns of the grouping table is identified and
analyzed (Tally marks)
Total
Tally
Mark
(1) (2) (3) (4) (5) (6)
𝑿 𝒇 (1) (2) (3) (4) (5) (6)
10 8 23
43
0
11 15 35
135
0
12 20 120
218
0
13 100 198
293
| | | 3
14 98 193
283
| | | | 4
15 95 185
260
| | | | 4
16 90 165
215
| | 2
17 75 125
155
| 1
18 50 80
0
19 30 0
Explanation to column
(𝟏) Original Frequency
(𝟐) grouping in “two’s
(𝟑) Leaving the first and grouping
the rest in “two’s”
(𝟒) grouping in “three’s”
(𝟓) Leaving the first and grouping in
“three’s”
(𝟔) Leaving the first & second and
grouping in “three’s”
Mode
Mode is ill-defined or bi-modal
(Since “14” and “15” occur equal number of times)
Hence, the empirical relation is used to arrive 𝑀𝑜
𝑀𝑜 𝑀𝑒𝑎𝑛 − 𝑀𝑜𝑑𝑒 = 3(𝑀𝑒𝑎𝑛 − 𝑀𝑒𝑑𝑖𝑎𝑛)
15.02 − 𝑀𝑜𝑑𝑒 = 3(15.02 – 15)
14.96
Points to Ponder:
Under Location Method, Mode = 13 (as the highest frequency is 100)
Under Grouping Method, Mode is ill- defined.
But, Under Empirical Relationship, Mode = 14.96, which brings the issues an accuracy
Measures of Dispersion
Formula Calculation Answer
1 Range (R) 𝐿 − 𝑆 19 − 10 10
Co – efficient of Range 𝐿 − 𝑆
𝐿 + 𝑆
19 − 10
19 + 10 0.31
2 Quartile Deviation (𝑸𝑫) 𝑄3 − 𝑄1
2
17 − 14
2 1.5
Coefficient of Quartile Deviation 𝑄3 − 𝑄1
𝑄3 + 𝑄1
17 − 14
17 + 14 0.0967
3 Mean Deviation (𝑀𝐷X̅) 1
𝑁∑|𝑋 − X̅|
969.82
58.1 1.669
SSA Statistics 2.9
Co – efficient of MD 𝑀𝐷X̅
𝑀𝑒𝑎𝑛
1.669
15.02 0.111133
4 Standard Deviation (𝒔) √∑ 𝑓(𝑋 − X̅)2
𝑁 √
2204.7628
581 3.80
𝑉𝑎𝑟 (𝑋) 𝑠2 3.802 14.44
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛, 𝑣𝑎𝑟(𝑋) 𝑠
x̅× 100
3.80
15.02× 100 25.29%
Working Note:
for MD For SD
𝑿 𝒇 |𝑿 − �̅�| 𝒇|𝑿 − 𝐗| (𝑿 − �̅�) 𝒇(𝑿 − �̅�)𝟐
10 8 5 .02 40.16 -5 .02 201.6032
11 15 4.02 60.30 -4.02 242.4060
12 20 3.02 68.40 -3.02 182.4080
13 100 2.02 202.00 -2.02 81.6080
14 98 1.02 99.96 -1.02 101.9592
15 95 0.02 1.90 -0.02 0.0380
16 90 0.98 88.20 0.98 86.4360
17 75 1.98 143.50 1.98 294.0300
18 50 2.98 1.49 2.98 444.0200
19 36 3.98 119.40 3.98 570.2544
∑ 581 969.82 2204.7628
Question 4: Continuous Frequency Distribution:
Marks 01-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100
Number of Students 3 7 13 17 12 10 8 8 6 6
Also verify the empirical relation
Answer:
Measures of Averages
Formula Calculation Answer
A.M.
(Direct Method)
∑ 𝑓𝑚
𝑁
4375
90
48.61
A.M.
(Short cut -Method)
𝐴 + ∑ 𝑓𝑑
𝑁× 𝑐 (A=45.5) 45.5 +
28
90× 10
48.61
𝐴 + ∑ 𝑓𝑑
𝑁× 𝑐 (A = 55.5) 55.5 +
−620
90× 10
48.61
Geometric Mean, GM 𝐴𝑛𝑡𝑖𝑙𝑜𝑔 (
∑ 𝑓 log 𝑚
𝑁) 𝐴𝑛𝑡𝑖𝑙𝑜𝑔
150.5439
90
47.07
Harmonic Mean, HM 𝑁
∑𝑓
𝑚
90
2.7905
32.25
SSA Statistics 2.10
Working Note:
Marks
(Class
boundaries
)
𝒎 𝒇 𝒇𝒎 𝒅
=𝒎 − 𝟒𝟓. 𝟓
𝟏𝟎
𝒇𝒅 𝐥𝐨𝐠 𝒎 𝒇 𝐥𝐨𝐠 𝒎 𝒇
𝒎
0.5 – 10.5 5.5 3 16.5 -4 -
1
2
0.740
4
2.2212 0.545
4
10.5 – 20.5 15.
5
7 108.5 -3 -
2
1
1.903 13.3210 0.451
6
20.5 – 30.5 25.
5
1
3
331.5 -2 -
2
6
1.406
5
18.2845 0.509
8
30.5 – 40. 5 35.
5
1
7
603.5 -1 -
1
7
1.550
2
26.3534 0.478
9
40.5 – 50.5 45.
5
1
2
546.0 0 0 1.658
0
19.8960 0.263
7
50.5 – 60.5 55.
5
1
0
555.0 1 1
0
1.744
3
17.4430 0.180
1
60.5 – 70.5 65.
5
8 524.0 2 1
6
1.816
2
14.5296 0.122
1
70.5 – 80.5 75.
5
8 604.0 3 2
4
1.877
9
15.0232 0.106
0
80.5 – 90.5 85.
5
6 513.0 4 2
4
1.932
0
11.5920 0.070
1
90.5 – 100.5 95.
5
6 573.0 5 3
0
1.980
0
11.8800 0.062
8
Total 9
0
4375.
0
2
8
150.543
9
2.790
5
SSA Statistics 2.11
Positional Average and Mode
Formula Calculation Answer
Working Note
𝑀𝑒 𝑙 +
𝑁
2− 𝑚
𝑓× 𝑐 40.5 +
45 − 40
12× 10 44.67
𝑿 𝒇 𝒄𝒇
0.5–
10.5
3 3
10.5–
20.5
7 10
20.5–
30.5
13 23
30.5–
40. 5
17 40
40.5–
50.5
12 52
50.5–
60.5
10 62
60.5–
70.5
8 70
70.5–
80.5
8 78
80.5–
90.5
6 84
90.5–
100.5
6 90
𝑄1 𝑙 +
1𝑁
4− 𝑚
𝑓× 𝑐 20.5 +
22.5 − 10
13× 10 30.12
𝑄3 𝑙 +
3𝑁
4− 𝑚
𝑓× 𝑐
60.5 +67.5 − 62
8× 10 67.38 𝑂6 𝑙 +
6𝑁
8− 𝑚
𝑓× 𝑐
𝑃75 𝑙 +
75𝑁
100− 𝑚
𝑓× 𝑐
𝑂3 = 𝑂6 = 𝑃75 =
67.38
𝐷7 𝑙 +
7𝑁
10− 𝑚
𝑓× 𝑐 60.5 +
63 − 62
8 × 10 61.75
𝑀𝑜
𝑙1
+ (𝑓0 − 𝑓−1
2𝑓0 − 𝑓−1 − 𝑓1
)
× 𝐶
30.5
+ (17 − 13
2 × 17 − 13 − 12)
× 10
34.94
𝑴𝒐 𝒄𝒍𝒂𝒔𝒔 𝒊𝒔 (𝟑𝟎. 𝟓 − 𝟒𝟎. 𝟓), since 17 is the highest
frequency
Graphical Method: Ogive Curves for Positional Average:
Marks Number
of Students
Less than ogive curve More than ogive curve
UCL < cf LCL >cf
0.5 – 10.5 3 10.5 3 0.5 90 (= ∑𝑓)
10.5 – 20.5 7 20.5 10 10.5 87
20.5 – 30.5 13 30.5 23 20.5 80
30.5 – 40. 5 17 40. 5 40 30.5 67
40.5 – 50.5 12 50.5 52 40.5 50
50.5 – 60.5 10 60.5 62 50.5 38
60.5 – 70.5 8 70.5 70 60.5 28
70.5 – 80.5 8 80.5 78 70.5 20
80.5 – 90.5 6 90.5 84 80.5 12
90.5 – 100.5 6 100.5 90 (= ∑𝑓) 90.5 6
SSA Statistics 2.12
Verification of Empirical relation:
Mean – Mode = 3 (Mean - Median)
(i.e.,) 48.61 – 34.94 = 3 (48.61 – 44.67)
13.67 = 3 ( 4.006)
13.67 = 12.18, which is not true
Graphical Method
𝑀𝑜 = 35 (𝐺𝑟𝑎𝑝ℎ𝑖𝑐𝑎𝑙 𝑀𝑒𝑡ℎ𝑜𝑑)
𝑄1 = 30, 𝑄3 = 45 & 𝑄3 = 67
Measures of Dispersion
Formula Calculation Answer
1 Range (R) 𝐿 – 𝑆 100 − 1 99
Other-way 100.5 − 0.5 100
Co – efficient of Range 𝐿 − 𝑆
𝐿 + 𝑆
100 − 1
100 + 1 0.98
2 Quartile Deviation (𝑸𝑫) 𝑄3 − 𝑄1
2
67.38 − 30.12
2 18.63
Coefficient of Quartile Deviation 𝑄3 − 𝑄1
𝑄3 + 𝑄1
67.38 − 30.12
67.38 + 30.12 0.38
3 Mean Deviation (𝑀𝐷X̅)
1
𝑁∑|𝑚
− X̅|
1843.54
90 20.48
Co – efficient of MD 𝑀𝐷X̅
𝑀𝑒𝑎𝑛
20.48
48.61 0.42
4 Standard Deviation (𝒔) √∑ 𝑓(𝑚 − X̅)2
𝑁 √
53128.89
90 24.29
𝑉𝑎𝑟 (𝑋) 𝑆2 (24.29)2 590.49
𝐶𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑣𝑎𝑟𝑖𝑎𝑡𝑖𝑜𝑛, 𝑣𝑎𝑟(𝑋) 𝑠
X̅× 100
24.29
48.61× 100 25.29%
SSA Statistics 2.13
Working Notes
Marks
(Class boundaries) 𝒎 𝒇
|𝒎
− 𝐗| f|𝒎 − �̅�| (𝑿 − �̅�)𝟐 f (𝑿 − 𝐗)𝟐
0.5 – 10.5 5.5 3 43.11 129.33 1858.4701 5575.4163
10.5 – 20.5 15.5 7 33.11 231.77 1096.2721 7673.9047
20.5 – 30.5 25.5 13 23.11 300.43 534.0721 6942.9373
30.5 – 40. 5 35.5 17 13.11 222.87 171.8721 2921.8252
40.5 – 50.5 45.5 12 3.11 37.32 9.6721 116.0652
50.5 – 60.5 55.5 10 6.89 68.90 47.4733 474.7210
60.5 – 70.5 65.5 8 16.89 135.12 285.2721 2282.1768
70.5 – 80.5 75.5 8 26.89 215.12 723.0721 5784.5768
80.5 – 90.5 85.5 6 36.89 221.34 1360.8721 8165.2326
90.5 – 100.5 95.5 6 46.89 281.34 2198.6721 13192.0326
Total 90 1843.54 53128.889
PROPERTIES: (A) MEASURES OF AVERAGES / CENTRAL TENDENCY
Arithmetic Mean
Property 1: If all the observations assumed by a variable are constants, say k, then the AM is also k.
Illustration: Consider 2, 2, 2
Property Calculation Answer
X̅ =𝑘 + 𝑘 + ⋯ + 𝑘
𝑛= 𝑘
2 + 2 + 2
3 2
Property 2: (a) The algebraic sum of deviations of a set of observations from their AM is zero. And
(b) the sum of the square of the deviation taken from the Mean (X̅) is always minimum compared to
the deviations taken from any other Assumed Mean (𝐴)
Illustration: Consider (X): 2, 3, 4
Property Formula Calculation Ans
X̅ = ∑ 𝑋 2 + 3 + 4
3 3
a
∑(𝑋 − X̅) = 0
∑ 𝑓(𝑋 − X̅) = 0
∑(𝑋 − X̅) (2 − 3) + (3 − 3 + (4 − 3) 0
b ∑(𝑋 − X̅)2 ≤ ∑(𝑋 − 𝐴)2
∑(𝑋 − X̅)2
(2 − 3)2 + (3 − 3)2
+ (4
− 3)2
2
∑(𝑋 − 𝐴)2
𝑊ℎ𝑒𝑟𝑒 𝐴 = 4
(2 − 4)2 + (3 − 4)2
+ (4
− 4)2
5
SSA Statistics 2.14
Property 3: AM is affected due to a change of origin (+/−) and / or scale (×/÷)
i.e., If 𝑦 = 𝑎 + 𝑏𝑥, then the AM of y is given by y̅ = 𝑎 + 𝑏�̅� (where a is change of origin and b is change
of scale)
Illustration: Consider (𝑋) = 2, 3, 4,
Formula Calculation Answer �̅� =∑ 𝒀
𝒏= 𝒂 + 𝒃𝒙
1 𝑿 = 𝟐, 𝟑, 𝟒, �̅� =∑ 𝑿
𝒏
𝟐 + 𝟑 + 𝟒
𝟑 3
2 𝑌 = 4, 5, 6, �̅� =∑ 𝑌
𝑛
4 + 5 + 6
3 5
Change of Origin (𝑎 =
2) 𝐵𝑒𝑖𝑛𝑔 𝑌 = 𝑋 + 2
y̅
= 𝑎
+ 𝑏�̅�
2 + 1 ×3 5
3 𝑌 = 0, 1, 2, �̅� =∑ 𝑌
𝑛
0 + 1 + 2
3 1
Change of Origin (𝑎 =
−2) 𝐵𝑒𝑖𝑛𝑔 𝑌 = 𝑋 − 2
y̅
= 𝑎
+ 𝑏�̅�
−2 + 1 ×3 1
4 𝑌 = 4, 6, 8, �̅� =∑ 𝑌
𝑛
4 + 6 + 8
3 6
Change of Scale (𝑏 = 2)
𝐵𝑒𝑖𝑛𝑔 𝑌 = 𝑋 × 2
y̅
= 𝑎
+ 𝑏�̅�
0 + 2 ×3 6
5 𝑌 = 1, 1.5, 2, �̅� =∑ 𝑌
𝑛
1 + 1.5 + 2
3 1.5
Change of Scale (𝑏 =1
2)
𝐵𝑒𝑖𝑛𝑔 𝑌 = 𝑋 ×1
2
y̅
= 𝑎
+ 𝑏�̅�
0 +1
2×3 1.5
6 𝑌 = 7, 9, 11, �̅� =∑ 𝑌
𝑛
7 + 9 + 11
3 9 Change of Origin and
change of scale
(𝑎 = 3)&(𝑏 = 2) 𝐵𝑒𝑖𝑛𝑔 𝑌
= 3 + 2 × 𝑋
y̅
= 𝑎
+ 𝑏�̅�
3 + 2 ×3 9
Property 4: If there are two groups containing 𝑛1 and 𝑛2 observations and �̅�1 and �̅�2 as the respective
arithmetic means, then the combined AM is given by (�̅�12) =𝑛1�̅�̅1+𝑛2�̅�̅2
𝑛1+𝑛2
Illustration Combined mean Calculation Answer
Group 1 Group II
𝑛1 = 5 𝑛2 = 15
�̅�1 = 9 �̅�2 = 5
�̅�12
=𝑛1�̅�1 + 𝑛2�̅�2
𝑛1 + 𝑛2
(5 × 9) + (15 × 5)
5 + 15 6
SSA Statistics 2.15
Points to Ponder:
1 In the case of “n” number of groups, Combined mean (�̅�1…𝑛) =∑𝑛𝑖�̅�̅𝑖
∑𝑛𝑖
2 If sizes of the group are same, then the combined Mean is the average of the group means
Explanation: If 𝑛1= 𝑛2 =n, then �̅�1+2 =𝑛�̅�̅1+𝑛�̅�̅2
𝑛+𝑛=
𝑛(�̅�̅1+�̅�̅2)
2𝑛=
�̅�̅1+ �̅�̅2
2
Illustration
Formula Calculation Answer
1 𝑋1
= 2, 3, 4, �̅�1 =
∑ 𝑋1
𝑛
2 + 3 + 4
3 3
2 𝑋2
= 4, 5, 6, �̅�2 =
∑ 𝑋2
𝑛
4 + 5 + 6
3 5
3 �̅�1+2
=𝑛�̅�1 + 𝑛�̅�2
𝑛 + 𝑛
3 × 3 + 3 × 5
3 + 3 4
�̅�1+2
=𝑛(�̅�1 + �̅�2)
2𝑛
3(3 + 5)
2 × 3 4
�̅�1+2 =�̅�1 + �̅�2
2
3 + 5
2 4
3 If the averages are same, then the combined mean is the average itself
Explanation: If �̅�1 = �̅�2 = �̅�12
�̅�12 =𝑛1�̅� + 𝑛2�̅�
𝑛1 + 𝑛2
=�̅�(𝑛1 + 𝑛2)
𝑛1 + 𝑛2
Illustration
Formula Calculation Answer
1 𝑋1
= 2, 3, 4, �̅�1 =
∑ 𝑋1
𝑛
2 + 3 + 4
3 3
2 𝑋2
= 4, 2, �̅�2 =
∑ 𝑋2
𝑛
4 + 2
2 3
3 �̅�1+2
=𝑛�̅�1 + 𝑛�̅�2
𝑛 + 𝑛
3 × 3 + 2 × 3
3 + 2 3
�̅�(𝑛1 + 𝑛2)
𝑛1 + 𝑛2
3(3 + 2)
2 + 3 3
�̅�1+2 = �̅�1
= �̅�2 3
Geometric Mean
Property 1: Transformation in terms of log function
𝐺𝑀 = 𝐴𝑛𝑡𝑖𝑙𝑜𝑔 (1
𝑛∑ 𝑙𝑜𝑔 𝑥) 𝑂𝑟 𝑙𝑜𝑔 𝐺𝑀 =
1
𝑛∑ 𝑙𝑜𝑔 𝑥
Property 2: If all the observations assumed by a variable are constants, say 𝑘 > 0, then the GM of the
observations is also K.
SSA Statistics 2.16
Property Illustration Calculation Answer
(𝑘 × 𝑘 × … .× 𝑘)1 𝑛⁄
= 𝑘
Consider: 2, 2, 2 = (2 × 2 × 2)1 3⁄ 2
Property 3: GM of the product of two variables is the product of their GM‘s.
Property 4: GM of the ratio of two variables is the ratio of the GM’s of the two variables.
Illustration Formula Calculation Answer
𝑋 = 3, 6, 12 GM = (𝑋1 × 𝑋2 × …
× 𝑋)1
𝑛
(3 × 6
× 12)1 3⁄ 6
𝑌 = 1, 2, 4 (1 × 2
× 4)1 3⁄ 2
𝑍 = 3, 12, 48 (3 × 12
× 48)1 3⁄ 12
Property 3 Being 𝑍= 𝑋 × 𝑌
GM𝑍 = GM𝑋 × GM𝑌 6 × 2 12
𝑍 = 3
1,6
2,12
4
(3 × 3
× 3)1 3⁄ 3
Property 4 Being 𝑍 =𝑋
𝑌 𝐺𝑀𝑧 =
GM𝑋
GM𝑌
6
2 3
Harmonic Mean:
Property 1: If all the observations taken by a variable are constants, say k, then the HM of the
observations is also k.
Property Illustration Calculation Answer
𝑛1
𝑘+
1
𝑘+ … . +
1
𝑘
= 𝑘
𝑋 = 2, 2, 2 31
2+
1
2+
1
2
2
Property 2: If there are two groups containing 𝒏𝟏 and 𝒏𝟐 observations and 𝑿𝟏 and 𝑿𝟐 as the respective
Harmonic Means, then the combined HM is given by (�̅�𝟏𝟐) = 𝒏𝟏+𝒏𝟐𝒏𝟏�̅�𝟏
+ 𝒏𝟐�̅�𝟐
Illustration Combined H.M. Calculation Answer
Group 1 Group II
𝑛1 = 15 𝑛2 = 10
�̅�1 = 3 �̅�2 = 2
�̅�𝟏𝟐 =𝑛1 + 𝑛2𝑛1
�̅�̅1+
𝑛2
�̅�̅2
15 + 1015
3+
10
2
3.125
Median:
Property 1: If x and y are two variables, to be related by 𝑌 = 𝑎 + 𝑏𝑋 for any two constants a and b, then
the median of y is given by 𝑌𝑀𝑒= 𝑎 + 𝑏𝑋𝑀𝑒
(i.e., Median is affected due to a change of origin (+/−) and / or scale (×/÷))
Illustration: Consider (𝑋) = 2, 3, 4,
Formula Calculation Answer 𝒀𝑴𝒆= 𝒂 + 𝒃𝑿𝑴𝒆
SSA Statistics 2.17
1 𝑋 = 2, 3, 4,
�̅�𝑀𝑒
= (𝑛 + 1
2)
𝑡ℎ
𝑜𝑏𝑠 (
3 + 1
2)
𝑡ℎ
𝑜𝑏𝑠 3
2 𝑌 = 4, 5, 6,
�̅�𝑀𝑒
= (𝑛 + 1
2)
𝑡ℎ
𝑜𝑏𝑠 (
3 + 1
2)
𝑡ℎ
𝑜𝑏𝑠 5 Change of Origin
(𝑎 = 2)
𝐵𝑒𝑖𝑛𝑔 𝑌
= 𝑋 + 2 𝑌𝑀𝑒
= 𝑎 + 𝑏𝑋𝑀𝑒 2 + 1 ×3 5
3 𝑌 = 0, 1, 2,
�̅�𝑀𝑒
= (𝑛 + 1
2)
𝑡ℎ
𝑜𝑏𝑠 (
3 + 1
2)
𝑡ℎ
𝑜𝑏𝑠 1 Change of Origin
(𝑎 = −2)
𝐵𝑒𝑖𝑛𝑔 𝑌
= 𝑋 − 2 𝑌𝑀𝑒
= 𝑎 + 𝑏𝑋𝑀𝑒 −2 + 1 ×3 1
4 𝑌 = 4, 6, 8,
�̅�𝑀𝑒
= (𝑛 + 1
2)
𝑡ℎ
𝑜𝑏𝑠 (
3 + 1
2)
𝑡ℎ
𝑜𝑏𝑠 6 Change of Scale (𝑏 =
2)
𝐵𝑒𝑖𝑛𝑔 𝑌
= 𝑋 × 2 𝑌𝑀𝑒
= 𝑎 + 𝑏𝑋𝑀𝑒 0 + 2 ×3 6
5 𝑌 = 1, 1.5, 2,
�̅�𝑀𝑒
= (𝑛 + 1
2)
𝑡ℎ
𝑜𝑏𝑠 (
3 + 1
2)
𝑡ℎ
𝑜𝑏𝑠 1.5 Change of Scale
(𝑏 =1
2)
𝐵𝑒𝑖𝑛𝑔 𝑌
= 𝑋 ×1
2
𝑌𝑀𝑒= 𝑎 + 𝑏𝑋𝑀𝑒
0 +1
2×3 1.5
6 𝑌 = 7, 9, 11,
�̅�𝑀𝑒
= (𝑛 + 1
2)
𝑡ℎ
𝑜𝑏𝑠 (
3 + 1
2)
𝑡ℎ
𝑜𝑏𝑠 9 Change of Origin
and
change of scale
(𝑎 = 3)&(𝑏 = 2) 𝐵𝑒𝑖𝑛𝑔 𝑌
= 3 + 2 × 𝑋 𝑌𝑀𝑒
= 𝑎 + 𝑏𝑋𝑀𝑒 3 + 2 ×3 9
Property 2: For a set of observations, the sum of absolute deviations is minimum when the deviations
are taken from the median.
Illustration: Consider (X): 0.5, 3, 4
Calculation Answer Property
𝑀𝑒
= (𝑛 + 1
2)
𝑡ℎ
𝑜𝑏𝑠 (
3 + 1
2)
𝑡ℎ
𝑜𝑏𝑠 3
�̅� =∑ 𝑋
𝑛
0.5 + 3 + 4
3 2.5
(a) ∑ |𝑋 − X̅| |0.5 − 2.5| + |3 − 2.5|
+ |4 − 2.5| 4
(𝑏)< (𝑎)
(b) ∑ |𝑋 − 𝑀𝑒| |0.5 − 3| + |3 − 3| + |4 − 3| 3.5
SSA Statistics 2.18
Mode:
Property 1: If 𝑌 = 𝑎 + 𝑏𝑋, then 𝑌𝑀𝑜= 𝑎 + 𝑏𝑋𝑀𝑜
(i.e., Mode is affected due to a change of origin (+/−) and / or scale (×/÷))
Illustration: Consider (𝑋) = 2, 3, 3, 4
Formula Calculation Answer 𝒀𝑴𝒐= 𝒂 + 𝒃𝑿𝑴𝒐
1 𝑋 = 2, 3, 3, 4, Most usual 3
2 𝑌 = 4, 5, 5, 6, 5 Change of Origin (𝑎 =
2) 𝐵𝑒𝑖𝑛𝑔 𝑌
= 𝑋 + 2
𝑌𝑀𝑜
= 𝑎 + 𝑏𝑋𝑀𝑜
2 + 1 ×3 5
3 𝑌 = 0, 1, 1, 2, Most usual 1 Change of Origin (𝑎 =
−2) 𝐵𝑒𝑖𝑛𝑔 𝑌
= 𝑋 − 2
𝑌𝑀𝑜
= 𝑎 + 𝑏𝑋𝑀𝑜
−2 + 1 ×3 1
4 𝑌 = 4, 6, 6, 8, Most usual 6 Change of Scale (𝑏 =
2) 𝐵𝑒𝑖𝑛𝑔 𝑌
= 𝑋 × 2
𝑌𝑀𝑒
= 𝑎 + 𝑏𝑋𝑀𝑒
0 + 2 ×3 6
5 𝑌
= 1, 1.5, 1.5, 2, Most usual 1.5
Change of Scale (𝑏 =
1
2)
𝐵𝑒𝑖𝑛𝑔 𝑌
= 𝑋 ×1
2
𝑌𝑀𝑜
= 𝑎 + 𝑏𝑋𝑀𝑜
0 +1
2×3 1.5
6 𝑌 = 7, 9, 9, 11, Most usual 9 Change of Origin and
change of scale
(𝑎 = 3)&(𝑏 = 2)
𝐵𝑒𝑖𝑛𝑔 𝑌
= 3 + 2 × 𝑋
𝑌𝑀𝑜
= 𝑎 + 𝑏𝑋𝑀𝑜
3 + 2 ×3 9
(B) MEASURES OF DISPERSION: PROPERTY
Property Measure / Explanation
1 All the observations assumed by a variable are
constant,
then measure of dispersion = 0
Range (R) = 0
Mean Deviation (MD) = 0
Standard Deviation (s) = 0
Illustration: Consider (𝑿): 2, 2, 2
Formula Calculation Answer
�̅� =∑ 𝑿
𝒏
2 + 2 + 2
3 2
Range = L – S 2 − 2
0
𝑀𝐷X̅
=1
𝑛∑|𝑋 − X̅|
|2 − 2| + |2 − 2| + |2 − 2|
3
𝑆𝐷
= √∑(𝑋 − X̅)2
𝑛
√∑(𝑋 − 2)2
3
SSA Statistics 2.19
2 Affected due to change of Scale, but not of origin 𝑅𝑦 = 0 + |𝑏| × 𝑅𝑥
𝑀𝐷y̅ = 0 + |𝑏| × MDx̅
𝑠𝑦 = 0 + |𝑏| × 𝑠𝑥̅
3 Mean deviation takes its minimum value
when A = Median
𝑀𝐷𝑀𝑒=
1
𝑛∑|𝑋 − 𝑀𝑒| is minimum
4 Combined SD 𝑠12
= √𝑛1𝑆1
2 + 𝑛2𝑆22 + 𝑛1𝑑1
2 + 𝑛2𝑑22
𝑛1 + 𝑛2
where 𝑑1 = �̅�1 − �̅�12 and 𝑑2 =
�̅�2 − �̅�12
Note: If �̅�1 = �̅�2 , then �̅�1 = �̅�2 =
�̅�12
Then 𝑑1 = 0 & 𝑑2 = 0
∴ 𝑠12 = √𝑛1𝑆1
2 + 𝑛2𝑆22
𝑛1 + 𝑛2
Illustration Calculation
Answe
r
Grou
p I
Grou
p II
𝑛1 = 5 𝑛2
= 15
�̅�1
= 9
�̅�2
= 5
𝑠1
= 0.8
𝑠2
= 0.5
�̅�12 = 6
𝑠12
= √5 × (0.8)2 + (15 × (0.5)2) + (5 × 32) + (15 × (−1)2)
5 + 15
𝑑1 = �̅�1 - �̅�12 = 9 – 6 = 3
𝑑2 = �̅�2 − �̅�12 = 5 − 6 = −1
1.83
Problem for SD under Change of scale and origin
Formula Calculation Answer �̅� =∑ 𝒀
𝒏= 𝒂 + 𝒃𝒙
1 𝑿 = 𝟐, 𝟑, 𝟒, �̅� =∑ 𝑿
𝒏
𝟐 + 𝟑 + 𝟒
𝟑 3
𝑅𝑋 = 𝐿 − 𝑆 4 − 2 2
𝑀𝐷x̅
=∑|𝑋 − X̅|
𝑛
∑|𝑋 − 3|
3
2
3
𝑠𝑋
= √∑(𝑋 − X̅ )2
𝑛
√∑(𝑋 − 3)2
3 0.82
2 𝑌 = 4, 5, 6, �̅� =∑ 𝑌
𝑛
4 + 5 + 6
3 5
SSA Statistics 2.20
𝐵𝑒𝑖𝑛𝑔 𝑌
= 𝑋 + 2 y̅ = 𝑎 + 𝑏�̅� 2 + 1 ×3 5
Change of Origin
(𝑎 = 2)
𝑅𝑌 = 𝐿 − 𝑆 6−4 2
𝑀𝐷Y̅
=∑|𝑌 − Y̅|
𝑛
∑|𝑌 − 5|
3
2
3
𝑠𝑌
= √∑(𝑌 − Y̅ )2
𝑛
√∑(𝑌 − 3)2
3 0.82
3 𝑌 = 0, 1, 2, �̅� =∑ 𝑌
𝑛
0 + 1 + 2
3 1
Change of Origin
(𝑎 = −2)
𝐵𝑒𝑖𝑛𝑔 𝑌
= 𝑋 − 2 y̅ = 𝑎 + 𝑏�̅� −2 + 1 ×3 1
𝑅𝑌 = 𝐿 − 𝑆 2 − 0 2
𝑀𝐷Y̅
=∑|𝑌 − Y̅|
𝑛
∑|𝑌 − 1|
3
2
3
𝑠𝑌
= √∑(𝑌 − Y̅ )2
𝑛
√∑(𝑌 − 1)2
3 0.82
4 𝑌 = 4, 6, 8, �̅� =∑ 𝑌
𝑛
4 + 6 + 8
3 6
Change of Scale (𝑏 =
2)
𝐵𝑒𝑖𝑛𝑔 𝑌
= 𝑋 × 2 y̅ = 𝑎 + 𝑏�̅� 0 + 2 ×3 6
𝑅𝑌 = 𝐿 − 𝑆 8 − 4 4
𝑀𝐷Y̅
=∑|𝑌 − Y̅|
𝑛
∑|𝑌 − 6|
3
4
3
𝑠𝑌
= √∑(𝑌 − Y̅ )2
𝑛
√∑(𝑌 − 6)2
3 1.64
5 𝑌 = 1, 1.5, 2, �̅� =∑ 𝑌
𝑛
1 + 1.5 + 2
3 1.5
Change of Scale (𝑏 =
1
2)
𝐵𝑒𝑖𝑛𝑔 𝑌
= 𝑋 ×1
2
y̅ = 𝑎 + 𝑏�̅� 0 +1
2×3 1.5
𝑅𝑌 = 𝐿 − 𝑆 2 − 1 1
𝑀𝐷Y̅
=∑|𝑌 − Y̅|
𝑛
∑|𝑌 − 1.5|
3
1
3
𝑠𝑌
= √∑(𝑌 − Y̅ )2
𝑛
√∑(𝑌 − 1.5)2
3 0.41
6 𝑌 = 7, 9, 11, �̅� =∑ 𝑌
𝑛
7 + 9 + 11
3 9
SSA Statistics 2.21
𝐵𝑒𝑖𝑛𝑔 𝑌
= 3 + 2 × 𝑋 y̅ = 𝑎 + 𝑏�̅� 3 + 2 ×3 9
Change of Origin
and
change of scale
(𝑎 = 3)&(𝑏 = 2)
𝑅𝑋 = 𝐿 − 𝑆 11 − 7 4
𝑀𝐷x̅
=∑|𝑋 − X̅|
𝑛
∑|𝑋 − 9|
3
4
3
𝑠𝑋
= √∑(𝑋 − X̅ )2
𝑛
√∑(𝑋 − 9)2
3 0.41
Coefficient of Variation (CV): 𝐶𝑉 =𝑠
�̅�̅× 100
Illustration Calculation Comparison
Group
1
Group
II
�̅�1 = 9 �̅�2 = 5
𝑠1 = 0.8 𝑠2 = 0.5
�̅�12 = 6
𝐶𝑉(𝐼) =0.8
9 × 100
= 8.88%
𝐶𝑉(𝐼𝐼) =0.5
5× 100
= 10%
𝐶𝑉(𝐼)
= 8.88%
< 𝐶𝑉(𝐼𝐼) = 10%
More Stable
More
Consistent
Less Variable
Less
Dispersed
Less Stable
Less
Consistent
More Variable
More
Dispersed
EXTRA PROBLEMS
Comparison between Arithmetic Mean and Geometric Mean
Question 1: Find the average rate of return.
Year 1 2 3
Rate of Return (r %) 10% 60% 20%
Answer: The average rate of return
Formula Calculation Answer
GM G
= (𝑋1 × 𝑋2 × …
× 𝑋𝑛)1
𝑛
(1.10 × 1.60
× 1.20)1 3⁄
1.283 𝑜𝑟 128.3% 𝑜𝑟 28.3%
AM X̅ =
∑ 𝑋
𝑛
1.10 + 1.60 + 1.20
3
1.3 𝑜𝑟 130% 𝑜𝑟 30%
which is not possible
Comparison between Arithmetic Mean and Harmonic Mean
Question 2: An aeroplane covered a distance of 800 miles with four different speeds of 100, 200, 300
and 400 m/p.h for the first, second, third and fourth quarter of the distance. Find the average speed in
miles per hour.
Answer: The average speed is given by the H.M. of the given set of data.
SSA Statistics 2.22
Formula Calculation Answer
H M 𝐻𝑀
=𝑛
∑1
𝑋
41
100+
1
200+
1
300+
1
400
192 m/p.h
AM X̅ =
∑ 𝑋
𝑛
100 + 200 + 300 + 400
4
250 m/p.h,
which is not true
Combined Mean
Question 3: Two groups of students reported mean weights of 160 kg and 150 kg respectively. Find
out, when the weight of both the groups together be 155 kg?
Answer:
Given Data Formula Calculation Answer
Group
I
Group
II
Number 𝑁1 𝑁2
Mean
(kg.)
X̅1 =
160
X̅2 =
150
Combined Mean: X̅12 = 155kg
X̅12
=𝑁1X̅1 + 𝑁2�̅�2
𝑁1 + 𝑁2
155 =160𝑁1 + 150𝑁2
𝑁1 + 𝑁2
155𝑁1 + 155𝑁2
= 160𝑁1 + 150𝑁2
𝑁1 = 𝑁2
Question 4: Show that for any two numbers a and b, standard deviation is given by |𝑎−𝑏|
2
Answer: For two numbers a and b, AM is given by X̅ =𝑎+𝑏
2
The variance is =∑(𝑋𝑖 − X̅)2
2
=(𝑎 −
𝑎+𝑏
2)
2
+ (𝑏 − 𝑎+𝑏
2)
2
2=
(𝑎−𝑏)
4
2
+ (𝑎−𝑏)2
4
2=
(𝑎 − 𝑏)2
4 ⟹ 𝑠 =
|𝑎 − 𝑏|
4
(The absolute sign is taken, as SD cannot be negative).
Question 5: Prove that for the first n natural numbers, 𝑖𝑠 √𝑛2− 1
12 .
Answer: for the first n natural numbers AM is given by
X̅ =1 + 2 + … … … + 𝑛
𝑛=
𝑛(𝑛 + 1)
2𝑛=
𝑛 + 1
2
∴ 𝑆𝐷 = √∑ 𝑋𝑖
2
𝑛− X̅2 = √
12 + 22 + 32 … … . . +𝑛2
𝑛− (
𝑛 + 1
2)
2
√𝑛(𝑛 + 1)(2𝑛 + 1)
6𝑛− (
𝑛 + 1
2)
2
= √(𝑛 + 1)(2𝑛 + 1)
6− (
𝑛 + 1
2)
2
√(𝑛 + 1)(2𝑛 + 1)
6−
𝑛 + 1
2×
𝑛 + 1
2= √(𝑛 + 1) (
(2𝑛 + 1)
6−
𝑛 + 1
4)
√(𝑛 + 1)(4𝑛 + 2 − 3𝑛 − 3)
12= √
𝑛2 − 1
12
Thus, SD of first n natural numbers is SD = √𝑛2 − 1
12
SSA Statistics 2.23
COMPARISON BETWEEN MEASURES OF CENTRAL TENDENCY N
o
Mea
sure
s
Ari
thm
etic
Mea
n
Geo
met
ric
Mea
n
Har
mo
nic
Mea
n
Med
ian
Mo
de
Ran
ge
Qu
arti
le
Dev
iati
on
Mea
n
Dev
iati
on
Sta
nd
ard
Dev
iati
on
1 Well defined Yes Yes Yes Yes
No (when the
number of
observations is
small, then use
Empirical
Relationship)
Yes Yes A may be
X̅, 𝑀𝑒, 𝑀𝑜 Yes
2
Easy to calculate &
simple to
understand
Yes No No Yes
Location Method,
but not Grouping
method
Yes Yes Yes No
3 Based on all the
items Yes
Yes (but able
to find only
for Positive
Values)
Yes
(ONLY
positive
values
and no
“0”)
No No No No Yes Yes
4
capable of further
mathematical
treatment
Yes
Yes (Useful
for
calculation of
Index
Numbers)
Yes
Yes (but only in
Mean Deviation,
no combined
Median)
No
No (But in case
of Quality
control and
stock market
fluctuations)
No
No (Useful for
Economists and
Businessmen and
in public reports)
Yes
5 Good basis for
comparison Yes Not much Yes
6 Necessary for
arrange of data No No No Yes No ------Not on Discussion-----
7 Affected by extreme
values Yes
Yes (Not
much Yes No No Yes No Less than SD Yes
SSA Statistics 2.24
compared to
AM)
8
Not Precise – Mis-
leading impressions
(E.g. Average
number of persons
is 1.5 which is not
possible)
No
No No
Yes (except
when Median
lies in between
two values)
Yes (except on
continuous series) ------Not on Discussion-----
9 Location
(Inspection) Method No No No
Yes (on
arrangement) Yes ------Not on Discussion-----
10 Graphical Method Yes (using Ogive
Curves) ------Not on Discussion-----
11
Calculated in the
case of open end
class intervals
No No No Yes Yes No Yes Based on “A” No
12
Affected by
sampling
fluctuations
No
(least) No No Yes Yes Yes Yes Yes
Less
affected
13
Affected by Change
of origin Yes Yes Yes Yes Yes No No No No
Affected by Change
of Scale Yes Yes Yes Yes Yes Yes Yes Yes Yes
SSA Statistics 2.25
Explanations to Formulae:
1. Geometric Mean
Logarithmic formulae of Geometric Mean
Individual Observation Discrete Continuous
GM = √𝑥1 × 𝑥2 × … .× 𝑥𝑛𝑛
log 𝐺. 𝑀 = log √𝑥1 × 𝑥2 × … .× 𝑥𝑛𝑛
= 1
𝑛log(𝑥1 × 𝑥2 ×. . .× 𝑥𝑛)
= 1
𝑛(log 𝑥1 + log 𝑥2
+ … . log 𝑥𝑛)
= 1
𝑛∑ log 𝑥
GM = Anti log (1
𝑛∑ log 𝑥)
GM = √𝑥1𝑓1 × 𝑥2
𝑓2 × … . 𝑥𝑛𝑓𝑛
𝑁
log 𝐺. 𝑀 = log √𝑥1𝑓1 × 𝑥2
𝑓2 × … . 𝑥𝑛𝑓𝑛
𝑁
= 1
𝑁[(log 𝑥1
𝑓1 × 𝑥2𝑓2 × … . 𝑥𝑛
𝑓𝑛)]
= 1
𝑁[log 𝑥1
𝑓1 + log 𝑥2𝑓2
+ … . log 𝑥𝑛𝑓𝑛]
= 1
𝑁[𝑓1 log 𝑥1 + 𝑓2 log 𝑥2
+ ⋯ 𝑓𝑛 log 𝑥𝑛]
= 1
𝑁∑ 𝑓 log 𝑥
GM = Antilog 1
𝑁∑ 𝑓 log 𝑥
GM = √
𝑚1𝑓1 × 𝑚2
𝑓2 ×
… .× 𝑚𝑛𝑓𝑛
𝑁
log 𝐺. 𝑀 = log √
𝑚1𝑓1 × 𝑚2
𝑓2 ×
… × 𝑚𝑛𝑓𝑛
𝑁
= 1
𝑁[(log 𝑚1
𝑓1 × 𝑚2𝑓2 × … × 𝑚𝑛
𝑓𝑛)]
= 1
𝑁[log 𝑚1
𝑓1 + log 𝑚2𝑓2
+ … . log 𝑚𝑛𝑓𝑛]
= 1
𝑁[𝑓1 log 𝑚1 + 𝑓2 log 𝑚2
+ ⋯ 𝑓𝑛 log 𝑚𝑛]
= 1
𝑁∑ 𝑓 log 𝑚
GM = Antilog
1
𝑁∑ 𝑓 log 𝑚
SSA Statistics 2.26
Standard Deviation:
𝑠 = √∑(𝑋 − X̅)2
𝑛
∑(𝑋 − X̅)2 = ∑[𝑋2 − 2𝑋X̅ + X̅2]
∑(𝑋 − X̅)2 = ∑ 𝑋2 − ∑(2𝑋X̅) + ∑ X̅2
∑(𝑋 − X̅)2 = ∑ 𝑋2 − 2X̅ ∑ 𝑋 + 𝑛X̅2
∑(𝑋 − X̅)2 = ∑ 𝑋2 − 2∑ 𝑋
𝑛∑ 𝑋 + 𝑛.
∑ 𝑋
𝑛.∑ 𝑋
𝑛
∑(𝑋 − X̅)2 = ∑ 𝑋2 − 2(∑ 𝑋)2
𝑛+
(∑ 𝑋)2
𝑛
∑(𝑋 − X̅)2 = ∑ 𝑋2 − 2(∑ 𝑋)2
𝑛+
(∑ 𝑋)2
𝑛
∑(𝑋 − X̅)2 = ∑ 𝑋2 −(∑ 𝑋)2
𝑛(2 − 1)
∑(𝑋 − X̅)2
𝑛=
∑ 𝑋2 −(∑ 𝑋)2
𝑛
𝑛
∑(𝑋 − X̅)2
𝑛=
𝑛 ∑ 𝑋2−(∑ 𝑋)2
𝑛
𝑛=
∑ 𝑋2
𝑛− (
∑ 𝑋
𝑛)
2
=∑ 𝑋2
𝑛− X̅2
SSA Statistics 2.27
Graphical Method
Weighted Average:
1. Calculate goodwill using weighted average method:
Profit 20,000 10,000 (7000)
Weight 3 2 1
Missing Frequency:
1. Given N = 581 and Mean = 15. Find the missing frequencies.
x 10 11 12 13 14 15 16 17 18 19
f 8 15 x 100 98 95 y 75 50 30
2. Given Mean = 47, Median = 45, Mode = 35 and N= 90. Find the missing frequencies.
Marks 01-10 11-20 21-30 31-40 41-50 51-60 61-70 71-80 81-90 91-100
Number of Students 3 7 x 17 12 y 8 8 6 6
SSA Statistics 3.1
3. Probability
Introduction:
‘Probably’ “in all likelihood’, ‘chance’, ‘odds in favour, odds against
A branch of Mathematics
An integral part of statistics
Application on testing Hypothesis/Estimation
First Application – by a group of mathematicians in Europe about 300 hundreds years back to enhance
their chances of winning in different games of gambling.
Development by Mathematicians & Statisticians
Abraham De Moicere, & Piere-Simon De Laplace of France, Reverend Thomas Bayes & R.A.
Fisher of England, chebyshev, Morkov, Khinchin, Kolmgorov of Russia.
Divisions
Subjective
Dependent on personal judgement and experience, influenced by the personal belief, attitude
& bias.
Helpful in the field of uncertainty & in the area of decision making management.
Objective: The measure based on a recorded observation rather than a subjective estimate.
Definition/Terms
Experiment: An experiment may be described as a performance that produces certain results.
Random Experiment: An experiment is defined to be random if the results of the experiment
depend on chance only.
Trial: The result is known only after the experiment is done.
Events: The results or outcomes of a random experiment are known as events.
Sometimes events may be combination of outcomes.
Sample Space (S): The set of all events (Hence, applicability of set theory)
Example
Tossing of coin Experiment
Tossing of “any” coin Random Experiment
“Tossing” Trial
Head – H and Tail – T Events
S = {H,T} Sample Space
SSA Statistics 3.2
Types of Events
Sl.
No
Events Examples
I Simple/Elementary – No decomposition Toss a coin, S = {H, T}
Composite/compound – Decomposed into
two or more events.
Toss two coins, S = {HH, TT, TH, HT}
II Mutually Exclusive Events / Incompatible
Events
Not more than one events occur
simultaneously
Happening of one excludes the
happening of the other.
Occurrence of one event implies the
non-occurrence of the other events.
On tossing a coin
Mutually Exclusive:
If H occurs, T does not occur
Exhaustive events:
Either H or T occurs
Equally Likely:
H & T has equal chances of occurrence.
Exhaustive events – one of the events in the
sample space must necessarily occur
Equally Likely Events / Mutually
symmetric Events / Equi-probable
Equality of the events.
No event in expected to occur more
frequently as compared to the other events
III Finite Events: n (number of events) is finite Tossing 1 coin, n=2
Infinite Events: n is Infinite Tossing a coin continuously, 𝑛 = ∞
IV Unbiased Events: getting events on
performing
On tossing a coin, either H or T will
definitely turn up.
Biased Events: Not getting the events on the
performance
On tossing a coin in a sandy floor, one side
of the coin showing H & the other site
showing T
V Sure event: P(A) = 1
On tossing a coin,
Let A – Getting H or T, P(A) =1
Let B – Getting neither H nor T, P(B) = 0 Impossible event: P(A) = 0
VI Dependent Events: The event depends on
the previous trials.
A box contains 5 balls.
If First ball is drawn and not replaced,
then drawing the second ball is a
dependent event.
If First ball drawn is not replaced, then
drawing the second ball is an independent
event.
Independent Events: The event does not
depends on the previous trials.
SSA Statistics 3.3
Different Definitions on probability
I. Classical Definition / Aprior Definition
Let n – finite elementary events/equally likely
𝑛𝐴(≤ 𝑛) - favourable to A.
Then, 𝑃(𝐴) = 𝑛𝐴
𝑛=
𝑁𝑜.𝑜𝑓 𝑒𝑞𝑢𝑎𝑙𝑙𝑦 𝑙𝑖𝑘𝑒𝑙𝑦 𝑒𝑣𝑒𝑛𝑡𝑠 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒 𝑡𝑜 𝐴
𝑡𝑜𝑡𝑎𝑙 𝑛𝑜.𝑜𝑓 𝑒𝑞𝑢𝑎𝑙𝑙𝑦 𝑙𝑖𝑘𝑒𝑙𝑦 𝑒𝑣𝑒𝑛𝑡𝑠
Also, m(≤ 𝑛) – Composite events / mutually Exclusive and Exhaustive Equally likely
𝑚𝐴(≤ 𝑚) – favourable to 𝐴
Then, 𝑷(𝑨) = 𝒎𝑨
𝒎=
𝑵𝒐.𝒐𝒇 𝒎𝒖𝒕𝒖𝒂𝒍𝒚 𝑬𝒙𝒄𝒍𝒖𝒔𝒊𝒗𝒆 𝑬𝒙𝒉𝒂𝒖𝒔𝒕𝒊𝒗𝒆 & 𝒆𝒒𝒖𝒂𝒍𝒍𝒚 𝒍𝒊𝒌𝒆𝒍𝒚 𝒆𝒗𝒆𝒏𝒕𝒔 𝒇𝒂𝒗𝒐𝒖𝒓𝒂𝒃𝒍𝒆 𝒕𝒐 𝑨
𝑻𝒐𝒕𝒂𝒍 𝒏𝒐.𝒐𝒇 𝒎𝒖𝒕𝒖𝒂𝒍𝒚 𝒆𝒙𝒉𝒖𝒔𝒊𝒗𝒆,𝒆𝒙𝒉𝒂𝒖𝒔𝒊𝒗𝒆 & 𝒆𝒒𝒖𝒂𝒍𝒍𝒚 𝒍𝒊𝒌𝒆𝒍𝒚 𝒆𝒗𝒆𝒏𝒕𝒔.
Points to Ponder:
1. Indebted to Bernoulli/ Laplace.
2. Based on prior knowledge
Demerits / Limitations
1. n-finite
2. Assumption: Events must equally likely / equi – probable
3. Limited applications (events – certain) – Coin tossing, dice throwing
4. Inapplicability – field of uncertainity / no prior knowledge.
Gist:
1. 0 ≤ 𝑃(𝐴) ≤ 1
𝑃(𝐴) = 0 – Impossible event and 𝑃(𝐴) = 1 - sure event
2. Complimentary Event:
A’ / 𝐴𝑐 / �̅� – Non – occurence of event A
Points to Ponder:
• A &A’ are mutually Exclusive
• 𝑃(𝐴) + 𝑃(A’) = 1
P(A’) = 1 - 𝑚𝐴
𝑚 =
𝑚−𝑚𝐴
𝑚
3. Odds in favour of A = 𝑚𝐴: (𝑚 − 𝑚𝐴)
Odds against A = (𝑚 − 𝑚𝐴):𝑚𝐴
Question 1: A coin is tossed three times. What is the probability of getting 2 heads or At least 2 heads?
Answer: All the elementary events, when a coin is tossed three times,
𝑆 = {𝐻𝐻𝐻, 𝐻𝐻𝑇, 𝐻𝑇𝐻, 𝐻𝑇𝑇, 𝑇𝐻𝐻, 𝑇𝐻𝑇, 𝑇𝑇𝐻, 𝑇𝑇𝑇}
𝑛 = 8
𝑛𝐴 = 𝑡𝑤𝑜 ℎ𝑒𝑎𝑑𝑠 𝑓𝑟𝑜𝑚 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑎𝑟𝑦 𝑒𝑣𝑒𝑛𝑡𝑠 = 3
𝑃(𝐴) =𝑛𝐴
𝑛=
𝑁𝑜. 𝑜𝑓 𝑒𝑞𝑢𝑎𝑙𝑙𝑦 𝑙𝑖𝑘𝑒𝑙𝑦 𝑒𝑣𝑒𝑛𝑡𝑠 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒 𝑡𝑜 𝐴
𝑇𝑜𝑡𝑎𝑙 𝑛𝑜. 𝑜𝑓 𝑒𝑞𝑢𝑎𝑙𝑙𝑦 𝑙𝑖𝑘𝑒𝑙𝑦 𝑒𝑣𝑒𝑛𝑡𝑠=
3
8
𝑛𝐴 = 𝑎𝑡 𝑙𝑒𝑎𝑠𝑡 𝑡𝑤𝑜 ℎ𝑒𝑎𝑑𝑠 𝑓𝑟𝑜𝑚 𝑒𝑙𝑒𝑚𝑒𝑛𝑡𝑎𝑟𝑦 𝑒𝑣𝑒𝑛𝑡𝑠 = 4
SSA Statistics 3.4
𝑃(𝐴) =𝑛𝐴
𝑛=
𝑁𝑜. 𝑜𝑓 𝑒𝑞𝑢𝑎𝑙𝑙𝑦 𝑙𝑖𝑘𝑒𝑙𝑦 𝑒𝑣𝑒𝑛𝑡𝑠 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒 𝑡𝑜 𝐴
𝑇𝑜𝑡𝑎𝑙 𝑛𝑜. 𝑜𝑓 𝑒𝑞𝑢𝑎𝑙𝑙𝑦 𝑙𝑖𝑘𝑒𝑙𝑦 𝑒𝑣𝑒𝑛𝑡𝑠=
4
8
Question 2: A dice is rolled twice. What is the probability of getting a difference of 2 points?
Answer: If an experiment results in p outcomes and if the experiment is repeated q times, then the
total number of outcomes is pq. In the present case, since a dice results in 6 outcomes and the dice is
rolled twice, total no. of outcomes or elementary events is 62 or 36. We assume that the dice is
unbiased which ensures that all these 36 elementary events are equally likely
Now a difference of 2 points in the uppermost faces of the dice thrown twice can occur in the
following cases:
1st Throw 2nd Throw Difference
6 4 2
5 3 2
4 2 2
3 1 2
1 3 2
2 4 2
3 5 2
4 6 2
Thus denoting the event of getting a difference of 2 points by A, we find that the no. of outcomes
favourable to A, from the above table, is 8. By classical definition of probability, we get
𝑃(𝐴) =8
36=
2
9
Question 3: Two dice are thrown simultaneously. Find the probability that the sum of points on the
two dice would be 7 or more.
Answer: If two dice are thrown then, as explained in the last problem, total no. of elementary events is
62 or 36. Now a total of 7 or more i.e. 7 or 8 or 9 or 10 or 11 or 12 can occur only in the following
combinations:
SUM = 7 (1,6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)
SUM = 8 (2,6), (3,5), (4,4), (5,3), (6,2)
SUM = 9 (3,6), (4,5), (5,4), (6,3)
SUM = 10 (4,6), (5,5), (6,4)
SUM = 11 (5,6), (6,5)
SUM = 12 (6,6)
Thus the no. of favourable outcomes is 21. Letting A stand for getting a total of 7 points or more, we
have 𝑃(𝐴) =21
6=
7
12
SSA Statistics 3.5
Question 4: What is the chance of picking a spade or an ace not of spade from a pack of 52 cards?
Answer: A pack of 52 cards contain 13 Spades, 13 Hearts, 13 Clubs and 13 Diamonds. Each of these
groups of 13 cards has an ace. Hence the total number of elementary events is 52 out of which 13 + 3 or
16 are favourable to the event A representing picking a Spade or an ace not of Spade. Thus we have
𝑃(𝐴) =16
52=
4
13
Question 5: Find the probability that a 4 digit number comprising the digits 2, 5, 6 and 7 would be
divisible by 4.
Answer: Since there are four digits, all distinct, the total number of four digit numbers that can be
formed without any restriction is 4! or 4 × 3 × 2 × 1 or 24. Now a four digit number would be divisible
by 4 if the number formed by the last two digits is divisible by 4. This could happen when the four
digit number ends with 52 or 56 or 72 or 76. If we fix the last two digits by 52, and then the 1st two
places of the four digit number can be filled up using the remaining 2 digits in 2! or 2 ways. Thus there
are 2 four digit numbers that end with 52. Proceeding in this manner, we find that the number of four
digit numbers that are divisible by 4 is 4 × 2 or 8. If (A) denotes the event that any four digit number
using the given digits would be divisible by 4, then we have
𝑃(𝐴) =8
24=
1
2
Question 6: A committee of 7 members is to be formed from a group comprising 8 gentlemen and 5
ladies. What is the probability that the committee would comprise:
a. 2 ladies,
b. at least 2 ladies.
Answer: Since there are altogether 8 + 5 or 13 persons, a committee comprising 7 members can be
formed in
13C7 or 13!
7!6! or
13×12×11×10×9×8×7!
7!×6×5×4×3×2×1 or 11×12×13 ways.
a. When the committee is formed taking 2 ladies out of 5 ladies, the remaining (7–2) or 5
committee members are to be selected from 8 gentlemen. Now 2 out of 5 ladies can be selected in 5C2
ways and 5 out of 8 gentlemen can be selected in 8C5 ways. Thus, if A denotes the event of having the
committee with 2 ladies, then A can occur in5C2× 8C5 or 5×4
2×1×
8×7×6
3×2 or 10×56 ways.
Thus 𝑃(𝐴) = 10×56
11×12×13=
140
429
Since the minimum number of ladies is 2, we can have the following combinations;
Population 5L 8G
Sample 2L + 5G
Or 3L + 4G
Or 4L + 3G
Or 5L + 2G
SSA Statistics 3.6
b. Thus if B denotes the event of having at least two ladies in the committee, then B can occur in
5C2×8C5 +5C3×8C4 + 5C4 ×8C3+5C5× 8C2 = 1568 ways.
𝐻𝑒𝑛𝑐𝑒 𝑃(𝐵) =1568
11 × 12 × 13=
392
429
II.Statistical Definition (Limiting form)
To overcome the limitation of finite number of elements in classical definition
Developed by British Mathematicians.
Here, P(A) = lim n⟶∞
𝐹𝐴
𝑛
A occurs 𝐹𝐴 times - Random experiment repeated a very good number of times, say n, under an
identical set of conditions.
Applicability
1. Limit should exist
2. Tends to finite values
Question 7: The following data relate to the distribution of wages of a group of workers:
Wages in Rs 50 – 60 60 – 70 70 – 80 80 – 90 90 – 100 100 - 110 110 – 120
No. of workers 15 23 36 42 17 12 5
If a worker is selected at random from the entire group of workers, what is the probability that
a. his wage would be less than Rs 50?
b. his wage would be less than Rs 80?
c. his wage would be more than Rs 100?
d. his wages would be between Rs 70 and Rs 100?
Answer: As there are altogether 150 workers, n=150.
a. Since there is no worker with wage less than ₹50, the probability that the wage of a randomly
selected worker would be less than ₹50 is P(A) = 0
150 = 0
b. Since there are (15 + 23 + 36) or 74 worker having wages less than ₹80 out of a group of 150 workers,
the probability that the wage of a worker, selected at random from the group, would be less than
₹80 is 𝑃(𝐵) = 74
150=
37
75
c. There are ( 12 +5) or 17 workers with wages more than ₹100. Thus the probability of finding a
worker, selected at random, with wage more than ₹100 is 𝑃(𝐶) = 17
150
d. There are (36+42+17) or 95 workers with wages in between ₹70 and ₹100. Thus 𝑃(𝐷) = 95
150=
19
30
Operations on Sets
1. A∪B = {𝑥: 𝑥𝜖𝐴 𝑜𝑟 𝑥𝜖𝐵}
2. A∩B = {𝑥: 𝑥𝜖𝐴 & 𝑥𝜖𝐵}
3. A – B = {𝑥: 𝑥𝜖𝐴 & 𝑥 ∉ 𝐵}/ 𝐵 − 𝐴 = {𝑥: 𝑥𝜖𝐵 & 𝑥 ∉ 𝐴}
4. A’ = {𝑥: 𝑥 ∉ 𝐴}
SSA Statistics 3.7
Question 8: Three events A, B and C are mutually exclusive, exhaustive and equally likely. What is
the probably of the complementary event of A?
Answer: A, B and C are
Mutually exclusive: 𝑃(𝐴 ∪ 𝐵 ∪ 𝐶) = 𝑃(𝐴) + 𝑃(𝐵) + 𝑃(𝐶)
Exhaustive: 𝑃(𝐴 ∪ 𝐵 ∪ 𝐶) = 1
Equally likely; 𝑃(𝐴) = 𝑃(𝐵) = 𝑃(𝐶)
Thus, Combining the above
1 = 𝑘 + 𝑘 + 𝑘
⇒ 𝑘 = 1
3
Thus 𝑃(𝐴) = 𝑃(𝐵) = 𝑃(𝐶) =1
3
Hence 𝑃(𝐴′) = 1 −1
3=
2
3
III. Axiomatic/modern
Let 𝐴 ⊆ 𝑆.
Then, real valued function P = 𝑃(𝐴) − probability of A, if P satisfies the following axioms:
1. P(A) ≥ 0 for every A ≤ S
2. P(S) = 1
3. For any sequence of mutually exclusive events 𝐴1, 𝐴2, 𝐴3, … ..
𝑃(𝐴1 ∪ 𝐴2 ∪ 𝐴3 ∪ … … ) = 𝑃(𝐴1) + 𝑃(𝐴2) + 𝑃(𝐴3 ) + … …
Addition Theorem / Theorem on Total probability
Theorem – 1:
Let A & B (k, no. of events=2) be ME, then
𝑃(𝐴 ∪ 𝐵) 𝑜𝑟 𝑃(𝐴 + 𝐵) 𝑜𝑟 𝑃(𝐴 𝑜𝑟 𝐵) = 𝑃(𝐴) + 𝑃(𝐵)
Question 9: A number is selected from the first 25 natural numbers. What is the probability that it
would be divisible by 4 or 7?
Answer: Let A be the event that the number selected would be divisible by 4 and B, the event that the
selected number would be divisible by 7. Then AUB denotes the event that the number would be
divisible by 4 or 7. Next we note that A = {4, 8, 12, 16, 20, 24} and B = {7, 14, 21} whereas S = {1, 2, 3,
……... 25}. Since A∩B =𝜙 the two events A and B are mutually exclusive and as such we have
𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵)
Since
𝑃(𝐴) = 𝑛(𝐴)
𝑛(𝑆)=
6
25 and 𝑃(𝐵) =
𝑛(𝐵)
𝑛(𝑆)=
3
25
∴ 𝑃(𝐴 ∪ 𝐵) =6
25+
3
25=
9
25
Hence the probability that the selected number would be divisible by 4 or 7 is 9
25 or 0.36
SSA Statistics 3.8
Question 10: A coin is tossed thrice. What is the probability of getting 2 or more heads?
Answer: If a coin is tossed three times, then we have the following sample space.
S = {HHH, HHT, HTH, HTT, THH, THT, TTH, TTT} 2 or more heads imply 2 or 3 heads.
If A and B denote the events of occurrence of 2 and 3 heads respectively, then we find that
A = {HHT, HTH, THH} and B = {HHH}
𝑃(𝐴) = 𝑛(𝐴)
𝑛(𝑆)=
3
8 and 𝑃(𝐵) =
𝑛(𝐵)
𝑛(𝑆)=
1
8
As A and B are mutually exclusive, the probability of getting 2 or more heads is
∴ 𝑃(𝐴 ∪ 𝐵) =3
8+
1
8= 0.5
Theorem – 2: (Extension of Theorem –1)
𝐿𝑒𝑡 𝐴1, 𝐴2, … . , 𝐴𝑘( 𝑘 ≥ 2)𝑏𝑒 𝑡ℎ𝑒 𝑀𝐸 𝑒𝑣𝑒𝑛𝑡𝑠, 𝑡ℎ𝑒𝑛
P(𝐴1 ∪ 𝐴2 ∪ … . .∪ 𝐴𝑘 ) = P(𝐴1) + P(𝐴2) +....+ P(𝐴𝑘)
Theorem 3:
P(either A occurs or B occurs) = P(A) + P(B) – P(Simultaneous occurrence of the events A&B)
𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)
Points to Ponder: Stronger than Theorem -1, as it can be derived from Theorem -3
Question 11: A number is selected at random from the first 1000 natural numbers. What is the
probability that it would be a multiple of 5 or 9?
Answer: Let A, B, A∩B and A∩B denote the events that the selected number would be a multiple of 5,
9, 5 or 9 and both 5 and 9 i.e. LCM of 5 and 9 i.e. 45 respectively.
Since 1000 = 5 ×200
= 9× 111 × 1
= 42×22 + 10
it is obvious that
P(A) = 200
1000 , P(B) =
111
1000 , 𝑃(𝐴 ∩ 𝐵) =
22
1000
Hence the probability that the selected number would be a multiple of 4 or 9 is given by
𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵) = 200
1000+
111
1000−
22
1000
SSA Statistics 3.9
Question 12: The probability that an Accountant's job applicant has a B. Com. Degree is 0.85, that he is
a CA is 0.30 and that he is both B. Com. and CA is 0.25 out of 500 applicants, how many would be B.
Com. or CA?
Answer: Let the event that the applicant is a B. Com. be denoted by B and that he is a CA be denoted
by C Then as given,
𝑃(𝐵) = 0.85 , 𝑃(𝐶) = 0.30 and 𝑃(𝐵 ∩ 𝐶) = 0.25
The probability that an applicant is B. Com. or CA is given by
𝑃(𝐵 ∪ 𝐶) = 𝑃(𝐵) + 𝑃(𝐶) − 𝑃(𝐵 ∩ 𝐶) = 0.85 + 0.30 – 0.25 = 0.90
Question 13: If 𝑃(𝐴 − 𝐵) = 1
5, 𝑃(𝐴) =
1
3 and 𝑃(𝐴) =
1
2 , what is the probability that out of the two
events A and B, only B would occur?
Answer: A glance at Figure 13.3 suggests that
Only A, 𝑃(𝐴 − 𝐵) = 𝑃(𝐴 ∩ 𝐵′) = 𝑃(𝐴) − 𝑃(𝐴 ∩ 𝐵) = 1
5
𝑃(𝐴 − 𝐵) = 𝑃(𝐴) − 𝑃(𝐴 ∩ 𝐵) = 1
3 − 𝑃(𝐴 ∩ 𝐵) =
1
5
𝑃(𝐴 ∩ 𝐵) =2
15
Only B, 𝑃(𝐵 − 𝐴) = 𝑃(𝐵 ∩ 𝐴′) = 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)
𝑃(𝐵 − 𝐴) = 𝑃(𝐵 ∩ 𝐴′) = 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵) =1
2−
2
15=
11
30
Theorem – 4:
Let A, B & C be 3 events, then the probability that atleast one of the events is given by
𝑃(𝐴 ∪ 𝐵 ∪ 𝐶) = 𝑃(𝐴) + 𝑃(𝐵) + 𝑃(𝐶) − 𝑃(𝐴 ∩ 𝐵) − 𝑃(𝐵 ∩ 𝐶) − 𝑃(𝐴 ∩ 𝐶) + 𝑃(𝐴 ∩ 𝐵 ∩ 𝐶)
Question 14: There are three persons A, B and C having different ages. The probability that A survives
another 5 years is 0.80, B survives another 5 years is 0.60 and C survives another 5 years is 0.50. The
probabilities that A and B survive another 5 years is 0.46, B and C survive another 5 years is 0.32 and
A and C survive another 5 years 0.48. The probability that all these three persons survive another 5
years is 0.26. Find the probability that at least one of them survives another 5 years.
Answer:
As given P(A) = 0.80, P(B) = 0.60, P(C) = 0.50,
P(A∩B) = 0.46, P(B∩C) = 0.32, P(A∩C) = 0.48 and
P(A∩B∩C) = 0.26
SSA Statistics 3.10
The probability that at least one of them survives another 5 years in given by
𝑃(𝐴 ∪ 𝐵 ∪ 𝐶) = 𝑃(𝐴) + 𝑃(𝐵) + 𝑃(𝐶) − 𝑃(𝐴 ∩ 𝐵) − 𝑃(𝐵 ∩ 𝐶) − 𝑃(𝐴 ∩ 𝐶) + 𝑃(𝐴 ∩ 𝐵 ∩ 𝐶)
= 0.80 + 0.60 + 0.50 – 0.46 – 0.32 – 0.48 + 0.26 = 0.90
Conditional Probability / Compound Theorem / Multiplication Theorem
Compound / Joint Probability P(𝑨 ∩ 𝑩)/𝑷(𝑨𝟏 ∩ 𝑨𝟐 ∩ … .∩ 𝑨𝒌)- The probability of occurrence of two or
more events A &B simultaneously
Situations
1. Dependent Events P(B/A) – The occurrence of one event B impossible is influenced by the
occurrence of another event, A (not an impossible event)
𝑃(𝐵/𝐴) = 𝑃(𝐵 ∩ 𝐴)
𝑃(𝐴), 𝑃(𝐴 > 0)
Note:
If A depends on B, then
𝑃(𝐴/𝐵) = 𝑃(𝐴∩𝐵)
𝑃(𝐵), 𝑃(𝐵) > 0
Points to Ponder:
1. 𝑃(𝐵/𝐴) = 𝑃(𝐵∩𝐴)
𝑃(𝐴)=
𝑃(𝐴∩𝐵)
𝑃(𝐵) (since P(𝐴 ∩ 𝐵) = P(𝐵 ∩ 𝐴) – commutative property)
2. If B is not dependent on A, then P(B/A) = P(B)
3. If A is not dependent on B, then P(A/B) = P(A)
Thus, 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) × 𝑃(𝐵)
4. If A & B are independent, then
A &B’ are independent i.e 𝑃(𝐴 ∩ 𝐵′) = 𝑃(𝐴) × 𝑃(𝐵′) = 𝑃(𝐴) × [1 − 𝑃(𝐵)]
A’ &B are independent i.e 𝑃(𝐴′ ∩ 𝐵) = 𝑃(𝐴′) × 𝑃(𝐵) = [1 − 𝑃(𝐴)] × 𝑃(𝐵)
A’ & B’ are independent i.e 𝑃(𝐴′ ∩ 𝐵′) = 𝑃(𝐴′) × 𝑃(𝐵′) = [1 − 𝑃(𝐴)] × [1 − 𝑃(𝐵)]
IfIf A, B &C are independent, then 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) × 𝑃(𝐵) 𝑃(𝐴 ∩ 𝐶) = 𝑃(𝐴) × 𝑃(𝐶) 𝑃(𝐵 ∩ 𝐶) = 𝑃(𝐵) × 𝑃(𝐶) 𝑃(𝐴 ∩ 𝐵 ∩ 𝐶) = 𝑃(𝐴) × 𝑃(𝐵) × 𝑃(𝐶)
If A, B &C are dependent, then 𝑃(𝐴 ∩ 𝐵 ∩ 𝐶) = 𝑃(𝐴) × 𝑃(𝐵)/𝐴)
× 𝑃(𝐶/𝐵 ∩ 𝐶)
Theorems of Compound Probability
Theorem -5
P(A &B occur simultaneously) = product of the unconditional probability of A and the conditional
probability of B, given that A has already occurred
𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) × 𝑃(𝐵)/𝐴)
SSA Statistics 3.11
Theorem – 6:
Let A, B & C be any 3 events, the probability that they occur jointly is
𝑃(𝐴 ∩ 𝐵 ∩ 𝐶) = 𝑃(𝐴) × 𝑃(𝐵)/𝐴) × 𝑃 (𝐶
𝐵∩ 𝐶), provided P(A∩ 𝐵)>0
If independent, then
𝑃(𝐴 ∩ 𝐵 ∩ 𝐶) = 𝑃(𝐴) × 𝑃(𝐵) × 𝑃(𝐶)
Question 15: Rupesh is known to hit a target in 5 out of 9 shots whereas David is known to hit the
same target in 6 out of 11 shots. What is the probability that the target would be hit once they both try?
Answer: Let A denote the event that Rupesh hits the target and B, the event that David hits the target.
Then as given,
𝑃(𝐴) =5
9 and 𝑃(𝐵)
6
11
𝑃(𝐴 ∩ 𝐵 ) = 𝑃(𝐴) × 𝑃(𝐵) = 5
9×
6
11=
79
99
Alternately
𝑃(𝐴 ∪ 𝐵) = 1 − 𝑃(𝐴 ∪ 𝐵)′ = 1 − 𝑃(𝐴′ ∩ 𝐵′)
= 1 − [(1 − 𝑃(𝐴)) × (1 − 𝑃(𝐵))] -
= 1 – (1 - 5
9)× (1-
6
11) = 1-
4
9×
5
11 =
79
99
Question 16: A pair of dice is thrown together and the sum of points of the two dice is noted to be 10.
What is the probability that one of the two dice has shown the point 4?
Answer: Let A denote the event of getting 4 points on one of the two dice and B denote the event of
getting a total of 10 points on the two dice. Then we have
P(A) = 1
2×
1
6 =
1
12 and P(A∩ 𝐵) =
2
36
[Since a total of 10 points may result in (4, 6) / (5, 5) / (6, 4) and two of these combinations contain 4]
Thus 𝑃(𝐵/𝐴) =𝑃(𝐴∩𝐵)
𝑃(𝐴)=
2/36
1/12=
2
3
Alternately the sample space for getting a total of 10 points when two dice are thrown simultaneously
is given by S = {(4,6),(5,5),(6,4)}
Out of these 3 case, we get 4 in 2cases. Thus by the definition of probability, we have 𝑃(𝐵/𝐴) = 2
3
Question 17: In a group of 20 males and 15 females, 12 males and 8 females are service holders. What
is the probability that a person selected at random from the group is a service holder given that the
selected person is a male?
Answer: Let S and M stand for service holder and male respectively. We are to evaluate P (S / M).
SSA Statistics 3.12
We note that (S∩ 𝑀)represents the event of both service holder and male.
Thus 𝑃(𝑆/𝑀) = 𝑃(𝑆∩𝑀)
𝑃(𝑀)=
12/35
20/35= 0.60
Question 18: In connection with a random experiment, it is found that
𝑃(𝐴) =2
3 , 𝑃(𝐵) =
3
5 = and 𝑃(𝐴 ∪ 𝐵) =
5
6
Evaluate the following probabilities:
1. P(A/B)
2. P(B/A)
3. P(A’/B)
4. P(A/B’)
5. P(A’/B’)
Answer:
𝑃(𝐴 ∪ 𝐵) = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)
5
6=
2
3+
3
5− 𝑃(𝐴 ∩ 𝐵)
𝑃(𝐴 ∩ 𝐵) = 13
20
Hence
1. 𝑃(𝐴/𝐵) =𝑃(𝐴∩𝐵)
𝑃(𝐵) =
13/30
3/5 =
13
18
2. 𝑃(𝐵/𝐴) =𝑃(𝐴∩𝐵)
𝑃(𝐴) =
13/30
2/3 =
13
20
3. 𝑃(𝐴′/𝐵) =𝑃(𝐴′∩𝐵)
𝑃(𝐵) =
𝑃(𝐵)− 𝑃(𝐴∩𝐵)
𝑃(𝐵) =
3
5−
13
303
5
= 5
18
4. 𝑃(𝐴/𝐵′) =𝑃(𝐴∩𝐵′)
𝑃(𝐵′) =
𝑃(𝐴)−𝑃(𝐴∩𝐵)
1−𝑃(𝐵) =
7
12
5. 𝑃 (𝐴′
𝐵′) =𝑃(𝐴′∩𝐵′)
𝑃(𝐵′)=
𝑃(𝐴∪𝐵)′
𝑃(𝐵) [ by De-Morgan’s Law A’∩ 𝐵′ = (A∪ 𝐵)′]
= 1−𝑃(𝐴∪𝐵)
1−𝑃(𝐵) =
1−5/6
1−3/5 =
5
12
Question 19: The odds in favour of an event is 2 : 3 and the odds against another event is 3 : 7. Find
the probability that only one of the two events occurs.
Answer: We denote the two events by A and B respectively.
P(A) = 2
2+3 =
2
5 and P(B) =
7
7+3 =
7
10
As A and B are independent, 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴) × 𝑃(𝐵) =2
5×
7
10=
7
25
Probability (either only A occurs or only B occurs) = 𝑃(𝐴 − 𝐵) + 𝑃(𝐵 − 𝐴)
= [P(A) – P(A∩ 𝐵)] + [𝑃(𝐵) − 𝑃(𝐴 ∩ 𝐵)]
= P(A) + P(B) – 2 P(A∩ 𝐵)
SSA Statistics 3.13
= 2
5 +
7
10− 2 ×
7
25 =
27
50
Question.20: There are three boxes with following compositions;
Colour Box Blue Red White Total
I. 5 8 10 23
II. 4 9 8 21
III. 3 6 7 16
Tow balls are drawn from each box. What is the probability that they would be of the same colour?
Answer: Either the balls would be Blue or Red or White. Denoting Blue, Red and White balls by B, R
and W respectively and the box by lower suffix, the required probability is
=P(𝐵1 ∩ 𝐵2 ∩ 𝐵3) + 𝑃(𝑅1 ∩ 𝑅2 ∩ 𝑅3) + 𝑃(𝑊1 ∩ 𝑊2 ∩ 𝑊3)
=P(𝐵1) ×P(𝐵2) × 𝑃(𝐵3) + 𝑃(𝑅1) × 𝑃(𝑅2) × 𝑃(𝑅3) + 𝑃(𝑊1) × 𝑃(𝑊2) × 𝑃(𝑊3)
= 5
23×
4
21×
3
16+
8
23×
9
21×
6
16+
10
23×
8
21×
7
16 =
1052
7728
Question 21: Mr. Roy is selected for three separate posts. For the first post, there are three candidates,
for the second, there are five candidates and for the third, there are 10 candidates. What is the
probability that Mr. Roy would be selected?
Answer: Denoting the three posts by A, B and C respectively, we have
P(A) = 1
3 , P(B) =
1
5 and P(C) =
1
10
The probability that Mr. Roy would be selected (i.e.selected for at least one post).
=P(A∪ 𝐵 ∪ 𝐶)
=1- P[A∪ 𝐵 ∪ 𝐶)′]
=1 – P(A’∩ 𝐵′ ∩ 𝐶′) (by De-Morgan’s Law)
= 1 – P(A’)×P(B’)× 𝑃(𝐶′) (As A,B and C are independent, so are their complements)
= 1 – (1 − 1
3) × (1 −
1
5) × (1 −
1
10) =
13
25
Question 22: The independent probabilities that the three sections of a costing department will
encounter a computer error are 0.2, 0.3 and 0.1 per week respectively what is the probability that there
would be
1. at least one computer error per week?
2. one and only one computer error per week?
Answer: Denoting the three sections by A, B and C respectively, the probabilities of encountering a
computer error by these three sections are given by P(A) = 0.20, P(B) = 0.30 and P(C) = 0.10
1. Probability that there would be at least one computer error per week.
SSA Statistics 3.14
= 1 – Probability of having no computer error in any at the three sections.
= 1 – P(A’∩B’∩C’)
= 1 – P(A’)×P(B’) ×P(C’) [Since A, B and C are independent]
= 1 – (1 – 0.20) × (1 – 0.30) ×(1 – 0.10)
= 0.50
2. Probability of having one and only one computer error per week
= P(A∩B’∩C’) + P(A’∩B∩C’) +P(A’∩B’∩C)
= P(A)×P(B’) ×P(C’) + P(A’) ×P(B) ×P(C’) + P(A’) ×P(B’) ×P(C)
= 0.20 ×0.70×0.90 + 0.80×0.30×0.90 + 0.80×0.70 ×0.10
= 0.40
Question 23: A lot of 10 electronic components is known to include 3 defective parts. If a sample of 4
components is selected at random from the lot, what is the probability that this sample does not
contains more than one defectives?
Answer: Denoting detective component and non-defective components by D and D’ respectively, we
have the following situation:
D D T
Lot 3 7 10
Sample(1) 0 4 4
Sample(2) 1 3 4
Thus the required probability is given by
= (3C0 × 7C4 + 3C1 × 7C3) / 10C4
= 1×35+3×35
210 =
2
3
Question 24: There are two urns containing 5 red and 6 white balls and 3 red and 7 white balls
respectively. If two balls are drawn from the first urn without replacement and transferred to the
second urn and then a draw of another two balls is made from it, what is the probability that both the
balls drawn are red?
Answer: Since two balls are transferred from the first urn containing 5 red and 6 white balls to the
second urn containing 3 red and 7 white balls, we are to consider the following cases :
Case A: Both the balls transferred are red. In this case, the second urn contains 5 red and 7 white balls.
Case B: The two balls transferred are of different colours. Then the second urn contains 4 red and 8
white balls.
Case C: Both the balls transferred are white. Now the second urn contains 3 red and 7 white balls.
The required probability is given by
P(𝑅 ∩ 𝐴) + 𝑃(𝑅 ∩ 𝐵) + 𝑃(𝑅 ∩ 𝐶)
= P(R/A) × P(A) + P(R/B) × P(B) + P(R/C) × P(C)
= 5C2
12𝐶2×
5C2
11𝐶2×
4𝐶2
12𝐶2×
5𝐶1×6𝐶1
11𝐶2×
3𝐶2
12𝐶2×
6𝐶2
11𝐶2
SSA Statistics 3.15
=10
66×
10
55+
6
66×
30
35+
3
66×
15
55
= 325
66×55 =
65
726
Question 25: If 8 balls are distributed at random among three boxes, what is the probability that the
first box would contain 3 balls?
Answer: The first ball can be distributed to the 1st box or 2nd box or 3rd box i.e. it can be distributed
in 3 ways. Similarly, the second ball also can be distributed in 3 ways. Thus the first two balls can be
distributed in 32 ways. Proceeding in this way, we find that 8 balls can be distributed to 3 boxes in 38
ways which is the total number of elementary events. Let A be the event that the first box contains 3
balls which implies that the remaining 5 both must go to the remaining 2 boxes which, as we have
already discussed, can be done in 2 5 ways. Since 3 balls out of 8 balls can be selected in 8C3 ways, the
event can occur in 8C3 × 25 ways, thus we have
P(A) = 8C3 ×25
38 = 1792
6561
Question 26: There are 3 boxes with the following composition:
Box I : 7 Red + 5 White + 4 Blue balls
Box II : 5 Red + 6 White + 3 Blue balls
Box III : 4 Red + 3 White + 2 Blue balls
One of the boxes is selected at random and a ball is drawn from it. What is the probability that the
drawn ball is red?
Answer: Let A denote the event that the drawn ball is blue. Since any of the 3 boxes may be drawn, we
have P (BI) = P (BII) = P (BIII) =1
3
Also P (R1/BII) = probability of drawing a red ball from the first box = 7
16
P(𝑅2/𝐵𝑛) = 5
14 and P(𝑅3/𝐵𝑚) =
4
9
Thus we have
= P(A) = P(A)= P(R1∩BI) + P(R2∩BII) + P(R3∩BIII)
=P (𝑅1/𝐵1) × 𝑃(𝐵1) + 𝑃(𝑅2/𝐵𝐼𝐼) × 𝑃(𝐵𝐼𝐼) + 𝑃(𝑅3/𝐵𝐼𝐼𝐼) × 𝑃(𝐵𝐼𝐼𝐼)
= 7
16×
1
3+
5
14×
1
3+
4
9×
1
3 =
1249
3024
Random Variable – Probability Distribution
Random/Stochastic variable – A function defined on a sample space associated with a random
experiment assuming any value from R and assigning a real number to each and every sample point
of the random experiment.
Example:
SSA Statistics 3.16
Let A – an event of getting a head on tossing a
coin (S={H,T})
X – number of heads
∴ X= 0, if T turns up and X=1, if H turns up
A T H
X 0 1
P(X) ½ ½
On tossing 2 coins , S = {HH,TT,HT, TH}
A TT HT, HH HH
X 0 1 1
P(X) ¼ 2/4 ¼
On tossing n coins
A T…..T …..
X 0 1 2 ….. N
P(X) 𝑛𝐶0
2𝑛
𝑛𝐶1
21
𝑛𝐶2
22
….. 𝑛𝐶𝑛
2𝑛
Types
Types Example
Discrete: the variable defined on
a discrete sample space.
Number of car accidents
Number of heads on tossing a coin
Continuous: the variable defined on
a continuous sample space, assuming
an uncountably infinite number of values.
Height
Weight
Probability Distribution:
The Statement that expresses the different values taken by a random variable and the corresponding
probabilities.
Probability Distribution function: If a random variable x assumes n finite values 𝑋1, 𝑋2 … … , 𝑋𝑛 with
corresponding probabilites 𝑃1,𝑃2,𝑃3,…..,𝑃𝑛
Such that
I.𝑃𝑖 ≥0, for every i
II.∑ 𝑃𝑖 = 1 (over all i)
Then pd of x is given by
X 𝑋1 𝑋2 …… 𝑋𝑛
P 𝑃1 𝑃2 …… 𝑃𝑛
Case Function Definition
Discrete Probability Mass function
(pmf)
f(x) ≥0, for every x
&∑ f(x) = 1, where f(x)
=P(X=x)
Continous Probability density function
(pdf)
x is a continuous random
variable defined in an interval
(∝, 𝛽), 𝛽 ≥∝ when x can
SSA Statistics 3.17
assume an infinite number of
values.
f(x)≥ 0, x ∈ [∝, 𝛽]
∫ 𝑓(𝑥)𝑑𝑥
𝛽
∝
= 1
f & x lies between a,b, i.e ∝≤
𝑎 < 𝑏 ≤ 𝛽 then
∫ 𝑓(𝑥)𝑏𝑥
𝑏
𝑎
Expected value of a Random Variable
Expected value / Mathematical expectations / Expectation E[x] = (𝜇)
The sum of the products of the different values taken by the random variable and the
corresponding probabilities.
𝜇 = E(x) = ∑ 𝑃𝑖𝑥𝑖
𝑖
Points to Ponder:
1. 𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝑥2, 𝐸(𝑥2) = ∑ 𝑃𝑖𝑥𝑖2
𝑖
2. Expected value of Montonic function, E[g(x)] = ∑ 𝑃𝑖𝑔(𝑥𝑖)
3. Variance of x, 𝜎2 / V(x) =E(𝑥 − 𝜇)2 = E(𝑥)2 − 𝜇2
4. 𝜎 – ‘+’ ve square root of variance
5. If y = a +bx, (x, y – random variable & a, b – constants), then
𝜇𝑦 = a + b 𝜇𝑥̅ & 𝜎𝑦 = |𝑏| × 𝜎𝑥̅
Discrete case Continous case(𝒙 ∈ (−∞, ∞))
𝜇=∑ 𝑥𝑓(𝑥)
𝜎2 = E(𝑥2)- 𝜇2, where E(𝑥2) = ∑ 𝑥2𝑓(𝑥)
𝐸(𝑥) = ∫ 𝑥𝑓(𝑥)𝑑𝑥⋈
−⋈
𝜎2 = 𝐸(𝑥2) − 𝜇2, 𝑤ℎ𝑒𝑟𝑒 𝐸(𝑥2)
= ∫ 𝑥2𝑓(𝑥)𝑑𝑥∞
−∞
Properties of Expected Values
1. Expectation of a constant k is k
i.e E(k) = k for any constant k.
2. Expectation of sum of two random variables is the sum of their expectations.
i.e E(𝑥 + 𝑦) = E(𝑥) + ∑(𝑦) for any two random variables and 𝑥 and 𝑦
3. Expectation of the product of a constant and a random variable is the product of the constant and
the expectation of the random variable.
i.e E(𝑘𝑥) = kE(𝑥) for any constant k …… (13.53)
4. Expectation of the product of two random variables is the product of the expectation of the two
random variables, provided the two variables are independent.
i.e 𝐸(𝑥𝑦) = 𝐸(𝑥) × 𝐸(𝑦) , whenever 𝑥 and 𝑦 are independent.
SSA Statistics 3.18
Example
Property -1:
X 1 1 1
𝑝(𝑥) ¼ 2/4 ¼
𝐸(𝑥) = ∑ 𝑝𝑥 = (1 ×1
4) + (1 ×
2
4) + (1 ×
1
4) =1
x 2 2 2
𝑝(𝑥) ¼ 2/4 ¼
𝐸(𝑥)= ∑ 𝑝𝑥 = (2 ×1
4) + (2 ×
2
4) + (2 ×
1
4) = 2
Hence, 𝐸(𝑘) = 𝑘𝑥. 𝐸(𝑥)
Property – 3 Toss a coin A- Getting Head
X 0 1
𝑃(𝑋 = 𝑥) ½ ½
𝐸(𝑥) = ∑ 𝑝𝑥 =(0 ×1
2) + (1 ×
1
2) =
1
2
Let k = 2
𝑘𝑥 0 2
p(𝑘𝑥) ½ ½
𝐸(𝑘𝑥) = ∑ 𝑝(𝑘𝑥) =(0 ×1
2) + (2 ×
1
2) = 1
Hence, 𝐸(𝑘𝑥) = 𝐸(𝑥)
Property - 2
𝐸(𝑥) =∑ 𝑝𝑥 =((2 ×1
2) + (−1 ×
1
2) = 1/ 2 = 0.5
𝐸(𝑦) =∑ 𝑝𝑥 =((2 ×1
2) + (−1 ×
1
2) = 1/ 2 = 0.5
Consider
(𝑥 + 𝑦) 4 -2.5
𝑝(𝑥 + 𝑦) ½ ½
𝐸(𝑥 + 𝑦) = ∑ 𝑝𝑥 (𝑥 + 𝑦)
= (4 ×1
2) + (−2.5 ×
1
2) =0.75
Hence, 𝐸(𝑥 + 𝑦) = 𝐸(𝑥) + 𝐸(𝑦)
A P L
𝑥 2 -1
𝑝(𝑥) ½ ½
B P L
𝑦 2 -1.5
𝑝(𝑦) ½ ½
Property – 4
Toss a coin
A- Getting Head
X 0 1
𝑃(𝑋 = 𝑥) ½ ½
𝐸(𝑥) = ∑ 𝑝𝑥 = (0 ×1
2) + (1 ×
1
2) =
1
2
B - Getting Tail
Y 0 1
𝑃(𝑌 = 𝑦) ½ ½
𝐸(𝑦) = ∑ 𝑝𝑦 = (0 ×1
2) + (1 ×
1
2) =
1
2
Consider
𝑥 × 𝑦 0 1
𝑃(𝑥 × 𝑦) ¼ ¼
𝐸(𝑥𝑦) = ∑ 𝑝(𝑥 × 𝑦) = (0 ×1
4) + (1 ×
1
4) =
1
4
𝐸(𝑥) × 𝐸(𝑦) = ½ ×½ = ¼
Question 27: An unbiased coin is tossed three times. Find the expected value of the number of heads
and also its standard deviation.
Answer: If x denotes the number of heads when an unbiased coin is tossed three times, then the
probability distribution of x is given by
X: 0 1 2 3
P: 1
8
3
8
3
8
1
8
The expected value of x is given by
𝜇 = E(x) = ∑ 𝑃𝑖𝑋𝑖 = 1
8× 0 +
3
8× 1 +
3
8× 2+
1
8× 3= 1.50
Also = E(𝑋2) = ∑ 𝑃𝑖𝑥𝑖2 =
1
8× 02 +
3
8× 12 +
3
8× 22 +
1
8× 32 = 3
𝜎2 = E(𝑋2) − 𝜇2 = 3 – (1.50)2 =0.75
SSA Statistics 3.19
∴SD, 𝜎 = 0.87
Question 28: A random variable has the following probability distribution:
X: 4 5 7 8 10
P: 0.15 0.20 0.40 0.15 0.10
Find E[𝑋 − 𝐸(𝑋)]2 . Also obtain v(3x – 4 )
Answer: The expected value of x is given by
E(x) = ∑ 𝑃𝑖𝑋𝑖 = 0.15 ×4+0.20×5 + 0.40 × 7 + 0.15 × 8 + 0.10 × 10 = 6.60
Also, E[𝑋 − 𝐸(𝑋)]2 = ∑ 𝜇𝑖2𝑃𝑖 where = 𝜇𝑖 = 𝑋𝑖 − 𝐸(𝑋)
Let y = 3X – 4 = (-4) +(3)x. then Variance of y= var y = 𝑏2 × 𝜎𝑥̅2 = 9× 𝜇𝑥̅
2
Table 13.1
Computation of E[𝑿 − 𝑬(𝑿)]𝟐
𝑿𝒊 𝑷𝒊 𝝁𝒊 = 𝑿𝒊 − 𝑬(𝑿) 𝝁𝒊𝟐 𝝁𝒊
𝟐𝑷𝒊
4 0.15 -2.60 6.76 1.014
5 0.20 -1.60 2.56 0.512
7 0.40 0.40 0.16 0.064
8 0.15 1.40 1.96 0.294
10 0.10 3.40 11.56 1.156
Total 1.00 - - 3.040
Thus E[𝑋 − 𝐸(𝑋)]2 = 3.04
As 𝜇𝑥̅2 = 3.04, v(y) =9×3.04 = 27.36
Question 29: In a business venture, a man can make a profit of Rs. 50,000 or incur a loss of ₹20,000.
The probabilities of making profit or incurring loss, from the past experience, are known to be 0.75
and 0.25 respectively. What is his expected profit?
Answer: If the profit is denoted by x, then we have the following probability distribution of x:
X: ₹50,000 ₹-20,000
P: 0.75 0.25
Thus, his expected profit
E(X) = 𝑃1𝑋1 + 𝑃2𝑋2 = 0.75 ×₹50,000 + 0.25×₹-20,000 =₹32,500
Question 30 A box contains 12 electric lamps of which 5 are defectives. A man selects three lamps at
random. What is the expected number of defective lamps in his selection?
Answer: Let x denote the number of defective lamps x can assume the values 0, 1, 2 and 3. P(x = 0) =
Prob. of having 0 defective out of 5 defectives and 3 non defective out of 7 non defectives.
= 5𝐶0×7𝐶3
12𝐶3 =
35
220
SSA Statistics 3.20
Similarly P(x=1) = 5𝐶1×7𝐶2
12𝐶3 =
105
220
P(x=2) = 5𝐶2×7𝐶1
12𝐶3 =
70
220
And P(x=3) = 5𝐶3×7𝐶0
12𝐶3 =
10
220
Probability Distribution of No. of Defective Lamp
X: 0 1 2 3
P: 35 105 70 10
220 220 220 220
Thus the expected number of defectives is given by
35
220× 0 +
105
220× 1 +
70
220× 2 +
10
220× 3 =1.25
Question 31: Moidul draws 2 balls from a bag containing 3 white and 5 Red balls. He gets ₹500 if he
draws a white ball and ₹200 if he draws a red ball. What is his expectation? If he is asked to pay ₹400
for participating in the game, would he consider it a fair game and participate?
Answer: We denote the amount by x. Then x assumes the value 2 x ₹500 i.e. ₹1000 if 2 white balls are
drawn, the value ₹500 + ₹200 i.e. ₹700 if 1 white and 1 red balls are drawn and the value 2 x ₹200 i.e.
₹400 if 2 red balls are drawn. The respective probabilities are given by
P(WW) = 3𝐶2
8𝐶2 =
3
28
P(WR) = 3𝐶1×5𝐶1
8𝐶2 =
15
28
And = 5𝐶2
8𝐶2
Probability Distribution of x
X: ₹1000 ₹700 ₹400
P: 3
28
15
28
10
28
Hence E(X) = 3
28× 1000 ×
15
28× 700
10
28× 400 = ₹625 > 400.
Therefore, the game is fair and he would participate.
Question 32: A number is selected at random from a set containing the first 100 natural numbers and
another number is selected at random from another set containing the first 200 natural numbers. What
is the expected value of the product?
Answer: We denote the number selected from the first set by x and the number selected from the
second set by y. Since the selections are independent of each other, the expected value of the product
is given by E(xy) =E(x) ×E(y)
Now x can assume any value between 1 to 100 with the same probability 1
100 and any value between 1
to 200 with the same probability 1
200 , the probability distribution of x is given by
SSA Statistics 3.21
X: 1 2 ….. 3
P: 1
100
1
100 ……. 1
100
E(x) = 1
100× 1
1
100× 2 +
1
100× 3 + ⋯
1
100× 100
= 1+2+3+⋯+100
100
= 100×101
2×100 [Since 1+2+….+n =
𝑛(𝑛+1)
2]
= 101
2
X: 1 2 ….. 200
P: 1
200
1
200 ……. 1
200
E(y) = 201
2
∴ E(xy) = 101
2×
201
2 = 5075.25
Question 33: A dice is thrown repeatedly till a 'six' appears. Write down the sample space. Also find
the expected number of throws.
Answer: Let p denote the probability of getting a six and q = 1 – p, the probability of not getting a six.
If the dice is unbiased then
p= 1
6 and q =
5
6
If a six obtained with the very first throw then the experiment ends and the probability of getting a six,
as we have already seen, is p. However, if the first throw does not produce a six, the dice is thrown
again and if a six appears with the second throw, the experiment ends. The probability of getting a six
preceded by a non–six is qp. If the second thrown does not yield a six, we go for a third throw and if
the third throw produces a six, the experiment ends and the probability of getting a Six in the third
attempt is q2p. The experiment is carried on and we get the following countably infinite sample space.
S = { p, qp, q2p, q3p, …..}
If x denotes the number of throws necessary to produce a six, then x is a random variable with the
following probability distribution:
X 1 2 3 4 …..
P P qp 𝑞2𝑝 𝑞3𝑝
E(x) = p× 1 + 𝑞𝑝 + 2 + 𝑞2𝑝 × 3 + 𝑞3𝑝 × 4 + ⋯
= p(1+2q+3𝑞2+4𝑞3+….)
=p(1 − 𝑞)−2
=𝑝
𝑝2
=1
𝑝
In case of an unbiased dice, p = 1
6
Question 34: A random variable x has the following probability distribution:
X 0 1 2 3 4 5 6 7
P(X) 0 2k 2k k 2k k2 7𝑘2 2𝑘2 + 4
Find
I.The value of k
SSA Statistics 3.22
II.P(x<3)
III.P(x≥ 4)
IV.P(2<x≥ 5)
Answer:
∑ 𝑃(𝑥) =1
⟹ 0+2k+3k+k+2k+𝑘2 + 7𝑘2 + 2𝑘2 + 𝑘 = 1
⟹ 10𝑘2 + 9𝑘 − 1 = 0
⟹ (k+1) (10k-1) =0
⟹ k=1
10
I. Thus the value of k is 0.10
II. P(x<3) = P(x=0) +P(x=1) +P(x=2) = 0+2k+3k = 5k = 0.50
III. P(x≥ 4) = P(x=4) + P(x=5) +P(x=6) +P(x=7) = 2k+𝑘2 + 7𝑘2 + (2𝑘2 + 𝑘)
=10𝑘2 + 3𝑘
=10× (0.10)2 + 3 × 0.10
=0.40
IV. P(2<x≤ 5)= P(x=3)+P(x=4)+P(x=5) = k+2k+𝑘2 = 𝑘2+3k = (0.10)2 + 3×0.10 = 0.31
Extra problems on Multiplication Theorem
Question 1. A man wants to marry a girl having qualities: White complexion the probability of getting
such girl is 1 in 20. Handsome dowry - the probability of getting is 1 in 50. Westernised style - the
probability is 1 in 100.Find out the probability of his getting married to such a girl, who has all the
three qualities.
Answer :
The probability of a girl with white complexion = 1
20 = 0.05
The probability of a girl with handsome dowry = 1
50 = 0.02
The probability of a girl with westernised style = 1
100= 0.01
Since the events are independent, the probability of simultaneous occurrence of all three qualities
=1
20×
1
50×
1
100=0.00001
Question 2: Suppose it is 11 to 5 against a person who is now 38 years of age living till he is 73 and 5
to 3 against B who is 43 Living till he is 78, find the chance that at least one of these persons will be
alive 35 years hence.
Answer:
The probability that A will die within 35 years = 11
16
The probability that B will die within 35 years = 5
8
The probability that both of them will die within 35 years = 11
16 ×
5
8 =
55
128
SSA Statistics 3.23
The probability that both of them will not die i.e. atleast one of them will be alive = [1 - 55
128]
= 57%
SSA Statistics 4.1
4. Correlation and Regression
Introduction
Necessity – to study / analyse more than a variable
Nature of Variables: uni-variate, bi-variate, tri-variate or more
Example:
1. Univariate – Distribution of height, weight, mark, profit, wage
2. Bivariate – to know what amount of investment (x) would yield a desired level of profit (y)
Bivariate Data – Data collected on two variables simultaneously.
Bivariate Frequency Distribution – The distribution constructed for the bivariate data.
Points to Ponder:
1. Also known as joint frequency distribution / two way classification table.
2. Horizontal classification – for ′𝑥′ and Vertical classification – for ′𝑦′
Marks in
Statistics
𝒚
𝒙 Marks in Mathematics
0 – 4 4 – 8 8 – 12 12 – 16 16 – 20 Total
0 – 4 1 1 2 0 0 4
4 – 8 2 4 5 1 1 13
8 – 12 0 2 4 6 1 13
12 – 16 0 1 3 2 5 11
16 – 20 0 0 1 5 3 9
Total 3 8 15 14 10 50
Here, 𝑓𝑖𝑗 is the cell frequency for 𝑖𝑡ℎ row & 𝑗𝑡ℎ column. (𝑓12 = 1 is the number of students who has
secured the marks between 0 – 4 in statistics & marks between 4 – 8 in Maths).
Marginal Distribution
The distribution of any one of the variable.
It is a univariate Distribution
The means & S.D are called as Marginal mean & Marginal SD respectively.
Conditional Distribution
The distribution of a variable w.r.t a condition.
It is a univariate Distribution.
In general (m+n) conditional distributions exists.
SSA Statistics 4.2
Marginal Distribution Conditional Distribution
Marks in
Statistics
(x)
No. of
Students
0 – 4 4
4 – 8 12
8 – 12 14
12 – 16 11
16 – 20 9
Total 50
Marks in
Maths
(𝒚)
No. of
student
0 – 4 3
4 – 8 8
8 – 12 15
12 – 16 14
16 – 20 10
Total 50
Marks (𝒙)
w.r.t y
in 8 – 12
No. of
Students
0 – 4 2
4 – 8 5
8 – 12 4
12 – 16 3
16 – 20 1
Total 15
Marks (𝒚)
w.r.t 𝒙
in (12 – 16)
No. of
student
0 – 4 0
4 – 8 1
8 – 12 3
12 – 16 2
16 – 20 5
Total 11
Correlation analysis: To find an association or the lack of it between the two variables x and y (above)
using different measures. It helps in planning and controlling
Examples: A car owner knows that there is a definite relationship between petrol consumed and
distance travelled.
Points to Ponder:
Cause and Effect – The influence of a third variable (x) in finding out the association (correlation) of the
other two variables (x and y), although no causal relationship exists between the two variables.
Correlation: Definition
The change in one variable is reciprocated by a corresponding change in the other variable either
directly or inversely, else are dissociated / uncorrelated / independent.
If two variables vary in such a way that movements in one are accompanied by movements in the
other, then these quantities are said to be correlated.
Types of Correlation Definition Examples
Positive correlation Directly related moves in the same
direction (either both increases/decrease)
Profit & Investment
Negative Correlation Inversely related moves in the opposite
direction (i.e. one increase & other
decrease.
Price & demand
Profits of Insurance
company & the number of
claims
Simple Correlation Only two variables under study Height & weight
Multiple Correlation three or more variables are under study
Partial Correlation A multiple correlation, where only two
variables influence each other & the others
kept constant
Linear Correlation A constant ratio between the two variables
is maintained
SSA Statistics 4.3
Non–Linear
Correlation (Curvi–
linear)
No constant ratio is maintained
Uncorrelated The movement of one making any change
in the movement of the other
Measures of Correlation
(1) Scatter Diagram
A Simple diagrammatic method
The totality of all the plotted points forms a scatter diagram
The pattern reveals the shape / nature of correlation.
Scatter Diagram
SSA Statistics 4.4
Advantages Disadvantages
Applied for any type of correlation,
both linear & curvilinear
It can distinguish different types,
but fails to measure
(2) Karl Pearson’s product Moment Correlation coefficient.
Involves the method of least squares.
The relationship should be linear only.
Definition: The ratio of co-variance between the two variable to the product of the SD of the two
variables.
𝑟 = 𝑟𝑥̅𝑦 =𝑐𝑜𝑣(𝑥, 𝑦)
𝑠𝑥̅𝑠𝑦
𝑁𝑜𝑡𝑒: 𝑐𝑜𝑣(𝑥, 𝑦) =∑(𝑥 − x̅)(𝑦 − y̅)
𝑛 𝑜𝑟
∑ 𝑥𝑦
𝑛− x̅. y̅
𝑠𝑥̅ = √∑(𝑥 − x̅)2
𝑛 𝑜𝑟 √
∑ 𝑥2
𝑛− x̅2
𝐶𝑜𝑚𝑏𝑖𝑛𝑒𝑑 𝑓𝑜𝑟𝑚𝑢𝑙𝑎: 𝑟 =𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
√𝑛 ∑ 𝑥2 − (∑ 𝑥)2 × √𝑛 ∑ 𝑦2 − (∑ 𝑦)2
In case of a bivariate frequency distribution
𝑁𝑜𝑡𝑒: 𝑐𝑜𝑣(𝑥, 𝑦) =∑ 𝑥𝑖𝑦𝑖𝑓𝑖𝑗𝑖,𝑗
𝑁− x̅. y̅ & 𝑠𝑥̅ = √
∑ 𝑓𝑖𝑜𝑥𝑖2
𝑖
𝑛− x̅2 & 𝑠𝑦 = √
∑ 𝑓𝑜𝑗𝑥𝑗2
𝑗
𝑛− x̅2
Where
𝒙𝒊 𝑚𝑖𝑑𝑣𝑎𝑙𝑢𝑒𝑠 𝑜𝑓 𝑖𝑡ℎ 𝑐𝑙𝑎𝑠𝑠 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙
𝒚𝒋 𝑚𝑖𝑑𝑣𝑎𝑙𝑢𝑒𝑠 𝑜𝑓 𝑦𝑡ℎ 𝑐𝑙𝑎𝑠𝑠 𝑖𝑛𝑡𝑒𝑟𝑣𝑎𝑙
𝒇𝒊𝒐 Marginal frequency of x
𝒇𝒐𝒋 Marginal frequency of y
𝒇𝒊𝒋 Frequency of the (𝑖, 𝑗)𝑡ℎ cell
𝑵 ∑ 𝑓𝑖,𝑗
𝑖,𝑗
= ∑ 𝑓𝑖𝑜
𝑖
= ∑ 𝑓𝑜𝑗
𝑗
Properties
a) A unit free measure – Height in inch & weight in kgs gives the correlation in number only, but not
in inches or kgs.
b) Unaffected due to change of origin & / or scale but w.r.t signs i.e. if
𝑢 =𝑥 − 𝑎
𝑏 & 𝑣 =
𝑦 − 𝑐
𝑑 then 𝑟𝑥̅𝑦 =
𝑏 𝑑
|𝑏||𝑑|𝑟𝑢𝑣
c) −1 ≤ 𝑟 ≤ 1
SSA Statistics 4.5
(3) Spearman’s Rank Correlation
To measure qualitative characteristics.
To find the level of agreement/disagreement between the two judge assessment
𝑟𝑅 = 1 −6 ∑ 𝑑2
𝑛(𝑛2 − 1), 𝑤ℎ𝑒𝑟𝑒 d = 𝑥 − 𝑦
In case of Tied Rank
𝑟𝑅 = 1 – (6 ∑ 𝑑2
𝑛(𝑛2 − 1)+
6 ∑ (𝑡𝑗
3−𝑡𝑗
12)𝑗
𝑛(𝑛2 − 1))
𝑟𝑅 = 1 −6 [∑ 𝑑𝑖
2 + ∑ (𝑡𝑗
3−𝑡𝑗
12)𝑗𝑖 ]
𝑛(𝑛2 − 1)
(𝒕𝒋) is the number of times a rank is repeated
(4) Co efficient of concurrent Deviation
A Simple & Casual method to find correlation.
The deviation is concurrent, if both the ‘+’ sign deviation has the same sign value – if the value is
more than the previous value.
‘-‘ sign – if the value in less than the previous value.
𝑟𝐶 = ±√±(2𝑐 − 𝑚)
𝑚
Here ‘c’ – number of concurrent deviations ‘m' – total number of deviations (𝑚 = 𝑛 – 1)
Note 2𝑐 − 𝑚 > 0 ⟹ 𝑇𝑎𝑘𝑒 ‘
+ ’ 𝑏𝑜𝑡ℎ 𝑖𝑛𝑠𝑖𝑑𝑒 & 𝑜𝑢𝑡𝑠𝑖𝑑𝑒
2𝑐 − 𝑚 < 0 ⟹ 𝑇𝑎𝑘𝑒 ‘
− ‘ 𝑏𝑜𝑡ℎ 𝑖𝑛𝑠𝑖𝑑𝑒 & 𝑜𝑢𝑡𝑠𝑖𝑑𝑒
Practical Problems
Question 1: Compute the correlation coefficient between x and y from the following data
n = 10, ∑ 𝑥𝑦 = 220, ∑ 𝑥2= 200, ∑ 𝑦2 = 262, ∑ 𝑥 = 40 and ∑ 𝑦= 50
Answer:
𝑟 =𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
√𝑛 ∑ 𝑥2 − (∑ 𝑥)2 × √𝑛 ∑ 𝑦2
− (∑ 𝑦)2
=10 × 220 − 40 × 50
√10 × 200 − (40)2 × √10 × 262 − (50)2= 0.91
Thus, there is a good amount of positive correlation between the two variables x and y.
Alternately
𝑥 =∑ 𝑥
𝑛=
40
10= 4 & 𝑦 =
∑ 𝑦
𝑛=
50
10= 5
SSA Statistics 4.6
Cov (𝑥, 𝑦) =∑ 𝑥𝑦
𝑛− 𝑥. 𝑦 =
220
10− 4 × 5 = 2
𝑆𝑥̅ = √∑ 𝑥2
𝑛− (𝑥)2 = √
200
10− 42 = 2 & 𝑆𝑦 = √
∑ 𝑦2
𝑛− (𝑦)2 = √
262
10− 52 = 1.0954
𝑟 =𝑐𝑜𝑣(𝑥, 𝑦)
𝑆𝑥̅ . 𝑆𝑦 =
2
2 × 1.0954= 0.91
Question 2: Find product moment correlation coefficient from the following information:
x 2 3 5 5 6 8
y 9 8 8 6 5 3
Answer: In order to find the covariance and the two standard deviation, we prepare the following table:
𝒙𝒊 𝒚𝒊 𝒙𝒊𝒚𝒊 𝒙𝒊𝟐 𝒚𝒊
𝟐
Column No (1) (2) (3) (4) (5)
Calculations (𝟑) = (𝟏) × (𝟐) (𝟒) = (𝟏)𝟐 (𝟓) = (𝟐)𝟐
2 9 18 4 81
3 8 24 9 64
5 8 40 25 64
5 6 30 25 36
6 5 30 36 25
8 3 24 64 9
∑ 29 39 166 163 279
We have
𝑥 =∑ 𝑥
𝑛=
29
6= 4.8333 & 𝑦 =
∑ 𝑦
𝑛=
39
5= 6.5
Cov (𝑥, 𝑦) =∑ 𝑥𝑦
𝑛− 𝑥. 𝑦 =
166
6− 4.8333 × 6.5 = −3.7498
𝑆𝑥̅ = √∑ 𝑥2
𝑛− (𝑥)2 = √
163
6− 4.83332 = 1.9509 & 𝑆𝑦 = √
∑ 𝑦2
𝑛− (𝑦)2 = √
279
6− 6.52 = 2.0616
Thus the correlation coefficient between x and y in given by
𝑟 =𝑐𝑜𝑣(𝑥, 𝑦)
𝑆𝑥̅ . 𝑆𝑦 =
−3.7498
1.9509 × 2.0616= −0.93
We find a high degree of negative correlation between x and y.
Question 3: The following data relate to the test scores obtained by eight salesmen in an aptitude test
and their daily sales in thousands of rupees:
Sales man 1 2 3 4 5 6 7 8
SSA Statistics 4.7
Scores 60 55 62 56 62 64 70 54
Sales 31 28 26 24 30 35 28 24
Answer: Let the scores and sales be denoted by x and y respectively. We take a, origin of x as the average
of the two extreme values i.e. 54 and 70. Hence 𝑎 = 62 similarly, the origin of y is taken as 𝑏 =24+35
2≅
30
Computation of Correlation Coefficient Between Test Scores and Sales.
Scores
(𝒙𝒊)
Sales
₹ 000
(𝒚𝒊)
𝒖𝒊 =
𝒙𝒊 − 𝟔𝟐
𝒗𝒊 =
𝒚𝒊 = 𝟑𝟎
𝒖𝒊 𝒗𝒊 =
(3)×(4)
𝒖𝒊𝟐 =
(𝟑)𝟐
𝒗𝒊𝟐 =
(𝟒)𝟐
(1) (2) (3) (4) (5) (6) (7)
60 31 -2 1 -2 4 1
55 28 -7 -2 14 49 4
62 26 0 -4 0 0 16
56 24 -6 -6 36 36 36
62 30 0 0 0 0 0
64 35 2 5 10 4 25
70 28 8 -2 -16 64 4
54 24 -8 -6 48 64 36
Total - -13 -14 90 221 122
Since correlation coefficient remains unchanged due to change of origin, we have
𝑟 = 𝑟𝑥̅𝑦 = 𝑟𝑢𝑣 =𝑛 ∑ 𝑢𝑖𝑣𝑖 − ∑ 𝑢𝑖 × ∑ 𝑣𝑖
√𝑛 ∑ 𝑢𝑖2 − (∑ 𝑢𝑖)
2 × √𝑛 ∑ 𝑣𝑖2 − (∑ 𝑣𝑖)
2
𝑟 =8 × 90 − (−13) × (−14)
√8 × 221 − (−13)2 × √8 × 122 − (−14)2=
538
√1768 − 169 × √976 − 196= 0.48
Note: change of origin reduces the computational labor to a great extent.
Question 4: Examine whether there is any correlation between age and blindness on the basis of the
following data:
Age in years 0 –
10
10 –
20
20 –
30
30 –
40
40 –
50
50 –
60
60 –
70
70 –
80
No. of persons (in
thousands)
90 120 140 100 80 60 40 20
No. blind Persons 10 15 18 20 15 12 10 06
SSA Statistics 4.8
Answer: Let us denote the mid-value of age in years as x and the number of blind persons per lakh as
y. Then as before, we compute correlation coefficient between x and y.
Computation of correlation between age and blindness
Age in
years
Mid–value
𝒙
No. of
Persons
(‘000)
No. of
blind
No. of blind
per lakh
𝒚 =
(𝟒)
(𝟑)× 𝟏 𝒍𝒂𝒌𝒉
𝒙𝒚 =
(𝟐). (𝟓)
𝒙𝟐
(𝟐)𝟐
𝒚𝟐
(𝟓)𝟐
(1) (2) (3) (4) (5) (6) (7) (8)
0 – 10 5 90 10 11 55 25 121
10 – 20 15 120 15 12 180 225 144
20 – 30 25 140 18 13 325 625 169
30 – 40 35 100 20 20 700 1225 400
40 – 50 45 80 15 19 855 2025 361
50 – 60 55 60 12 20 1100 3025 400
70 – 80 75 20 6 30 2250 5625 900
Total 320 - - 150 7090 17000 3120
The correlation coefficient between age and blindness is given by
𝑟 =𝑛 ∑ 𝑥𝑦 − ∑ 𝑥. ∑ 𝑦
√𝑛 ∑ 𝑥2 − (∑ 𝑥)2 × √𝑛 ∑ 𝑦2 − (∑ 𝑦)2=
8 × 7,090 − 320 × 150
√8 × 17,000 − (320)2 × √8 × 3120 − (150)2= 0.96
Which exhibits a very high degree of positive correlation between age and blindness.
Note: There may be some confusion about selecting the pair of variables for which correlation is wanted.
Question 5: Coefficient of correlation between x and y for 20 items is 0.4. The AM’s and SD’s of x and y
are known to be 12 and 15 and 3 and 4 respectively. Later on, it was found that the pair (20, 15) was
wrongly taken as (15, 20). Find the correct value of the correlation coefficient.
Answer: 𝑊𝑒 𝑎𝑟𝑒 𝑔𝑖𝑣𝑒𝑛 𝑡ℎ𝑎𝑡 𝑛 = 20 𝑎𝑛𝑑 𝑡ℎ𝑒 𝑜𝑟𝑖𝑔𝑖𝑛𝑎𝑙 𝑟 = 0.4, 𝑥 = 12, 𝑦 = 15, 𝑆𝑥̅ = 3 𝑎𝑛𝑑 𝑆𝑦 = 4
𝑟 =𝑐𝑜𝑣(𝑥, 𝑦)
𝑆𝑥̅ × 𝑆𝑦
0.4 =𝑐𝑜𝑣(𝑥, 𝑦)
3 × 4= 𝑐𝑜𝑣(𝑥, 𝑦) = 4.8
∑ 𝑥𝑦
𝑛− 𝑥. 𝑦 = 4.8,
∑ 𝑥𝑦
20− 12 × 15 = 4.8 𝑎𝑛𝑑 ∑ 𝑥𝑦 = 3696
𝐻𝑒𝑛𝑐𝑒, 𝑐𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑 ∑ 𝑥𝑦 = 3696 − 20 × 15 + 15 × 20 = 3696
Also, 𝑆𝑥̅2 =
∑ 𝑥2
20− 122 = 9 𝑎𝑛𝑑 ∑ 𝑥2 = 3060
𝑆𝑖𝑚𝑖𝑙𝑎𝑟𝑙𝑦, 𝑆𝑦2 =
∑ 𝑦2
20− 152 = 16 𝑎𝑛𝑑 ∑ 𝑦2 = 4820
SSA Statistics 4.9
Thus corrected ∑ 𝑥 = 𝑛𝑋 − 𝑊𝑟𝑜𝑛𝑔 𝑣𝑎𝑙𝑢𝑒 + 𝐶𝑜𝑟𝑟𝑒𝑐𝑡 𝑉𝑎𝑙𝑢𝑒
Corrected ∑ 𝑥 = 20 × 12 − 15 + 20 = 245
Corrected ∑ 𝑦 = 20 × 15 − 20 + 15 = 295
𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑 ∑ 𝑥2 = 3060 − 152 + 202 = 3235
𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑 ∑ 𝑦2 = 4820 − 202 + 152 = 4645
Thus corrected value of the correlation coefficient by applying formula
20 × 3696 − 245 × 295
√20 × 3235 − (245)2 × √20 × 4645 − (295)2
=73920 − 72275
68.3740 × 76.6480= 0.31
Question 6: Compute the coefficient of correlation between marks in Statistics and Mathematics for the
bivariate frequency distribution shown in Table given below
Answer: For the sake of computational advantage, we effect change of origin and scale for both the
variable x and y.
𝐷𝑒𝑓𝑖𝑛𝑒 𝑢𝑖 =𝑥𝑖 − 𝑎
𝑏=
𝑥𝑖 − 10
4 𝑎𝑛𝑑 𝑣𝑗 =
𝑦𝑖 − 𝑐
𝑏=
𝑦𝑖 − 10
4
Computation of Correlation Coefficient Between Marks of Mathematics and Statistics
CI 0 – 4 4 – 8 8 – 12 12 – 16 16 – 20
𝑚 2 6 10 14 18
CI 𝑚 𝑣𝑗
𝑢𝑖` –2 –1 0 1 2 𝑓𝑖𝑜 𝑓𝑖𝑜𝑢𝑖 𝑓𝑖𝑜𝑢𝑖
2 𝑓𝑖𝑗𝑢𝑖𝑣𝑗
0 – 4 2 –2 1 [4] 1 [2] 2 [0] 4 –8 16 6
64 – 8 6 –1 2 [4] 4 [4] 5 [0] 1 [−1] 1 [−2] 13 –13 13 5
8 – 12 10 0 2 [0] 4 [0] 6 [0] 1 [0] 13 0 0 0
12 – 16 14 1 1 [1] 3 [0] 2 [2] 5 [10] 11 11 11 11
16 – 20 18 2 1 [0] 5 [10] 3 [12] 9 18 36 22
𝑓𝑜𝑗 3 8 15 14 10 50 5 76 44
𝑓𝑜𝑗𝑣𝑗 –6 –8 0 14 20 20
𝑓𝑜𝑗𝑣𝑗2 12 8 0 14 40 74
𝑓𝑖𝑗𝑢𝑖𝑣𝑗 8 5 0 11 20 44 Check
A single formula for computing correlation coefficient from bivariate frequency distribution is given by
SSA Statistics 4.10
𝑟 =𝑁 ∑ 𝑓𝑖𝑗𝑢𝑖𝑣𝑗 − ∑ 𝑓𝑖𝑜𝑢𝑖 × ∑ 𝑓𝑜𝑗𝑣𝑗𝑖,𝑗
√𝑁 ∑ 𝑓𝑖𝑜𝑢𝑖2 − (∑ 𝑓𝑖𝑜𝑢𝑖)
2 × ∑ 𝑓𝑜𝑗𝑣𝑗2 − (∑ 𝑓𝑜𝑗𝑣𝑗)
2=
50 × 44 − 8 × 20
√50 × 76 − 82√50 × 74 − 202=
2040
61.12 × 57.45
= 0.58
The value of r shown a good amount of positive correlation between the marks in Statistics and
Mathematics on the basis of the given data.
Question 7: Given that the correlation coefficient between x and y is 0.8, write down the correlation
coefficient between u and v where
1. 2u + 3x + 4 = 0 and 4v + 16y + 11 = 0
2. 2u – 3x + 4 = 0 and 4v + 16y + 11 = 0
3. 2u – 3x + 4 = 0 and 4v – 16y + 11 = 0
4. 2u + 3x + 4 = 0 and 4v – 16y + 11 = 0
Answer: change of origin and scale have no impact in value but affects the sign
𝑟𝑥̅𝑦 =𝑏𝑑
|𝑏||𝑑|𝑟𝑢𝑣
𝑟𝑥̅𝑦 = 𝑟𝑢𝑣 𝑖𝑓 𝑏 𝑎𝑛𝑑 𝑑 𝑎𝑟𝑒 𝑠𝑎𝑚𝑒 𝑠𝑖𝑔𝑛
𝑟𝑥̅𝑦 = −𝑟𝑢𝑣 𝑖𝑓 𝑏 𝑎𝑛𝑑 𝑑 𝑎𝑟𝑒 𝑜𝑝𝑝𝑜𝑠𝑖𝑡𝑒 𝑠𝑖𝑔𝑛𝑠
𝐼𝑛 (1), 𝑢 = −2 −3
2𝑥 𝑎𝑛𝑑 𝑣 = −
11
4− 4𝑦 ℎ𝑒𝑛𝑐𝑒 𝑟𝑢𝑣 = 0.8
𝐼𝑛 (2), 𝑢 = −2 +3
2𝑥 𝑎𝑛𝑑 𝑣 = −
11
4− 4𝑦 ℎ𝑒𝑛𝑐𝑒 𝑟𝑢𝑣 = −0.8
𝐼𝑛 (3), 𝑢 = −2 +3
2𝑥 𝑎𝑛𝑑 𝑣 = −
11
4+ 4𝑦 ℎ𝑒𝑛𝑐𝑒 𝑟𝑢𝑣 = 0.8
𝐼𝑛 (4), 𝑢 = −2 −3
2𝑥 𝑎𝑛𝑑 𝑣 = −
11
4+ 4𝑦 ℎ𝑒𝑛𝑐𝑒 𝑟𝑢𝑣 = −0.8
Question 8: compute the coefficient of rank correlation between sales and advertisement expressed in
thousands of rupees from the following data:
Sales (𝑥𝑖) 90 85 68 75 82 80 95 70
Advertisement (𝒚𝒊) 7 6 2 3 4 5 8 1
Answer:
Computation of Rank correlation between Sales and Advertisement
(𝒙𝒊) (𝒚𝒊) Rank for (𝒙𝒊) Rank for (𝒚𝒊) 𝒅𝒊 = 𝒙𝒊 − 𝒚𝒊 𝒅𝒊𝟐
90 7 2 2 0 0
85 6 3 3 0 0
68 2 8 7 1 1
75 3 6 6 0 0
SSA Statistics 4.11
82 4 4 5 -1 1
80 5 5 4 1 1
95 8 1 1 0 0
70 1 7 8 -1 1
Total - - - 0 4
𝑟𝑅 = 1 −6 ∑ 𝑑𝑖
2
𝑛(𝑛2 − 1)= 1 −
6 × 4
8(82 − 1)= 0.95
The high positive value of the rank correlation coefficient indicates that there is a very good amount of
agreement between sales and advertisement.
Tied Rank
Question 9: Compute the coefficient of rank correlation between Eco. Marks and stats. Marks as given
below:
Economics Marks (𝑥𝑖) 80 56 50 48 50 62 60
Stats Marks (𝒚𝒊) 90 75 75 65 65 50 65
Answer:
Computation of Rank Correlation Between Eco Marks and Stats Marks with Tied Marks
Eco Mark (𝒙𝒊) Stats Mark (𝒚𝒊) Rank for Eco (𝒙𝒊) Rank for stats (𝒚𝒊) 𝒅𝒊 = 𝒙𝒊 - 𝒚𝒊 𝒅𝒊𝟐
80 90 1 1 0 0
56 75 4 2.50= 2+3
2 1.50 2.25
50 75 5.50 = 5+6
2 2.50 =
2+3
2 3 9
48 65 7 5 = 4+5+6
3 2 4
50 65 5.50 = 5+6
2 5 =
4+5+6
3 0.50 0.25
62 50 2 7 -5 25
60 65 3 5 = 4+5+6
3 -2 4
Total - - - 0 44.50
For Economics mark there is one tie of length 2 and for stats mark, there are two ties of lengths 2 and 3
respectively.
𝑟𝑅 = 1 −6 [∑ 𝑑𝑖
2 + ∑ (𝑡𝑗
3−𝑡𝑗
12)𝑗𝑖 ]
𝑛(𝑛2 − 1)= 1 −
6 × (44.50 +(23−2)+ (23−2)+ (33−3)
12)
7(72 − 1)= 0.15
Question 10: For a number of towns, the coefficient of rank correlation between the people living below
the poverty line and increase of population is 0.50. If the sum of squares of the differences in ranks
awarded to these factors is 82.50, find the number of towns.
Answer:
SSA Statistics 4.12
𝐴𝑠 𝑔𝑖𝑣𝑒𝑛 𝑟𝑅 = 0.50, ∑ 𝑑𝑖2 = 82.50.
𝑡ℎ𝑢𝑠 𝑟𝑅 = 1 −6 ∑ 𝑑𝑖
2
𝑛(𝑛2 − 1)
= 0.50 = 1 −6 × 82.50
𝑛(𝑛2 − 1)
𝑛(𝑛2 − 1) = 990 ∴ 𝑛 = 10 𝑎𝑠 𝑛 𝑚𝑢𝑠𝑡 𝑏𝑒 𝑎 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 𝑖𝑛𝑡𝑒𝑔𝑒𝑟
Question 11: While computing rank correlation coefficient between profits and investment for 10 years
of a firm, the difference in rank for a year was taken as 7 instead of 5 by mistake and the value of rank
correlation coefficient was computed as 0.80. What would be the correct value of rank correlation
coefficient after rectifying the mistake?
Answer: We are given that n = 10,
𝑟𝑅 = 0.80 and the wrong 𝑑𝑖 = 7 should be replaced by 5
𝑟𝑅 −6 ∑ 𝑑𝑖
2
𝑛(𝑛2 − 1)
0.80 = 1 −6 ∑ 𝑑𝑖
2
10(102 − 1) & ∑ 𝑑𝑖
2 = 33
𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑒𝑑 ∑ 𝑑𝑖2 = 33 − 72 + 52 = 9
Hence rectified value of rank correlation coefficient
1 −6 × 9
10 × (102 − 1)= 0.95
Question 12: Find the coefficient of concurrent deviations from the following data.
Year 1990 1991 1992 1993 1994 1995 1996 1997
Price 25 28 30 23 35 38 39 42
Demand 35 34 35 30 29 28 26 23
Answer:
Computation of Coefficient of Concurrent Deviations
Year Price Sign of deviation (a) Demand Sign of deviation (b) Product of deviation (ab)
1990 25 35
1991 28 + 34 - -
1992 30 + 35 + +
1993 23 - 30 - +
1994 35 + 29 - -
1995 38 + 29 - -
1996 39 + 26 - -
1997 42 + 23 - -
𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑝𝑎𝑖𝑟𝑠 𝑜𝑓 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛𝑠 (𝑚) = 7
SSA Statistics 4.13
𝑁𝑜. 𝑜𝑓 + 𝑠𝑖𝑔𝑛𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑝𝑟𝑜𝑑𝑢𝑐𝑡 𝑜𝑓 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 𝑐𝑜𝑙𝑢𝑚𝑛 𝑜𝑟 𝑁𝑜. 𝑜𝑓 𝑐𝑜𝑛𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛𝑠 (𝑐) = 2
𝑡ℎ𝑢𝑠 𝑟𝑐 = ±√±(2𝑐 − 𝑚)
𝑚
±√±(4 − 7)
𝑚= ±√±
(−3)
7= √
3
7= −0.65
(𝑠𝑖𝑛𝑐𝑒 2𝑐 − 𝑚
𝑚 =
−3
7 𝑤𝑒 𝑡𝑎𝑘𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑠𝑖𝑔𝑛 𝑏𝑜𝑡ℎ 𝑖𝑛𝑠𝑖𝑑𝑒 𝑎𝑛𝑑 𝑜𝑢𝑡𝑠𝑖𝑑𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑟𝑒𝑑𝑖𝑐𝑎𝑙 𝑠𝑖𝑔𝑛)
𝑇ℎ𝑢𝑠 𝑡ℎ𝑒𝑟𝑒 𝑖𝑠 𝑎 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 𝑐𝑜𝑟𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑝𝑟𝑖𝑐𝑒 𝑎𝑛𝑑 𝑑𝑒𝑚𝑎𝑛𝑑.
SSA Statistics 4.14
Regression Analysis
To predict the value of the dependent variable corresponding to a known value of the independent
variable
A statistical / Mathematical relationship between the variables that indicates the degree & direction
of the association.
Do not bring functional / Algebraic relationship
Applicable to both linear & as well as curviliner.
Points to Ponder:
1. Assumption: There exists a mathematical / Average relationship between the two variables
2. Variable ‘y’ (if influenced by 𝑥) is the dependent / Regression / Explained variable and
3. variable ‘𝑥′ - Independent / predictor / explanator
Regression Lines – The line of best fit (method of least square)
𝑦 = 𝑎 + 𝑏𝑥
𝑊ℎ𝑒𝑟𝑒, 𝑎 & 𝑏 – (𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡𝑠) 𝑟𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 &
𝑏 = 𝑏𝑦𝑥̅ = 𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛 𝑐𝑜𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑡 𝑜𝑓 𝑦 𝑜𝑛 𝑥
Regression: Normal Equations - Method of Least Squares: to minimize the Error / Residue,
𝑒𝑖 = Observed Value – Estimated Value
∑ 𝑒𝑖2 = ∑(𝑦𝑖 − �̂�𝑖)
2 = ∑(𝑦𝑖 − 𝑎 − 𝑏𝑥𝑖)2
Here 𝑦𝑖 Actual / observed value
�̂�𝑖 the estimated value of �̂�𝑖 for given 𝑥𝑖
𝑒𝑖 = (𝑦𝑖 − �̂�𝑖) Error / Residue – the difference between the observed & estimate value
Comprehensive Question: Marks of 8 students in Mathematics and statistics are given as:
SSA Statistics 4.15
Mathematics (x) 80 75 76 69 70 85 72 68
Statistics (y) 85 65 72 68 67 88 80 70
Find the following:
1. Karl Pearson’s Product Moment Correlation Co-efficient
2. Spearsman Rank Correlation Coeeficeint
3. Correlation Coefficeint of concurrent Deviation
4. Find the regression lines.
5. When marks in Mathematics is 90, what is the most likely marks in statistics?
6. When marks in Statistics is 92, what is the most likely marks in Mathematics?
Answer:
Working Note:
𝒙 𝒚 𝒙𝒚 𝒙𝟐 𝒚𝟐 𝒙 − �̅� 𝒚 − �̅� (𝒙 − �̅�) (𝒚 − �̅�)
(𝒙 − �̅�)𝟐 (𝒚 − �̅�)𝟐 (𝒙 − �̅�)𝒚 (𝒚 − �̅�)𝒙
80 85 6800 6400 7225 5.625 10.625 59.7656 31.641 112.8906 478.125 850
75 65 4875 5625 4225 0.625 -9.375 -5.8594 0.391 87.89063 40.625 -703.125
76 72 5472 5776 5184 1.625 -2.375 -3.8594 2.641 5.640625 117 -180.5
69 68 4692 4761 4624 -5.375 -6.375 34.2656 28.891 40.64063 -365.5 -439.875
70 67 4690 4900 4489 -4.375 -7.375 32.2656 19.141 54.39063 -293.125 -516.25
85 88 7480 7225 7744 10.625 13.625 144.7656 112.891 185.6406 935 1158.125
72 80 5760 5184 6400 -2.375 5.625 -13.3594 5.641 31.64063 -190 405
68 70 4760 4624 4900 -6.375 -4.375 27.8906 40.641 19.14063 -446.25 -297.5
595 595 44529 44495 44791 0 0 275.875 241.875 537.875 275.875 275.875
𝒙 𝒚 𝒖1 𝒗2 𝒖𝟐 𝒗𝟐 𝒖𝒗 𝒓𝒙 𝒓𝒚 𝒅𝟐 𝒂 𝒃 𝒂𝒃
80 85 6 9 36 81 54 2 2 0
75 65 1 -11 1 121 -11 4 8 16 - - +
76 72 2 -4 4 16 -8 3 4 1 + + +
69 68 -5 -8 25 64 40 7 6 1 - - +
70 67 -4 -9 16 81 36 6 7 1 + - -
85 88 11 12 121 144 132 1 1 0 + + +
72 80 -2 4 4 16 -8 5 3 4 - - +
68 70 -6 -6 36 36 36 8 5 9 - - +
595 595 3 -13 243 559 271 32
�̅� =∑ 𝑥
𝑛=
595
8= 74.375 𝑦 =
∑ 𝑥
𝑛=
595
8= 74.375
To find 𝑺𝒙:
1 𝒖 = (𝒙 − 𝟕𝟒) 2 𝒗 = (𝒚 − 𝟕𝟔)
SSA Statistics 4.16
Formulae 1: 𝑆𝑥̅ = √∑ 𝑥̅2
𝑛− (𝑥)2 = √
44,495
8− 74.3752 = 5.4986
Formulae 2: 𝑆𝑥̅ = √∑(𝑥̅−𝑥̅)2
𝑛= √
241.875
8= 5.4986
To find 𝑺𝒚:
Formulae 1: 𝑆𝑦 = √∑ 𝑦2
𝑛− (𝑦)2 = √
44,791
8− 74.3752 = 8.1997
Formulae 2: 𝑆𝑦 = √∑(𝑦−𝑦)2
𝑛= √
537.875
8= 8.1997
To find 𝐂𝐨𝐯 (𝒙, 𝒚):
Formulae 1: Cov (𝑥, 𝑦) =∑ 𝑥̅𝑦
𝑛− 𝑥. 𝑦 =
44,529
8− 74.375 × 74.375 = 34.4844
Formulae 2: Cov (𝑥, 𝑦) =∑(𝑥̅−𝑥̅)(𝑦−𝑦)
𝑛= Cov (𝑥, 𝑦) =
275.875
8= 34.4844
To find Karl Pearson’s Product Moment Correlation Co-efficient - r
Formulae 1: 𝑟 =𝑐𝑜𝑣(𝑥̅,𝑦)
𝑆𝑥 .𝑆𝑦 = 𝑟 =
34.4844
5.4986×8.1997 = 0.7648
Formulae 2: 𝑟 =𝑛 ∑ 𝑥̅𝑦− ∑ 𝑥̅ ∑ 𝑦
√𝑛 ∑ 𝑥̅2−(∑ 𝑥̅)2×√𝑛 ∑ 𝑦2
−(∑ 𝑦)2
= 𝑟 =8×44,529−595×595
√8×44,495−5952×√8×44,791−5952= 0.7648
Formulae 3: 𝑟 = 𝑟𝑥̅𝑦 = 𝑟𝑢𝑣 =𝑛 ∑ 𝑢𝑖𝑣𝑖−∑ 𝑢𝑖×∑ 𝑣𝑖
√𝑛 ∑ 𝑢𝑖2−(∑ 𝑢𝑖)2×√𝑛 ∑ 𝑣𝑖
2− (∑ 𝑣𝑖)2= 𝑟 =
8×271−(3×(−13)
√8×243−32×√8×559−(−13)2= 0.7648
Formulae 4: 𝑟 = √𝑏𝑦𝑥̅ × 𝑏𝑥̅𝑦 = 𝑟 = √𝑟𝑠𝑦
𝑠𝑥× 𝑟
𝑠𝑥
𝑠𝑦= 𝑟 = √1.1406 × 0.5129 = 0.7648
To find Spearsman Rank Correlation Co-efficient - 𝑟𝑅
𝑟𝑅 = 1 −6 ∑ 𝑑𝑖
2
𝑛(𝑛2 − 1)= 1 −
6 × 32
8(82 − 1)= 0.62
To find Concurrent Deviation - 𝑟𝑐
𝑟𝑐 = ±√±(2𝑐−𝑚)
𝑚= ±√±
(2×6−7)
7 = 0.84
Y on X X on Y
1 Normal Equation
𝑦 = 𝑎 + 𝑏𝑥 𝑥 = �̂� + �̂�𝑦
∑ 𝑦
= 𝑛𝑎 + 𝑏 ∑ 𝑥
595 = 8𝑎 + 595𝑏 → (1) ∑ 𝑥
= 𝑛�̂� + �̂� ∑ 𝑦
595 = 8𝑎 + 595𝑏 → (1)
∑ 𝑥𝑦
= 𝑎 ∑ 𝑥
+ 𝑏 ∑ 𝑥2
44,529 = 595𝑎 + 44,495𝑏
→ (2) ∑ 𝑥𝑦
= �̂� ∑ 𝑦
+ �̂� ∑ 𝑦2
44,529 = 595𝑎 + 44,791𝑏
→ (2)
SSA Statistics 4.17
3,54,025
= 4,760𝑎 + 3,54,025𝑏
→ (3) = ((1) × 595)
3,54,025
= 4,760𝑎 + 3,54,025𝑏
→ (3) = ((1) × 595)
3,56,232
= 4,760𝑎 + 3,55,960𝑏
→ (4) = ((2) × 8)
3,56,232
= 4,760𝑎 + 3,58,328𝑏
→ (4) = ((2) × 8)
𝑏 =
3,56,232 − 3,54,025
3,55,960 − 3,54,025
= 1.1406
= (4)
− (3)
𝑏 =
3,56,232 − 3,54,025
3,58,328 − 3,54,025
= 0.5129
= (4)
− (3)
𝑎 = −10.4571
→ 1.1406 𝑓𝑜𝑟 𝑏 𝑖𝑛 (1) 𝑎 = 36.2281
→ 0.5129 𝑓𝑜𝑟 𝑏 𝑖𝑛 (1)
𝑦 = −10.4571 + 1.1406𝑥 𝑥 = 36.2281 + 0.5129𝑦
𝑦
= −10.4571 + 1.1406(90)
= 92
𝑥 = 36.2281 + 0.5129(92)
= 83
2 Simplified Formula using Normal Equation
𝑦 = 𝑎 + 𝑏𝑥 𝑥 = �̂� + �̂�𝑦
𝑏 = 𝑏𝑦𝑥̅
=𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑛 ∑ 𝑥2 − (∑ 𝑥)2
𝑏
=8 × 44,529 − 595 × 595
8 × 44,495 − (595)2
= 1.1406
�̂� = 𝑏𝑥̅𝑦
=𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑛 ∑ 𝑦2 − (∑ 𝑦)2
𝑏
=8 × 44,529 − 595 × 595
8 × 44,791 − (595)2
= 0.5129
𝑎 = �̅� − 𝑏�̅� 𝑎
= 74.375
− 74.375(1.1406)
= 10.4571
�̂� = �̅� − �̂��̅� 𝑎
= 74.375
− 74.375(0.5129)
= 36.2281
𝑦
= −10.4571 + 1.1406(90)
= 92
𝑥 = 36.2281 + 0.5129(90)
= 83
3 Deviation Method (Deviation taken from mid value)
𝑦 = 𝑎 + 𝑏(𝑥 − �̅�) 𝑥 = �̂� + �̂�(𝑦 − �̅�)
𝑎 =
∑ 𝑦
𝑛 𝑎 =
595
8= 74.375 �̂� =
∑ 𝑥
𝑛 𝑎 =
595
8= 74.375
𝑏 = 𝑏𝑦𝑥̅
=∑(𝑥 − �̅�)𝑦
∑(𝑥 − �̅�)2
𝑏 =275.875
241.875= 1.1406
�̂� = 𝑏𝑥̅𝑦
=∑(𝑦 − �̅�)𝑥
∑(𝑦 − �̅�)2
𝑏 =275.875
537.875= 0.5129
𝑦
= 74.375
+ 1.1406(90 − 74.375)
= 92
𝑦
= 74.375
+ 0.5129(92 − 74.375)
= 83
4 Deviation Method (Deviation taken from assumed value)
𝑦 = 𝑎 + 𝑏𝑥 𝑥 = �̂� + �̂�𝑦
SSA Statistics 4.18
𝑏 = 𝑏𝑦𝑥̅ = 𝑏𝑣𝑢
=𝑛 ∑ 𝑢𝑣 − ∑ 𝑢 ∑ 𝑣
𝑛 ∑ 𝑢2 − (∑ 𝑢)2
𝑏 =8 × 271 − 3 × (−13)
8 × 243 − 32
= 1.1406
𝑏 = 𝑏𝑥̅𝑦 = 𝑏𝑣𝑢
=𝑛 ∑ 𝑢𝑣 − ∑ 𝑢 ∑ 𝑣
𝑛 ∑ 𝑣2 − (∑ 𝑣)2
𝑏 =8 × 271 − 3 × (−13)
8 × 559 − (−13)2
= 0.5129
𝑎 = �̅� − 𝑏�̅� 𝑎
= 74.375 − 1.1406 × 74.375
= −10.4571
�̂� = �̅� − �̂��̅� 𝑎 = 74.375 − 0.5129
× 74.375
= 36.2281
𝑦
= −10.4571 + 1.1406(90)
= 92
𝑥 = 36.2281 + 0.5129(92)
= 83
5 Point Slope Form
𝑦 − �̅�
= 𝑏𝑦𝑥̅ (𝑥 − �̅�)
𝑦 − 74.375 = 1.1405(90
− 74.375)
= 𝑦 = 92
𝑥 − �̅�
= 𝑏𝑥̅𝑦(𝑦 − �̅�)
𝑥 − 74.375 = 0.5129 (92
− 74.375)
= 𝑥 = 83
𝑚 = 𝑏𝑦𝑥̅ = 𝑟𝑠𝑦
𝑠𝑥̅
𝑏𝑦𝑥̅ = 0.7648 ×8.1997
5.4986= 1.1405
𝑚 = 𝑏𝑥̅𝑦 = 𝑟𝑠𝑥̅
𝑠𝑦
𝑏𝑦𝑥̅ = 0.7648 ×5.4986
8.1997= 0.5129
Properties of Regression Lines
(1) The regression coefficients remain unchanged due to a shift of origin but change due to a shift of
scale.
𝑰𝒇 𝒖 =𝒙 − 𝒂
𝒑 𝒂𝒏𝒅 𝒗 =
𝒚 − 𝒄
𝒒 𝒕𝒉𝒆𝒏 𝒃𝒚𝒙 =
𝒒
𝒑× 𝒃𝒗𝒖 𝒂𝒏𝒅 𝒃𝒙𝒚 =
𝒑
𝒒× 𝒃𝒖𝒗
Problem 1: Find out the coefficients of 𝒙 and 𝒚
𝒙 12 17 22 27 32
𝒚 24 44 55 64 84
Answer:
𝑭𝒐𝒓𝒎𝒖𝒍𝒂 𝑪𝒂𝒍𝒄𝒖𝒍𝒂𝒕𝒊𝒐𝒏 𝑨𝒏𝒔𝒘𝒆𝒓
𝑏 = 𝑏𝑦𝑥̅ =𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑛 ∑ 𝑥2 − (∑ 𝑥)2 𝑏𝑦𝑥̅ =
5 × 6,640 − 110 × 270
5 × 2,670 − (110)2 2.8
𝑏 = 𝑏𝑦𝑥̅ = 𝑏𝑣𝑢 =𝑛 ∑ 𝑢𝑣 − ∑ 𝑢 ∑ 𝑣
𝑛 ∑ 𝑢2 − (∑ 𝑢)2 𝑏𝑣𝑢 =
5 × 114 − 20 × 25
5 × 90 − (20)2 1.4
𝑰𝒇 𝒖 =𝒙 − 𝒂
𝒑 𝒂𝒏𝒅 𝒗 =
𝒚 − 𝒄
𝒒 𝒕𝒉𝒆𝒏 𝒃𝒚𝒙 =
𝒒
𝒑× 𝒃𝒖𝒗 𝒂𝒏𝒅 𝒃𝒙𝒚 =
𝒑
𝒒× 𝒃𝒗𝒖
𝒖 =𝒙 − 𝟐
𝟓 𝒂𝒏𝒅 𝒗 =
𝒚 − 𝟒
𝟏𝟎 𝒕𝒉𝒆𝒏 𝟐. 𝟖 =
𝟏𝟎
𝟓× 𝒃𝒖𝒗 𝒂𝒏𝒅 𝒃𝒗𝒖 = 𝟏. 𝟒
𝒙 𝒚 𝒖 =𝒙 − 𝟐
𝟓 𝒗 =
𝒚 − 𝟒
𝟏𝟎 𝒙𝟐 𝒚𝟐 𝒙𝒚 𝒖𝟐 𝒗𝟐 𝒖𝒗
12 24 2 2 144 576 288 4 4 4
17 44 3 4 289 1936 748 9 16 12
22 54 4 5 484 2916 1188 16 25 20
27 64 5 6 729 4096 1728 25 36 30
SSA Statistics 4.19
32 84 6 8 1024 7056 2688 36 64 48
110 270 20 25 2670 16580 6640 90 145 114
Question 2: If the relationship between two variables 𝑥 and u is 𝑢 + 3𝑥 = 10 and between two other
variables 𝑦 and 𝑣 is 2𝑦 + 5𝑣 = 25, and the regression coefficient of 𝑦 𝑜𝑛 𝑥 is known as 0.80, what would
be the regression coefficient of 𝑣 on 𝑢?
Answer:
𝑢 + 3𝑥 = 10 & 𝑢 =(𝑥 −
10
3)
−1
3
𝑎𝑛𝑑 2𝑦 + 5𝑣 = 25 & 𝑣 =(𝑦 −
25
3)
−5
2
𝑏𝑦𝑥̅ =𝑞
𝑝× 𝑏𝑣𝑢 = 0.8 =
−5/2
−1/3𝑏𝑣𝑢 𝑎𝑛𝑑 𝑏𝑣𝑢 =
2
15× 0.8 = 0.1067
(2) The two lines of regression intersect at the point 𝒙, �̅�, where x and y are the variables under
consideration.
(3) The Correlation coefficient, r is the Geometric Mean of the Regression Coefficients
𝑟 = ±√± 𝑏𝑦𝑥̅ × 𝑏𝑥̅𝑦
𝑖𝑓 𝑏𝑦𝑥̅ & 𝑏𝑥̅𝑦 𝑎𝑟𝑒 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒, 𝑟 𝑖𝑠 – 𝑣𝑒
𝑖𝑓 𝑏𝑦𝑥̅ & 𝑏𝑥̅𝑦 𝑎𝑟𝑒 𝑝𝑜𝑠𝑖𝑡𝑖𝑣𝑒 , 𝑟 𝑖𝑠 + 𝑣𝑒
Question 3: For the variables 𝑥 and 𝑦, the regression equations are given as 7𝑥– 3𝑦– 18 = 0 and
4𝑥– 𝑦– 11 = 0
1. Find the arithmetic means of 𝑥 and 𝑦.
2. Identify the regression equation of 𝑦 on 𝑥.
3. Compute the correlation coefficient between 𝑥 and 𝑦.
4. Given the variance of 𝑥 is 9, find the SD of 𝑦.
Answer:
(1) Since the two lines of regression intersect at the point (𝑥, 𝑦) replacing 𝑥 any y by 𝑥 and 𝑦
7𝑥 − 3𝑦 − 18 = 0 𝑎𝑛𝑑 4𝑥 − 𝑦 − 11 = 0
Solving these two equations, we get 𝑥 = 3 𝑎𝑛𝑑 𝑦 = 1
Thus the arithmetic means of 𝑥 and y are given by 3 and 1 respectively.
(2) Let us assume that 7𝑥– 3𝑦– 18 = 0 represents the regression line of y on 𝑥 and 4𝑥– 𝑦– 11 = 0
represents the regression line of 𝑥 on y.
𝑁𝑜𝑤 7𝑥 − 3𝑦 − 18 = 0
⟹ 𝑦 = (−6) +7
3𝑥 ∴ 𝑏𝑦𝑥̅ =
7
3
Again 4 𝑥 − 𝑦 − 11 = 0
SSA Statistics 4.20
⟹ 𝑥 =11
4+
1
4𝑦 ∴ 𝑏𝑥̅𝑦 =
1
4
𝑇ℎ𝑢𝑠 𝑟2 = 𝑏𝑦𝑥̅ × 𝑏𝑥̅𝑦 =7
3×
1
4=
7
12< 1
Since |𝑟| ≤ 1 ⟹ 𝑟2 ≤ 1, our assumptions are correct. Thus, 7𝑥 − 3𝑦 − 18 = 0 truly represents the
regression line of 𝑦 𝑜𝑛 𝑥
(3) Since 𝑟2 =7
12 ∴ 𝑟 = √
7
12= 0.7638 (we take the sign of r as positive since both the regression
coefficient are positive)
(4) 𝑏𝑦𝑥̅ = 𝑟 ×𝑆𝑦
𝑆𝑥 ⟹
7
3= 0.7638 ×
𝑆𝑦
3 (∴ 𝑆𝑥̅
2 = 9 𝑎𝑠 𝑔𝑖𝑣𝑒𝑛)
⟹ 𝑆𝑦 =7
0.7638= 9.1647
Probable Error (PE) – A method to obtain correlation coefficient of population
𝑃. 𝐸 = 0.674 ×1 − 𝑟2
√𝑁
Here r – Correlation coefficient from n pairs of sample observations.
𝑃. 𝐸 =2
3𝑆𝐸 (Where SE – Standard error of correlation coefficient)
∴ 𝑆𝐸 =1 − 𝑟2
√𝑁
Limit: 𝑃 = 𝑟 ± 𝑃. 𝐸. , P – population correlation co efficient
Assumption (as probable errors are significant)
1. 𝑟 < 𝑃𝐸, 𝑁𝑜 𝑒𝑣𝑖𝑑𝑒𝑛𝑐𝑒 𝑜𝑓 𝑐𝑜𝑟𝑟𝑙𝑒𝑎𝑡𝑖𝑜𝑛
2. 𝑟 > 𝑃𝐸, 𝑡ℎ𝑒 𝑝𝑟𝑒𝑠𝑒𝑎𝑛𝑐𝑒 𝑜𝑓 𝑟 𝑖𝑠 𝑐𝑒𝑟𝑡𝑎𝑖𝑛
3. 𝑃𝐸 𝑖𝑠 𝑛𝑒𝑣𝑒𝑟 𝑛𝑒𝑔𝑎𝑡𝑖𝑣𝑒 (𝑎𝑠 − 1 ≤ 𝑟 ≤ 1)
Points to Ponder:
1. The sampling followed is Simple Random Sampling
2. The population is normal.
Question 4: Compute the Probable Error assuming the correlation coefficient of 0.8 from a sample of 25
pairs of items.
Answer: 𝑟 = 0.8, 𝑛 = 25
Formula Calculation Answer
𝑷. 𝑬 = 𝟎. 𝟔𝟕𝟒 ×𝟏 − 𝒓𝟐
√𝑵 𝑃. 𝐸 = 0.674 ×
1 − 0.82
√25 0.0485
Question 5: If 𝑟 = 0.7; and 𝑛 = 64 find out the probable error of the coefficient of correlation and
determine the limits for the population correlation coefficient:
SSA Statistics 4.21
Answer: 𝑟 = 0.7, 𝑛 = 64
Formula Calculation Answer
𝑷. 𝑬 = 𝟎. 𝟔𝟕𝟒 ×𝟏 − 𝒓𝟐
√𝑵
𝑃. 𝐸
= 0.674 ×1 − 0.72
√64
0.043
𝑳𝒊𝒎𝒊𝒕𝒔 𝒇𝒐𝒓 𝒕𝒉𝒆 𝒑𝒐𝒑𝒖𝒍𝒂𝒕𝒊𝒐𝒏 𝒄𝒐𝒓𝒓𝒆𝒍𝒂𝒕𝒊𝒐𝒏 𝒄𝒐𝒆𝒇𝒇𝒊𝒄𝒊𝒆𝒏𝒕
= 𝒓 ± 𝑷. 𝑬 0.7 ± 0.043 0.743, 0.657
Karl Pearsons Coefficient and Coefficient of Determination
Limitation: 𝑟 = 0, need not imply the relationship to be independent or uncorrelated.
Example: For the set of values (−2, 4), (−1,1), (0,0), (1,1) & (2,4)
Cov (𝑥, 𝑦) (−2 × 4) + (−1 × 1) + (0 × 0) + (1 × 1) + (2 × 4) 0
∴ r (𝑎𝑠 𝑥 = 0) 0
But, the non-linear relationship between 𝑥 & 𝑦 is 𝑦 = 𝑥2 and Then 𝑥 & 𝑦 are not independent
Correlation coefficient measuring a linear relationship between the two variables indicates the amount
of variation of one variable accounted for by the other variable. A better measure for this purpose is
provided by the square of the correlation coefficient, Known as ‘coefficient of determination’.
Coefficient of Determination (a better measure)
Description Formula 𝑰𝒇 𝒓 = 𝟎. 𝟔
Calculation Answer
1 Coefficient of determination (by the
factor) 𝑟2 =
𝐸𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝑇𝑜𝑡𝑎𝑙 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 0.62% 36%
2 Coefficient of Non – Determination
(by the other factor)
(1 − 𝑟2)
=𝑈𝑛𝑒𝑥𝑝𝑙𝑎𝑖𝑛𝑒𝑑 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
𝑇𝑜𝑡𝑎𝑙 𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒
(1 − 0.62)% 64%
Correlation & Regression
𝒓 Correlation Regression
𝒓 = 𝟏 Perfect +ve correlation the two
𝒓 = −𝟏 Perfect –ve correlation Regression times coincide
𝒓 = 𝟎 Not Correlated Lines are perpendicular to each other
Views of different persons
Ya-lun
Chou
“There are two related but distinct aspects of the study of association between
variables. Correlation analysis and regression analysis. Correlation analysis has the
objective of determining the degree or strength of the relationship between variables.
Regression analysis attempts to establish the nature of the relationship between
SSA Statistics 4.22
variables – that is, to study the functional relationship between the variables and
thereby provide a mechanism of prediction, or forecasting.”
Croxton
and
Cowden
“when relationship between two variables is of quantitative nature the appropriate
statistical tool for measuring and expressing it in formula is known as correlation.
Thus correlation is a statistical device which helps in analyzing the relationship and
also the covariation of two or more variables.
Simpson
and Kafta
“correlation analysis deals with the association between two or more variables.”
Formula
1 𝒓 =𝒄𝒐𝒗(𝒙, 𝒚)
𝑺𝒙 . 𝑺𝒚
𝑟 =
∑ 𝑥̅𝑦
𝑛− (
∑ 𝑥̅
𝑛) (
∑ 𝑦
𝑛)
√∑ 𝑥̅2
𝑛− (
∑ 𝑥̅
𝑛)
2
. √∑ 𝑦2
𝑛− (
∑ 𝑦
𝑛)
2
𝒓 =𝒏 ∑ 𝒙𝒚 − ∑ 𝒙 ∑ 𝒚
√𝒏 ∑ 𝒙𝟐 − (∑ 𝒙)𝟐 × √𝒏 ∑ 𝒚𝟐
− (∑ 𝒚)𝟐
𝒃𝒚 𝒕𝒂𝒌𝒊𝒏𝒈 𝑳𝑪𝑴
𝒃𝒚𝒙 × 𝒃𝒙𝒚 = 𝒓𝒔𝒚
𝒔𝒙
× 𝒓𝒔𝒙
𝒔𝒚
= 𝒓𝟐 𝒐𝒓 𝒓 = √𝒃𝒚𝒙 × 𝒃𝒙𝒚
𝐜𝐨𝐯(𝐱, 𝐲) = 𝐫𝒔𝒙 𝒔𝒚
𝐂𝐨𝐯 (𝒙, 𝒚) =∑(𝒙 − 𝒙)(𝒚 − 𝒚)
𝒏 𝒐𝒓 𝐂𝐨𝐯 (𝒙, 𝒚) =
∑ 𝒙𝒚
𝒏− 𝒙. 𝒚 𝒐𝒓
∑ 𝒙𝒚
𝒏− (
∑ 𝒙
𝒏) (
∑ 𝒚
𝒏)
𝑺𝒙 = √∑(𝒙 − 𝒙)𝟐
𝒏 𝒐𝒓 𝑺𝒙 = √
∑ 𝒙𝟐
𝒏− (𝒙)𝟐 𝒐𝒓 √
∑ 𝒙𝟐
𝒏− (
∑ 𝒙
𝒏)
𝟐
𝑺𝒚 = √∑(𝒚 − 𝒚)𝟐
𝒏 𝒐𝒓 𝑺𝒚 = √
∑ 𝒚𝟐
𝒏− (𝒚)𝟐 𝒐𝒓 √
∑ 𝒚𝟐
𝒏− (
∑ 𝒚
𝒏)
𝟐
𝒓 = 𝒓𝒙𝒚 = 𝒓𝒖𝒗 =
𝒏 ∑ 𝒖𝒊𝒗𝒊 − ∑ 𝒖𝒊 × ∑ 𝒗𝒊
√𝒏 ∑ 𝒖𝒊𝟐 − (∑ 𝒖𝒊)
𝟐 × √𝒏 ∑ 𝒗𝒊𝟐 − (∑ 𝒗𝒊)
𝟐
𝒓 = 𝒓𝒙𝒚 = 𝒓𝒖𝒗 =𝒏 ∑ 𝒖𝒊𝒗𝒊
√𝒏 ∑ 𝒖𝒊𝟐 × √𝒏 ∑ 𝒗𝒊
𝟐
𝑾𝒉𝒆𝒓𝒆 𝒖 = (𝒙 − 𝒙) & 𝒗 = (𝒚 − �̅�)
2 𝒓𝑹 = 𝟏 −𝟔 ∑ 𝒅𝒊
𝟐
𝒏(𝒏𝟐 − 𝟏)
3 𝒓𝒄 = ±√±(𝟐𝒄 − 𝒎)
𝒎
SSA Statistics 4.23
Y on X X on Y
1 Normal Equation
𝒚 = 𝒂 + 𝒃𝒙 𝒙 = �̂� + �̂�𝒚
∑ 𝒚 = 𝒏𝒂 + 𝒃 ∑ 𝒙 → (𝟏) ∑ 𝒙 = 𝒏�̂� + �̂� ∑ 𝒚 → (𝟏)
∑ 𝒙𝒚 = 𝒂 ∑ 𝒙 + 𝒃 ∑ 𝒙𝟐 → (𝟐) ∑ 𝒙𝒚 = �̂� ∑ 𝒚 + �̂� ∑ 𝒚𝟐 → (𝟐)
𝐶𝑜𝑛𝑠𝑖𝑑𝑒𝑟, 𝑦 = 𝑎 + 𝑏𝑥 → 𝐴
When X = x1, y1 = a + bx1
When X = x2, y2 = a + bx2
.
.
.
When X = xn, yn = a + bxn
Summing up, ∑ y = na + b ∑ x → (1)
X. x1, y1x1 = ax1 + bx12
X. x2, y2x2 = ax2 + bx22
.
.
.
X. xm, ymyn = axm + bxn2
Summing up, ∑ xy = a ∑ x + b ∑ x2 → (2)
2 Simplified Formula using Normal Equation
𝒚 = 𝒂 + 𝒃𝒙 𝒙 = �̂� + �̂�𝒚
𝒃 = 𝒃𝒚𝒙 =
𝒏 ∑ 𝒙𝒚 − ∑ 𝒙 ∑ 𝒚
𝒏 ∑ 𝒙𝟐 − (∑ 𝒙)𝟐 �̂� = 𝒃𝒙𝒚 =
𝒏 ∑ 𝒙𝒚 − ∑ 𝒙 ∑ 𝒚
𝒏 ∑ 𝒚𝟐 − (∑ 𝒚)𝟐
𝒂 = �̅� − 𝒃𝒙 �̂� = �̅� − �̂�𝒙
∑ 𝑦 = 𝑛𝑎 + 𝑏 ∑ 𝑥 → (1)
∑ 𝑥𝑦 = 𝑎 ∑ 𝑥 + 𝑏 ∑ 𝑥2 → (2)
∑ 𝑥 ∑ 𝑦 = 𝑎𝑛 ∑ 𝑥 + 𝑏 ∑ 𝑥 ∑ 𝑥 → (3) = (1) × ∑ 𝑥
𝑛 ∑ 𝑥𝑦 = 𝑎𝑛 ∑ 𝑥 + 𝑛𝑏 ∑ 𝑥2 → (4) = (2) × 𝑛
∑ 𝑥 ∑ 𝑦 − 𝑛 ∑ 𝑥𝑦 = 0 + 𝑏 ∑ 𝑥 ∑ 𝑥 − 𝑛𝑏 ∑ 𝑥2 → (5) = (2) − (1)
𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦 = 𝑏 (𝑛 ∑ 𝑥2 − (∑ 𝑥)2
) → (6) = 𝐶ℎ𝑎𝑛𝑔𝑖𝑛𝑔 𝑡ℎ𝑒 𝑠𝑖𝑔𝑛
𝑏 =𝑛 ∑ 𝑥𝑦 − ∑ 𝑥 ∑ 𝑦
𝑛 ∑ 𝑥2 − (∑ 𝑥)2→ (6) = 𝐶ℎ𝑎𝑛𝑔𝑖𝑛𝑔 𝑡ℎ𝑒 𝑠𝑖𝑔𝑛
SSA Statistics 4.24
𝒃 = 𝒃𝒚𝒙 =𝑪𝒐𝒗(𝒙, 𝒚)
𝒔𝒙𝟐
𝒓 =𝒄𝒐𝒗(𝒙, 𝒚)
𝑺𝒙 . 𝑺𝒚
𝐜𝐨𝐯(𝐱, 𝐲) = 𝐫𝒔𝒙 𝒔𝒚
𝒃 = 𝒃𝒚𝒙 =𝐫𝒔𝒙 𝒔𝒚
𝒔𝒙𝟐
𝒃 = 𝒃𝒚𝒙 = 𝐫 𝒔𝒚
𝒔𝒙
3 Deviation Method (Deviation taken from mid value)
𝒚 = 𝒂 + 𝒃(𝒙 − 𝒙) 𝒙 = �̂� + �̂�(𝒚 − �̅�)
𝒂 =
∑ 𝒚
𝒏 �̂� =
∑ 𝒙
𝒏
𝒃 = 𝒃𝒚𝒙 =
∑(𝒙 − 𝒙)𝒚
∑(𝒙 − 𝒙)𝟐 �̂� = 𝒃𝒙𝒚 =
∑(𝒚 − �̅�)𝒙
∑(𝒚 − �̅�)𝟐
4 Deviation Method (Deviation taken from assumed value)
𝒚 = 𝒂 + 𝒃𝒙 𝒙 = �̂� + �̂�𝒚
𝒃 = 𝒃𝒚𝒙 = 𝒃𝒗𝒖 =
𝒏 ∑ 𝒖𝒗 − ∑ 𝒖 ∑ 𝒗
𝒏 ∑ 𝒖𝟐 − (∑ 𝒖)𝟐 𝒃 = 𝒃𝒙𝒚 = 𝒃𝒗𝒖 =
𝒏 ∑ 𝒖𝒗 − ∑ 𝒖 ∑ 𝒗
𝒏 ∑ 𝒗𝟐 − (∑ 𝒗)𝟐
𝒂 = �̅� − 𝒃𝒙 �̂� = �̅� − �̂�𝒙
5 Point Slope Form
𝒚 − �̅� = 𝒃𝒚𝒙 (𝒙 − 𝒙) 𝒙 − 𝒙 = 𝒃𝒙𝒚(𝒚 − �̅�)
𝒎 = 𝒃𝒚𝒙 = 𝒓𝒔𝒚
𝒔𝒙
𝒎 = 𝒃𝒙𝒚 = 𝒓𝒔𝒙
𝒔𝒚
𝑦 − 𝑦1 = 𝑚(𝑥 − 𝑥1)
𝐿𝑒𝑡 (𝑥1, 𝑦1) = (�̅�, �̅�) 𝑎𝑛𝑑 𝑚 = 𝑏𝑦𝑥̅ = 𝑟𝑠𝑦
𝑠𝑥̅
𝑦 − �̅� = 𝑏𝑦𝑥̅ (𝑥 − �̅�)
𝑦 − �̅� = 𝑟𝑠𝑦
𝑠𝑥̅
(𝑥 − �̅�)
𝑦 𝑜𝑛 𝑥 (y − �̅�
𝑠𝑦
) = 𝑟 ( 𝑥 − �̅�
𝑠𝑥̅
)
𝑥 − 𝑥1 = 𝑚(𝑦 − 𝑦1)
𝐿𝑒𝑡 (𝑥1, 𝑦1) = (�̅�, �̅�) 𝑎𝑛𝑑 𝑚 = 𝑏𝑥̅𝑦 = 𝑟𝑠𝑥̅
𝑠𝑦
𝑥 − �̅� = 𝑏𝑥̅𝑦(𝑦 − �̅�)
𝑥 − �̅� = 𝑟𝑠𝑥̅
𝑠𝑦
(𝑦 − �̅�)
𝑥 𝑜𝑛 𝑦 ( 𝑥 − �̅�
𝑠𝑥̅
) = 𝑟 (y − �̅�
𝑠𝑦
)
top related