elementary statistics - m. ghamsary, ph.d

37
Elementary Statistics Dr. Ghamsary Chapter 2 Page 1 1 Elementary Statistics M. Ghamsary, Ph.D. Chapter 02

Upload: others

Post on 09-Feb-2022

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 1

1

Elementary Statistics

M. Ghamsary, Ph.D.

Chapter 02

Page 2: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 2

2

Descriptive Statistics

Grouped vs Ungrouped Data

• Ungrouped data: have not been summarized in any way are also called raw data

• Grouped data: have been organized into a frequency distribution

RRaaww DDaattaa:: When data are collected in original form, they are called rraaww ddaattaa.

The following are the scores on the first test of the statistics class in fall of 2004.

76 78 71 86 80 62 55 89 66 72 68 96 78 81 82 69 89 88 85 86 79 73 58 85 99 90 66 76 70 63 79 88 59 55 75 86 92 92 62 83 52 94 93 80 78 97 50 88 60 61

Table 2.1: Data fromTest#1 of fall 2007

Stem-and-Leaf: One method of displaying a set of data is with a stem-and-leaf plot.

Stem Leaf 5 0 2 5 5 8 9

6 0 1 2 2 3 6 6 8 9

7 0 1 2 3 5 6 6 8 8 8 9 9

8 0 0 1 2 3 5 5 6 6 6 8 8 8 9 9

9 0 2 2 3 4 6 7 9

Group Data: When the raw data is organized into a ffrreeqquueennccyy ddiissttrriibbuuttiioonn

FFrreeqquueennccyy DDiissttrriibbuuttiioonn: is the organizing of raw data in table form, using classes and

frequencies.

Page 3: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 3

3

• Class: Number of classes in the above table is 5.

• Class Limits: represent the smallest and largest data values in each class.

• LLoowweerr CCllaassss:: the lowest number in each class. In above table 50 is the lower class limit of the

first class, 60 is the lower class limit of the 2nd class, etc.

• UUppppeerr CCllaassss:: tthhee hhiigghheesstt nnuummbbeerr iinn eeaacchh ccllaassss.. In above table 59 is the upper class limit of the

first class, 69 is the upper class limit of the 2nd class, etc.

• CCllaassss WWiiddtthh:: for a class in a frequency distribution is found by subtracting the lower (or

upper) class limit of one class minus the lower (or upper) class limit of the previous class. In

above table the class width is 10.

CCllaassss BBoouunnddaarriieess are used to separate the classes so that there are no gaps in the frequency distribution.

Class Class Boundaries

Frequency

50-59 49.5-59.5 6

60-69 59.5-69.5 9

70-79 69.5-79.5 12

80-89 79.5-89.5 15

90-99 89.5-99.5 8

Class Tally Frequency50-59 6

60-69 9

70-79 12 80-89 15 90-99 8

Page 4: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 4

4

Cumulative Frequency:

Relative Frequency:

Class Frequency Cumulative Frequency

Relative Frequency

50-59 6 6 6/50=0.12

60-69 9 9+6=15 9/50=0.18

70-79 12 12+15=27 12/50=0.24

80-89 15 15+27=42 15/50=0.30

90-99 8 8+42=50 8/50=0.16

nn==5500

MMoosstt PPooppuullaarr GGrraapphhss iinn SSttaattiissttiiccss TThhee mmoosstt ccoommmmoonnllyy uusseedd ggrraapphhss iinn ssttaattiissttiiccss aarree::

11.. TThhee HHiissttooggrraamm

22.. TThhee FFrreeqquueennccyy PPoollyyggoonn..

33.. TThhee CCuummuullaattiivvee FFrreeqquueennccyy GGrraapphh

44.. TThhee BBaarr CChhaarrtt

55.. PPiiee CChhaarrtt

66.. PPaarreettoo CChhaarrttss

77.. DDoott PPlloott 88.. SStteemm--LLeeaaff 99.. TTiimmee SSeerriieess GGrraapphh

11.. TThhee HHiissttooggrraamm

o Making decisions about a process, product, or procedure that could be improved after

examining the variation (example: Should the school invest in a computer-based tutoring

program for low achieving students in Algebra I after examining the grade distribution? Are

more shafts being produced out of specifications that are too big rather than too small?)

o Displaying easily the variation in the process (example: Which units are causing the most

difficulty for students? Is the variation in a process due to parts that are too long or parts that

are too short?)

Page 5: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 5

5

Test1

Freq

uenc

y

9585756555

16

14

12

10

8

6

4

2

0

Mean 76.8StDev 12.98N 50

Histogram of Test1Normal

Test1

Perc

ent

12011010090807060504030

99

95

90

80

70

60504030

20

10

5

1

Mean 76.8StDev 12.98N 50AD 0.537P-Value 0.161

Probability Plot of Test1Normal - 95% CI

Page 6: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 6

6

2. The frequency polygon

o Making decisions about a process, product or procedure that could be improved

(example: a frequency polygon for 642 psychology test scores, shown below to the right.)

X Frequency

54.5 6

64.5 9

74.5 12

84.5 15

94.5 8

Midpoints x

f

10090807060

15.0

12.5

10.0

7.5

5.0

Scatterplot of f vs x

Page 7: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 7

7

2. The Cumulative Frequency Graph (Ogive)

Cumulative frequency is used to determine the number of observations that lie above

(or below) a particular value.

Upper Class Boundaries

Cumulative Frequency

59.5 6

69.5 15

79.5 27

89.5 42

99.5 50

Upper Class Boudaries

Cum

ulat

ive

f

10090807060

50

40

30

20

10

0

Scatterplot of Cumulative f vs x

Page 8: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 8

8

4. The bar chart

Bar charts are useful for comparing classes or groups of data. A class or group can have a

single category of data or they can be broken down further into multiple categories for

greater depth of analysis.

Class Grade Frequency 50-59 F 6

60-69 D 9

70-79 C 12

80-89 B 15

90-99 A 8

ABCDF

16

14

12

10

8

6

4

2

0

Grade

Freq

uenc

y

Page 9: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 9

9

55.. PPiiee CChhaarrtt

oo AA ppiiee cchhaarrtt iiss aa wwaayy ooff ssuummmmaarriizziinngg aa sseett ooff ccaatteeggoorriiccaall ddaattaa oorr ddiissppllaayyiinngg tthhee ddiiffffeerreenntt

vvaalluueess ooff aa ggiivveenn vvaarriiaabbllee ((eexxaammppllee:: ppeerrcceennttaaggee ddiissttrriibbuuttiioonn))..

oo PPiiee cchhaarrttss uussuuaallllyy sshhooww tthhee ccoommppoonneenntt ppaarrttss ooff aa wwhhoollee.. OOfftteenn yyoouu wwiillll sseeee aa sseeggmmeenntt ooff tthhee

ddrraawwiinngg sseeppaarraatteedd ffrroomm tthhee rreesstt ooff tthhee ppiiee iinn oorrddeerr ttoo eemmpphhaassiizzee aann iimmppoorrttaanntt ppiieeccee ooff

iinnffoorrmmaattiioonn

A8, 16.0%

B15, 30.0%

C12, 24.0%

D9, 18.0%

F6, 12.0%

Page 10: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 10

10

66.. PPaarreettoo CChhaarrttss

A Pareto chart is used to graphically summarize and display the relative importance of the differences between groups of data.

FADCB

16

14

12

10

8

6

4

2

0

Freq

uenc

y

77.. DDoott pplloott A dot plot is a visual representation of the similarities between two sequences.

Te s t 19891847770635649

D o tp lo t o f T e s t1

Page 11: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 11

11

88.. SStteemm--LLeeaaff

o The Stem-and-Leaf Plot summarizes the shape of a set of data (the distribution) and

provides extra detail regarding individual values.

o They are usually used when there are large amounts of numbers to analyze. Series of

scores on sports teams, series of temperatures or rainfall over a period of time, series of

classroom test scores are examples of when Stem and Leaf Plots could be used.

Stem Leaf 5 0 2 5 5 8 9

6 0 1 2 2 3 6 6 8 9

7 0 1 2 3 5 6 6 8 8 8 9 9

8 0 0 1 2 3 5 5 6 6 6 8 8 8 9 9

9 0 2 2 3 4 6 7 9

9. Time series Graph

NovOctSepAugJulJunMayAprMarFebJanDec

120

110

100

90

80

70

60

50

40

30

Month

AOLMSFT

Variable

Time Series Plot of AOL, MSFT

Month Price of AOL

Price of MSFT

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

65 60 58 62 55 50 48 55 57 50 48 40

110 115 120 100 95 90 85 75 80 60 50 40

Page 12: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 12

12

Type of Distributions: There are several different kinds of distributions, but the following are the most common used in

statistics.

• Symmetric , normal, or bell shape

• Positively skewed, Right tail, or skewed to the right side.

• Negatively skewed, Left tail, or skewed to the left side.

• Uniform

• Symmetric, Bell Shape, or Normal Distribution

1441261089072543618

600

500

400

300

200

100

0

Page 13: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 13

13

• Positively skewed

0.630.540.450.360.270.180.090.00

500

400

300

200

100

0

• Negatively skewed

0.990.900.810.720.630.540.450.36

500

400

300

200

100

0

Page 14: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 14

14

• Uniform

1086420

1000

800

600

400

200

0

Page 15: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 15

15

1=Female 0=Male

Male Female

Sex

0

2

4

6

8

Co

un

t

GradeFDCBA

F D C B A

Grade

0

3

6

9

12

15

Count

SexMaleFemale

F D C B A

Grade

0

10

20

30

40

50

60

70

Percen

t

SexMaleFemale

Test1 Sex Grade Test1 Sex Grade 76 1 C 76 1 C 62 1 D 59 1 F 68 1 D 92 1 A 69 1 D 93 1 A 79 0 C 88 0 B 90 0 A 86 0 B 79 1 C 66 0 D 86 1 B 81 1 B 52 0 F 85 0 B 97 1 A 85 0 B 78 1 C 70 1 C 55 1 F 55 1 F 96 1 A 62 1 D 89 1 B 80 1 B 73 0 C 60 1 D 66 0 D 80 1 B 88 1 B 72 1 C 92 0 A 82 0 B 94 1 A 86 1 B 50 1 F 99 1 A 71 0 C 63 1 D 89 0 B 75 1 C 78 1 C 83 1 B 88 0 B 78 0 C 58 1 F 61 1 D

Page 16: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 16

16

Sex

Test

1

MaleFemale

100

90

80

70

60

50

Boxplot of Test1 vs Sex

Page 17: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 17

17

Numerical measurements:

• Sttaattiissttiicc::: any value(s) or measure(s) obtained from a sample.

• PPaarraammeetteerr:: any value(s) or measure(s) obtained from a specific population.

Measures of central tendency: are Mean, Median, and Mode,

MMeeaann is defined to be the sum of the scores in the data set divided by the total number of scores.

o Sample Mean: is denoted by x , and it is defined by:

xx

n

ii

n

= =∑

1 , or simply x

xn

= ∑ .

o Population Mean: is denoted by µ , and it is defined by:

µ = =∑ x

N

ii

N

1 , or simply x

Nµ = ∑ .

Note: The sample mean, x is an unbiased estimate of the population mean, µ .

Example1: Find the mean of 10, 7, 3, 12, 18.

x =+ + + +

=10 7 3 12 18

510 .

Example2: Find the mean of 10, 7, 3, 12, 18, 13, 17, 15, 25, 3

x =+ + + + + + + + +

= =10 7 3 12 18 13 17 15 25 30

1015010

15

Example3: Find the mean of scores in the test#1, 2004 in data set in this chapter.

76 62 78 61

57

06 8.x + + + +

= =

Page 18: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 18

18

MMeeddiiaann:: is defined to be the midpoint of the data set that is arranged from smallest to largest.

Example4: Find the median of 10, 7, 3, 12, 15.

Solution: First we must sort the data set as follows: 3, 7, 10, 12, 15.

The median is 10.

Example5: Find the median of 10, 7, 3, 12, 15, 20.

Solution: After we sort we get: 3, 7, 10, 12, 15, 20.

As we observe, there are 2 middle observations. So to find the median we average these 2 values,

namely: Median=(10+12)/2 =11.

Example6: The median of scores in the test#1, 2004 in data is 78.50

Median = 78.50

MMooddee:: is defined to be the value in the data set that occurs most frequently.

Example7A: Find the mode of 10, 7, 3, 12, 15, 3.

Mode is 3.

Example7B: Find the mode of 10, 7, 3, 10, 15, 3.

Modes are 3 and 10.

Example7C: Find the mode of 10, 7, 3, 10, 10, 3.

Mode is 10.

Example7D: Find the mode of 10, 7, 3, 10, 7, 3.

There is no mode, since all values occur with same frequency

Example7E: Find the mode of 10, 7, 3, 12, 15, 18.

There is no mode, since no values occur more than once.

Page 19: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 19

19

Example 8: Find the mean, the median, and the mode of data set:

10, 17, 13, 12, 15, 18, 10, 17, 14, 16, 35, 28, 22, 17, 23, 12, 15, 28, 10, 20

Solution: First we must sort the data set

10, 10, 10, 12, 12, 13, 14, 15, 15, 16, 17, 17, 17, 18, 20, 22, 23, 28, 28, 35

o Mean: x =+ + + + + + +

= =10 10 10 12 28 28 35

2035220

17 6. . . . . .

o Median: 16 172

16 5+= . , since there are 2 middle observations

o Mode: 10 17,

Example 9: Find the mean, the median, and the mode of data set:

25, 42, 18, 37, 25, 18, 40, 57, 64, 66, 85, 86, 92

85, 88, 92, 67, 33, 75, 85, 48, 60, 80, 60, 50

Example10: Find the mean, the median, and the mode of data set:

12.37, 13.33, 32.67, 12.37, 26.45

Page 20: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 20

20

Example11A: Find the mean for the following group data

Class Frequency

50-59

60-69

70-79

80-89

90-99

6

9

12

15

8

Solution: First we need to find the class marks(midpoints) and then we use the following formula

:[ ]x. f

xn

= ∑ ,

where x : is the midpoint or class mark, and f :is the frequency n :is the number of data points

Class Frequency

f

Class marks

x

x f.

50-59

60-69

70-79

80-89

90-99

6

9

12

15

8

54.5

64.5

74.5

84.5

94.5

327

580.5

894

1267.5

756

n f= ∑ =50 x f.∑ =3825

So the mean is [ ] 3825 76 5

50x. f

xn

.= = =∑

Page 21: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 21

21

Example11B: Find the mean for the following group data

Class Frequency

00-04

05-09

10-14

15-19

20-24

25-29

4

10

12

20

8

6

Weighted Average (Mean): The formula in above is also called weighted average or weighted mean. It can also be written as follows:

[ ]x

ww.x

= ∑∑

where w is weight and x is the score.

Example12: Find the GPA of John who has the following courses with the corresponding units

and grades.

English 5 units with the grade of A Math 3 units with the grade of F Spanish 2 units with the grade of D Solution: In this problem, x will be the value of the grades and w is the number of units,

[ ] [ ] [ ] [ ]4 0 15 3 20 0 2 22 2 213 0 0

215 2

.x . . .x

ww

.+ + + +

= = = = =+ +

∑∑

.

Example13: A teacher is teaching 3 classes: There are 30 students in the first Class with the

average of 70 on the final exam. The second class has 40 students with the average of 60 on the final

exam. The 3rd class has 20 students with the average of 80 on the final exam. Find the weighted

average of the three classes combined together.

Solution: Let x be the average of and w be the number of students.

[ ] 70 30 60 40 80 20 2100 2400 1600 6100 67 830 40 20 90 90

( ) ( ) ( ).xx

ww

.+ + + += = = ≈

+ += ∑∑

Page 22: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 22

22

MMeeaassuurreess ooff VVaarriiaattiioonn

•• RRaannggee

•• VVaarriiaannccee

•• SSttaannddaarrdd DDeevviiaattiioonn

The RRaannggee: is defined to be the highest value minus the lowest value in the data set

The Variance: is defined by the following:

Sample:( )2

2 1

1

n

ii

x xs

n=

−=

∑ or

( )2

2

2

1

xx

nsn

−=

∑∑ (short cut formula of the sample

variance).

Population:( )2

2 1

N

ii

x

N

µσ =

−=∑

, or σ 2

2

2

=−∑ ∑

xx

NN

d i (short cut formula of the sample

variance).

Standard deviation: is the positive square root of the variance.

Standard deviation = Variance

Sample: ( )2

1

1

n

ii

x xs

n=

−=

∑, and

Population:( )2

1

N

ii

x

N

µσ =

−=∑

Page 23: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 23

23

Example14A: Find the range, variance, and the standard deviation of the following data

set.

3, 0, 7, 5, 15.

Solution:

o Range: Largest- Smallest = 15-0=15

o Variance: If we use the ( )2

2 1

1

n

ii

x xs

n=

−=

∑, first we need to find the sample mean x .

So x =+ + + +

= =3 0 7 5 15

5305

6 , then we substitute in the above formula and we get

s 22 2 2 2 23 6 0 6 7 6 5 6 15 6

5 1=

− + − + − + − + −

−b g b g b g b g b g

,

s 22 2 2 2 23 6 1 1 9

5 1=

− + − + + − +

−b g b g b g b g b g

s 2 9 36 1 1 815 1

=+ + + +

−,

s 2 1284

32= = , So the variance is s 2 32= .

x x x− ( )2x x−

3

0

7

5

15

3-6=-3

0-6=-6

7-6=1

5-6=-1

15-6=9

9

36

1

1

81

( )x x−∑ =0 ( )2x x−∑ =128

( )2

2 1 1281

128 325 1 4

n

ii

x xs

n=

−= = = =

−−

o Standard deviation: As we know the standard deviation is positive square root of

variance. standard deviation = Variance = 32 5 66≈ .

Page 24: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 24

24

But if we use the short cut formula

( )2

2

2

1

xx

nsn

−=

∑∑, first we need to find their sum, x∑ , and their sum of squares,

2x∑ .

3 0 5 05 1 37x + + + += =∑

2 2 2 2 22 3083 0 7 5 15 9 0 49 25 225x = + + + + = + + + + =∑ then we have

( )2

2

2

1

xx

nsn

−=

∑∑ ( )2303

5

0

1

85

−=

− = 308308 1

90080 1285 32

4 4 4

− −= = = , which is exactly the

same as above. ----------------------------------------------------------------------------------

Example14B: Find the range, variance, and the standard deviation of the following data set.

10, 17, 13, 12, 15, 18, 10, 17, 14, 16

28, 22, 17, 23, 12, 15, 28, 10, 20, 35

Solution:

Page 25: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 25

25

Example15A: Find the standard deviation for the following group data

Class Frequency

50-59

60-69

70-79

80-89

90-99

6

9

12

15

8

Solution: First will modify the above formula for the variance. But first we need to find the class marks

(midpoints) and then we use the following formula

sx x f

ni2

2

1=

∑ b g .or

( )2

2

2

1

x ff

n

xns

⎡ ⎤⎣ ⎦ −=

∑ ∑

where

x : is the midpoint or class mark

f : is the frequency

n : is the number of data points

We already know the mean [ ] 3865

576 5

0x. f

xn

.= = =∑

Class f x x f. x xi −b g2 x x fi −b g2 .

50-59

60-69

70-79

80-89

90-99

6

9

12

15

8

54.5

64.5

74.5

84.5

94.5

327

580.5

894

1267.5

756

(54.5-76.5)2=484

(64.5-76.5)2=144

(74.5-76.5)2=4

(84.5-76.5)2=64

(94.5-76.5)2=324

2904

1296

48

960

2592

n f= ∑ =50 x f.∑=3825

x x fi −∑ b g2 .= 7800

Page 26: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 26

26

After substitution in sx x f

ni2

2

1=

∑ b g . we get 2

50 17800 159 18s .= =−

, and hence the

standard deviation will be 159 1 12 68.s .= ≈

If we use the short cut formula

( )2

2

2

1

x ff

n

xns

⎡ ⎤⎣ ⎦ −=

∑ ∑, we need the following table.

Class f x x f. x f2 .

50-59

60-69

70-79

80-89

90-99

6

9

12

15

8

54.5

64.5

74.5

84.5

94.5

327

580.5

894

1267.5

756

(54.5)2.6 =17821.5

(64.5)2..9 =37442.25

(74.5)2.12 =66603

(84.5)2.15=107103.8

(94.5)2.8 =71442

n f= ∑ =50 x f.∑

=3825

x f2 .∑

=300412.5

( )2

2

38230

50412

0

550

4 1

.s

−= =

14630625300412 5 3004125 5 292612 55039 49

. . .− −= =

7 159 1884900 .= and hence the standard deviation will be 159 1 12 68.s .= ≈ , which the same as

the above result.

Page 27: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 27

27

Example15B: Find the standard deviation for the following group data

Class Frequency

00-04

05-09

10-14

15-19

20-24

25-29

4

10

12

20

8

6

Page 28: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 28

28

Question 1. What will happen to the mean, median, mode, range, and standard deviation if we add

a fix number, c, to all values in the data set?

Answer. The mean, median, and mode will increase by c units, but the range, and standard

deviation will not change.

Question 2. What will happen to the mean, median, mode, range, and standard deviation if we

subtract a fix number, c, from all values in the data set?

Answer. The mean, median, and mode will decrease by c units, but the range, and standard

deviation will not change.

Question 3. What will happen to the mean, median, mode, range, and standard deviation if we

multiply a fix number, c, to all values in the data set?

Answer. The mean, median, and mode will be multiplied by c units, so does to the range, and

standard deviation.

Example 16: X X+7 X-7 X*7

15

13

15

15

22

15+7=22

16+7=23

15+7=22

15+7=22

22+7=29

15-7=8

16-7=9

15-7=8

15-7=8

22-7=15

15*7=105

16*7=112

15*7=105

15*7=105

22*7=154

Mean

Median

Mode

Range

Sd

16

15

15

9

3.46

16+7=23

15+7=22

15+7=22

9

3.46

16-7=9

15-7=8

15-7=8

9

3.46

16*7=112

15*7=105

15*7=105

9*7=63

3.46*7=24.22

In general if Y aX b= + , then we have

• Mean of Y = a. [Mean of X]+b or y ax b= +

• Standard deviation of Y = |a| [standard deviation of X], S a Sy X=

Page 29: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 29

29

Empirical Rule

If the distribution of a data is bell shape or normal, then

• Approximately 68% of scores are one standard deviation away from the mean. They fall in the

interval x s− 1 , x s+ 1 .

• Approximately 95% of scores are two standard deviation away from the mean. They fall in the

interval x s− 2 , x s+ 2 .

• Approximately 99.7% of scores are two standard deviation away from the mean. They fall in the

interval x s− 3 , x s+ 3 .

Example17. Suppose the IQ scores are normally distributed with the mean of µ = 100 and

standard deviation of σ = 15 . Then by the empirical rule

• Approximately 68% of scores are in the interval 100-15, to100+15 or 85 to 115.

• Approximately 95% of scores are in the interval 100-2(15), to100+2(15) or 70 to 130.

• Approximately 99.7% of scores are in the interval 100-3(15), to100+3(15) or 55 to 145.

Page 30: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 30

30

CCooeeffffiicciieenntt ooff VVaarriiaattiioonn

The ccooeeffffiicciieenntt ooff vvaarriiaattiioonn is defined to be the standard deviation divided by the mean.

Coefficient of variation (CV) = sx

. If x is 0 or close to 0, then this measure shall not be used.

Normally this measure is used in the case we have 2 or more groups of data with different units.

Example18.

Class A Mean =129, and standard deviation= 11 CV=11/129=.085 or 8.5%

Class B Mean =150, and standard deviation= 25 CV=25/150=.167 or 16.7%

Class C Mean =60, and standard deviation= 15 CV=15/60 = .25 or 25.0%

The class C has the greatest relative variation.

MMeeaassuurreess ooff PPoossiittiioonn

• Standard Scores

z x xs

=− or z x

=− µσ

,

where, x or µ is the mean s or σ is the standard deviation.

This value, z, measures the deviation from the mean in number of standard deviation which is also has

no unit.

Example19. Suppose John is taking 3 classes with the following scores. In which class has he

better score?

Class A English test score = 145 Mean =129, and standard deviation= 11

Z=(145-129)/11 =1.45

Class B Physics test score = 190 Mean =150, and standard deviation= 25

Z=(190-150)/25 = 1.60

Class C Statistics test score = 88 Mean =60, and standard deviation= 15

Z=(88-60)/15=1.87

So his score in class C is higher relatively.

Page 31: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 31

31

PPeerrcceennttiilleess The percentile corresponding to a given score (X) is denoted by P and it is given by the following

formula

#of scores less thanP .100total number of scores

x=

Example20. John has the score of 88 in a class of 20 students. Find the percentile rank of a his

score.

81, 65, 75, 76, 78, 62, 63, 65, 70, 90, 61, 75, 76, 79, 58, 88, 82, 95, 90, 67.

Solution: In any problem of finding percentile, we must sort the data set from smallest to largest.

58, 61, 62, 63, 65, 65, 67, 70, 75, 75 76, 76, 78, 79, 81, 82, 88, 90, 90, 95.

P #of scores less than xtotal number of scores

.100 = 1620

= =.100 80

So john’s score has 80th percentile, which means 80% of all scores are below 88.

FFiinnddiinngg tthhee SSccoorree CCoorrrreessppoonnddiinngg ttoo aa GGiivveenn PPeerrcceennttiillee

Example21. In data set of example 20, find the score corresponding 12th percentile.

Solution:

Step1: Make sure data is sorted

58, 61, 62, 63, 65, 65, 67, 70, 75, 75

76, 76, 78, 79, 81, 82, 88, 90, 90, 95

Step2: Compute the L = p% of n., where L is the location for the score.

In this example L=12%of 20=0.12(20)=2.4 or 3.

Step3: Go to the data set and pick the score at the 3rd position which is 62.

It is usually written as P12=62

Page 32: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 32

32

NNoottee:: • If L is not a whole number, round up to the next whole number.

• If LL is a whole number, use the score as the average of Lth and ((LL++11))tthh location score.

Example22. In data set of example 20, find the score corresponding 40th percentile.

Step1: as before 58, 61, 62, 63, 65, 65, 67, 70, 75, 75 76, 76, 78, 79, 81, 82, 88, 90, 90, 95

Step2: L =40% of 20= 0.40(20)=8 which is a whole number so we are going to pick the average of 8th

and 9th scores.

Step3: 8th score is 70

9th score is 75 and their average is (70+75)/2=72.5. So P40=72.5.

DDeecciilleess:: divide the data set into 10 groups.

D1=10th percentile which the same as P10

D2=20th percentile which the same as P20

…….

D9=90th percentile which the same as P90

QQuuaarrttiilleess:: divide the data set into 4 groups.

Q1=First quartile or 25th percentile which the same as P25

Q2=second quartile or 50th percentile which the same as P50 . This is also median

Q3=third quartile or 75th percentile which the same as P75

IInntteerr--QQuuaarrttiillee RRaannggee ((IIQQRR)):: iiss tthhee ddiiffffeerreennccee bbeettwweeeenn 33rrdd aanndd 11sstt qquuaarrttiilleess aanndd iitt iiss ddeennootteedd bbyy

IIQQRR aanndd iitt iiss ddeeffiinneedd bbyy IIQQRR == QQ33 –– QQ11.

Page 33: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 33

33

Example23. In data set of example 20, find the score corresponding to

• D2

• Q1

• Q3

• IQR

Outlier: An oouuttlliieerr is an extremely high or an extremely low data value, To check for outlier we

compute Q1-1.5(IQR) and Q3+1.5(IQR), then if

• The suspected score is below Q1-1.5(IQR) or

• The suspected score is above Q3+1.5(IQR)

Then the score is said to be an outlier.

Example24. Is there any outlier in the following data set?

55 46 46 4155 49 51 4136 41 86 5352 47 44 5151 61 51 48

Sorted Data

36 41 41 41 44 46 46 47 48 4951 51 51 51 52 53 55 55 61 86

Page 34: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 34

34

Five commonly used Statistics: The five numbers in any data set that is used frequently are

Minimum, Q1, Q2, Q3, Maximum

Box pplloott oorr box-and-whisker plot:: is another graphical representation of any data set. We

use the five commonly used statistics to graph the box plot. The box plot can provide answers to the

following questions

o Is a factor significant?

o Does the location differ between subgroups?

o Does the variation differ between subgroups?

o Are there any outliers?

Example25. In data set of example 20, find the 5 common statistics.

58, 61, 62, 63, 65, 65, 67, 70, 75, 75 76, 76, 78, 79, 81, 82, 88, 90, 90, 95

1. Minimum: is 58

2. Q1: L= 25% of 20 =.25(20) = 5. Since this is a whole number we use the average of 5th and 6th

observation. In above ordered data set we have

5th score is 65

6th score is 65

their average is also 65. SO Q1=65.

3. Q2: L= 50% of 20 =0.50(20) =10. Again since this is a whole number we use the average of 10th

and 11th observation. In above ordered data set we have

10th score is 75

11th score is 76

their average is (75+76)/2=75.5 SO Q2=75.5.

4. Q3: L= 75% of 20 =0.75(20) =15. This is a whole number we use the average of 15th and 16th

observation. In above ordered data set we have

15th score is 81

Page 35: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 35

35

16th score is 82

their average is (81+82)/2=81.5 SO Q3=81.5.

5. Maximum: is 95.

So the five statistics are 58, 65, 75.5, 81.5, and 95.

C1

100

90

80

70

60

Boxplot of C 1

Example26 In data set of example 24, find the 5 common statistics.

Page 36: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 36

36

Example27. In data set below use computer to find the descriptive statistics and plot all appropriate charts for all variables that was discussed so far.

Test1 Sex Grade Test1 Sex Grade 76 1 C 76 1 C 62 1 D 59 1 F 68 1 D 92 1 A 69 1 D 93 1 A 79 0 C 88 0 B 90 0 A 86 0 B 79 1 C 66 0 D 86 1 B 81 1 B 52 0 F 85 0 B 97 1 A 85 0 B 78 1 C 70 1 C 55 1 F 55 1 F 96 1 A 62 1 D 89 1 B 80 1 B 73 0 C 60 1 D 66 0 D 80 1 B 88 1 B 72 1 C 92 0 A 82 0 B 94 1 A 86 1 B 50 1 F 99 1 A 71 0 C 63 1 D 89 0 B 75 1 C 78 1 C 83 1 B 88 0 B 78 0 C 58 1 F 61 1 D

Descriptive Statistics: Test1 Variable Sex N N* Mean SE Mean StDev Minimum Q1 Median Q3 Maximum Test1 Female 34 0 75.59 2.36 13.76 50.00 62.00 77.00 86.50 99.00 Male 16 0 79.38 2.77 11.10 52.00 71.50 83.50 88.00 92.00

Page 37: Elementary Statistics - M. Ghamsary, Ph.D

Elementary Statistics Dr. Ghamsary Chapter 2 Page 37

37

Sex

Test

1

MaleFemale

100

90

80

70

60

50

Boxplot of Test1 vs Sex