1 descriptive statistics chapter 3 msis 111 prof. nick dedeke

38
1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

Upload: theodore-daniel

Post on 16-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

1

Descriptive Statistics

Chapter 3MSIS 111 Prof. Nick Dedeke

Page 2: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

2

Objectives

Define measures of central tendency, variability, shape and associationDefine statistical measuresCompute statistical measures for ungrouped and grouped dataInterpret statistical results

Page 3: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

3

IntroductionIn most competitive sports, one looks for the position of the athletes, e.g. who came in first, second, and so on. In statistics, one is interested in the following measures:- most frequent value in data set- summary of all values in data set- midpoint position of data set- positions of data in data set- distances to midpoint of data set

Page 4: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

4

Exercise: Statistical Measure 1

We want to find out which of the following students is the better one using the available data. The data shows the positions of the two competitors in several rounds of testing.

Kuli 1st 2nd 1st 2nd 1st 4th 3rd 3rd 2nd 5th 1st Marti 3rd 2nd 3rd 1st 2nd 1st 1st 1st 3rd 2nd 3rd

Page 5: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

5

Response: Commonsense Approach

We want to find out which of the following students is the better one using the available data. Kuli 1st 2nd 1st 2nd 1st 4th 3rd 3rd 2nd 5th 1st Marti 3rd 2nd 3rd 2nd 2nd 1st 1st 1st 3rd 2nd 1st

3 times Kuli was 1st Marti was behind3 times Marti was 1st Kuli was behindMarti had more 2nd placesMarti had more 3rd placesImagine that you had a data set with 500 values!!

Page 6: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

6

Mode

The most frequently occurring value in a data setApplicable to all levels of data measurement (nominal, ordinal, interval, and ratio)

Bimodal -- Data sets that have two modesMultimodal -- Data sets that contain more than two modes

Page 7: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

7

Median

Middle value in an ordered array of numbersApplicable for ordinal, interval, and ratio dataNot applicable for nominal dataUnaffected by extremely large and extremely small values

Page 8: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

8

Median: Computational Procedure

First Procedure Arrange the observations in an ordered array. If there is an odd number of terms, the median is

the middle term of the ordered array. If there is an even number of terms, the median

is the average of the middle two terms.

Second Procedure The median’s position in an ordered array is

given by (n+1)/2.

Page 9: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

9

Median: Odd Number Example (Long method)

Ordered Array3 4 5 7 8 9 11 14 15 16 16 17 19 19 20 21 22

There are 17 terms in the ordered array.Position of median = (n+1)/2 = (17+1)/2 = 9The median is the 9th term, which is 15.If the 22 is replaced by 100, the median is 15.If the 3 is replaced by -103, the median is 15.

Page 10: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

10

Median: Even Number Example (Long Method)

Ordered Array3 4 5 7 8 9 11 14 15 16 16 17 19 19 20

21

• There are 16 terms in the ordered array.• Position of median = (n+1)/2 = (16+1)/2 =

8.5• The median is between the 8th and 9th

terms, 14.5.

NOTE• If the 21 is replaced by 100, the median is

14.5.• If the 3 is replaced by -88, the median is

14.5.

Page 11: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

11

Arithmetic Mean

Commonly called ‘the mean’Is the average of a group of numbersApplicable for interval and ratio dataNot applicable for nominal or ordinal dataAffected by each value in the data set, including extreme valuesComputed by summing all values in the data set and dividing the sum by the number of values in the data set

Page 12: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

12

Population Mean (Long method)

1 2 3...

57 57 86 86 42 42 43 56 57 42 42 43

12653

1254.4167

NX

N NX X X X

Data for total population: 57, 57, 86, 86, 42, 42, 43, 56, 57, 42, 42, 43

Page 13: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

13

Computing Sample Mean (Long method)

1 2 3...

57 86 42

3185

361.667

nX

Xn n

X X X X

Population mean is not the same thing as sample mean! Our numbers (57, 86, 42) is as sample thatis drawn from the population and hence it is asmall segment of it.

Page 14: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

14

Computing Central Tend. Measures using Frequency Tables (Compact method)

Mean= Fi *Xi

Fi

= 1655/15

=110.33

XiFi Fi * Xi

55 2 110

60 1 60

100 3 300

125 5 625

140 4 560

15 1655

Mode= 125

Median position =

= (15+1)/2 = 8th

Median value = 125

THIS IS THE TYPE APPROACH YOU NEED TO MASTER FOR YOUREXAM.

Data for total population: 55, 55, 60, 100, 100, 100, 125, 125, 125, 125, 125, 140, 140, 140, 140

Page 15: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

15

Exercise: Computing Central Tend. Measures using Frequency Tables

Mean= Fi *Xi

Fi

=

=

XiFi Fi * Xi

1 2

10 3

4 4

6 3

12 2

n=14

Mode=

Median position =

=

Median value =

Page 16: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

16

Response: Computing Central Tend. Measures using Frequency Tables

Mean= Fi *Xi

Fi

= 82/14

=5.85

XiFi Fi * Xi

1 2 2

4 4 8

6 3 18

10 3 30

12 2 24

n=14 82

Mode= 6 and 4

Median position =

= (14+1)/2 = 7.5

(between 7th and 8th )

Median value =

= (6+6)/2 = 6

Page 17: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

17

Opening Exercise: Using Statistical Measures

Kuli 1st 2nd 1st 2nd 1st 4th 3rd 3rd 2nd 5th 1st Marti 3rd 2nd 3rd 2nd 2nd 1st 1st 1st 3rd 2nd 1st

Mode: Most frequently occurring value of variable

Mode for Kuli: 1st Mode for Marti: 1st Mean: Average of the values of a variable

Sample mean = Xi

n

Mean or average score for Kuli 25/11 = 2.27Mean or average score for Marti 21/11 = 1.9

Page 18: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

18

Using Statistical MeasuresKuli 1st 2nd 1st 2nd 1st 4th 3rd 3rd 2nd 5th 1st Marti 3rd 2nd 3rd 2nd 2nd 1st 1st 1st 3rd 2nd 1st

Median: The value in the middle of an ordered data set of n values.

Median point = (n + 1)/2 = (11+ 1)/2 = 6th position

Kuli 1st 1st 1st 1st 2nd 2nd 2nd 3rd 3rd 4th 5th Marti 1st 1st 1st 1st 2nd 2nd 2nd 2nd 3rd 3rd 3rd

Median score for Kuli is 2nd Median score for Marti is 2nd

Notice medianrequires ordered set

Page 19: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

19

Using Frequency Distribution Tables

Analysis of Kuli’s performanceMean = Fi * Xi

Fi

= 25/11 = 2.27

Mode = 1st

Median point = (11+ 1)/2 = 6th Median value = 2nd Using cumul. Freq. column = 2nd

Xi Frequency (Fi)

Fi * Xi

Cum. (C Fi)

1st 4 4 4

2nd 3 6 7

3rd 2 6 9

4th 1 4 10

5th 1 5 11

11 25

Page 20: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

20

Using Frequency Distribution Tables

Analysis of Marti’s performanceMean = Fi * Xi

Fi

= 21/11 = 1.9

Mode = 1st & 2nd Median point = (11+ 1)/2 = 6th

Median value = 2nd Using cumul. Freq. column = 2nd

XiFrequency

(Fi)Fi * Xi Cum

. (C Fi)

1st 4 4 4

2nd 4 8 8

3rd 3 9 11

4th 0 0 0

5th 0 0 0

11 21

Page 21: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

21

Using Frequency Distribution Tables

Who is the better student?

Xi Marti Kuli

Mean 1.9 2.27

Median value 2nd 2nd

Mode 1st & 2nd 1st

Page 22: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

22

New Case: Median measure

Analysis of Katie’s performanceMean = Fi * Xi

Fi

= 31/12 = 2.58Mode = 3rd Median point = (12+ 1)/2 = 6.5th

> median value is between 6th

and 7th positions

Median value=(2nd+3rd)/2 = 2.5th > Average of the 6th and 7th positions.

Xi Frequency (Fi)

Fi * Xi

Cum. (C Fi)

1st 4 4 4

2nd 2 8 6

3rd 5 15 11

4th 1 4 12

12 31

Page 23: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

23

Examples

Page 24: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

24

PercentilesSometimes we are not analyzing several values from one person, but one value for several persons or objects. For example we have data from the performance of several fund managers for year 2006. We want to present the data in the form, XX manager is in the top 10 or tenth percentile or top 25 or 25th percentile.The method used consists of three steps- organize data in ascending order- calculate location of percentile you want- identify the object in the percentile location from the data set

Page 25: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

25

Interpretation: PercentilesIf manager YY is in the tenth percentile of of a group, this means that at least 10% of everyone scored below manager YY and at most 90 % of everyone in the data set scored better than manager YY. If manager Pico is in the 95th percentile of of a group, this means that at least 95 % of everyone in the data set scored below manager Pico and at most 5 % of everyone in the data set scored better than the manager .

Page 26: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

26

Exercise: Percentiles for Known Values

First name

Fund performanc

e

Bill 106%

Jane 109%

Sven 114%

Larry 116%

Dub 121%

Anna 122%

Cole 125%

Salome

129%

In which percentile is Sven?

Page 27: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

27

Deriving Percentiles with Cumulative Relative Frequency Approach for Observed ValuesFirst

nameFund

performance

Bill 106%

Jane 109%

Sven 114%

Larry 116%

Dub 121%

Anna 122%

Cole 125%

Salome

129%

In which percentile is Sven?

FiRel.

fi

1 1/8

1 1/8

1 1/8

1 1/8

1 1/8

1 1/8

1 1/8

1 1/8N=8

Cumrel. fi

Percentiles

1/8=0.125

12.5th Percentile

2/8=0.25 25th Percentile

3/8=0.375

37.5th Percentile

4/8=0.50 50th Percentile

5/8=0.625

62.5th Percentile

6/8=0.75 75th Percentile

7/8=0.875

87.5th Percentile

8/8=1 100th Percentile

Page 28: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

28

Deriving Percentiles with Cumulative Relative Frequency Approach for Unobserved Values

First name

Fund performanc

e

Bill 106%

Jane 109%

Sven 114%

Larry 116%

Dub 121%

Anna 122%

Cole 125%

Salome

129%What is the value of the 90th percentile?

FiRel.

fi

1 1/8

1 1/8

1 1/8

1 1/8

1 1/8

1 1/8

1 1/8

1 1/8

N=8

Cumfi

Percentiles

1/8 12.5th Percentile

2/8 25th Percentile

3/8 37.5th Percentile

4/8 50th Percentile

5/8 62.5th Percentile

6/8 75th Percentile

7/8 87.5th Percentile

1 100th Percentile

Page 29: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

29

Computing Data Values When Given Percentile locations (Approximate

method) 90th percentile location i = (P/100) * N = 0.9 * 8 = 7.2th positionResult is not an integer, percentile position is (7.2 + 1) rounded up to 8th position. 90th percentile value from tables = 129%

This is an approximate method because the formula gives the same result for multiple percentiles:

The approximate method gives the same result of 129% for 91st, 92nd, 93rd , up to 100th percentiles

50th percentile location i = (P/100) * N = 0.5 * 8 = 4th position50th percentile = (4th value + 5th value)/2 = (116+121)/2 = 118.5% (But from tables we see that 116% is also the 50th percentile)

RECOMMENDATION: USE THIS APPROXIMATE APPROACH FORMULA WHEN YOU ARE DEALING WITH UNOBSERVED VALUES. IF YOU USE THE APPROACH IN THE EXAM, YOU WILL NOT BE MARKED WRONG.

Page 30: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

30

Computing Percentile locations with arithmetic formula (More precise method)

90th percentile location i = (P/100) * N = 0.9 * 8 = 7.2th position90th percentile is 0.2 or 20% between the 7th and 8th The value for the 90th percentile is computed by computing the following values = 7th position’s value + (8th position’s value - 7th position value)* Fraction got from computing i125% + (129% - 125%)*0.2 = 125.8%(~ 126%)50th percentile location i = (P/100) * N = 0.5 * 8 = 4th position 50th percentile = 116%

Page 31: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

31

Overview Measures and Summary of Conditions for Using Descriptive Measures

The use of statistical measures is conditioned on the level of measurement of data.For specific levels, e.g. nominal level, many statistical measures can not be used.

Page 32: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

32

Descriptive Measures for Grouped Data

Mean, Median and Mode can all be computed for quantitative data sets, that were measured at the right level.

Page 33: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

33

Class interval Frequency (Fi)

Midpoints (Mi)

[1 – 3) inches 16 2

[3 – 5) inches 2 4

[5 – 7) inches 4 6

[7 – 9) inches 3 8

[9 – 11) inches 9 10

[11 – 13) inches 6 12

40 40

Exercise: Central Tendency Measures for Grouped Data

Modal class:Median position:Median class:

Page 34: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

34

Class interval Frequency (Fi)

Midpoint (Mi)

[1 – 3) inches 16 2

[3 – 5) inches 2 4

[5 – 7) inches 4 6

[7 – 9) inches 3 8

[9 – 11) inches 9 10

[11 – 13) inches 6 12

40 40

Response: Central Tendency Measures for Grouped Data

Modal class: [1 – 3) inches Median position: (n+1)/2 = 41/2 =20.5 between 20th and 21st positionsMedian class: [5-7) inches (this would be hard to derive if it were between 18th and 19th positions, i.e. it crossed two classes)

Page 35: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

35

Class interval Frequency (Fi)

Midpoint (Mi)

(Fi)*(Mi)

[1 – 3) inches 16 2 32

[3 – 5) inches 2 4 8

[5 – 7) inches 4 6 24

[7 – 9) inches 3 8 24

[9 – 11) inches 9 10 90

[11 – 13) inches 6 12 72

40 40 226

Example: Central Tendency Measures for Grouped Data

Find the mean for the distribution:Mean: = (Σ Fi*Mi)/n = 226/40 = 5.65 inches

Page 36: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

36

Class interval Frequency (Fi)

Midpoint (Mi)

(Fi)*(Mi)

[1 – 2) inches 2

[2 – 3) inches 2

[3 – 4) inches 4

[4 – 5) inches 2

[5 – 6) inches 1

Exercise: Central Tendency Measures for Grouped Data

Find the mean for the distribution:Mean: = (Σ Fi*Mi)/n = inches

Page 37: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

37

Class interval Frequency (Fi)

Midpoint (Mi)

(Fi)*(Mi)

[1 – 2) inches 2 0.5 1

[2 – 3) inches 2 2.5 5

[3 – 4) inches 4 3.5 14

[4 – 5) inches 2 4.5 9

[5 – 6) inches 1 5.5 5.5

11 34.5

Response: Central Tendency Measures for Grouped Data

Find the mean for the distribution:Mean: = (Σ Fi*Mi)/n = 34.5/11 = 3.136

inches

Page 38: 1 Descriptive Statistics Chapter 3 MSIS 111 Prof. Nick Dedeke

38

Excel Examples