business statistics spring 2005 summarizing and describing numerical data

49
Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Upload: eugenia-hensley

Post on 05-Jan-2016

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Business Statistics

Spring 2005

Summarizing and Describing Numerical Data

Page 2: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Topics•Measures of Central Tendency Mean, Median, Mode, Midrange, Midhinge•Quartile

•Measures of Variation The Range, Interquartile Range, Variance and Standard Deviation, Coefficient of variation•Shape Symmetric, Skewed, using Box-and-Whisker Plots

Page 3: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Numerical Data Properties

Central Tendency (Location)

Variation (Dispersion)

Shape

Page 4: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Measures of Central Tendency Measures of Central Tendency forfor

Ungrouped DataUngrouped Data

Raw Data

Page 5: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Summary Measures

Central Tendency

MeanMedian

Mode

Midrange

Quartile

Midhinge

Summary Measures

Variation

Variance

Standard Deviation

Coefficient of Variation

Range

Page 6: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Measures of Central Tendency

Central Tendency

Mean Median Mode

Midrange

Midhinge

n

xn

ii

1

Page 7: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Population MeanFor ungrouped data, the population mean is the sum of all

the population values divided by the total number of population values:

where µ stands for the population mean.

N is the total number of observations in the population.

X stands for a particular value.

indicates the operation of adding.

N

X

3-2

Page 8: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Population Mean ExampleParameter: a measurable characteristic of a population.

The Kane family owns four cars. The following is the mileage attained by each car: 56,000, 23,000, 42,000, and 73,000. Find the average miles covered by each car.

The mean is (56,000 + 23,000 + 42,000 + 73,000)/4 = 48,500

3-3

Page 9: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Sample Mean

For ungrouped data, the sample mean is the sum of all the sample values divided by the number of sample values:

where X stands for the sample mean

n is the total number of values in the sample

3-4

n

xX

Page 10: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Return on Stock

1998

1997

1996

1995

1994

10%

8

12

2

8

17%

-2

16

1

8

Stock X Stock Y

40% 40%

Average Return

on Stock= 40 / 5 = 8%

Page 11: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

The Mean (Arithmetic Average)

•It is the Arithmetic Average of data values:

•The Most Common Measure of Central Tendency

•Affected by Extreme Values (Outliers)

n

xn

1ii

n

xxx n 21

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Mean = 5 Mean = 6

xSample Mean

Page 12: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Properties of the Arithmetic Mean

Every set of interval-level and ratio-level data has a mean.

All the values are included in computing the mean.

A set of data has a unique mean.

The mean is affected by unusually large or small data values.

The mean is relatively reliable.

The arithmetic mean is the only measure of central tendency where the sum of the deviations of each value from the mean is zero.

3-6

Page 13: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

EXAMPLE

Consider the set of values: 3, 8, and 4.

The mean is 5.

Illustrating the fifth property, (3-5) + (8-5) + (4-5) = -2 +3 -1 = 0. In other words,

( )X X 0

3-7

Page 14: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

The Median

Median: The midpoint of the values after they have been ordered from the smallest to the largest, or the largest to the smallest. There are as many values above the median as below it in the data array.

Note: For an even set of numbers, the median will be the arithmetic average of the two middle numbers.

3-10

Page 15: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Position of Median in Sequence

Median

Positioning Pointn 1

2

Page 16: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

The Median

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10 12 14

Median = 5 Median = 5

•Important Measure of Central Tendency

•In an ordered array, the median is the “middle” number.

•If n is odd, the median is the middle number.•If n is even, the median is the average of the 2

middle numbers.•Not Affected by Extreme Values

Page 17: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

62

Properties of the Median• There is a unique median for each data set.

• It is not affected by extremely large or small valuesand is therefore a valuable measure of centraltendency when such values occur.

• It can be computed for ratio-level,interval-level, and ordinal-level data.

• It can be computed for an open-ended frequencydistribution if the median does not lie in an open-ended class.

• No arithmetic properties

Page 18: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

The Mode

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Mode = 9

•A Measure of Central Tendency•Value that Occurs Most Often•Not Affected by Extreme Values•There May Not be a Mode•There May be Several Modes•Used for Either Numerical or Categorical Data

0 1 2 3 4 5 6

No Mode

Page 19: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Midrange

•A Measure of Central Tendency

•Average of Smallest and Largest

Observation:

•Affected by Extreme Value

2

xx smallestestl arg

Midrange

0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Midrange = 5 Midrange = 5

Page 20: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Quartiles

• Not a Measure of Central Tendency• Split Ordered Data into 4 Quarters

• Position of i-th Quartile: position of point

25% 25% 25% 25%

Q1 Q2 Q3

Q i(n+1)i 4

Data in Ordered Array: 11 12 13 16 16 17 18 21 22

Position of Q1 = 2.50 Q1 =12.5= 1•(9 + 1)4

Page 21: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Quartiles

•See text page 107 for “rounding rules” for position of the i-th quartile

• Position (not value) of i-th Quartile:

25% 25% 25% 25%

Q1 Q2 Q3

Q i(n+1)i

4

Page 22: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Midhinge

• A Measure of Central Tendency

• The Middle point of 1st and 3rd Quarters

• Used to Overcome Extreme Values

Midhinge = 2

31 QQ

Data in Ordered Array: 11 12 13 16 16 17 18 21 22

Midhinge = 162

519512

231

..QQ

Page 23: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Summary Measures

Central Tendency

MeanMedian

Mode

Midrange

Quartile

Midhinge

n

xn

ii

1

Summary Measures

Variation

Variance

Standard Deviation

Coefficient of Variation

Range

1n

xxs

2i2

Page 24: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Measures of Variation

Variation

Variance Standard Deviation Coefficient of Variation

PopulationVariance

Sample

Variance

PopulationStandardDeviationSample

Standard

Deviation

Range

Interquartile Range

100%

X

SCV

Page 25: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

• Measure of Variation

• Difference Between Largest & Smallest Observations:

Range =

• Ignores How Data Are Distributed:

The Range

SmallestrgestLa xx

7 8 9 10 11 12

Range = 12 - 7 = 5

7 8 9 10 11 12

Range = 12 - 7 = 5

Page 26: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Return on Stock

1998

1997

1996

1995

1994

10%

8

12

2

8

17%

-2

16

1

8

Stock X Stock Y

Range on Stock X = 12 - 2 = 10%

Range on Stock Y = 17 - (-2) = 19%

Page 27: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

• Measure of Variation

• Also Known as Midspread: Spread in the Middle 50%

• Difference Between Third & First

Quartiles: Interquartile Range =

Interquartile Range

13 QQ Data in Ordered Array: 11 12 13 16 16 17 17 18 21

13 QQ = 17.5 - 12.5 = 5

Page 28: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

• IQR = 75th percentile - 25th percentile

•The IQR is useful for checking for outliers

•Not Affected by Extreme Values

Interquartile Range

Data in Ordered Array: 11 12 13 16 16 17 17 18 21

13 QQ = 17.5 - 12.5 = 5

Page 29: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Variance & Variance & Standard DeviationStandard Deviation

Measures of Dispersion

Most Common Measures

Consider How Data Are Distributed

Show Variation About Mean (X or )

4 6 8 10 12

X = 8.3

Page 30: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

•Important Measure of Variation

•Shows Variation About the Mean:

•For the Population:

•For the Sample:

Variance

N

X i

22

1

22

n

XXs i

For the Population: use N in the denominator.

For the Sample : use n - 1 in the denominator.

Page 31: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Population Variance

The population variance for ungrouped data is the arithmetic mean of the squared deviations from the population mean.

2

2

( )X

N

4-5

Page 32: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Population Variance EXAMPLE

The ages of the Dunn family are 2, 18, 34, and 42 years. What is the population variance?

X N/ /96 4 24

2 2 944 4 236 ( ) / /X N

2

2

( )X

N

x (x- (x-)2

2 24 -22 48418 24 -6 3634 24 10 10042 24 18 324

944

Page 33: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

PopulationStandard Deviation

N

x

2)(

Page 34: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Population Standard Deviation EXAMPLE

The ages of the Dunn family are 2, 18, 34, and 42 years. What is the population variance?

X N/ /96 4 24

2( ) 944236

4

X

N

N

X 2)(

x (x- (x-)2

2 24 -22 48418 24 -6 3634 24 10 10042 24 18 324

944

Page 35: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

•Most Important Measure of Variation

•Shows Variation About the Mean:

•For the Population:

•For the Sample:

Standard Deviation

N

X i

2

1

2

n

XXs i

For the Population: use N in the denominator.

For the Sample : use n - 1 in the denominator.

Page 36: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Sample Variance and Standard Deviation

The sample variance estimates the population

am variance. NOTE: important computation formriance estimates the population variance.

1

)(

S

1

)(

22

2

22

nnX

X

n

XXS

The sample standard deviation = 2ss

Page 37: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Example of Standard DeviationDeviation from Mean

Amount X (X - X) ( X - X )600 435 600 - 435 = 165 27,225 350 435 350 - 435 = -85 7,225 275 435 275 - 435 = -160 25,600 430 435 430 -435 = -5 25 520 435 520 - 435 = 85 7,225

0 67,300

( )X X

n

1s =

s == = = 129.7167 300

4

,16 825,

2

2

Page 38: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Example of Standard Deviation(Computational Version)

Amount(X) X (X - X) ( X - X ) X2

600 435 165 27,225 360000350 435 -85 7,225 122500275 435 -160 25,600 75625430 435 -5 25 184900520 435 85 7,225 270400

2175 67,300 1013425

2

2

2

1

xX

nn

s = = =

155

21751013425

2

129.71

Page 39: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Sample Standard Deviation

1

2

n

XX iNOTE: For the Sample : use n - 1 in the denominator.

Data: 10 12 14 15 17 18 18 24

s =

n = 8 Mean =16

18

)1624()1618(.....)1612()1610( 2222

= 4.2426

s

:X i

Page 40: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Interpretation and Uses of the Standard Deviation

Chebyshev’s theorem: For any set of observations, the minimum proportion of the values that lie within k standard deviations of the mean is at least 1 - 1/k2 where k is any constant greater than 1.

Multiply by 100% to get percentage of values within k standard deviations of the mean

4-14

Page 41: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Interpretation and Uses of the Standard Deviation

Empirical Rule: For any symmetrical, bell-shaped distribution, approximately 68% of the observations will lie within of the mean ( );approximately 95% of the observations will lie within of the mean ( ); approximately 99.7% will lie within of the mean ( ).

1

3

2

4-15

Page 42: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Comparing Standard Deviations

1

2

n

XX is =

= 4.2426

N

X i

2 = 3.9686

Value for the Standard Deviation is larger for data considered as a Sample.

Data : 10 12 14 15 17 18 18 24:X i

N= 8 Mean =16

Page 43: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Comparing Standard Deviations

Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21

11 12 13 14 15 16 17 18 19 20 21

Data B

Data A

Mean = 15.5 s = .9258

11 12 13 14 15 16 17 18 19 20 21

Mean = 15.5 s = 4.57

Data C

Page 44: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Coefficient of Variation

•Measure of Relative Variation

•Always a %

•Shows Variation Relative to Mean

•Used to Compare 2 or More Groups

•Formula ( for Sample):

100%

X

SCV

Page 45: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Comparing Coefficient of Variation

Stock A: Average Price last year = $50

Standard Deviation = $5

Stock B: Average Price last year = $100

Standard Deviation = $5

100%

X

SCV

Coefficient of Variation:

Stock A: CV = 10%

Stock B: CV = 5%

Page 46: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Shape

• Describes How Data Are Distributed

• Measures of Shape: Symmetric or skewed

Right-SkewedLeft-Skewed Symmetric

Mean = Median = ModeMean Median Mode Median MeanMode

Page 47: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Box-and-Whisker Plot

Graphical Display of Data Using5-Number Summary

Median

4 6 8 10 12

Q3Q1 XlargestXsmallest

Page 48: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Distribution Shape & Box-and-Whisker Plots

Right-SkewedLeft-Skewed Symmetric

Q1 Median Q3Q1 Median Q3 Q1

Median Q3

Page 49: Business Statistics Spring 2005 Summarizing and Describing Numerical Data

Summary• Discussed Measures of Central Tendency Mean, Median, Mode, Midrange, Midhinge

• Quartiles• Addressed Measures of Variation The Range, Interquartile Range, Variance, Standard Deviation, Coefficient of Variation• Determined Shape of Distributions

Symmetric, Skewed, Box-and-Whisker Plot

Mean = Median = ModeMean Median Mode Mode Median Mean