chapter 1 describing data: graphical and numerical probability (6mtcoae205)

106
Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Upload: cassandra-cole

Post on 26-Dec-2015

230 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Chapter 1

Describing Data: Graphical and Numerical

PROBABILITY (6MTCOAE205)

Page 2: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Dealing with Uncertainty

Everyday decisions are based on incomplete information

Consider:

Will the job market be strong when I graduate? Will the price of Yahoo stock be higher in six months

than it is now? Will interest rates remain low for the rest of the year if

the federal budget deficit is as high as predicted?

Assist. Prof. Dr. İmran Göker Ch. 1-2

Page 3: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Dealing with Uncertainty

Numbers and data are used to assist decision making

Statistics is a tool to help process, summarize, analyze, and interpret data

Assist. Prof. Dr. İmran Göker Ch. 1-3

(continued)

Page 4: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Key Definitions

A population is the collection of all items of interest or under investigation

N represents the population size

A sample is an observed subset of the population n represents the sample size

A parameter is a specific characteristic of a population A statistic is a specific characteristic of a sample

Assist. Prof. Dr. İmran Göker Ch. 1-4

Page 5: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Population vs. Sample

Assist. Prof. Dr. İmran Göker Ch. 1-5

a b c d

ef gh i jk l m n

o p q rs t u v w

x y z

Population Sample

Values calculated using population data are called parameters

Values computed from sample data are called statistics

b c

g i n

o r u

y

Page 6: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Examples of Populations

Names of all registered voters in the Turkish

Republic Incomes of all families living in Ankara Osteoporosis incidence in Turkish women older

than 45 years old. Grade point averages of all the students in our

university

Assist. Prof. Dr. İmran Göker Ch. 1-6

Page 7: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Random Sampling

Simple random sampling is a procedure in which each member of the population is chosen strictly by

chance, each member of the population is equally likely to be

chosen, every possible sample of n objects is equally likely to

be chosen

The resulting sample is called a random sample

Assist. Prof. Dr. İmran Göker Ch. 1-7

Page 8: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Descriptive and Inferential Statistics

Two branches of statistics: Descriptive statistics

Graphical and numerical procedures to summarize and process data

Inferential statistics Using data to make predictions, forecasts, and

estimates to assist decision making

Assist. Prof. Dr. İmran Göker Ch. 1-8

Page 9: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Descriptive Statistics

Collect data e.g., Survey

Present data e.g., Tables and graphs

Summarize data e.g., Sample mean =

Assist. Prof. Dr. İmran Göker Ch. 1-9

iX

n

Page 10: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Inferential Statistics

Assist. Prof. Dr. İmran Göker Ch. 1-10

Estimation e.g., Estimate the population

mean weight using the sample mean weight

Hypothesis testing e.g., Test the claim that the

population mean weight is 140 pounds

Inference is the process of drawing conclusions or making decisions about a population based on

sample results

Page 11: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Types of Data

Examples:

Marital Status Are you registered to

vote? Eye Color (Defined categories or

groups)

Examples:

Number of Children Defects per hour (Counted items)

Examples:

Weight Voltage (Measured characteristics)

Assist. Prof. Dr. İmran Göker Ch. 1-11

Page 12: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Measurement Levels

Interval Data

Ordinal Data

Nominal Data

Quantitative Data

Qualitative Data

Categories (no ordering or direction)

Ordered Categories (rankings, order, or scaling)

Differences between measurements but no true zero

Ratio DataDifferences between measurements, true zero exists

Assist. Prof. Dr. İmran Göker Ch. 1-12

Page 13: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Graphical Presentation of Data

Data in raw form are usually not easy to use for decision making

Some type of organization is needed Table Graph

The type of graph to use depends on the variable being summarized

Assist. Prof. Dr. İmran Göker Ch. 1-13

Page 14: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Graphical Presentation of Data

Techniques reviewed in this chapter:

CategoricalVariables

NumericalVariables

• Frequency distribution • Bar chart• Pie chart• Pareto diagram

• Line chart• Frequency distribution• Histogram and ogive• Stem-and-leaf display• Scatter plot

(continued)

Assist. Prof. Dr. İmran Göker Ch. 1-14

Page 15: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Tables and Graphs for Categorical Variables

Categorical Data

Graphing Data

Pie Chart

Pareto Diagram

Bar Chart

Frequency Distribution

Table

Tabulating Data

Assist. Prof. Dr. İmran Göker Ch. 1-15

Page 16: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

The Frequency Distribution Table

Example: Hospital Patients by Unit

Hospital Unit Number of Patients

Cardiac Care 1,052 Emergency 2,245Intensive Care 340Maternity 552Surgery 4,630

(Variables are categorical)

Summarize data by category

Assist. Prof. Dr. İmran Göker Ch. 1-16

Page 17: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Bar and Pie Charts

Bar charts and Pie charts are often used for qualitative (category) data

Height of bar or size of pie slice shows the frequency or percentage for each category

Assist. Prof. Dr. İmran Göker Ch. 1-17

Page 18: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Bar Chart Example

Hospital Patients by Unit

0

1000

2000

3000

4000

5000

Car

dia

cC

are

Em

erg

ency

Inte

nsi

veC

are

Mat

ern

ity

Su

rger

y

Nu

mb

er

of

pa

tie

nts

pe

r y

ea

r

Hospital Number Unit of Patients

Cardiac Care 1,052Emergency 2,245Intensive Care 340Maternity 552Surgery 4,630

Assist. Prof. Dr. İmran Göker Ch. 1-18

Page 19: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Hospital Patients by Unit

Emergency25%

Maternity6%

Surgery53%

Cardiac Care12%

Intensive Care4%

Pie Chart Example

(Percentages are rounded to the nearest percent)

Hospital Number % of Unit of Patients Total

Cardiac Care 1,052 11.93Emergency 2,245 25.46Intensive Care 340 3.86Maternity 552 6.26Surgery 4,630 52.50

Assist. Prof. Dr. İmran Göker Ch. 1-19

Page 20: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Pareto Diagram

Used to portray categorical data A bar chart, where categories are shown in

descending order of frequency A cumulative polygon is often shown in the

same graph Used to separate the “vital few” from the “trivial

many”

Assist. Prof. Dr. İmran Göker Ch. 1-20

Page 21: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Pareto Diagram Example

Example: 400 defective items are examined for cause of defect:

Source of Manufacturing Error Number of defects

Bad Weld 34

Poor Alignment 223

Missing Part 25

Paint Flaw 78

Electrical Short 19

Cracked case 21

Total 400

Assist. Prof. Dr. İmran Göker Ch. 1-21

Page 22: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Pareto Diagram Example

Step 1: Sort by defect cause, in descending orderStep 2: Determine % in each category

Source of Manufacturing Error Number of defects % of Total Defects

Poor Alignment 223 55.75

Paint Flaw 78 19.50

Bad Weld 34 8.50

Missing Part 25 6.25

Cracked case 21 5.25

Electrical Short 19 4.75

Total 400 100%

(continued)

Assist. Prof. Dr. İmran Göker Ch. 1-22

Page 23: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Pareto Diagram Examplecu

mu

lative % (lin

e grap

h)%

of

def

ects

in

eac

h c

ateg

ory

(b

ar g

rap

h)

Pareto Diagram: Cause of Manufacturing Defect

0%

10%

20%

30%

40%

50%

60%

Poor Alignment Paint Flaw Bad Weld Missing Part Cracked case Electrical Short

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Step 3: Show results graphically

(continued)

Assist. Prof. Dr. İmran Göker Ch. 1-23

Page 24: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Graphs for Time-Series Data

A line chart (time-series plot) is used to show the values of a variable over time

Time is measured on the horizontal axis

The variable of interest is measured on the vertical axis

Assist. Prof. Dr. İmran Göker Ch. 1-24

Page 25: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Line Chart Example

Magazine Subscriptions by Year

0

50

100

150

200

250

300

350

19

90

19

91

19

92

19

93

19

94

19

95

19

96

19

97

19

98

19

99

20

00

20

01

20

02

20

03

20

04

20

05

20

06

Th

ou

sa

nd

s o

f s

ub

sc

rib

ers

Assist. Prof. Dr. İmran Göker Ch. 1-25

Page 26: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Numerical Data

Stem-and-LeafDisplay

Histogram Ogive

Frequency Distributions and

Cumulative Distributions

Graphs to Describe Numerical Variables

Assist. Prof. Dr. İmran Göker Ch. 1-26

Page 27: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Frequency Distributions

What is a Frequency Distribution? A frequency distribution is a list or a table …

containing class groupings (categories or ranges within which the data fall) ...

and the corresponding frequencies with which data fall within each class or category

Assist. Prof. Dr. İmran Göker Ch. 1-27

Page 28: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Why Use Frequency Distributions?

A frequency distribution is a way to summarize data

The distribution condenses the raw data into a more useful form...

and allows for a quick visual interpretation of the data

Assist. Prof. Dr. İmran Göker Ch. 1-28

Page 29: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Class Intervals and Class Boundaries

Each class grouping has the same width Determine the width of each interval by

Use at least 5 but no more than 15-20 intervals Intervals never overlap Round up the interval width to get desirable

interval endpoints

intervalsdesiredofnumber

numbersmallestnumberlargestwidthintervalw

Assist. Prof. Dr. İmran Göker Ch. 1-29

Page 30: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Frequency Distribution Example

Example: A manufacturer of insulation randomly selects 20 winter days and records the daily high temperature

24, 35, 17, 21, 24, 37, 26, 46, 58, 30,

32, 13, 12, 38, 41, 43, 44, 27, 53, 27

Assist. Prof. Dr. İmran Göker Ch. 1-30

Page 31: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Frequency Distribution Example

Sort raw data in ascending order:12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

Find range: 58 - 12 = 46

Select number of classes: 5 (usually between 5 and 15)

Compute interval width: 10 (46/5 then round up)

Determine interval boundaries: 10 but less than 20, 20 but

less than 30, . . . , 60 but less than 70

Count observations & assign to classes

(continued)

Assist. Prof. Dr. İmran Göker Ch. 1-31

Page 32: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Frequency Distribution Example

Interval Frequency

10 but less than 20 3 .15 1520 but less than 30 6 .30 3030 but less than 40 5 .25 25 40 but less than 50 4 .20 2050 but less than 60 2 .10 10 Total 20 1.00 100

RelativeFrequency Percentage

Data in ordered array:

12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

(continued)

Assist. Prof. Dr. İmran Göker Ch. 1-32

Page 33: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Histogram

A graph of the data in a frequency distribution is called a histogram

The interval endpoints are shown on the horizontal axis

the vertical axis is either frequency, relative frequency, or percentage

Bars of the appropriate heights are used to represent the number of observations within each class

Assist. Prof. Dr. İmran Göker Ch. 1-33

Page 34: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Histogram : Daily High Tem perature

0

3

6

5

4

2

00

1

2

3

4

5

6

7

0 10 20 30 40 50 60

Fre

qu

ency

Temperature in Degrees

Histogram Example

(No gaps between

bars)

Interval

10 but less than 20 320 but less than 30 630 but less than 40 540 but less than 50 450 but less than 60 2

Frequency

0 10 20 30 40 50 60 70

Assist. Prof. Dr. İmran Göker Ch. 1-34

Page 35: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Histograms in Excel

Select Data Tab

1

Assist. Prof. Dr. İmran Göker Ch. 1-35

Click on Data Analysis

2

Page 36: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Choose Histogram

3

4

Input data range and bin range (bin range is a cell range containing the upper interval endpoints for each class grouping)

Select Chart Output and click “OK”

Histograms in Excel(continued)

(

Assist. Prof. Dr. İmran Göker Ch. 1-36

Page 37: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Questions for Grouping Data into Intervals

1. How wide should each interval be? (How many classes should be used?)

2. How should the endpoints of the intervals be determined?

Often answered by trial and error, subject to user judgment

The goal is to create a distribution that is neither too "jagged" nor too "blocky”

Goal is to appropriately show the pattern of variation in the data

Assist. Prof. Dr. İmran Göker Ch. 1-37

Page 38: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

How Many Class Intervals?

Many (Narrow class intervals) may yield a very jagged distribution

with gaps from empty classes Can give a poor indication of how

frequency varies across classes

Few (Wide class intervals) may compress variation too much and

yield a blocky distribution can obscure important patterns of

variation. 0

2

4

6

8

10

12

0 30 60 More

TemperatureF

req

ue

nc

y

0

0.5

1

1.5

2

2.5

3

3.5

4 8

12 16 20 24 28 32 36 40 44 48 52 56 60

Mor

e

Temperature

Fre

qu

ency

(X axis labels are upper class endpoints)

Assist. Prof. Dr. İmran Göker Ch. 1-38

Page 39: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

The Cumulative Frequency Distribuiton

Class

10 but less than 20 3 15 3 15

20 but less than 30 6 30 9 45

30 but less than 40 5 25 14 70

40 but less than 50 4 20 18 90

50 but less than 60 2 10 20 100

Total 20 100

Percentage Cumulative Percentage

Data in ordered array:

12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58

FrequencyCumulative Frequency

Assist. Prof. Dr. İmran Göker Ch. 1-39

Page 40: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

The OgiveGraphing Cumulative Frequencies

Ogive: Daily High Temperature

0

20

40

60

80

100

10 20 30 40 50 60Cu

mu

lati

ve P

erce

nta

ge

Interval endpoints

Interval

Less than 10 10 010 but less than 20 20 1520 but less than 30 30 4530 but less than 40 40 7040 but less than 50 50 9050 but less than 60 60 100

Cumulative Percentage

Upper interval

endpoint

Assist. Prof. Dr. İmran Göker Ch. 1-40

Page 41: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Stem-and-Leaf Diagram

A simple way to see distribution details in a data set

METHOD: Separate the sorted data series

into leading digits (the stem) and

the trailing digits (the leaves)

Assist. Prof. Dr. İmran Göker Ch. 1-41

Page 42: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Example

Here, use the 10’s digit for the stem unit:

Data in ordered array:21, 24, 24, 26, 27, 27, 30, 32, 38, 41

21 is shown as 38 is shown as

Stem Leaf

2 1

3 8

Assist. Prof. Dr. İmran Göker Ch. 1-42

Page 43: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Example

Completed stem-and-leaf diagram:Stem Leaves

2 1 4 4 6 7 7

3 0 2 8

4 1

(continued)

Data in ordered array:21, 24, 24, 26, 27, 27, 30, 32, 38, 41

Assist. Prof. Dr. İmran Göker Ch. 1-43

Page 44: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Using other stem units

Using the 100’s digit as the stem:

Round off the 10’s digit to form the leaves

613 would become 6 1 776 would become 7 8 . . . 1224 becomes 12 2

Stem Leaf

Assist. Prof. Dr. İmran Göker Ch. 1-44

Page 45: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Using other stem units

Using the 100’s digit as the stem:

The completed stem-and-leaf display:

Stem Leaves

(continued)

6 1 3 6

7 2 2 5 8

8 3 4 6 6 9 9

9 1 3 3 6 8

10 3 5 6

11 4 7

12 2

Data:

613, 632, 658, 717,722, 750, 776, 827,841, 859, 863, 891,894, 906, 928, 933,955, 982, 1034, 1047,1056, 1140, 1169, 1224

Assist. Prof. Dr. İmran Göker Ch. 1-45

Page 46: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Relationships Between Variables

Graphs illustrated so far have involved only a single variable

When two variables exist other techniques are used:

Categorical(Qualitative)

Variables

Numerical(Quantitative)

Variables

Cross tables Scatter plots

Assist. Prof. Dr. İmran Göker Ch. 1-46

Page 47: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Scatter Diagrams are used for paired observations taken from two numerical variables

The Scatter Diagram: one variable is measured on the vertical

axis and the other variable is measured on the horizontal axis

Scatter Diagrams

Assist. Prof. Dr. İmran Göker Ch. 1-47

Page 48: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Scatter Diagram Example

Cost per Day vs. Production Volume

0

50

100

150

200

250

0 10 20 30 40 50 60 70

Volume per Day

Cos

t per

Day

Volume per day

Cost per day

23 125

26 140

29 146

33 160

38 167

42 170

50 188

55 195

60 200

Assist. Prof. Dr. İmran Göker Ch. 1-48

Page 49: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Scatter Diagrams in Excel

Select the Insert tab12 Select Scatter type from

the Charts section

When prompted, enter the data range, desired legend, and desired destination to complete the scatter diagram

3

Assist. Prof. Dr. İmran Göker Ch. 1-49

Page 50: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Cross Tables

Cross Tables (or contingency tables) list the number of observations for every combination of values for two categorical or ordinal variables

If there are r categories for the first variable (rows) and c categories for the second variable (columns), the table is called an r x c cross table

Assist. Prof. Dr. İmran Göker Ch. 1-50

Page 51: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Cross Table Example

4 x 3 Cross Table for Investment Choices by Investor (values in $1000’s)

Investment Investor A Investor B Investor C Total Category

Stocks 46.5 55 27.5 129Bonds 32.0 44 19.0 95CD 15.5 20 13.5 49Savings 16.0 28 7.0 51

Total 110.0 147 67.0 324

Assist. Prof. Dr. İmran Göker Ch. 1-51

Page 52: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Graphing Multivariate Categorical Data

Side by side bar charts

(continued)

Comparing Investors

0 10 20 30 40 50 60

S toc k s

B onds

CD

S avings

Inves tor A Inves tor B Inves tor C

Assist. Prof. Dr. İmran Göker Ch. 1-52

Page 53: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Side-by-Side Chart Example Sales by quarter for three sales territories:

0

10

20

30

40

50

60

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

EastWestNorth

1st Qtr 2nd Qtr 3rd Qtr 4th QtrEast 20.4 27.4 59 20.4West 30.6 38.6 34.6 31.6North 45.9 46.9 45 43.9

Assist. Prof. Dr. İmran Göker Ch. 1-53

Page 54: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Data Presentation Errors

Goals for effective data presentation:

Present data to display essential information

Communicate complex ideas clearly and

accurately

Avoid distortion that might convey the wrong

message

Assist. Prof. Dr. İmran Göker Ch. 1-54

Page 55: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Data Presentation Errors

Unequal histogram interval widths Compressing or distorting the

vertical axis Providing no zero point on the

vertical axis Failing to provide a relative basis

in comparing data between groups

(continued)

Assist. Prof. Dr. İmran Göker Ch. 1-55

Page 56: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Describing Data Numerically

Assist. Prof. Dr. İmran Göker

Arithmetic Mean

Median

Mode

Describing Data Numerically

Variance

Standard Deviation

Coefficient of Variation

Range

Interquartile Range

Central Tendency Variation

Ch. 1-56

Page 57: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Measures of Central Tendency

Assist. Prof. Dr. İmran Göker

Central Tendency

Mean Median Mode

n

xx

n

1ii

Overview

Midpoint of ranked values

Most frequently observed value

Arithmetic average

Ch. 1-57

2.1

Page 58: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Arithmetic Mean

The arithmetic mean (mean) is the most common measure of central tendency

For a population of N values:

For a sample of size n:

Assist. Prof. Dr. İmran GökerSample size

n

xxx

n

xx n21

n

1ii

Observed

values

N

xxx

N

xμ N21

N

1ii

Population size

Population values

Ch. 1-58

Page 59: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Arithmetic Mean

The most common measure of central tendency Mean = sum of values divided by the number of values Affected by extreme values (outliers)

Assist. Prof. Dr. İmran Göker

(continued)

0 1 2 3 4 5 6 7 8 9 10

Mean = 3

0 1 2 3 4 5 6 7 8 9 10

Mean = 4

35

15

5

54321

4

5

20

5

104321

Ch. 1-59

Page 60: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Median

In an ordered list, the median is the “middle” number (50% above, 50% below)

Not affected by extreme values

Assist. Prof. Dr. İmran Göker

0 1 2 3 4 5 6 7 8 9 10

Median = 3

0 1 2 3 4 5 6 7 8 9 10

Median = 3

Ch. 1-60

Page 61: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Finding the Median

The location of the median:

If the number of values is odd, the median is the middle number If the number of values is even, the median is the average of

the two middle numbers

Note that is not the value of the median, only the

position of the median in the ranked data

Assist. Prof. Dr. İmran Göker

dataorderedtheinposition2

1npositionMedian

2

1n

Ch. 1-61

Page 62: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Mode

A measure of central tendency Value that occurs most often Not affected by extreme values Used for either numerical or categorical data There may may be no mode There may be several modes

Assist. Prof. Dr. İmran Göker

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Mode = 9

0 1 2 3 4 5 6

No Mode

Ch. 1-62

Page 63: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Review Example

Assist. Prof. Dr. İmran Göker

Five houses on a hill by the beach

$2,000 K

$500 K

$300 K

$100 K

$100 K

House Prices:

$2,000,000 500,000 300,000 100,000 100,000

Ch. 1-63

Page 64: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Review Example:Summary Statistics

Assist. Prof. Dr. İmran Göker

Mean: ($3,000,000/5)

= $600,000

Median: middle value of ranked data = $300,000

Mode: most frequent value = $100,000

House Prices:

$2,000,000 500,000 300,000 100,000 100,000

Sum 3,000,000

Ch. 1-64

Page 65: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Which measure of location is the “best”?

Assist. Prof. Dr. İmran Göker

Mean is generally used, unless extreme values (outliers) exist . . .

Then median is often used, since the median is not sensitive to extreme values. Example: Median home prices may be reported for

a region – less sensitive to outliers

Ch. 1-65

Page 66: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Shape of a Distribution

Describes how data are distributed Measures of shape

Symmetric or skewed

Assist. Prof. Dr. İmran Göker

Mean = Median Mean < Median Median < Mean

Right-SkewedLeft-Skewed Symmetric

Ch. 1-66

Page 67: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Assist. Prof. Dr. İmran Göker

Geometric Mean

Geometric mean Used to measure the rate of change of a variable

over time

Geometric mean rate of return Measures the status of an investment over time

Where xi is the rate of return in time period i

1/nn21

nn21g )xx(x)xx(xx

1)x...x(xr 1/nn21g

Ch. 1-67

Page 68: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Assist. Prof. Dr. İmran Göker

Example

An investment of $100,000 rose to $150,000 at the end of year one and increased to $180,000 at end of year two:

$180,000X$150,000X$100,000X 321

50% increase 20% increase

What is the mean percentage return over time?

Ch. 1-68

Page 69: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Assist. Prof. Dr. İmran Göker

Example

Use the 1-year returns to compute the arithmetic mean and the geometric mean:

30.623%131.6231(1000)

1(20)][(50)

1)x(xr

1/2

1/2

1/n21g

35%2

(20%)(50%)X

Arithmetic mean rate of return:

Geometric mean rate of return:

Misleading result

More accurate result

(continued)

Ch. 1-69

Page 70: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Measures of Variability

Assist. Prof. Dr. İmran Göker

Same center, different variation

Variation

Variance Standard Deviation

Coefficient of Variation

Range Interquartile Range

Measures of variation give information on the spread or variability of the data values.

Ch. 1-70

Page 71: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Range

Simplest measure of variation Difference between the largest and the smallest

observations:

Assist. Prof. Dr. İmran Göker

Range = Xlargest – Xsmallest

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 14 - 1 = 13

Example:

Ch. 1-71

Page 72: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Disadvantages of the Range

Ignores the way in which data are distributed

Sensitive to outliers

Assist. Prof. Dr. İmran Göker

7 8 9 10 11 12

Range = 12 - 7 = 5

7 8 9 10 11 12

Range = 12 - 7 = 5

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120

Range = 5 - 1 = 4

Range = 120 - 1 = 119

Ch. 1-72

Page 73: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Interquartile Range

Can eliminate some outlier problems by using the interquartile range

Eliminate high- and low-valued observations and calculate the range of the middle 50% of the data

Interquartile range = 3rd quartile – 1st quartile

IQR = Q3 – Q1

Assist. Prof. Dr. İmran Göker Ch. 1-73

Page 74: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Interquartile Range

Assist. Prof. Dr. İmran Göker

Median(Q2)

XmaximumX

minimum Q1 Q3

Example:

25% 25% 25% 25%

12 30 45 57 70

Interquartile range = 57 – 30 = 27

Ch. 1-74

Box-and-Whisker Plots

Page 75: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Quartiles

Quartiles split the ranked data into 4 segments with an equal number of values per segment

Assist. Prof. Dr. İmran Göker

25% 25% 25% 25%

The first quartile, Q1, is the value for which 25% of the observations are smaller and 75% are larger

Q2 is the same as the median (50% are smaller, 50% are larger)

Only 25% of the observations are greater than the third quartile

Q1 Q2 Q3

Ch. 1-75

Page 76: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Quartile Formulas

Assist. Prof. Dr. İmran Göker

Find a quartile by determining the value in the appropriate position in the ranked data, where

First quartile position: Q1 = 0.25(n+1)

Second quartile position: Q2 = 0.50(n+1) (the median position)

Third quartile position: Q3 = 0.75(n+1)

where n is the number of observed values

Ch. 1-76

Page 77: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Quartiles

Assist. Prof. Dr. İmran Göker

(n = 9)

Q1 = is in the 0.25(9+1) = 2.5 position of the ranked data

so use the value half way between the 2nd and 3rd values,

so Q1 = 12.5

Sample Ranked Data: 11 12 13 16 16 17 18 21 22

Example: Find the first quartile

Ch. 1-77

Page 78: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Population Variance

Average of squared deviations of values from the mean

Population variance:

Assist. Prof. Dr. İmran Göker

N

μ)(xσ

N

1i

2i

2

Where = population mean

N = population size

xi = ith value of the variable x

μ

Ch. 1-78

Page 79: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Sample Variance

Average (approximately) of squared deviations of values from the mean

Sample variance:

Assist. Prof. Dr. İmran Göker

1-n

)x(xs

n

1i

2i

2

Where = arithmetic mean

n = sample size

Xi = ith value of the variable X

X

Ch. 1-79

Page 80: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Population Standard Deviation

Most commonly used measure of variation Shows variation about the mean Has the same units as the original data

Population standard deviation:

Assist. Prof. Dr. İmran Göker

N

μ)(xσ

N

1i

2i

Ch. 1-80

Page 81: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Sample Standard Deviation

Most commonly used measure of variation Shows variation about the mean Has the same units as the original data

Sample standard deviation:

Assist. Prof. Dr. İmran Göker

1-n

)x(xS

n

1i

2i

Ch. 1-81

Page 82: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Calculation Example:Sample Standard Deviation

Assist. Prof. Dr. İmran Göker

Sample Data (xi) : 10 12 14 15 17 18 18 24

n = 8 Mean = x = 16

4.24267

126

18

16)(2416)(1416)(1216)(10

1n

)x(24)x(14)x(12)X(10s

2222

2222

A measure of the “average” scatter around the mean

Ch. 1-82

Page 83: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Measuring variation

Assist. Prof. Dr. İmran Göker

Small standard deviation

Large standard deviation

Ch. 1-83

Page 84: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Comparing Standard Deviations

Assist. Prof. Dr. İmran Göker

Mean = 15.5 s = 3.338 11 12 13 14 15 16 17 18 19 20 21

11 12 13 14 15 16 17 18 19 20 21

Data B

Data A

Mean = 15.5 s = 0.926

11 12 13 14 15 16 17 18 19 20 21

Mean = 15.5 s = 4.570

Data C

Ch. 1-84

Page 85: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Advantages of Variance and Standard Deviation

Each value in the data set is used in the calculation

Values far from the mean are given extra weight (because deviations from the mean are squared)

Assist. Prof. Dr. İmran Göker Ch. 1-85

Page 86: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Coefficient of Variation

Measures relative variation Always in percentage (%) Shows variation relative to mean Can be used to compare two or more sets of

data measured in different units

Assist. Prof. Dr. İmran Göker

100%x

sCV

Ch. 1-86

Page 87: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Comparing Coefficient of Variation

Stock A: Average price last year = $50 Standard deviation = $5

Stock B: Average price last year = $100 Standard deviation = $5

Assist. Prof. Dr. İmran Göker

Both stocks have the same standard deviation, but stock B is less variable relative to its price

10%100%$50

$5100%

x

sCVA

5%100%$100

$5100%

x

sCVB

Ch. 1-87

Page 88: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Using Microsoft Excel

Descriptive Statistics can be obtained from Microsoft® Excel

Select:

data / data analysis / descriptive statistics

Enter details in dialog box

Assist. Prof. Dr. İmran Göker Ch. 1-88

Page 89: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Using Excel

Assist. Prof. Dr. İmran Göker

Select data / data analysis / descriptive statistics

Ch. 1-89

Page 90: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Using Excel

Enter input range details

Check box for summary statistics

Click OK

Assist. Prof. Dr. İmran Göker Ch. 1-90

Page 91: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Excel output

Assist. Prof. Dr. İmran Göker

Microsoft Excel

descriptive statistics output,

using the house price data:

House Prices:

$2,000,000 500,000 300,000 100,000 100,000

Ch. 1-91

Page 92: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

For any population with mean μ and standard deviation σ , and k > 1 , the percentage of observations that fall within the interval

[μ + kσ] Is at least

Assist. Prof. Dr. İmran Göker

Chebychev’s Theorem

)]%(1/k100[1 2

Ch. 1-92

Page 93: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Regardless of how the data are distributed, at least (1 - 1/k2) of the values will fall within k standard deviations of the mean (for k > 1)

Examples:

(1 - 1/1.52) = 55.6% ……... k = 1.5 (μ ± 1.5σ)

(1 - 1/22) = 75% …........... k = 2 (μ ± 2σ)

(1 - 1/32) = 89% …….…... k = 3 (μ ± 3σ)

Assist. Prof. Dr. İmran Göker

Chebychev’s Theorem

withinAt least

(continued)

Ch. 1-93

Page 94: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

If the data distribution is bell-shaped, then the interval:

contains about 68% of the values in the population or the sample

Assist. Prof. Dr. İmran Göker

The Empirical Rule

1σμ

μ

68%

1σμCh. 1-94

Page 95: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

contains about 95% of the values in the population or the sample

contains almost all (about 99.7%) of the values in the population or

the sample

Assist. Prof. Dr. İmran Göker

The Empirical Rule

2σμ

3σμ

3σμ

99.7%95%

2σμ

Ch. 1-95

Page 96: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Weighted Mean

The weighted mean of a set of data is

Where wi is the weight of the ith observation

and

Use when data is already grouped into n classes, with wi values in the ith class

Assist. Prof. Dr. İmran Göker

n

xwxwxw

n

xwx nn2211

n

1iii

Ch. 1-96

iwn

Page 97: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Approximations for Grouped Data

Suppose data are grouped into K classes, with frequencies f1, f2, . . . fK, and the midpoints of the classes are m1, m2, . . ., mK

For a sample of n observations, the mean is

Assist. Prof. Dr. İmran Göker

n

mfx

K

1iii

K

1iifnwhere

Ch. 1-97

Page 98: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Approximations for Grouped Data

Suppose data are grouped into K classes, with frequencies f1, f2, . . . fK, and the midpoints of the classes are m1, m2, . . ., mK

For a sample of n observations, the variance is

Assist. Prof. Dr. İmran Göker Ch. 1-98

1n

)x(mfs

K

1i

2ii

2

Page 99: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

The Sample Covariance The covariance measures the strength of the linear relationship

between two variables

The population covariance:

The sample covariance:

Only concerned with the strength of the relationship No causal effect is implied

Assist. Prof. Dr. İmran Göker

N

))(y(xy),(xCov

N

1iyixi

xy

1n

)y)(yx(xsy),(xCov

n

1iii

xy

Ch. 1-99

Page 100: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Interpreting Covariance

Covariance between two variables:

Cov(x,y) > 0 x and y tend to move in the same direction

Cov(x,y) < 0 x and y tend to move in opposite directions

Cov(x,y) = 0 x and y are independent

Assist. Prof. Dr. İmran Göker Ch. 1-100

Page 101: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Coefficient of Correlation

Measures the relative strength of the linear relationship between two variables

Population correlation coefficient:

Sample correlation coefficient:

Assist. Prof. Dr. İmran Göker

YX ss

y),(xCovr

YXσσ

y),(xCovρ

Ch. 1-101

Page 102: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Features of Correlation Coefficient, r

Unit free Ranges between –1 and 1 The closer to –1, the stronger the negative linear

relationship The closer to 1, the stronger the positive linear

relationship The closer to 0, the weaker any positive linear

relationship

Assist. Prof. Dr. İmran Göker Ch. 1-102

Page 103: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Scatter Plots of Data with Various Correlation Coefficients

Assist. Prof. Dr. İmran Göker

Y

X

Y

X

Y

X

Y

X

Y

X

r = -1 r = -.6 r = 0

r = +.3r = +1

Y

Xr = 0

Ch. 1-103

Page 104: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Using Excel to Find the Correlation Coefficient

Select Data / Data Analysis

Assist. Prof. Dr. İmran Göker Ch. 1-104

Choose Correlation from the selection menu Click OK . . .

Page 105: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Using Excel to Find the Correlation Coefficient

Input data range and select appropriate options

Click OK to get output

Assist. Prof. Dr. İmran Göker

(continued)

Ch. 1-105

Page 106: Chapter 1 Describing Data: Graphical and Numerical PROBABILITY (6MTCOAE205)

Interpreting the Result

r = .733

There is a relatively strong positive linear relationship between test score #1 and test score #2

Students who scored high on the first test tended to score high on second test

Assist. Prof. Dr. İmran Göker

Scatter Plot of Test Scores

70

75

80

85

90

95

100

70 75 80 85 90 95 100

Test #1 ScoreT

est

#2 S

core

Ch. 1-106