statistics for decision making sta 253

44
Statistics for Decision Making STA 253 Dr. Ginner W. Hudson Covenant College 1

Upload: raoul

Post on 24-Feb-2016

36 views

Category:

Documents


0 download

DESCRIPTION

Dr. Ginner W. Hudson Covenant College. Statistics for Decision Making STA 253. 1.1 Examining Distributions - Intro. A statistical analysis starts with a set of … Data We construct a set of data by first deciding what cases or individuals that we want to study. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Statistics for Decision Making STA 253

1

Statistics forDecision MakingSTA 253

Dr. Ginner W. HudsonCovenant College

Page 2: Statistics for Decision Making STA 253

2

1.1 Examining Distributions - Intro

A statistical analysis starts with a set of …

Data We construct a set of data by first

deciding what cases or individuals that we want to study.

For each case/individual we record information about characteristics that we call variables.

Page 3: Statistics for Decision Making STA 253

3

Constructing Our Data Set

Looking at data …Individuals, cases, records

the WHOObservation Takes

PlaceVariable – a characteristic of a

case

the WHAT

Page 4: Statistics for Decision Making STA 253

4

Important terms

Individuals (cases, records): Objects described by the data. Ex: customers, cities, patients, cars

Variable: A characteristic of a case. Ex: profit, duration of a service call, number of customers, gender Different cases can have different values for the variables. Some variables may be a label to distinguish the different cases.

Distribution of a variable: the values the variable takes and how often it takes them.

Page 5: Statistics for Decision Making STA 253

5

To better understand a data set, ask:

Who? What cases (individuals) do the data describe? How many cases (individuals)?

Think of an assembly line with the WHO passing by on the conveyor belt and the variables of interest being observed.

Page 6: Statistics for Decision Making STA 253

6

To better understand a data set, ask:

Who? What cases do the data describe? How many cases?

What? How many variables? What is the exact definition of each variable? What is the unit of measurement for each variable?

Why? What is the purpose of the data? What questions are being asked? Are the variables suitable?

Page 7: Statistics for Decision Making STA 253

7

Types of variables

Quantitative Variable: Takes numerical values for which we can do

arithmetic Ex: credit card balance, number of employees,

time until customer is served, age

Discrete or continuous? Categorical Variable:

Places a case into one of several groups or categories

Ex: gender, brand of credit card, own a home (yes/no)

Page 8: Statistics for Decision Making STA 253

8

Example: An iTunes playlist

Page 9: Statistics for Decision Making STA 253

9

Example: Grade book data for statistics course

Page 10: Statistics for Decision Making STA 253

10

Example: The FAA

The Federal Aviation Administration (FAA) monitors airlines for safety and customer service. For each flight the carrier must report the type of aircraft, number of passengers, whether or not the flights departed and arrived on schedule, and any mechanical problems.

Identify the WHO.• The FAA• The airline carriers• The passengers• The flights• None of the above

Page 11: Statistics for Decision Making STA 253

11

Page 12: Statistics for Decision Making STA 253

12

Example: The common cold

Scientists at a major pharmaceutical firm conducted an experiment to study the effectiveness of an herbal compound to treat the common cold. They exposed volunteers to a cold virus, then gave them either the herbal compound or a useless sugar solution. Several days later they assessed each patient’s condition using a cold severity scale ranging from 0-5.

Identify the WHO.• Scientists• Volunteers• The pharmaceutical firm• The herbal compound• None of the above

Page 13: Statistics for Decision Making STA 253

13

Page 14: Statistics for Decision Making STA 253

14

Displaying distributions with graphs

Ways to chart categorical data Bar/column graphs (called Pareto

charts when ordered) Pie charts

Ways to chart quantitative data Histograms Stemplots Time plots

Page 15: Statistics for Decision Making STA 253

15

Law firm example

A law firm studies the gender of their clients. They find 55% are males and 45% are females.

Cases: Variable: Distribution:

Values: Male, Female How often: 55% and 45%, respectively

Are the data (the variable) categorical or quantitative?

Page 16: Statistics for Decision Making STA 253

16

Page 17: Statistics for Decision Making STA 253

17

Credit card example

A credit card company studies the spending behavior of their 21- to 25- year-old customers with a $1000 credit limit. They randomly select 100 of them and record the following variables for each person. For each item identify the type of variable.

Average balance on their card over the last year Whether customer has ever made late payments Which day of the week their card is used the most Customer’s age (in years)

Page 18: Statistics for Decision Making STA 253

18

Credit card example

For each item, give its possible values.

Average balance on their card over the last year Quantitative: $0.00 through $1000.00

Whether customer has ever made late payments Categorical: Yes, No

Which day of the week their card is used the most Categorical: Sunday, Monday, Tuesday, …, Saturday

Customer’s age (in years) Quantitative: 21, 22, 23, 24, 25 years

Page 19: Statistics for Decision Making STA 253

19

Displaying categorical data Purpose:

Summarize the data so the reader can grasp the distribution quickly

Process: List the categories Give either the count or the percent of cases that fall into each category

Methods: Tables, pie charts, bar/column graphs, Pareto charts

Page 20: Statistics for Decision Making STA 253

20

Ways to chart categorical dataBecause the variable is categorical, the data in the graph can be ordered any way we want (alphabetical, by increasing value, by year, by personal preference, etc.).

Bar graphsEach category is represented by

a bar.

Pie chartsThe slices must represent the parts of

one whole.

Page 21: Statistics for Decision Making STA 253

23

Bar graph sorted by rank (Pareto Chart) Easy to analyze

Automobile Accidents per day of the week

Sorted chronologically Much less useful

Page 22: Statistics for Decision Making STA 253

24

Ways to chart quantitative data

Histograms and stemplotsThese are summary graphs for a single variable. They are very useful to understand the pattern of variability in the data.

Line graphs: time plotsUse when there is a meaningful sequence, like time. The line connecting the points helps emphasize any change over time.

Page 23: Statistics for Decision Making STA 253

Histograms

The range of values that a variable can take is divided into equal size intervals.

The histogram shows the number of individual data points that fall in each interval. Example: Histogram of the

December 2004 unemployment rates in the 50 states and Puerto Rico.

Page 24: Statistics for Decision Making STA 253

26

How to create a histogram

It is an iterative process – try and try again.What bin size should you use?

Not too many bins with either 0 or 1 counts Not overly summarized that you loose all the

information Not so detailed that it is no longer summary

rule of thumb: start with 5 to10 bins

Look at the distribution and refine your bins

(There isn’t a unique or “perfect” solution)

Page 25: Statistics for Decision Making STA 253

Interpreting histograms

When describing the distribution of a quantitative variable, we look for the overall pattern and for striking deviations from that pattern. We can describe the overall pattern of a histogram by its shape, center, and spread.

Histogram with a line connecting each column too

detailed

Histogram with a smoothed curve highlighting the overall

pattern of the distribution

Page 26: Statistics for Decision Making STA 253

28

Common distribution patterns (shapes)

SymmetricLeft and right sides are mirror images of each other (or

close)

Page 27: Statistics for Decision Making STA 253

29

Common distribution patterns (shapes)

Skewed leftLeft side extends farther out than the right side

Page 28: Statistics for Decision Making STA 253

30

Common distribution patterns (shapes)

Skewed rightRight side extends farther out than the left side

Page 29: Statistics for Decision Making STA 253

31

Common distribution patterns (shapes)

Many shapes are bimodel or complexTwo peaksFirst part symmetric; flat in the middle; increasing at the

end

Page 30: Statistics for Decision Making STA 253

32

Outliers

An important kind of deviation is an outlier.

Outliers are observations that lie outside the overall pattern of a distribution. Always look for outliers and try to explain them.

Page 31: Statistics for Decision Making STA 253

33

Alaska Florida

Outliers

The overall pattern is fairly symmetrical except for two states clearly not belonging to the main trend. Alaska and Florida have unusual representation of the elderly in their population.

A large gap in the distribution is typically a sign of an outlier.

Page 32: Statistics for Decision Making STA 253

34

IMPORTANT NOTE:Your data are the way they are. Do not try to force them into a particular shape.

Example: US Female Population 1997

Page 33: Statistics for Decision Making STA 253

35

It is a common misconception that if you have a large enough data set, the data will eventually turn out nice and symmetrical.

Example: Dry Days per Month 1995

Histogram of dry days in 1995

Page 34: Statistics for Decision Making STA 253

36

Example: Customer Service Center Call Lengths

Page 35: Statistics for Decision Making STA 253

37

Example: Customer Service Center Call Lengths

Why were there so many calls lasting 10 seconds or less?

Page 36: Statistics for Decision Making STA 253

38

Example: Customer Service Center Call Lengths

Example: Customer Service Center Call LengthsThe inappropriate actions by customer service reps were hidden in this histogram where the software chose the classes (bin intervals).

Page 37: Statistics for Decision Making STA 253

42

Example: Constructing a Histogram

Class Exercise: GDP by Country

2005 Growth Domestic Product (GDP)

Growth Rates for 30 Industrialized Countries

Country Growth Rate %

Turkey 7.4Czech Republic 6.1

Slovakia 6.1

Hungary 4.1

South Korea 4.0

Luxembourg 4.0

Greece 3.7

Poland 3.4

Spain 3.4

Denmark 3.2

United States 3.2

Mexico 3.0

Canada 2.9

Finland 2.9

Sweden 2.7

Japan 2.6

Australia 2.5

New Zealand 2.3

Norway 2.3

Austria 2.0

Switzerland 1.9

United Kingdom 1.9

Belgium 1.5

Netherlands 1.5

France 1.2

Germany 0.9

Portugal 0.4

Italy 0.0

               

               

               

               

               

               

               

               

               

               

Page 38: Statistics for Decision Making STA 253

43

StatTutor

StatsPortal

Page 39: Statistics for Decision Making STA 253

Example: T-bill interest rates

Page 40: Statistics for Decision Making STA 253

What is this type of plot called?

Page 41: Statistics for Decision Making STA 253

What is a Time Series?

Time series -- observations collected over time

Time plot -- plot of the data over time

Page 42: Statistics for Decision Making STA 253

Identifying Trends in the Data

Trend- gradual increases or decreases over time

0

10

20

30

40

50

60

70

1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004Year

In m

illio

ns

Annual Sales – XYZ Company

Page 43: Statistics for Decision Making STA 253

Other Common Components Of Time Series

0

5

10

15

20

25

30

35

1st 2nd 3rd 4th 1st 2nd 3rd 4th 1st 2nd 3rd 4th 1st 2nd 3rd0

10

20

30

40

50

60

70

'80 '81 '82 '83 '84 '85 '86 '87 '88 '89 '90 '91 '92 '93 '94 '95 '96 '97 '98 '99

Seasonality Cycles

Quarter Year

Page 44: Statistics for Decision Making STA 253

Line Graphs: Time Plots

Retail Price of Fresh Oranges over Time

This time plot shows a regular pattern of yearly variations. These are seasonal variations in fresh orange pricing most likely due to similar seasonal variations in the production of fresh oranges. There is also an overall upward trend in pricing over time. It could simply be reflecting inflation trends or a more fundamental change in this industry.

Time is on the horizontal, x axis. The variable of interest—here “retail price of fresh oranges”— goes on the vertical, y axis.