statistics for decision making sta 253
DESCRIPTION
Dr. Ginner W. Hudson Covenant College. Statistics for Decision Making STA 253. 1.1 Examining Distributions - Intro. A statistical analysis starts with a set of … Data We construct a set of data by first deciding what cases or individuals that we want to study. - PowerPoint PPT PresentationTRANSCRIPT
1
Statistics forDecision MakingSTA 253
Dr. Ginner W. HudsonCovenant College
2
1.1 Examining Distributions - Intro
A statistical analysis starts with a set of …
Data We construct a set of data by first
deciding what cases or individuals that we want to study.
For each case/individual we record information about characteristics that we call variables.
3
Constructing Our Data Set
Looking at data …Individuals, cases, records
the WHOObservation Takes
PlaceVariable – a characteristic of a
case
the WHAT
4
Important terms
Individuals (cases, records): Objects described by the data. Ex: customers, cities, patients, cars
Variable: A characteristic of a case. Ex: profit, duration of a service call, number of customers, gender Different cases can have different values for the variables. Some variables may be a label to distinguish the different cases.
Distribution of a variable: the values the variable takes and how often it takes them.
5
To better understand a data set, ask:
Who? What cases (individuals) do the data describe? How many cases (individuals)?
Think of an assembly line with the WHO passing by on the conveyor belt and the variables of interest being observed.
6
To better understand a data set, ask:
Who? What cases do the data describe? How many cases?
What? How many variables? What is the exact definition of each variable? What is the unit of measurement for each variable?
Why? What is the purpose of the data? What questions are being asked? Are the variables suitable?
7
Types of variables
Quantitative Variable: Takes numerical values for which we can do
arithmetic Ex: credit card balance, number of employees,
time until customer is served, age
Discrete or continuous? Categorical Variable:
Places a case into one of several groups or categories
Ex: gender, brand of credit card, own a home (yes/no)
8
Example: An iTunes playlist
9
Example: Grade book data for statistics course
10
Example: The FAA
The Federal Aviation Administration (FAA) monitors airlines for safety and customer service. For each flight the carrier must report the type of aircraft, number of passengers, whether or not the flights departed and arrived on schedule, and any mechanical problems.
Identify the WHO.• The FAA• The airline carriers• The passengers• The flights• None of the above
11
12
Example: The common cold
Scientists at a major pharmaceutical firm conducted an experiment to study the effectiveness of an herbal compound to treat the common cold. They exposed volunteers to a cold virus, then gave them either the herbal compound or a useless sugar solution. Several days later they assessed each patient’s condition using a cold severity scale ranging from 0-5.
Identify the WHO.• Scientists• Volunteers• The pharmaceutical firm• The herbal compound• None of the above
13
14
Displaying distributions with graphs
Ways to chart categorical data Bar/column graphs (called Pareto
charts when ordered) Pie charts
Ways to chart quantitative data Histograms Stemplots Time plots
15
Law firm example
A law firm studies the gender of their clients. They find 55% are males and 45% are females.
Cases: Variable: Distribution:
Values: Male, Female How often: 55% and 45%, respectively
Are the data (the variable) categorical or quantitative?
16
17
Credit card example
A credit card company studies the spending behavior of their 21- to 25- year-old customers with a $1000 credit limit. They randomly select 100 of them and record the following variables for each person. For each item identify the type of variable.
Average balance on their card over the last year Whether customer has ever made late payments Which day of the week their card is used the most Customer’s age (in years)
18
Credit card example
For each item, give its possible values.
Average balance on their card over the last year Quantitative: $0.00 through $1000.00
Whether customer has ever made late payments Categorical: Yes, No
Which day of the week their card is used the most Categorical: Sunday, Monday, Tuesday, …, Saturday
Customer’s age (in years) Quantitative: 21, 22, 23, 24, 25 years
19
Displaying categorical data Purpose:
Summarize the data so the reader can grasp the distribution quickly
Process: List the categories Give either the count or the percent of cases that fall into each category
Methods: Tables, pie charts, bar/column graphs, Pareto charts
20
Ways to chart categorical dataBecause the variable is categorical, the data in the graph can be ordered any way we want (alphabetical, by increasing value, by year, by personal preference, etc.).
Bar graphsEach category is represented by
a bar.
Pie chartsThe slices must represent the parts of
one whole.
23
Bar graph sorted by rank (Pareto Chart) Easy to analyze
Automobile Accidents per day of the week
Sorted chronologically Much less useful
24
Ways to chart quantitative data
Histograms and stemplotsThese are summary graphs for a single variable. They are very useful to understand the pattern of variability in the data.
Line graphs: time plotsUse when there is a meaningful sequence, like time. The line connecting the points helps emphasize any change over time.
Histograms
The range of values that a variable can take is divided into equal size intervals.
The histogram shows the number of individual data points that fall in each interval. Example: Histogram of the
December 2004 unemployment rates in the 50 states and Puerto Rico.
26
How to create a histogram
It is an iterative process – try and try again.What bin size should you use?
Not too many bins with either 0 or 1 counts Not overly summarized that you loose all the
information Not so detailed that it is no longer summary
rule of thumb: start with 5 to10 bins
Look at the distribution and refine your bins
(There isn’t a unique or “perfect” solution)
Interpreting histograms
When describing the distribution of a quantitative variable, we look for the overall pattern and for striking deviations from that pattern. We can describe the overall pattern of a histogram by its shape, center, and spread.
Histogram with a line connecting each column too
detailed
Histogram with a smoothed curve highlighting the overall
pattern of the distribution
28
Common distribution patterns (shapes)
SymmetricLeft and right sides are mirror images of each other (or
close)
29
Common distribution patterns (shapes)
Skewed leftLeft side extends farther out than the right side
30
Common distribution patterns (shapes)
Skewed rightRight side extends farther out than the left side
31
Common distribution patterns (shapes)
Many shapes are bimodel or complexTwo peaksFirst part symmetric; flat in the middle; increasing at the
end
32
Outliers
An important kind of deviation is an outlier.
Outliers are observations that lie outside the overall pattern of a distribution. Always look for outliers and try to explain them.
33
Alaska Florida
Outliers
The overall pattern is fairly symmetrical except for two states clearly not belonging to the main trend. Alaska and Florida have unusual representation of the elderly in their population.
A large gap in the distribution is typically a sign of an outlier.
34
IMPORTANT NOTE:Your data are the way they are. Do not try to force them into a particular shape.
Example: US Female Population 1997
35
It is a common misconception that if you have a large enough data set, the data will eventually turn out nice and symmetrical.
Example: Dry Days per Month 1995
Histogram of dry days in 1995
36
Example: Customer Service Center Call Lengths
37
Example: Customer Service Center Call Lengths
Why were there so many calls lasting 10 seconds or less?
38
Example: Customer Service Center Call Lengths
Example: Customer Service Center Call LengthsThe inappropriate actions by customer service reps were hidden in this histogram where the software chose the classes (bin intervals).
42
Example: Constructing a Histogram
Class Exercise: GDP by Country
2005 Growth Domestic Product (GDP)
Growth Rates for 30 Industrialized Countries
Country Growth Rate %
Turkey 7.4Czech Republic 6.1
Slovakia 6.1
Hungary 4.1
South Korea 4.0
Luxembourg 4.0
Greece 3.7
Poland 3.4
Spain 3.4
Denmark 3.2
United States 3.2
Mexico 3.0
Canada 2.9
Finland 2.9
Sweden 2.7
Japan 2.6
Australia 2.5
New Zealand 2.3
Norway 2.3
Austria 2.0
Switzerland 1.9
United Kingdom 1.9
Belgium 1.5
Netherlands 1.5
France 1.2
Germany 0.9
Portugal 0.4
Italy 0.0
Example: T-bill interest rates
What is this type of plot called?
What is a Time Series?
Time series -- observations collected over time
Time plot -- plot of the data over time
Identifying Trends in the Data
Trend- gradual increases or decreases over time
0
10
20
30
40
50
60
70
1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004Year
In m
illio
ns
Annual Sales – XYZ Company
Other Common Components Of Time Series
0
5
10
15
20
25
30
35
1st 2nd 3rd 4th 1st 2nd 3rd 4th 1st 2nd 3rd 4th 1st 2nd 3rd0
10
20
30
40
50
60
70
'80 '81 '82 '83 '84 '85 '86 '87 '88 '89 '90 '91 '92 '93 '94 '95 '96 '97 '98 '99
Seasonality Cycles
Quarter Year
Line Graphs: Time Plots
Retail Price of Fresh Oranges over Time
This time plot shows a regular pattern of yearly variations. These are seasonal variations in fresh orange pricing most likely due to similar seasonal variations in the production of fresh oranges. There is also an overall upward trend in pricing over time. It could simply be reflecting inflation trends or a more fundamental change in this industry.
Time is on the horizontal, x axis. The variable of interest—here “retail price of fresh oranges”— goes on the vertical, y axis.