discrete or continuous types of data continuousdiscrete categorical quantitative (numerical)...
TRANSCRIPT
Discrete or Continuous
Types of data
ContinuousDiscrete
CategoricalQuantitative(numerical)
Discrete
Two Types of Variables
A Numerical Variable describes quantities of the objects of interest. Data values are numbers.Weight of an infantNumber of sexual partnersTime to run the mile
A Categorical Variable describes qualities of the objects of interest. Data values are usually words.Skin colorBirth city Last Name
Example: Numerical or Categorical?
• Numerical– Age– Units– GPA
Categorical
Gender Major Housing
Age Gender Major Units Housing GPA
18 Male Psychology 16 Dorm 3.6
21 Male Nursing 15 Parents 3.1
20 Female Business 16 Apartment 2.8
Numerical or Categorical?
• Coding Categorical Data with Numbers: Although the above data values are numbers, the variable is still categorical.
• Reason for Coding: Easier to input into a computer.
Why are you in college? Answer:1. Person Growth 2. Career Opportunities3. Parental Pressure 4. Personal Networking
Results from 12 participants: 1, 4, 3, 2, 2, 1, 2, 3, 3, 1, 4, 2
Scales of Measurement
Ratio •Ordered categories•Equal interval between categories•Absolute zero point
•Number of correct answers•Time to complete task•Gain in height since last year
Scale Characteristics Examples
Nominal •Label and categorize •No quantitative distinctions
•Gender•Diagnosis•Experimental or Control
Ordinal •Categorizes observations•Categories organized by size or magnitude
•Rank in class•Clothing sizes (S,M,L,XL)•Olympic medals
Interval •Ordered categories•Interval between categories of equal size•Arbitrary or absent zero point
•Temperature
What kinds of data are typically collected?
• Nominal Data – no ordering, e.g. it makes no sense to state that F > M
– arbitrary labels, e.g., m/f, 0/1, etc • Ordinal Data
– ordered but differences between values are not important – e.g., Likert scales, rank on a scale of 1..5 your degree of satisfaction
• Interval Data – ordered, constant scale, but no natural zero
– differences make sense, but ratios do not (e.g., 30°-20°=20°-10°, but 20°/10° is not twice as hot!
• Ratio Data – ordered, constant scale, natural zero
– e.g., height, weight, age, length
RatioNominal Ordinal Interval
ContinuousCategorical
Example 2.3Frequency, Proportion and Percent
X f p = f/N percent = p(100)
5 1 1/10 = .10 10%
4 2 2/10 = .20 20%
3 3 3/10 = .30 30%
2 3 3/10 = .30 30%
1 1 1/10 = .10 10%
Displaying distributions Qualitative variables
• Pie Charts• Bar Graphs
PIE CHART FOR THE TASTE TEST
Others
Coca-Cola
Pepsi
Dr Pepper
Seven up
Graphs for Nominal or Ordinal Data
• For non-numerical scores (nominal and ordinal data), use a bar graph
• without a particular order (nominal)• non-measurable width (ordinal)
Bar graph
BAR CHART FOR THE AIDS DATA
1 ATLANTA
2 AUSTIN
3 DALLAS
4 HOUSTON
5 NY, NY.
6 SAN. FRAN.
7 WASH
D.C.8 W. P. BEACH
Figure 2.7 Bar Graph of Relative Frequencies
A Misleading Bar GraphProblem
The bar graph that follows presents the total sales figures for three realtors. When the bars are replaced with pictures, often related to the topic of the graph, the graph is called a pictogram.
Realtor #1 Realtor #3Realtor #2
$2.05 million
$1.41 million
$0.9 million
TotalSales
(a) How does the height of the home for Realtor 1 compare to that for Realtor 3?(b) How does the area of the home for Realtor 1 compare to that for Realtor 3?
Realtor 1Realtor 2
Realtor 3
Displaying Distributions Quantitative Variables
• Histograms• Polygons• Frequency plots• Stem and Leaf Plots• Time plots• Scatterplots
HistogramHistogram of Age
CLASS TALLY # OBSERVATIONS PERCENTAGE
[30,35) / 1 1/20 = 0.05 5%[35,40) // 2 2/20 = 0.10 10%[40,45) //////// 8 8/20 = 0.40
40%[45,50) /////// 7 7/20 = 0.35
35%[50,55) // 2 2/20 = 0.10
10%
31,36,36,40, 41,41,41,44,44,44,44,45, 45, 45,46,47,48,49, 51,51
2
4
6
8
30 35 4540 50 55
Count
10%
20%
30%
40%
30 35 4540 50 55
Percent
Figure 2.3 Frequency Distribution Block Histogram
Histogram versus Bar Graph
count
length 1 2 3 4 5 6
count
length 1 2 3 4 5 6
count
green blue red white yellow
color
count
green blue red white yellow
color
GRAPH I GRAPH II
GRAPH III
GRAPH IV
Misleading Histograms
Figure 2.4 Frequency Distribution Polygon
Figure 2.5Grouped Data Frequency Distribution Polygon
Describe The Distribution
What eyes see Describe
1) with words and
2) with numbers
Describing with WORDS
Three Aspects of a Distribution
• Shape– Symmetry– How a many bumps or modes?– Other distinguishing features
• Center– What is a typical value?– The bulk of the data
• Spread– Is the data all close together or spread out?
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
Distribution Shapes
SHAPE ~ Symmetric Distributions
• A distribution is symmetric if the left hand side is roughly the mirror image of the right hand side.
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
Symmetric Distributions
Symmetric
2. Is the histogram symmetric?– If you can fold the histogram along a vertical line through
the middle and have the edges match pretty closely, the histogram is symmetric.
Slide 1- 29
SHAPE ~ Normal Distributions
• A Normal distribution has the following properties– Symmetric– Unimodal– Mound or Bell Shaped
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
SHAPE ~ Skewness
• A distribution is Skewed Right if most of the data values are small and there is a “tail” of larger values to the right.
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
A distribution is Skewed Left if most of the data values are large and there is a “tail” of smaller values to the left.
Skewed– The (usually) thinner ends of a distribution are called the
tails. If one tail stretches out farther than the other, the histogram is said to be skewed to the side of the longer tail.
– In the figure below, the histogram on the left is said to be skewed left, while the histogram on the right is said to be skewed right.
Slide 1- 32
SHAPE ~ How Many Mounds
• A Unimodal distribution has one mound.
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
A Multimodal distribution has more than two mounds.
A Bimodal distribution has two mounds.
Peaks: Modes
1. Does the histogram have a single, central peak or several separated peaks?– Peaks in a histogram are called modes.– A histogram with one main peak is called unimodal; histograms with
two peaks are bimodal;
histograms with three or more peaks are called multimodal.
Slide 1- 34
Center• For now, we look at the most common value in each
distribution. We will develop more precise ways to describe the center of a distribution in the next section.
• What is the center of this distribution?
Slide 1- 35
Center What is a typical value
• What is a typical value?
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
Center not a typical value for bimodal or skewed.
Center• For now, we look at the most common value in each
distribution. We will develop more precise ways to describe the center of a distribution in the next section.
• What is the center of this distribution?
Slide 1- 37
SPREAD~ Range• The range of the data is the difference between the maximum
and minimum values
Slide 1- 38
Spread: Range
• Always report a measure of spread along with a measure of center when describing a distribution numerically.
• The range of the data is the difference between the maximum and minimum values:
Range = max – min• A disadvantage of the range is that a single extreme value can
make it very large and, thus, not representative of the data overall.
• For example, if my test scores were 10, 87, 94, 88, 85, 82, 85, 92 my range would be 94-10=84. This is a large spread, but most of my scores are in the 80. We will soon discuss different measures of spread.
Slide 1- 39
Quiz Scores
• Please see replacement activity
• Which class (A or B) has more variability?2014 Summer Training Institute College of the Canyons 40
Hypothetical Quiz Scores
• Please see replacement activity
• Which class has the least? Which the most?
2014 Summer Training Institute College of the Canyons 41
Outliers
• An Outlier is a data value that is either much smaller or much larger than the rest of the data.
• Some reasons for outliers– Error in data collection– No error. For example, the owner’s salary could
be an outlier if the rest of the employees are all low wage workers
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
Anything Unusual? (cont.)
• The following histogram has possible outliers to the left.
Slide 1- 43
Describing a Distribution with wordsUsing Stats language in Context.
• What is the shape? – Is it Symmetric, Skewed, or Neither?– Unimodal, Bimodal, or Multimodal?– Normal?– Are there outliers?
• Where is the center? Is the center a typical value?
• Is there low or high variability?
Copyright © 2013 Pearson Education, Inc.. All rights reserved.
Describe The Distributions• It is always more interesting to compare groups. Below are
daily wind speeds at a National Park.
Describe The Distribution• The dotplots below show drive times for 3 different routes.• Describe these dotplots. • What route would you take and why?
Shape center and Spread activity
Describe The Distributions• It is always more interesting to compare groups. Below are
daily wind speeds at a National Park.
Describe The Distribution
• The dotplots below show drive times for 3 different routes.• Describe these dotplots. • What route would you take and why?