chapter 1 & 3 the role of statistics & graphical methods for describing data
DESCRIPTION
Chapter 1 & 3 The Role of Statistics & Graphical Methods for Describing Data. Statistics. the science of collecting, analyzing, and drawing conclusions from data. Suppose we wanted to know the average GPA of high school graduates in the nation this year. - PowerPoint PPT PresentationTRANSCRIPT
Chapter 1 & 3
The Role of Statistics&
Graphical Methods for Describing Data
Statisticsthe science of collecting, analyzing, and drawing conclusions from data
Suppose we wanted to know the average GPA of high school graduates in the nation this year.
We could collect data from all high schools in the nation. What term would be
used to describe “all high school graduates”?
PopulationThe entire collection of
individuals or objects about which information is desired
A census is performed to gather about the entire population
What do you call it when you collect data about the
entire population?
Suppose we wanted to know the average GPA of high school graduates in the nation this year.
We could collect data from all high schools in the nation.
Why might we not want to use a census here?
If we didn’t perform a census, what would we do?
SampleA subset of the population,
selected for study in some prescribed manner
What would a sample of all high school graduates across the nation look like?
A list created by randomly selecting the GPAs of all high school graduates from each state.
Suppose we wanted to know the average GPA of high school graduates in the nation this year.
We could collect data from a sample of high schools in the nation.
Once we have collected the data, what would we do with it?
Descriptive statistics the methods of organizing &
summarizing data
• Create a graph
If the sample of high school GPAs contained 10,000 numbers, how could the data be described or summarized?
• State the range of GPAs• Calculate the average GPA
Suppose we wanted to know the average GPA of high school graduates in the nation this year.
We could collect data from a sample of high schools in the nation.Could we use the data from this sample to answer our question?
Inferential statistics involves making generalizations
from a sample to a populationBased on the sample, if the average GPA for high school graduates was 3.0, what generalization could be made?
The average national GPA for this year’s high school graduate is approximately 3.0.
Could someone claim that the average GPA for PISD graduates is 3.0?
No. Generalizations based on the results of a sample can only be made back to the population from which the sample came from.
Be sure to sample from the population of interest!!
Variable any characteristic whose value may change from one individual to another
Is this a variable . . .The number of wrecks per week
at the intersection outside?
Dataobservations on single variable or simultaneously on two or more variables
For this variable . . .The number of wrecks per week at the
intersection outside . . . What could observations be?
Types of variables
Categorical variablesor qualitativeidentifies basic
differentiating characteristics of the population
Numerical variablesor quantitative observations or measurements
take on numerical valuesmakes sense to average these
valuestwo types - discrete & continuous
Discrete (numerical)
listable set of valuesusually counts of items
Continuous (numerical)
data can take on any values in the domain of the variable
usually measurements of something
Classifying variables by the number of variables in a data set
Suppose that the PE coach records the height of each student in his class.
Univariate - data that describes a single characteristic of the population
This is an example of a univariate data
Classifying variables by the number of variables in a data set
Suppose that the PE coach records the height and weight of each student in his class.
Bivariate - data that describes two characteristics of the population
This is an example of a bivariate data
Classifying variables by the number
of variables in a data setSuppose that the PE coach records the height, weight, number of sit-ups, and number of push-ups for each student in his class.
Multivariate - data that describes more than two characteristics (beyond the scope of this course)
This is an example of a multivariate data
Identify the following variables:1. the appraised value of homes in Niceville
2. the color of cars in the teacher’s lot
3. the number of calculators owned by students at your school
4. the zip code of an individual
5. the amount of time it takes students to drive to school
Discrete numerical
Discrete numerical
Continuous numerical
Categorical
Categorical
Is money a measurement or a count?
Graphs for categorical data
Bar Graph
Used for categorical data Bars do not touch Categorical variable is typically on the horizontal
axis Frequency or relative frequency is on the vertical
axis To describe – comment on which occurred the
most often or least often May make a double bar graph or segmented bar
graph for bivariate categorical data sets
Relative frequency = frequency / total
Pie (Circle) graph
Used for categorical data To make:
– Proportion 360°
– Using a protractor, mark off each part
To describe – comment on which occurred the most often or least often
Using class survey data:
graph favorite ice cream
graph birth month
Graphs for numerical data
Dotplot
Used with numerical data (either discrete or continuous)
Made by putting dots (or X’s) on a number line
Can make comparative dotplots by using the same axis for multiple groups
Distribution Activity . . .
Types (shapes)of Distributions
Symmetricalrefers to data in which both sides are
(more or less) the same when the graph is folded vertically down the middle
bell-shaped is a special type–has a center mound with two
sloping tails
Uniformrefers to data in which every
class has equal or approximately equal frequency
Skewed (left or right)refers to data in which one
side (tail) is longer than the other side
the direction of skewness is on the side of the longer tail
Bimodal (multi-modal)refers to data in which two
(or more) classes have the largest frequency & are separated by at least one other class
How to describe a numerical,
univariate graph
What strikes you as the most distinctive difference among the distributions of exam scores in classes A, B, & C ?
1. Centerdiscuss where the middle of
the data fallsthree types of central
tendency–mean, median, & mode
What strikes you as the most distinctive difference among the distributions of scores in
classes D, E, & F?
2. Spreaddiscuss how spread out the data
isrefers to the variability of the
data–Range, standard deviation, IQR
What strikes you as the most distinctive difference among the distributions of exam scores in classes G, H, & I ?
3. Shaperefers to the overall shape of
the distributionsymmetrical, uniform,
skewed, or bimodal
What strikes you as the most distinctive difference among the distributions of exam scores in class K ?
4. Unusual occurrencesoutliers - value that lies away
from the rest of the datagapsclustersanything else unusual
5. In contextYou must write your answer
in reference to the specifics in the problem, using correct statistical vocabulary and using complete sentences!
More graphs for numerical data
Stemplots (stem & leaf plots)
Used with univariate, numerical data Must have key so that we know how to read
numbers Can split stems when you have long list of
leaves Can have a comparative stemplot with two
groups
Would a stemplot be a good graph for the number of pieces of gun chewed per day by
AP Stat students? Why or why not?
Would a stemplot be a good graph for the number of pairs of shoes owned by AP Stat
students? Why or why not?
Example:
The following data are price per ounce for various brands of dandruff shampoo at a local grocery store.
0.32 0.21 0.29 0.54 0.17 0.28 0.36 0.23
Can you make a stemplot with this data?
Example: Tobacco use in G-rated Movies
Total tobacco exposure time (in seconds) for Disney movies:223 176 548 37 158 51 299 37 11 165 74 9 2 6 23 206 9
Total tobacco exposure time (in seconds) for other studios’ movies:205 162 6 1 117 5 91 155 24 55 17
Make a comparative stemplot.
Histograms
Used with numerical data Bars touch on histograms Two types
– Discrete• Bars are centered over discrete values
– Continuous• Bars cover a class (interval) of values
For comparative histograms – use two separate graphs with the same scale on the horizontal axis
Would a histogram be a good graph for the fastest speed driven by AP Stat students?
Why or why not?
Would a histogram be a good graph for the number of pieces of gum chewed per day by
AP Stat students? Why or why not?
The two histograms below display the distribution of heights of gymnasts and the distribution of heights of female basketball players. Which is which? Why?
Heights – Figure A
Heights – Figure B
Suppose you found a pair of size 6 shoes left outside the locker room. Which team would you go to first to find the owner of the shoes? Why?
Suppose a tall woman (5 ft 11 in) tells you she is looking for her sister who is practicing in the gym. To which team would you send her? Why?
Cumulative Relative Frequency Plot(Ogive)
. . . is used to answer questions about percentiles. Percentiles are the percent of individuals that are
at or below a certain value. Quartiles are located every 25% of the data. The
first quartile (Q1) is the 25th percentile, while the third quartile (Q3) is the 75th percentile. What is the special name for Q2?
Interquartile Range (IQR) is the range of the middle half (50%) of the data.
IQR = Q3 – Q1