chapter 14 descriptive statistics - augusta county … · chapter 14 descriptive statistics ... 14...
TRANSCRIPT
• Data set
A collection of data values denoted by N.
• Data points
Individual data values in a data set.
2
3
Professor Blackbeard has posted the results in
the hallway outside his office. The data set
consists of N = 75 data points (the number of
students that took the test). Each data point is a
raw score on the midterm between 0 and 25.
Each student has one question on their mind:
How did I do?
It’s the next question that is statistically more
interesting:
How did the class as a whole do?
4
The first step in summarizing the information
is to organize the scores in a frequency table.
In this table, the number below each score
gives the frequency of the score– that is, the
number of students getting that particular
score.
5
The figure below shows the information in a more visual
way called a bar graph. With a bar graph, it is easy to
detect outliers -- extreme data points that do not fit into
the overall pattern of the data (the score of 1 and 24).
6
Sometimes it is more convenient to express the bar graph in a term
of relative frequencies– that is, the frequencies given in terms of
percentages of the total population.
• Variable
Any characteristic that varies with the members of a population.
• Numerical (Quantitative) Variable
A variable that represents a measurable quantity.
7
The process of converting test scores (a numerical variable) into grades ( a categorical variable) requires setting up class intervals for the various letter grades.
The grade distribution in the Stat 101 midterm can now be seen by means of a bar graph.
8
9
Measures of Location
The mean (or average), the median, and the
quartiles are numbers that provide information
about the values of the data.
Measures of Spread
The range, the interquartile range, and the
standard deviation are numbers that provide
information about the spread within the data
set.
10
Percentile
The pth percentile of a data set is a value such
that p percent of the numbers fall at or below
this value and the rest fall at or above it.
Locator
Computed by the pth percent of N and is
denoted by L. L = (p/100) • N
11
Finding the pth Percentile of a Data Set
Step 0. Sort the data set. Let {d1, d2, d3, …, dN}
represent the sorted data set.
Step 1. Find the locator: L = (p/100) • N
Step 2. Find the pth percentile: If L is a whole
number, the pth percentile is given by d L.5. If L is
not a whole number, the pth percentile is given by
dL+ (L+ is L rounded up).
12
After the median, the next most commonly
used set of percentiles are the first and third
quartiles.
The first quartile (denoted by Q1) is the
25th percentile, and the third quartile
(denoted by Q3) is the 75th percentile.
13
We will now find the median and quartile scored for Stat 101.
Here N = 75 (odd), the median is d (75+1)/2 = d 38 . We conclude
that the 38th test score is 11. Thus, M = 11.
The locator for the first quartile is L = (0.25) X 75 = 18.75. We
tally from left to right. Thus Q1 = d 19 = 9 .
Since the first and third quartiles are at equal distance, a quick
way to locate the third quartile is to count from right to left.
Thus, Q3 = 12.
14
Range
The difference between the highest and lowest
values of the data and is denoted by R. Thus,
R = Max - Min.
Interquartile Range
The difference between the third quartile and the
first quartile (IQR = Q3 – Q1), and it tells us how
spread out the middle 50% of the data values
are.
15
The Standard Deviation of a Data Set
Let A denote the mean of the data set. For
each number x in the data set, compute its
deviation from the mean (x – A), and square
each of these numbers. These are called the
squared deviations.
Find the average of the squared deviations.
This number is called the variance V.
The standard deviation is the square root of
the variance ( ).V
16
The standard deviation of a normal
distribution is the horizontal
distance between the line of
symmetry of the curve and one of
the two points of inflection (P or P' )
Calculating Standard Deviation (using a chart)
(ex) 17 19 21 21 are the given data values n = 4 (the number of data values)
17
OBSERVATION DEVIATION squared deviation
(obs. - x)
17 17-19.5
19 19-19.5
21 21-19.5
21 21-19.5
sum = 78
mean (x) = 19.5
OBSERVATION DEVIATION
(obs. - x)
17
19
21
21
sum = 78
mean (x) = 19.5
squared deviation
VARIANCE: the average
of the squared deviations
= 11/4 = 2.75
STANDARD
DEVIATION = the
Square Root of the
variance
s = = 1.66
units2 75.
s2