chapter 14 descriptive statistics - augusta county … · chapter 14 descriptive statistics ... 14...

Chapter 14Descriptive Statistics

Graphing and Summarizing Data

1

• Data set

A collection of data values denoted by N.

• Data points

Individual data values in a data set.

2

3

Professor Blackbeard has posted the results in

the hallway outside his office. The data set

consists of N = 75 data points (the number of

students that took the test). Each data point is a

raw score on the midterm between 0 and 25.

Each student has one question on their mind:

How did I do?

It’s the next question that is statistically more

interesting:

How did the class as a whole do?

4

The first step in summarizing the information

is to organize the scores in a frequency table.

In this table, the number below each score

gives the frequency of the score– that is, the

number of students getting that particular

score.

5

The figure below shows the information in a more visual

way called a bar graph. With a bar graph, it is easy to

detect outliers -- extreme data points that do not fit into

the overall pattern of the data (the score of 1 and 24).

6

Sometimes it is more convenient to express the bar graph in a term

of relative frequencies– that is, the frequencies given in terms of

percentages of the total population.

• Variable

Any characteristic that varies with the members of a population.

• Numerical (Quantitative) Variable

A variable that represents a measurable quantity.

7

The process of converting test scores (a numerical variable) into grades ( a categorical variable) requires setting up class intervals for the various letter grades.

The grade distribution in the Stat 101 midterm can now be seen by means of a bar graph.

8

9

Measures of Location

The mean (or average), the median, and the

quartiles are numbers that provide information

about the values of the data.

Measures of Spread

The range, the interquartile range, and the

standard deviation are numbers that provide

information about the spread within the data

set.

10

Percentile

The pth percentile of a data set is a value such

that p percent of the numbers fall at or below

this value and the rest fall at or above it.

Locator

Computed by the pth percent of N and is

denoted by L. L = (p/100) • N

11

Finding the pth Percentile of a Data Set

Step 0. Sort the data set. Let {d1, d2, d3, …, dN}

represent the sorted data set.

Step 1. Find the locator: L = (p/100) • N

Step 2. Find the pth percentile: If L is a whole

number, the pth percentile is given by d L.5. If L is

not a whole number, the pth percentile is given by

dL+ (L+ is L rounded up).

12

After the median, the next most commonly

used set of percentiles are the first and third

quartiles.

The first quartile (denoted by Q1) is the

25th percentile, and the third quartile

(denoted by Q3) is the 75th percentile.

13

We will now find the median and quartile scored for Stat 101.

Here N = 75 (odd), the median is d (75+1)/2 = d 38 . We conclude

that the 38th test score is 11. Thus, M = 11.

The locator for the first quartile is L = (0.25) X 75 = 18.75. We

tally from left to right. Thus Q1 = d 19 = 9 .

Since the first and third quartiles are at equal distance, a quick

way to locate the third quartile is to count from right to left.

Thus, Q3 = 12.

14

Range

The difference between the highest and lowest

values of the data and is denoted by R. Thus,

R = Max - Min.

Interquartile Range

The difference between the third quartile and the

first quartile (IQR = Q3 – Q1), and it tells us how

spread out the middle 50% of the data values

are.

15

The Standard Deviation of a Data Set

Let A denote the mean of the data set. For

each number x in the data set, compute its

deviation from the mean (x – A), and square

each of these numbers. These are called the

squared deviations.

Find the average of the squared deviations.

This number is called the variance V.

The standard deviation is the square root of

the variance ( ).V

16

The standard deviation of a normal

distribution is the horizontal

distance between the line of

symmetry of the curve and one of

the two points of inflection (P or P' )

Calculating Standard Deviation (using a chart)

(ex) 17 19 21 21 are the given data values n = 4 (the number of data values)

17

OBSERVATION DEVIATION squared deviation

(obs. - x)

17 17-19.5

19 19-19.5

21 21-19.5

21 21-19.5

sum = 78

mean (x) = 19.5

OBSERVATION DEVIATION

(obs. - x)

17

19

21

21

sum = 78

mean (x) = 19.5

squared deviation

VARIANCE: the average

of the squared deviations

= 11/4 = 2.75

STANDARD

DEVIATION = the

Square Root of the

variance

s = = 1.66

units2 75.

s2

chapter 14 descriptive statistics - augusta county … · chapter 14 descriptive statistics ... 14...

Documents