box 4 box plots 2 stem chapter plotsroneducate.weebly.com/uploads/6/2/3/8/6238184/chap... ·...

8
Chapter 6: Descriptive Statistics Chapter 6: Descriptive Statistics 6-1 Numerical Summaries of Data 6 - 2 Stem - and - Leaf Diagrams 6-3 Frequency Distributions and Histograms 6-4 Box Plots 6 4 Box Plots 6 - 5 Time Sequence Plots 6-6 Probability Plots 1 Chapter Learning Objectives After careful study of this chapter you should be able to: 1. Compute and interpret the sample mean, sample variance, sample standard deviation sample median and sample range sample standard deviation, sample median, and sample range 2. Explain the concepts of sample mean, sample variance, population mean, and population variance 3. Construct and interpret visual data displays, including the stem - and - leaf display, the histogram, and the box plot 4. Explain the concept of random sampling 5. Construct and interpret normal probability plots 6 E li h t b lt d th dt di l t 6. Explain how to use box plots, and other data displays, to visually compare two or more samples of data 7 Know how to use simple time series plots to visually display 7. Know how to use simple time series plots to visually display the important features of time - oriented data 2 Populations & Samples Figure 6-3 A population is Figure 6 3 A population is described, in part, by its parameters, i.e., mean () and standard deviation () A random standard deviation (). A random sample of size n is drawn from a population and is described, in part, b it t ti ti i ( b ) by its statistics, i.e., mean (x-bar) and standard deviation (s). The statistics are used to estimate the parameters. 3 Numerical Summaries of Data Data are the numeric observations of a phenomenon of interest The totality of all observations is a population A portion used for analysis is a random sample We gain an understanding of the population, possibly massive, by describing it numerically and/or graphically, usually with the sample data We describe the data in terms of shape, outliers, We describe the data in terms of shape, outliers, centre, and spread (SOCS) The centre is measured by the mean The centre is measured by the mean The spread is measured by the variance 4

Upload: others

Post on 24-Nov-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Box 4 Box Plots 2 Stem Chapter Plotsroneducate.weebly.com/uploads/6/2/3/8/6238184/chap... · histogram, and the box plot 4. Explain the concept of random sampling 5. Construct and

Chapter 6: Descriptive StatisticsChapter 6: Descriptive Statistics

6-1 Numerical Summaries of Data6-2 Stem-and-Leaf Diagrams6-3 Frequency Distributions and Histograms6-4 Box Plots6 4 Box Plots6-5 Time Sequence Plots6-6 Probability Plots

1

Chapter Learning ObjectivesAfter careful study of this chapter you should be able to:1. Compute and interpret the sample mean, sample variance,

sample standard deviation sample median and sample rangesample standard deviation, sample median, and sample range2. Explain the concepts of sample mean, sample variance,

population mean, and population variance3. Construct and interpret visual data displays, including the stem-

and-leaf display, the histogram, and the box plot4. Explain the concept of random sampling5. Construct and interpret normal probability plots6 E l i h t b l t d th d t di l t6. Explain how to use box plots, and other data displays, to

visually compare two or more samples of data7 Know how to use simple time series plots to visually display7. Know how to use simple time series plots to visually display

the important features of time-oriented data2

Populations & Samples

Figure 6-3 A population isFigure 6 3 A population is described, in part, by its parameters, i.e., mean (�) and standard deviation (�) A randomstandard deviation (�). A random sample of size n is drawn from a population and is described, in part, b it t ti ti i ( b )by its statistics, i.e., mean (x-bar) and standard deviation (s). The statistics are used to estimate the parameters.

3

Numerical Summaries of Data• Data are the numeric observations of a

phenomenon of interest– The totality of all observations is a population– A portion used for analysis is a random sample

• We gain an understanding of the population, possibly massive, by describing it numerically p y , y g yand/or graphically, usually with the sample data

• We describe the data in terms of shape, outliers,We describe the data in terms of shape, outliers, centre, and spread (SOCS)– The centre is measured by the meanThe centre is measured by the mean– The spread is measured by the variance

4

Page 2: Box 4 Box Plots 2 Stem Chapter Plotsroneducate.weebly.com/uploads/6/2/3/8/6238184/chap... · histogram, and the box plot 4. Explain the concept of random sampling 5. Construct and

Summarizing Data Numerically:Summarizing Data Numerically: Sample Mean & Sample Variancep p

5

A Simple Example of the SampleA Simple Example of the Sample Mean

Example 6-1Mean

6

A Physics Analogy For TheA Physics Analogy For The Sample MeanSample Mean

Figure 6-1 The sample mean as a balance point for a system of weights.

7

A Simple Example of the SampleA Simple Example of the Sample Variance

Example 6-2Variance

8

Page 3: Box 4 Box Plots 2 Stem Chapter Plotsroneducate.weebly.com/uploads/6/2/3/8/6238184/chap... · histogram, and the box plot 4. Explain the concept of random sampling 5. Construct and

9

Sample Mean as an EstimateSample Mean as an Estimate of Population Mean

When the population is finite and consists of N equally-

o opu at o ea

likely values, we may define the population mean as:

The sample mean is a reasonable estimate of theThe sample mean is a reasonable estimate of the population mean

10

Sample Variance as an EstimateSample Variance as an Estimate of Population Variance

When the population is finite and consists of N equally-

o opu at o Va a ce

likely values, we may define the population variance as:

Th l i i bl ti t f thThe sample variance is a reasonable estimate of the population variance

11

Summarizing Data Numerically:Summarizing Data Numerically:Sample Median and Range

• The median is another measure of the “centre” of a set of data– Roughly, there are as many values “below” the median as “above” it

Sample Median and Range

– More strictly, it is the smallest data value below which lie at least half of the sample values

• The median of a set of n data values is obtained by listing the data in y gincreasing order of magnitude and:

– If n is odd the median is the middle valueIf n is even the median is the average of the two middle values– If n is even the median is the average of the two middle values

• The range is another measure of the “spread” of a set of data:

12

Page 4: Box 4 Box Plots 2 Stem Chapter Plotsroneducate.weebly.com/uploads/6/2/3/8/6238184/chap... · histogram, and the box plot 4. Explain the concept of random sampling 5. Construct and

Summarizing Data Numerically:Summarizing Data Numerically:QuartilesQuartiles

• The quartiles divide a set of data into four equal-sized portions:– The first (or lower) quartile is the smallest value below which lie at leastThe first (or lower) quartile is the smallest value below which lie at least

¼ (or 25%) of the values– The second quartile is the median

Th thi d ( ) til i th ll t l b l hi h li t l t– The third (or upper) quartile is the smallest value below which lie at least ¾ (or 75%) of the values

• The quartiles of a set of n data values are obtained by listing the data in increasing order of magnitude and:

– If n-1 is an integer multiple of four the 1st and 3rd quartiles are the (1+(n-1)/4)th and (1+3(n-1)/4)th values (respectively)

– If not, linear interpolation is used between two adjacent values at integer positions either side of (1+(n-1)/4) and (1+3(n-1)/4)

• The interquartile range is another measure of the variability of a data• The interquartile range is another measure of the variability of a data set (= 3rd quartile – 1st quartile)

13

Summarizing Data Numerically:Summarizing Data Numerically:PercentilesPercentiles

• Percentiles identify the value below which lies some specified proportion of the values in a data set:specified proportion of the values in a data set:– The 100kth percentile is the smallest value below which lie at least

100k% of the values• A particular percentile of interest for a set of n data values

is obtained by listing the data in increasing order of magnitude and:– If k(n-1) is an integer the 100kth percentile is the 1+k(n-1)th

valuevalue– If not, linear interpolation is used between two adjacent values at

(integer) positions either side of 1+k(n-1)• The 10th and 90th percentiles are also known as the upper

decile and lower decile, respectively 14

Frequency Distributions andFrequency Distributions and HistogramsHistograms

• A frequency distribution is a compact summary of data expressed in tabular or graphical formdata, expressed in tabular or graphical form

• To construct a frequency distribution, we must divide the range of the data into interv ls whichdivide the range of the data into intervals, which are usually called cells, bins, or class intervals

i• To construct a histogram:

15

A Frequency DistributionA Frequency Distribution ExampleExample

Note: clarity (and care) with the comparison

doperators used to define cell membership is required when defining a frequency distribution (i.e. use of “<” versus “<=”).)

16

Page 5: Box 4 Box Plots 2 Stem Chapter Plotsroneducate.weebly.com/uploads/6/2/3/8/6238184/chap... · histogram, and the box plot 4. Explain the concept of random sampling 5. Construct and

A Frequency DistributionA Frequency Distribution ExampleExample

Fi 6 7 Hi t f i t th f 80Figure 6-7 Histogram of compressive strength for 80 aluminum-lithium alloy specimens (bin width=20).

17

A Histogram of the Cumulative gRelative Frequency (from Excel)

100.0%

Cumulative�Relative�Frequency

80.0%

90.0%

50.0%

60.0%

70.0%

30.0%

40.0%

0.0%

10.0%

20.0%

18

70 90 110 130 150 170 190 210 230 250 270

Cell�Upper�Limit

The Impact of Changing theThe Impact of Changing the Number of CellsNumber of Cells

16.0%

Relative�Frequency50.0%

Relative�Frequency

10 0%

12.0%

14.0%

35.0%

40.0%

45.0%

6.0%

8.0%

10.0%

15 0%

20.0%

25.0%

30.0%

0.0%

2.0%

4.0%

0.0%

5.0%

10.0%

15.0%

70 80 90 100 110 120 130 140 150 160 170 180 190 200 210 220 230 240 250

Cell�Upper�Limit

130 170 210 250

Cell�Upper�Limit

19

Different Histogram Shapes

Figure 6-11 Histograms for symmetric and skewed distributions.

20

Page 6: Box 4 Box Plots 2 Stem Chapter Plotsroneducate.weebly.com/uploads/6/2/3/8/6238184/chap... · histogram, and the box plot 4. Explain the concept of random sampling 5. Construct and

i f C i lHistograms of Categorical Data35.0%

Relative�Frequency35.0%

Relative�Frequency

25.0%

30.0%

25.0%

30.0%

15 0%

20.0%

15 0%

20.0%

10.0%

15.0%

10.0%

15.0%

0 0%

5.0%

0 0%

5.0%

21

0.0%

ENCH ENCI ENGO ENME ENOG ENSF Other

0.0%

ENME ENCI ENCH ENGO Other ENSF ENOG

B Pl tBox Plots

• The box plot is a graphical display that simultaneously describes several important features of a data set, such as centre, spread, departure from symmetry, and identification of observations that lie unusually far from the bulk of the datathe bulk of the data

22

A Box Plot Example

Figure 6 14 Box plot for compressive strengthFigure 6-14 Box plot for compressive strength data in Table 6-2 (from Minitab)

23

Using Box Plots to CompareUsing Box Plots to Compare Different Data Sets

Figure 6-15C ti b

Different Data Sets

Comparative box plots of a quality index at three plantsindex at three plants.

24

Page 7: Box 4 Box Plots 2 Stem Chapter Plotsroneducate.weebly.com/uploads/6/2/3/8/6238184/chap... · histogram, and the box plot 4. Explain the concept of random sampling 5. Construct and

Probability Plots

• Probability plotting is a graphical method for d i i h h l d fdetermining whether sample data conform to a hypothesized distribution based on a subjective

i l i i f h dvisual examination of the data• Probability plotting typically uses special graph

paper, known as probability paper, that has been designed for the hypothesized distribution.– Probability paper is available for the normal,

lognormal, Weibull, and various chi-square and gamma di ib idistributions

25

A Normal Probability PlottingA Normal Probability Plotting Example (Example 6-7)Example (Example 6 7)

26

A Normal Probability PlottingA Normal Probability Plotting Example (Example 6-7)Example (Example 6 7)

27

A Normal Probability PlottingA Normal Probability Plotting Example (Example 6-7)

Figure 6-19

Example (Example 6 7)Figure 6 19Normal probability plot for battery life.

28

Page 8: Box 4 Box Plots 2 Stem Chapter Plotsroneducate.weebly.com/uploads/6/2/3/8/6238184/chap... · histogram, and the box plot 4. Explain the concept of random sampling 5. Construct and

Examples of Normal ProbabilityExamples of Normal Probability Plots Indicating a Poor FitPlots Indicating a Poor Fit

Figure 6-21 Normal probability plots indicating a nonnormal distribution (a) Light-tailed distribution (b) Heavy-taileddistribution. (a) Light-tailed distribution. (b) Heavy-tailed distribution. (c ) A distribution with positive (or right) skew.

29