exploratory data analysis cross tabulations and scatter diagrams

40
x y Exploratory data analysis Cross tabulations and scatter diagrams

Upload: nelle-garcia

Post on 30-Dec-2015

48 views

Category:

Documents


1 download

DESCRIPTION

y. x. Exploratory data analysis Cross tabulations and scatter diagrams. Exploratory data analysis consists of simple arithmetic and easy-to-draw graphs that can be used to summarize data quickly. The Stem and Leaf Display. A stem-and-leaf display shows both the rank order - PowerPoint PPT Presentation

TRANSCRIPT

x

y

• Exploratory data analysis

• Cross tabulations and scatter diagrams

Exploratory data analysis

consists of simple

arithmetic and easy-to-draw

graphs that can be used to

summarize data quickly

The Stem and Leaf Display

• A stem-and-leaf display shows both the rank order

and shape of the distribution of the data.

• It is similar to a histogram on its side, but it has the

advantage of showing the actual data values.

• The first digits of each data item are arranged to the

left of a vertical line.

• To the right of the vertical line we record the last

digit for each item in rank order.

Example: Hudson Auto Repair

The manager of Hudson Autowould like to have a betterunderstanding of the costof parts used in the enginetune-ups performed in theshop. She examines 50customer invoices for tune-ups. The costs of

parts,rounded to the nearest dollar, are listed on the

nextslide.

Stretched Stem and Leaf

• If we believe the original stem-and-leaf display has condensed the data too much, we can stretch the display by using two stems for each leading digit(s).

• Whenever a stem value is stated twice, the first value corresponds to leaf values of 0 - 4, and the second value corresponds to leaf values of 5 - 9.

91 78 93 57 75 52 99 80 97 6271 69 72 89 66 75 79 75 72 76104 74 62 68 97 105 77 65 80 10985 97 88 68 83 68 71 69 67 7462 82 98 101 79 105 79 69 62 73

Sample parts cost for 50 tune-ups

5

6

7

8

9

10

2 7

2 2 2 2 5 6 7 8 8 8 9 9

1 1 2 2 3 4 4 5 5 5 6 7 8 9 9 90 0 2 3 5 8 9

A Stem and Leaf Display for the Auto Parts Cost data

1 3 7 7 7 8 9

1 4 5 5 9Stem

Leaf

55667788991010

Stretched Stem and Leaf for Hudson Auto parts data

72

5 6 7 8 8 8 9 9

2 2 2 2

1 1 2 2 3 4 45 5 5 6 7 8 9 9 90 0 2 3

5 8 9

7 7 7 8 9

1 3

5 5 9

1 4

Leaf Units

A single digit is used to define each leaf. In the preceding example, the leaf unit was 1. But it does not have to be 1.

The leaf unit can be 0.1, 10, or 100.

Example: Leaf unit = .1Suppose we have the following data:

8.6 11.7 9.4 10.2 11.0 8.8

The leaf unit is .1. Thus:

8

9

10

11

6 8

4

2

0 7

Example: Leaf Unit = 10

If we have data with values such as

16 17 18 19

Leaf Unit = 108

1 90 3

1 7

1806 1717 1974 1791 1682 1910 1838

a stem-and-leaf display of these data will be

The 82 in 1682is rounded down

to 80 and isrepresented as an

8.

Crosstabulations and Scatter Diagrams

So far we have considered only ONE variable (parts cost,

audit time). But often we are interested in tabular and graphical data that uncover the relationship between TWO variables.

Crosstabulations

A tabular method for summarizing the data for two variables simultaneously

Crosstabulations can be used when

• one variable is qualitative and the other is

quantitative,

• both variables are qualitative, or

• both variables are quantitative.

PriceRange Colonial Log Split A-FrameTotal

< $99,000> $99,000

18 6 19 12 55

45

30 20 35 15Total 100

12 14 16 3

Home Style

Example: Finger Lakes Homes

CrosstabulationThe number of Finger Lakes homes sold

for each style and price for the past two years is shown below. quantitativ

e variable

qualitative variable

Crosstabulation: Row or Column Percentages

• Converting the entries in the table into row percentages or column percentages can provide additional insight about the relationship between the two variables.

PriceRange Colonial Log Split A-Frame Total

< $99,000> $99,000

32.73 10.91 34.55 21.82100

100

Note: row totals are actually 100.01 due to rounding.

26.67 31.11 35.56 6.67

Home Style

(Colonial and > $99K)/(All >$99K) x 100 = (12/45) x 100

Crosstabulation: Row Percentages

PriceRange Colonial Log Split A-Frame

< $99,000> $99,000

60.00 30.00 54.29 80.0040.00 70.00 45.71 20.00

Home Style

100 100 100 100Total

(Colonial and > $99K)/(All Colonial) x 100 = (12/30) x 100

Crosstabulation: Column Percentages

Using Excel’s PivotTable Reportto Construct a Crosstabulation

Step 3 When the Create Pivot Table dialog box appears:Choose Select a table or rangeEnter A1:C301 in the Table/Range boxSelect New WorksheetClick OK

Step 2: In the Tables group, click the icon above PivotTable

Step 1: Click on the Insert tab on the ribbon

Chapter 2 file Restaurant.xlsx

Using the Pivot table Field List

• Step 1: In the PivotTable Field List, go to Choose Fields to add to report: – Drag the Quality Rating Field to the Row Labels area.– Drag the ($)Meal Price field to the Column Labels

area.– Drag the Restaurant field to the Values area.

• Step 2: Click Sum of Restaurant in the Values area– Select Value Field Settings.

• Step 3: When the Value Field Settings dialog box appears:– Under Summarize value field by, choose Count– Click OK

Finalizing the PivtotTable Report

• Step 1: Right-click in cell B4 (or any other cell containing meal prices)– Select Group

• Step 2: When the Grouping dialog box appears:– Enter 10 in the Starting at box– Enter 49 in the Ending at box– Enter 10 in the By box

• Step 3: Right-click on Excellent in cell A5 – Choose Move– Select Move “Excellent” to End

• Step 4: Close the PivotTable Field List box

Crosstabulation for the LA Restaurant Example

Meal Price ($)        

Quality Rating 10-19 20-29 30-39 40-49Grand Total

Good 42 40 2   84

Very Good 34 64 46 6 150

Excellent 2 14 28 22 66

Grand Total 78 118 76 28 300

Chapter 2 file Restaurant.xlsx

Crosstabulation: Simpson’s Paradox

Simpson’ Paradox: In some cases the conclusions based upon an aggregated crosstabulation can be completely reversed if we look at the unaggregated data.

We must be careful in drawing conclusions about the relationship between the two variables in the aggregated crosstabulation.

Data in two or more crosstabulations are often aggregated to produce a summary crosstabulation.

Judge

Verdict Kendall Luckett Total

UpheldReversed

129 (86%)21 (14%)

110 (88%)15 (12%)

23936

Total (%) 150 (100%) 125 (100%) 275

You might think Luckett is the better Judge. However, a larger share of Kendall’s cases were in municipal court—where the likelihood of being overturned on appeal is higher.

The general pattern of the plotted points suggests the overall relationship between the variables.

One variable is shown on the horizontal axis and the other variable is shown on the vertical axis.

A scatter diagram is a graphical presentation of the relationship between two quantitative variables.

Scatter Diagram and Trendline

A trendline is an approximation of the relationship.

A Positive Relationship

Y

X0

A Negative Relationship

Y

X0

No Apparent Relationship

Y

X0

Example: Panthers Football Team

• Scatter Diagram The Panthers football team is interestedin investigating the relationship, if any,between interceptions made and points scored.

13213

1424181730

x = Number ofInterceptions

y = Number of Points Scored

Scatter Diagram

y

x

Number of Interceptions

Num

ber

of

Poin

ts S

core

d

510

15

2025

30

0

35

1 2 30 4

Insights Gained from the Preceding Scatter Diagram

• The relationship is not perfect; all plotted points in the scatter diagram are not on a straight line.

• Higher points scored are associated with a higher number of interceptions.

• The scatter diagram indicates a positive relationship between the number of interceptions and the number of points scored.

Example: Panthers Football Team

Using Excel’s Chart Wizard to Constructa Scatter Diagram and Trendline

Formula Worksheet (showing data entered)

A B C

1Number of

InterceptionsNumber of

Points Scored 2 1 14 3 3 24 4 2 18 5 1 17 6 3 30 7

Step 1 Select cells A1:B6

Step 2 Click the Chart Wizard button on standard toolbar

Step 3 When the Chart Wizard - Step 1 of 4 - Chart Type dialog box appears: Choose XY (Scatter) in the Chart Type list Choose Scatter from the Chart subtype display Click Next >

Using Excel’s Chart Wizardto Construct a Scatter Diagram

. . . continue

Using Excel’s Chart Wizardto Construct a Scatter Diagram

Step 4 When the Chart Wizard - Step 2 of 4 - Chart Source Data dialog box appears: Click Next >

. . . continue

Using Excel’s Chart Wizardto Construct a Scatter Diagram

Step 5 When the Chart Wizard - Step 3 of 4 – Chart Options dialog box appears:

Select the Titles tab and then Type Scatter Diagram for the Panthers

in the Chart title: box Type Number of Interceptions in the

Value (X) axis: box Type Number of Points Scored in the

Value (Y) axis: box

. . . continue

Step 5 (continued) Select the Legend tab and then

Remove the check in the Show Legend box Click Next >

Using Excel’s Chart Wizardto Construct a Scatter

Diagram

Step 6 When the Chart Wizard – Step 4 of 4 - Chart Location dialog box appears:

Specify a location for the new chart Click Finish

Using Excel’s Chart Wizardto Construct a Scatter Diagram

A B C 8 9

10 11 12 13 14 15 16 17 18 19 20

Scatter Diagram for the Panthers

0

5

10

15

20

25

30

35

0 1 2 3 4Number of Interceptions

Nu

mb

er

of

Po

ints

Sco

red

.

Adding a Trendline

Using Excel’s Chart Wizard to Constructa Scatter Diagram and Trendline

Step 3 When the Add Trendline dialog box appears: Select the Type tab and then

Choose Linear from the Trend/ Regression type display

Click OK

Step 2 Choose the Add Trendline option

Step 1 Position the mouse pointer over any data point in the scatter diagram and right click

A B C 8 9

10 11 12 13 14 15 16 17 18 19 20

Scatter Diagram for the Panthers

0

5

10

15

20

25

30

35

0 1 2 3 4Number of Interceptions

Nu

mb

er

of

Po

ints

Sco

red

.

Using Excel’s Chart Wizard to Construct

a Scatter Diagram and Trendline

Scatter Diagram for the Stereo and Sound Equipment Store Example

Scatter Diagram for Stereo and Sound Equipment Store

010203040506070

0 1 2 3 4 5 6

Commercials

Sal

es V

ole

um

Scatter Diagram for Stereo and Sound Equipment Store

010203040506070

0 1 2 3 4 5 6

Commercials

Sal

es V

ole

um

Scatter Diagram for the Stereo and Sound Equipment Store Example—with a Trendline