organizing categorical variables - · pdf filestatistics chapter 2: organizing and visualizing...
Post on 23-Mar-2018
291 Views
Preview:
TRANSCRIPT
Chapter 2Organizing and Visualizing Variables
Dr. Joerg Wild
Henan University of Technology
Statistics Chapter 2: Organizing and Visualizing Variables
Table of contents
1 Organizing Categorical Variables
2 Organizing Numerical Variables
3 Visualizing Categorical Variables
4 Visualizing Numerical Variables
5 Visualizing Two Numerical Variables
6 The Challenge in Organizing and VisualizingVariables
2 / 51
Organizing CategoricalVariables
3 / 51
Statistics Chapter 2: Organizing and Visualizing Variables Organizing Categorical Variables
Categorical Data Are Organized By UtilizingTables
4 / 51
Statistics Chapter 2: Organizing and Visualizing Variables Organizing Categorical Variables
Organizing Categorical Data:Summary Table
A summary table tallies the frequencies orpercentages of items in a set of categories so thatyou can see differences between categories.
Figure 1: Main Reason Young Adults Shop Online
5 / 51
Statistics Chapter 2: Organizing and Visualizing Variables Organizing Categorical Variables
Summary Table
The sample of 316 retirement funds for the “ChoiceIs Yours” scenario includes the variable risk thathas the defined categories Low, Average, and High.
Figure 2: Summary Table of Levels of Risk of Retirement Funds
6 / 51
Statistics Chapter 2: Organizing and Visualizing Variables Organizing Categorical Variables
Contingency TableAdding a new dimension, the table below presents thecompleted contingency table after all 316 funds have beentallied. This table shows that there are 143 retirement fundsthat have the fund type Growth and risk level Low. Insummarizing all six joint responses, the table reveals thatGrowth and Low is the most frequent joint response in thesample of 316 retirement funds.
Figure 3: Contingency Table Displaying Fund Type and Risk Level
7 / 51
Statistics Chapter 2: Organizing and Visualizing Variables Organizing Categorical Variables
Contingency TableFigure 4: Contingency Table Displaying Fund Type and Risk Level, Based
on Percentage of Overall Total
Figure 5: Contingency Table Displaying Fund Type and Risk Level, Based
on Percentage of Row Total
8 / 51
Statistics Chapter 2: Organizing and Visualizing Variables Organizing Categorical Variables
Contingency Table
Figure 6: Contingency Table Displaying Fund Type and Risk Level, Based
on Percentage of Column Total
9 / 51
Organizing NumericalVariables
10 / 51
Statistics Chapter 2: Organizing and Visualizing Variables Organizing Numerical Variables
Tables Used For Organizing Numerical Data
11 / 51
Statistics Chapter 2: Organizing and Visualizing Variables Organizing Numerical Variables
Array - Unordered vs. Ordered
12 / 51
Statistics Chapter 2: Organizing and Visualizing Variables Organizing Numerical Variables
Organizing Numerical Data:Frequency Distribution 1/2
The frequency distribution is a summary tablein which the data are arranged into numericallyordered classes.You must give attention to selecting theappropriate number of class groupings for thetable, determining a suitable width of a classgrouping, and establishing the boundaries ofeach class grouping to avoid overlapping.
13 / 51
Statistics Chapter 2: Organizing and Visualizing Variables Organizing Numerical Variables
Organizing Numerical Data:Frequency Distribution 2/2
The number of classes depends on the numberof values in the data. With a larger number ofvalues, typically there are more classes. Ingeneral, a frequency distribution should haveat least 5 but no more than 15 classes.To determine the width of a class interval, youdivide the range (Highest value–Lowest value)of the data by the number of class groupingsdesired.
14 / 51
Statistics Chapter 2: Organizing and Visualizing Variables Organizing Numerical Variables
Frequency Distribution,Example Meal Cost
Figure 7: Frequency Distributions of the Meal Costs for 50 City Restaurants
and 50 Suburban Restaurants
15 / 51
Statistics Chapter 2: Organizing and Visualizing Variables Organizing Numerical Variables
Frequency Distribution,Example Returns
Figure 8: Frequency Distributions of the One-Year Return Percentage for
growth and Value Funds
16 / 51
Statistics Chapter 2: Organizing and Visualizing Variables Organizing Numerical Variables
The Relative Frequency Distribution and thePercentage Distribution, Example Meal Cost
Figure 9: Relative Frequency Distributions and Percentage Distributions of
the Meal Costs at City and Suburban Restaurants
17 / 51
Statistics Chapter 2: Organizing and Visualizing Variables Organizing Numerical Variables
The Relative Frequency Distribution and thePercentage Distribution, Example Returns
Figure 10: Relative Frequency Distributions and Percentage Distributions
of the One-Year Return Percentage for growth and Value Funds
18 / 51
Statistics Chapter 2: Organizing and Visualizing Variables Organizing Numerical Variables
The Cumulative Distribution, Example Meal
Figure 11: Developing the Cumulative Percentage Distribution for City
Restaurant Meal Costs
19 / 51
Statistics Chapter 2: Organizing and Visualizing Variables Organizing Numerical Variables
The Cumulative Distribution, Example Meal
Figure 12: Cumulative Percentage Distributions of the Meal Costs for City
and Suburban Restaurants
20 / 51
Statistics Chapter 2: Organizing and Visualizing Variables Organizing Numerical Variables
The Cumulative Distribution, ExampleReturns
Figure 13: Cumulative Percentage Distributions of the One-Year Return
Percentages for growth and Value Funds
21 / 51
Visualizing CategoricalVariables
22 / 51
Statistics Chapter 2: Organizing and Visualizing Variables Visualizing Categorical Variables
Visualizing Categorical Data ThroughGraphical Displays
23 / 51
Statistics Chapter 2: Organizing and Visualizing Variables Visualizing Categorical Variables
Bar Chart And Pie Chart
Figure 14: excel bar chart (left) and pie chart (right) for reasons for shopping
online
24 / 51
Statistics Chapter 2: Organizing and Visualizing Variables Visualizing Categorical Variables
Bar Chart
Figure 15: Reviewing below bar chart you see that low risk is the largest
category, followed by average risk. Very few of the funds have high risk.
25 / 51
Statistics Chapter 2: Organizing and Visualizing Variables Visualizing Categorical Variables
Pie Chart
Figure 16: Reviewing below pie chart you see that more than two-thirds of
the funds are low risk, about 30% are average risk, and only about 4% are
high risk.
26 / 51
Visualizing NumericalVariables
27 / 51
Statistics Chapter 2: Organizing and Visualizing Variables Visualizing Numerical Variables
The Stem-and-Leaf Display
DefinitionA stem-and-leaf display visualizes data bypresenting the data as one or more row-wise stemsthat represent a range of values. In turn, each stemhas one or more leaves that branch out to the rightof their stem and represent the values found in thatstem. For stems with more than one leaf, the leavesare arranged in ascending order.
28 / 51
Statistics Chapter 2: Organizing and Visualizing Variables Visualizing Numerical Variables
The Stem-and-Leaf Display - ExampleSuppose you collect the following meal costs (in $)for 15 classmates who had lunch at a fast-foodrestaurant:7.42 6.29 5.83 6.50 8.34 9.51 7.10 6.80 5.90 4.896.50 5.52 7.90 8.30 9.60
29 / 51
Statistics Chapter 2: Organizing and Visualizing Variables Visualizing Numerical Variables
The Histogram
DefinitionA histogram visualizes data as a vertical bar chartin which each bar represents a class interval from afrequency or percentage distribution. In ahistogram, you display the numerical variable alongthe horizontal (X) axis and use the vertical (Y) axisto represent either the frequency or the percentageof values per class interval.
30 / 51
Statistics Chapter 2: Organizing and Visualizing Variables Visualizing Numerical Variables
The Histogram - Example
Figure 17: Frequency histograms for meal costs at city and suburban
restaurants
31 / 51
Statistics Chapter 2: Organizing and Visualizing Variables Visualizing Numerical Variables
The Histogram - Example
Figure 18: Excel frequency histograms for the one-year return percentages
for the growth and value funds
32 / 51
Visualizing Two NumericalVariables
33 / 51
Statistics Chapter 2: Organizing and Visualizing Variables Visualizing Two Numerical Variables
The Scatter Plot
DefinitionA scatter plot explores the possible relationshipbetween two numerical variables by plotting thevalues of one numerical variable on the horizontal,or X, axis and the values of a second numericalvariable on the vertical, or Y, axis.
34 / 51
Statistics Chapter 2: Organizing and Visualizing Variables Visualizing Two Numerical Variables
The Scatter Plot - Example
Figure 19: Revenues and Values for NBA Teams
35 / 51
Statistics Chapter 2: Organizing and Visualizing Variables Visualizing Two Numerical Variables
The Scatter Plot - Example
Figure 20: Scatter plot of revenue and value for NBA teams
36 / 51
Statistics Chapter 2: Organizing and Visualizing Variables Visualizing Two Numerical Variables
The Time-Series Plot
DefinitionA time-series plot plots the values of a numericalvariable on the Y axis and plots the time periodassociated with each numerical value on the X axis.A time-series plot can help you visualize trends indata that occur over time.
37 / 51
Statistics Chapter 2: Organizing and Visualizing Variables Visualizing Two Numerical Variables
The Time-Series Plot - Example
Figure 21: Movie Revenues (in $billions) from 1995 to 2013
38 / 51
Statistics Chapter 2: Organizing and Visualizing Variables Visualizing Two Numerical Variables
The Time-Series Plot - Example
Figure 22: Time-series plot of movie revenue per year from 1995 to 2013s
39 / 51
The Challenge in Organizingand Visualizing Variables
40 / 51
Statistics Chapter 2: Organizing and Visualizing Variables The Challenge in Organizing and Visualizing Variables
Obscuring Data
Figure 23: Information overload, presenting too many details, can obscure
data and hamper decision making
41 / 51
Statistics Chapter 2: Organizing and Visualizing Variables The Challenge in Organizing and Visualizing Variables
Creating False Impressions, 1/2
Figure 24: Left: One-Year Percentage Change in Year-to-Year Sales for the
Month of April; Right: Percentage Change for Three Consecutive Years
42 / 51
Statistics Chapter 2: Organizing and Visualizing Variables The Challenge in Organizing and Visualizing Variables
Creating False impressions, 2/2
Figure 25: Market shares of companies in “two” industries
43 / 51
Statistics Chapter 2: Organizing and Visualizing Variables The Challenge in Organizing and Visualizing Variables
Chartjunk, 1/3
Figure 26: Two visualizations of market share of soft drinks
44 / 51
Statistics Chapter 2: Organizing and Visualizing Variables The Challenge in Organizing and Visualizing Variables
Chartjunk, 2/3
Figure 27: Two visualizations of Australian wine exports to the United
States, in millions of gallons
45 / 51
Statistics Chapter 2: Organizing and Visualizing Variables The Challenge in Organizing and Visualizing Variables
Chartjunk, 3/3
Figure 28: Visualization of the amount of land planted with grapes for the
wine industry
46 / 51
Statistics Chapter 2: Organizing and Visualizing Variables The Challenge in Organizing and Visualizing Variables
Graphical Errors, 1/3
Figure 29: No Relative Basis
47 / 51
Statistics Chapter 2: Organizing and Visualizing Variables The Challenge in Organizing and Visualizing Variables
Graphical Errors, 2/3
Figure 30: Compressing the Vertical Axis
48 / 51
Statistics Chapter 2: Organizing and Visualizing Variables The Challenge in Organizing and Visualizing Variables
Graphical Errors, 3/3
Figure 31: No Zero Point on the Vertical Axis
49 / 51
Statistics Chapter 2: Organizing and Visualizing Variables The Challenge in Organizing and Visualizing Variables
Best Practices for ConstructingVisualizations
Use the simplest possible visualizationInclude a titleLabel all axesInclude a scale for each axis if the chartcontains axesBegin the scale for a vertical axis at zeroUse a constant scaleAvoid 3D effectsAvoid chartjunk is flawed.
50 / 51
Statistics Chapter 2: Organizing and Visualizing Variables The Challenge in Organizing and Visualizing Variables
Summary
Methods to organize variables.Methods to visualize variables.Methods to organize or visualize more thanone variable at the same time.Principles of proper visualizations.
51 / 51
top related