visualiation of quantitative information

Download Visualiation of quantitative information

If you can't read please download the document

Upload: james-neill

Post on 16-Apr-2017

8.289 views

Category:

Education


2 download

TRANSCRIPT

Survey Design II

James Neill, 2011

Visualisation of quantitative information

Introduction to research ComEdu Hons/MastersSemester 1, 2011, University of Canberra, ACT, AustraliaJames T. Neillhttp://en.wikiversity.org/wikiVisualisation_of_quantitative_information

Image source: http://commons.wikimedia.org/wiki/File:3D_Bar_Graph_Meeting.jpgImage author: lumaxart, http://www.flickr.com/photos/lumaxart/2136954043/Image license: Creative Commons Attribution Share Alike 2.0 unported, http://creativecommons.org/licenses/by-sa/2.0/deed.en

Description: Overviews levels of measurement and graphical approaches to analysis of univariate data.

Overview

Visualisation

Approaching data

Levels of measurement

Principals of graphing

Univariate graphs

Graphical integrity

Visualisation

Visualization is any technique
for creating images, diagrams, or animations to communicate a message. - Wikipedia

Image source: http://en.wikipedia.org/wiki/File:FAE_visualization.jpgLicense: Public domain

Is Pivot a turning point for web exploration?
(Gary Flake)


(TED talk - 6 min.)

For more examples of bias questions, see Nardi (2006). pp. 66-67

Image source:http://commons.wikimedia.org/wiki/File:Parodyfilm.pngImage author: FRacco, http://commons.wikimedia.org/wiki/User:FRaccoImage license: Creative Commons Attribution 3.0 unported, http://creativecommons.org/licenses/by-sa/3.0/deed.en

Approaching
data

Approaching
data

Entering &
screeningExploring,describing, &
graphingHypothesistesting

You are adding tools to your toolkitImage: Clipart

Describing & graphing data

THE CHALLENGE:
to find a meaningful, accurate
way to depict thetrue story of the data

Get your fingers dirty with data

Image source: http://www.flickr.com/photos/analytik/1356366068/By analytic http://www.flickr.com/photos/analytik/License: CC-by-SA 2.0 http://creativecommons.org/licenses/by-sa/2.0/deed.en

Get intimate with your data

Image source: http://www.flickr.com/photos/elmoalves/2932572231/By analytic Elmo Alves http://www.flickr.com/photos/elmoalves/License: CC-A 2.0 http://creativecommons.org/licenses/by/2.0/deed.en

Clearly report the data's main features

Image source: http://www.flickr.com/photos/lloydm/2429991235/By analytic fakelvis http://www.flickr.com/photos/lloydm/License: CC-by-SA 2.0 http://creativecommons.org/licenses/by-sa/2.0/deed.en

Levels of Measurement
=
Type of Data

Stevens (1946)

Image source: Unknown.

Levels of measurement

Nominal / Categorical

Ordinal

Interval

Ratio

Nominal/Category - measures identify categories e.g., sex, ethnicity.

Ordinal - relative ordering of responses e.g., rankings in an exam

Interval - scores stand in a quantitative relationship to one another, adjacent scores are separated by an equal interval

Ratio - like interval but with a true zero value e.g., height, speed

Discrete vs. continuous

Discrete- - - - - - - - - -

Continuous___________

Discrete data: finite options (e.g., labels)Continuous data: infinite options (e.g., cms)Discrete data is generally only whole numbers, whilst continuous data can have many decimalsDiscrete: nominal, ordinal, intervalContinuous: ratio

Each level has the properties of the preceding
levels, plus something more!

Image source: Unknown.

Categorical / nominal

Conveys a category label

(Arbitrary) assignment of #s to categories

e.g. Gender

No useful information, except as labels

Ordinal / ranked scale

Conveys order, but not distance

e.g. in a race, 1st, 2nd, 3rd, etc. or ranking of favourites or preferences

Image: Cropped version of http://www.flickr.com/photos/beatkueng/1350250361/?addedcomment=1#comment72157605326099631CC-by-A by Beat - http://www.flickr.com/photos/beatkueng/

Ordinal / ranked example:
Ranked importance

Rank the following aspects of the university according to what is most important to you (1 = most important through to 5 = least important)__ Quality of the teaching and education__ Quality of the social life__ Quality of the campus__ Quality of the administration__ Quality of the university's reputation

Image source:L.N Fowler & Co. c. 1870.

Interval scale

Conveys order & distance

0 is arbitrary

e.g., temperature (degrees C)Usually treat as continuous for > 5 intervals

Interval example:
8 point Likert scale

Image source:L.N Fowler & Co. c. 1870.

Ratio scale

Conveys order & distance

Continuous, with a meaningful 0 point

e.g. height, age, weight, time, number of times an event has occurredRatio statements can be made

e.g. X is twice as old (or high or heavy) as Y

Ratio scale:
Time

Image source:L.N Fowler & Co. c. 1870.

Why do levels of measurement matter?

Different analytical procedures
are used for different
levels of data.


More powerful statistics can be applied to higher levels

Image source: Unknown.

Principles of graphing

Image source: http://www.flickr.com/photos/pagedooley/2121472112/By Kevin Dooley http://www.flickr.com/photos/pagedooley/License: CC-by-A 2.0 http://creativecommons.org/licenses/by/2.0/deed.en

Graphs
(Edward Tufte)

Visualise data

Reveal data Describe

Explore

Tabulate

Decorate

Communicate complex ideas with clarity, precision, and efficiency

Tufte's graphing guidelines

Show the data

Avoid distortion

Focus on substance rather than method

Present many numbers in a small space

Make large data sets coherent

Tufte's graphing guidelines

Maximise the information-to-ink ratio

Encourage the eye to make comparisons

Reveal data at several levels/layers

Closely integrate with statistical and verbal descriptions

Graphing steps

Identify the purpose of the graph

Select which type of graph to use

Draw a graph

Modify the graph to be clear, non-distorting, and well-labelled.

Disseminate the graph (e.g., include it in a report)

Software for
data visualisation (graphing)

Statistical packages

e.g., SPSS

Spreadsheet packages

e.g., MS Excel

Word-processors

e.g., MS Word Insert Object Micrograph Graph Chart

Univariate graphs

Univariate graphs

Bar graph

Pie chart

Data plot

Error bar

Stem & leaf plot

Box plot (Box & whisker)

Histogram

Bar chart (Bar graph)

Examine comparative heights of bars

X-axis: Collapse if too many categories

Y-axis: Count or % or mean?

Consider whether to use data labels

Use a bar chart instead

Hard to readDoes not show small differences

Rotation / position influences perception

Pie chart

Image source: Unknown

Data plot & error bar

Data plot

Error bar

Image source: Unknown.This is a univariate precursor to a scatterplot (a plot of a ratio by ratio variable).It works if there is a small amount of data; otherwise use a histogram to indicate the frequency within equal interval ranges.From: http://www.physics.csbsju.edu/stats/display.distribution.html

Image source: Unknown.Karl Pearson in his 1893 letter to Nature suggested that the moments about the mean could be used to measure the deviations of empirical distributions from the normal distribution Moments around the mean:http://www.visualstatistics.net/Visual%20Statistics%20Multimedia/normalization.htm

Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.Histogram: At what age do you think you will die?There is an outlier near zero which is minimising the positive skew; the data is also quite strongly leptokurtic.

Stem & leaf plot

Alternative to histogram

Use for ordinal, interval and ratio data

May look confusing to unfamiliar reader

Image source: Unknown.A bit of a plug and plea for stem & leaf plots they are underused. They are powerful because they are:Efficient e.g., they contain all the data succinctly others could use the data in a stem & leaf plot to do further analysis

Visual and mathematical: As well as containing all the data, the stem & leaf plot presents a powerful, recognizable visual of the data, akin to a bar graph. Turning a stem & leaf plot 90 degrees counter-clockiwse is recommend this makes the visual display more conventional and is easy to recognise, and the numbers are are less obvious, hence emphasizing the visual histogram shape.

Contains actual data

Collapses tails

Stem & leaf plot

Frequency Stem & Leaf 7.00 1 . & 192.00 1 . 22223333333 541.00 1 . 444444444444444455555555555555 610.00 1 . 6666666666666677777777777777777777 849.00 1 . 88888888888888888888888888899999999999999999999 614.00 2 . 0000000000000000111111111111111111 602.00 2 . 222222222222222233333333333333333 447.00 2 . 4444444444444455555555555 291.00 2 . 66666666677777777 240.00 2 . 88888889999999 167.00 3 . 000001111 146.00 3 . 22223333 153.00 3 . 44445555 118.00 3 . 666777 99.00 3 . 888999 106.00 4 . 000111 54.00 4 . 222 339.00 Extremes (>=43)

Box plot
(Box & whisker)

Useful for interval and ratio data

Represents min., max, median, quartiles, & outliers

Alternative to histogram

Useful for screening

Useful for comparing variables

Can get messy - too much info

Confusing to unfamiliar reader

Box plot

Histogram

For continuous data

X-axis needs a happy medium for # of categories

Y-axis matters (can exaggerate)

Histogram of male & female heights

Image source: Wild, C. J., & Seber, G. A. F. (2000). Chance encounters: A first course in data analysis and inference. New York: Wiley.DV = height (ratio)IV = Gender (categorical)

Non-normal distributions

Image source: Unknown.The significance tests for skewness / kurtosis are subject to sample size, so with a small size sample they are less likely to be significant than with a large sample size.

Non-normal distributions

Image source: Unknown.The significance tests for skewness / kurtosis are subject to sample size, so with a small size sample they are less likely to be significant than with a large sample size.

Histogram of weight

Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.Roughly normal, with positive skew

Histogram of daily calorie intake

Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.Bimodal

Histogram of fertility

Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.Bimodal, with positive skew

Example normal distribution 1

Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.At what age do you think you will die?There is an outlier near zero which is minimising the positive skew; the data is also leptokurtic.

Example normal distribution 2

Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.This distribution is bi-modal. It should not be treated as normal.In fact, if one looks more closely, it would sense to break down the distribution by gender.From the Quick Fun Survey data in Tutorial 1.

Example normal distribution 2

Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.This is the distribution for males; it has a ceiling effect, with very feminine not being selected at all (and not shown on the graph it should be). It is negatively skewed and leptokurtic. Note though that because Very feminine has no cases and is not shown, the population data would probably be even more skewed than this sample indicates. It is probably leptokurtic.

Effects of skew on measures of central tendency

Image source: Unknown.

Alternative to histogram

Implies continuity e.g., time

Can show multiple lines

Line graph

NOIRBar chart & pie chart NOI Histogram IRStem & leaf IRData plot & box plot IRError-bar IRLine graph IR

Summary:
Graphs & levels of measurement

Graphical integrity

(part of academic integrity)

Image source: Unknown.

Graphing can be like a bikini. What they reveal is suggestive, but what they conceal is vital.
(aka Aaron Levenstein)

Image source: http://www.flickr.com/photos/alosojos/350530627/By FranUlloa, http://www.flickr.com/people/alosojos/License: CC-by-SA 2.0 http://creativecommons.org/licenses/by-sa/2.0/deed.en

Graphical integrity

"Like good writing, good graphical displays of data communicate ideas with clarity, precision, and efficiency.Like poor writing, bad graphical displays distort or obscure the data, make it harder to understand or compare, or otherwise thwart the communicative effect which the graph should convey."

Michael Friendly Gallery of Data Visualisation

Tufte, Edward R., The Visual Display of Quantitative Information, 1983

Clevelands hierarchy

Image source:http://www.processtrends.com/TOC_data_visualization.htm License: Unknown

Cleveland (1984) conducted experiments to measure people's accuracy in interpreting graphs, with findings as follows (Robbins):

Clevelands hierarchy:
Best to worst

Position along a common scale

Position along identical, non aligned scales

Length

Angle-slope

Area

Volume

Color hue - color saturation - density

Image source: Cleveland, William S., Elements of Graphing Data, 1985

Tuftes graphical integrity

Some lapses intentional, some not

Lie Factor = size of effect in graph size of effect in data

Misleading uses of area

Misleading uses of perspective

Leaving out important context

Lack of taste and aesthetics

Tufte, Edward R., The Visual Display of Quantitative Information, 1983

If a survey question produces a floor effect, where will the mean, median and mode lie in relation to one another?

Over the last century, the performance of the best baseball hitters has declined. Does this imply that the overall performance of baseball batters has decreased?

Review questions

OVERHEAD p.84 Bryman & Duncan (1997)

Can you complete this table?

LevelPropertiesExamplesDescriptive StatisticsGraphs

Nominal
/CategoricalOrdinal / RankIntervalRatio

Answers: http://wilderdom.com/research/Summary_Levels_Measurement.html

Links

Presenting Data Statistics Glossary v1.1 - http://www.cas.lancs.ac.uk/glossary_v1.1/presdata.html

A Periodic Table of Visualisation Methods - http://www.visual-literacy.org/periodic_table/periodic_table.html

Gallery of Data Visualization - http://www.math.yorku.ca/SCS/Gallery/

Univariate Data Analysis The Best & Worst of Statistical Graphs - http://www.csulb.edu/~msaintg/ppa696/696uni.htm

Pitfalls of Data Analysis
http://www.vims.edu/~david/pitfalls/pitfalls.htm

Statistics for the Life Sciences
http://www.math.sfu.ca/~cschwarz/Stat-301/Handouts/Handouts.html

Visualizing Quantitative Data, Tufte E. R., Graphics Press, 2001

Graphical Methods for Data Analysis, Chambers J., Cleveland, B. Kleiner, and P. Tukey, Duxbury Press, Boston, 1983 Exploratory Data Analysis, Tukey J., Addison-Wesley Pub Co., 1977

References

Cleveland, W. S. (1985). The elements of graphing data. Monterey, CA: Wadsworth.

Jones, G. E. (2006). How to lie with charts. Santa Monica, CA: LaPuerta.

Tufte, E. (1983). The visual display of quantitative information. Cheshire, CT: Graphics Press.

Open Office Impress

This presentation was made using Open Office Impress.

Free and open source software.

http://www.openoffice.org/product/impress.html

Click to edit the title text format

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level

Click to edit the title text format

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level

Click to edit the title text format

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level