descriptives & graphing
TRANSCRIPT
Descriptives & graphing
Lecture 3
Survey Research & Design in Psychology
James Neill, 2017
Creative Commons Attribution 4.0
Descriptives & Graphing
Image source: http://commons.wikimedia.org/wiki/File:3D_Bar_Graph_Meeting.jpg
7126/6667 Survey Research & Design in PsychologySemester 1, 2017, University of Canberra, ACT, AustraliaJames T. Neillhttp://www.slideshare.net/jtneill/descriptives-graphinghttp://en.wikiversity.org/wiki/Survey_research_and_design_in_psychology/Lectures/Descriptives_%26_graphing
Image source: http://commons.wikimedia.org/wiki/File:3D_Bar_Graph_Meeting.jpgImage author: lumaxart, http://www.flickr.com/photos/lumaxart/2136954043/Image license: Creative Commons Attribution Share Alike 2.0 unported, http://creativecommons.org/licenses/by-sa/2.0/deed.en
Description: Overviews descriptive statistics and graphical approaches to analysis of univariate data.
Overview:
Descriptives & Graphing
Getting to know a data set
LOM & types of statistics
Descriptive statistics
Normal distribution
Non-normal distributions
Effect of skew on central tendency
Principles of graphing
Univariate graphical techniques
Getting to know
a data-set
(how to approach data)
Play with the data get to know it.
Image source: http://www.flickr.com/photos/analytik/1356366068/
Image source: http://www.flickr.com/photos/analytik/1356366068/By analytic http://www.flickr.com/photos/analytik/License: CC-by-SA 2.0 http://creativecommons.org/licenses/by-sa/2.0/deed.en
Don't be afraid - you can't break data!
Image source: http://www.flickr.com/photos/rnddave/5094020069
Image source: http://www.flickr.com/photos/rnddave/5094020069By David (rnddave), http://www.flickr.com/photos/rnddave/License: CC-by-SA 2.0 http://creativecommons.org/licenses/by-sa/2.0/deed.en
Check & screen the data keep signal, reduce noise
Image source: https://commons.wikimedia.org/wiki/File:Nasir-al_molk_-1.jpg
Image source: https://commons.wikimedia.org/wiki/File:Nasir-al_molk_-1.jpgImage author: Ayyoubsabawiki, https://commons.wikimedia.org/w/index.php?title=User:AyyoubsabawikiLicense: CC-SA 4.0, Creative Commons Attribution-Share Alike 4.0 International, https://creativecommons.org/licenses/by-sa/4.0/deed.en
Data checking: One person reads the survey responses aloud to another person who checks the electronic data file.
For large studies, check a proportion of the surveys and declare the error-rate in the research report.
Image source: http://maxpixel.freegreatpicture.com/Business-Team-Two-People-Meeting-Computers-Office-1209640
Image source: http://maxpixel.freegreatpicture.com/Business-Team-Two-People-Meeting-Computers-Office-1209640Image author: freegreatpicture.com, http://maxpixel.freegreatpicture.comLicense: CC0 Public Domain, https://creativecommons.org/publicdomain/zero/1.0/
Data screening: Carefully 'screening' a data file helps to remove errors and maximise validity.
For example, screen for:Out of range values
Mis-entered data
Missing cases
Duplicate cases
Missing data
Image source: https://commons.wikimedia.org/wiki/File:Archaeology_dirt_screening.jpg
Image source: https://commons.wikimedia.org/wiki/File:Archaeology_dirt_screening.jpgImage author: U.S. Air Force photo/Airman 1st Class Devante WilliamsLicense: CC0 Public Domain, https://creativecommons.org/publicdomain/zero/1.0/
Explore the data
Image source: https://commons.wikimedia.org/wiki/File:Kazimierz_Nowak_in_jungle_2.jpgI
Image source: https://commons.wikimedia.org/wiki/File:Kazimierz_Nowak_in_jungle_2.jpgImage author: http://www.poznajswiat.com.pl/art/1039License: Public domain, https://commons.wikimedia.org/wiki/Commons:Licensing#Material_in_the_public_domain
Get intimate
with the data
Image source: http://www.flickr.com/photos/elmoalves/2932572231/
Image source: http://www.flickr.com/photos/elmoalves/2932572231/By analytic Elmo Alves http://www.flickr.com/photos/elmoalves/License: CC-A 2.0 http://creativecommons.org/licenses/by/2.0/deed.en
Describe the data's main features
find a meaningful, accurate
way to depict thetrue story of the data
Image source: http://www.flickr.com/photos/lloydm/2429991235/
Image source: http://www.flickr.com/photos/lloydm/2429991235/By analytic fakelvis http://www.flickr.com/photos/lloydm/License: CC-by-SA 2.0 http://creativecommons.org/licenses/by-sa/2.0/deed.en
Test hypotheses
to answer research questions
Image source: https://pixabay.com/en/light-bulb-current-light-glow-1042480/
Image source: https://pixabay.com/en/light-bulb-current-light-glow-1042480/Image author: ComFreak, https://pixabay.com/en/users/Comfreak-51581/License: Public domain
Level of measurement & types of statistics
Image source: http://www.flickr.com/photos/peanutlen/2228077524/
Image source: http://www.flickr.com/photos/peanutlen/2228077524/ by Smile My DayImage author: Terence Chang, http://www.flickr.com/photos/peanutlen/Image license: CC-by-A 2.0
Golden rule of data analysis
A variable's level of measurement determines the type of statistics that can be used, including types of:descriptive statistics
graphs
inferential statistics
Levels of measurement and
non-parametric vs. parametric
Categorical & ordinal data DV
non-parametric
(Does not assume a normal distribution)
Interval & ratio data DV
parametric
(Assumes a normal distribution)
non-parametric
(If distribution is non-normal)
DVs = dependent variables
Parametric statistics
Statistics which estimate parameters of a population, based on
the normal distribution Univariate:
mean, standard deviation, skewness, kurtosis, t-tests, ANOVAs
Bivariate:
correlation, linear regression
Multivariate:
multiple linear regression
More powerful
(more sensitive)
More assumptions
(population is normally distributed)
Vulnerable to violations of assumptions
(less robust)
Parametric statistics
Non-parametric statistics
Statistics which do not assume sampling from a population which is normally distributed There are non-parametric alternatives for many parametric statistics
e.g., sign test, chi-squared, Mann-Whitney U test, Wilcoxon matched-pairs signed-ranks test.
Non-parametric statistics
Less powerful
(less sensitive)
Fewer assumptions
(do not assume a normal distribution)
Less vulnerable to assumption violation
(more robust)
Univariate descriptive statistics
Number of variables
Univariate
= one variableBivariate
= two variablesMultivariate
= more than two variables
mean, median, mode, histogram, bar chart
correlation, t-test, scatterplot, clustered bar chart
reliability analysis, factor analysis, multiple linear regression
This lecture focuses on univariate descritive statistics and graphs
What do we want to describe?
The distributional properties of variables, based on:Central tendency(ies): e.g., frequencies, mode, median, mean
Shape: e.g., skewness, kurtosis
Spread (dispersion): min., max., range, IQR, percentiles, variance, standard deviation
By using univariate statistics and possibly also graphs, we want to give a meaningful snapshop summary which captures the main features of each variables distribution.
Measures of central tendency
Statistics which represent the centre of a frequency distribution:Mode (most frequent)
Median (50th percentile)
Mean (average)
Which ones to use depends on:Type of data (level of measurement)
Shape of distribution (esp. skewness)
Reporting more than one may be appropriate.
Measures of central tendency
If meaningfulRatioInterval
Ordinal
NominalMeanMedianMode / Freq. /%s
If meaningfulx
x
x
Nominal Mode e.g., whats the favourite colour?Ordinal Median e.g., See also http://www.quickmba.com/stats/centralten/OVERHEAD p.84 Bryman & Duncan (1997)
Measures of distribution
Measures of shape, spread, dispersion, and deviation from the central tendency
Non-parametric:Min. and max.
Range
Percentiles
Parametric:SD
Skewness
Kurtosis
Ratio
Interval
Ordinal
NominalVar / SDPercentileMin / Max, Range
Measures of spread / dispersion / deviation
If meaningful
x
x
x
x
Descriptives for nominal data
Nominal LOM = Labelled categories
Descriptive statistics:Most frequent? (Mode e.g., females)
Least frequent? (e.g., Males)
Frequencies (e.g., 20 females, 10 males)
Percentages (e.g. 67% females, 33% males)
Cumulative percentages
Ratios (e.g., twice as many females as males)
Nominal data consists of labels - e.g., 1=no, 2=yesNote that if you want to test whether one frequency is significantly higher than another, then use binomial test or a contingency test (chi-square).Also note that nominal variables can be used as IVs in tests of mean differences, but not in parametric tests of association such as correlations. To use in parametric tests of association, nominal data can be dummy coded (i.e., converted into a series of dichotomous variables).
Descriptives for ordinal data
Ordinal LOM = Conveys order but not distance (e.g., ranks)
Descriptives approach is as for nominal (frequencies, mode etc.)
Plus percentiles (including median) may be useful
Note: Can use ordinal data as IVs in tests of mean differences, but not in parametric tests of association.
Descriptives for interval data
Interval LOM = order and distance, but no true 0 (0 is arbitrary).
Central tendency (mode, median, mean)
Shape/Spread (min., max., range, SD, skewness, kurtosis)
Interval data is discrete, but is often treated as ratio/continuous (especially for > 5 intervals)
Descriptives for ratio data
Ratio = Numbers convey order and distance, meaningful 0 point
As for interval, use median, mean, SD, skewness etc.
Can also use ratios (e.g., Category A is twice as large as Category B)
Mode (Mo)
Most common score - highest point in a frequency distribution a real score the most common response
Suitable for all levels of data, but may not be appropriate for ratio (continuous)
Not affected by outliers
Check frequencies and bar graph to see whether it is an accurate and useful statistic
e.g., for a distribution with 32%, 33%, and 34%, the mode would be misleading to report; instead it would be appropriate to report the similar % for each of the three categories.
Frequencies (f) and
percentages (%)
# of responses in each category
% of responses in each category
Frequency table
Visualise using a bar or pie chart
Note: Can use ordinal data as IVs in tests of mean differences, but not in parametric tests of association.
Crosstabs (contingency table) is the bivariate equivalent of frequencies
Median (Mdn)
Mid-point of distribution
(Quartile 2, 50th percentile)
Not badly affected by outliers
May not represent the central tendency in skewed data
If the Median is useful, then consider what other percentiles may also be worth reporting
Summary: Descriptive statistics
Level of measurement and normality determines whether data can be treated as parametric
Describe the central tendencyFrequencies, Percentages
Mode, Median, Mean
Describe the variability:Min., Max., Range, Quartiles
Standard Deviation, Variance
Properties of the
normal distribution
Image source: http://www.flickr.com/photos/trevorblake/3200899889/
Image source: Bell Curve http://www.flickr.com/photos/trevorblake/3200899889/By Trevor Blake http://www.flickr.com/photos/trevorblake/License: CC-by-SA 2.0 http://creativecommons.org/licenses/by-sa/2.0/deed.en
Four moments of a
normal distribution
MeanSD-ve Skew+ve SkewKurtosisImage source: UnknownKarl Pearson in his 1893 letter to Nature suggested that the moments about the mean could be used to measure the deviations of empirical distributions from the normal distribution Moments around the mean:http://www.visualstatistics.net/Visual%20Statistics%20Multimedia/normalization.htm
Four moments of a
normal distribution
Four mathematical qualities (parameters) can describe a continuous distribution which at least roughly follows a bell curve shape:1st = mean (central tendency)
2nd = SD (dispersion)
3rd = skewness (lean / tail)
4th = kurtosis (peakedness / flattness)
Mean
(1st moment )
Average score
Mean = X / NFor normally distributed ratio or interval (if treating it as continuous) data.
Influenced by extreme scores (outliers)
Beware inappropriate averaging...
With your head in an oven
and your feet in ice
you would feel,
on average,
just fineThe majority of people have more than the average number
of legs
(M = 1.9999).
Image sources: Clipart
Standard deviation
(2nd moment)
SD = square root of the variance
= (X - X)2 N 1For normally distributed interval or ratio data
Affected by outliers
Can also derive the Standard Error (SE) = SD / square root of N
Standard deviation is related to the scale of measurement, e.g.If SD = 1 one for cms, it would be 10 for ms and 1000 for kmSo, dont assume SD = 50 is big or SD = .1 is small it all depends on what scale is used.Be aware that the lower the N, the lower the SD > large samples reduce the SD
N-1 is the formula when generalising from a sample to a population; otherwise use N if its the SD for the sample.
Skewness
(3rd moment )
Lean of distribution +ve = tail to right
-ve = tail to left
Can be caused by an outlier, or ceiling or floor effects
Can be accurate (e.g., cars owned per person would have a skewed distribution)
Skewness (3rd moment)
(with ceiling and floor effects)
Negative skew
Ceiling effect
Positive skew
Floor effect
Image source http://www.visualstatistics.net/Visual%20Statistics%20Multimedia/normalization.htm
Image source http://www.visualstatistics.net/Visual%20Statistics%20Multimedia/normalization.htm
Kurtosis
(4th moment )
Flatness or peakedness of distribution
+ve = peaked -ve = flattenedBy altering the X &/or Y axis, any distribution can be made to look more peaked or flat add a normal curve to help judge kurtosis visually.
Kurtosis
(4th moment )
Image source: https://classconnection.s3.amazonaws.com/65/flashcards/2185065/jpg/kurtosis-142C1127AF2178FB244.jpg
Image source: https://classconnection.s3.amazonaws.com/65/flashcards/2185065/jpg/kurtosis-142C1127AF2178FB244.jpg
The kurtosis reflects the extent to which the density of the empirical distribution differs from the probability densities of the normal curve.Mesokurtic = 0http://www.visualstatistics.net/Visual%20Statistics%20Multimedia/normalization.htm
Judging severity of
skewness & kurtosis
View histogram with normal curve
Deal with outliers
Rule of thumb:
Skewness and kurtosis > -1 or < 1 is generally considered to
sufficiently normal for meeting the assumptions of parametric
inferential statistics
Significance tests of skewness: Tend to be overly
sensitive
(therefore avoid using)
The significance tests for skewness / kurtosis are not often used, at least in part because they are subject to sample size, so with a small size sample they are less likely to be significant than with a large sample size.
Areas under the normal curve
If distribution is normal
(bell-shaped - or close):~68% of scores within +/- 1 SD of M ~95%
of scores within +/- 2 SD of M~99.7% of scores within +/- 3 SD of
M
Areas under the normal curve
Image source: https://commons.wikimedia.org/wiki/File:Empirical_Rule.PNG
Image source: https://commons.wikimedia.org/wiki/File:Empirical_Rule.PNGImage author: Dan Kernler, https://commons.wikimedia.org/w/index.php?title=User:MathprofdkImage license: Creative Commons Attribution-Share Alike 4.0 International license, https://creativecommons.org/licenses/by-sa/4.0/deed.en
Non-normal distributions
Types of non-normal distribution
ModalityUni-modal (one peak)
Bi-modal (two peaks)
Multi-modal (more than two peaks)
SkewnessPositive (tail to right)
Negative (tail to left)
KurtosisPlatykurtic (Flat)
Leptokurtic (Peaked)
Non-normal distributions
Image source: Unknown.The significance tests for skewness / kurtosis are subject to sample size, so with a small size sample they are less likely to be significant than with a large sample size.
Histogram of people's weight
Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.Roughly normal, with positive skew
Histogram of daily calorie intake
N = 75
Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.Bimodal
Histogram of fertility
Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.Bimodal, with positive skew
Example normal distribution 1
At what age do you think you will die?
Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.At what age do you think you will die?There is an outlier near zero which is minimising the positive skew; the data is also leptokurtic.
Example normal distribution 2
This bimodal graph actually consists of two different, underlying normal distributions.
N = 185
Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.This distribution is bi-modal. It should not be treated as normal.In fact, if one looks more closely, it would sense to break down the distribution by gender.From the Quick Fun Survey data in Tutorial 1.
Example normal distribution 2
Distribution for females
Distribution for males
Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.This is the distribution for males; it has a ceiling effect, with very feminine not being selected at all (and not shown on the graph it should be). It is negatively skewed and leptokurtic. Note though that because Very feminine has no cases and is not shown, the population data would probably be even more skewed than this sample indicates. It is probably leptokurtic.
Non-normal distribution:
Use non-parametric descriptive statistics
Min. & Max.
Range = Max. - Min.
Percentiles
QuartilesQ1
Mdn (Q2)
Q3
IQR (Q3-Q1)
Add slide showing boxplot from p.84 Bryman & Duncan (1997)
Effects of skew on measures of central tendency
+vely skewed distributions
mode < median < meansymmetrical (normal) distributions
mean = median = mode-vely skewed distributions
mean < median < mode
The more skewed a distribution is, the more important it is to use the median tends as a measure of central tendency
Effects of skew on measures of central tendency
Image source: Unknown
Transformations
Converts data using various formulae to achieve normality and allow more powerful tests
Loses original metric
Complicates interpretation
If a survey question produces a floor effect, where will the mean, median and mode lie in relation to one another?
Review questions
Would the mean # of cars owned in Australia to exceed the median?
Review questions
Would the mean score on an easy test exceed the median performance?
Review questions
Graphical
techniques
Image source: http://www.flickr.com/photos/pagedooley/2121472112/
Image source: http://www.flickr.com/photos/pagedooley/2121472112/By Kevin Dooley http://www.flickr.com/photos/pagedooley/License: CC-by-A 2.0 http://creativecommons.org/licenses/by/2.0/deed.en
Visualisation
Visualization is any technique
for creating images, diagrams, or animations to communicate a
message. - Wikipedia
Image source: http://en.wikipedia.org/wiki/File:FAE_visualization.jpg
Image source: http://en.wikipedia.org/wiki/File:FAE_visualization.jpgLicense: Public domain
Science is beautiful
(Nature Video)
(Youtube 5:30 mins)
Image source:http://commons.wikimedia.org/wiki/File:Parodyfilm.png
http://www.youtube.com/watch?v=dvM4JPGsmVw
Image source:http://commons.wikimedia.org/wiki/File:Parodyfilm.pngImage author: FRacco, http://commons.wikimedia.org/wiki/User:FRaccoImage license: Creative Commons Attribution 3.0 unported, http://creativecommons.org/licenses/by-sa/3.0/deed.en
Is Pivot a turning point for web exploration?
(Gary Flake)
(TED talk - 6 min.)
Image source:http://commons.wikimedia.org/wiki/File:Parodyfilm.png
Image source:http://commons.wikimedia.org/wiki/File:Parodyfilm.pngImage author: FRacco, http://commons.wikimedia.org/wiki/User:FRaccoImage license: Creative Commons Attribution 3.0 unported, http://creativecommons.org/licenses/by-sa/3.0/deed.en
Principles of graphing
Clear purpose
Maximise clarity
Minimise clutter
Allow visual comparison
Graphs
(Edward Tufte)
Visualise data
Reveal data Describe
Explore
Tabulate
Decorate
Communicate complex ideas with clarity, precision, and efficiency
Graphing steps
Identify purpose of the graph (make large amounts of data coherent; present many #s in small space; encourage the eye to make comparisons)
Select type of graph to use
Draw and modify graph to be clear, non-distorting, and well-labelled (maximise clarity, minimise clarity; show the data; avoid distortion; reveal data at several levels/layers)
Software for
data visualisation (graphing)
Statistical packages
e.g., SPSS Graphs or via Analyses
Spreadsheet packages
e.g., MS Excel
Word-processors
e.g., MS Word Insert Object Micrograph Graph Chart
Clevelands hierarchy
Image source:http://www.processtrends.com/TOC_data_visualization.htm
Image source:http://www.processtrends.com/TOC_data_visualization.htm Cleveland, William S., Elements of Graphing Data, 1985License: Unknown
Cleveland (1984) conducted experiments to measure people's accuracy in interpreting graphs, with findings as follows (Robbins): Position along a common scale Position along non aligned scales Length Angle-slope Area Volume Color hue - saturation - density
Clevelands hierarchy
Image source:https://priceonomics.com/how-william-cleveland-turned-data-visualization/
Image source:https://priceonomics.com/how-william-cleveland-turned-data-visualization/Cleveland, William S., Elements of Graphing Data, 1985License: Unknown
Cleveland (1984) conducted experiments to measure people's accuracy in interpreting graphs, with findings as follows (Robbins): Position along a common scale Position along non aligned scales Length Angle-slope Area Volume Color hue - saturation - density
Univariate graphs
Bar graph
Pie chart
Histogram
Stem & leaf plot
Data plot /
Error bar
Box plot
Non-parametric
i.e., nominal or ordinal
}
}
Parametric
i.e., normally distributed interval or ratio
Non-normal parametric data can be recoded and treated as nominal or ordinal data.
Bar chart (Bar graph)
Allows comparison of heights of bars
X-axis: Collapse if too many categories
Y-axis: Count/Frequency or % - truncation exaggerates differences
Can add data labels (data values for each bar)
Note
truncated
Y-axis
Use a bar chart instead
Hard to readDifficult to showSmall values
Small differences
Rotation of chart and
position of slices
influences perception
Pie chart
Image source: Unknown
Pie chart Use bar chart instead
Image source: https://priceonomics.com/how-william-cleveland-turned-data-visualization/
Image source: https://priceonomics.com/how-william-cleveland-turned-data-visualization/
Histogram
For continuous data (Likert?, Ratio)
X-axis needs a happy medium for # of categories
Y-axis matters (can exaggerate)
Image source: Author
Histogram of male & female heights
Wild & Seber (2000)
Image source: Wild, C. J., & Seber, G. A. F. (2000). Chance encounters: A first course in data analysis and inference. New York: Wiley.
Image source: Wild, C. J., & Seber, G. A. F. (2000). Chance encounters: A first course in data analysis and inference. New York: Wiley.DV = height (ratio)IV = Gender (categorical)
Stem & leaf plot
Use for ordinal, interval and ratio data (if rounded)
May look confusing to unfamiliar reader
Image source: Unknown.A bit of a plug and plea for stem & leaf plots they are underused. They are powerful because they are:Efficient e.g., they contain all the data succinctly others could use the data in a stem & leaf plot to do further analysis
Visual and mathematical: As well as containing all the data, the stem & leaf plot presents a powerful, recognizable visual of the data, akin to a bar graph. Turning a stem & leaf plot 90 degrees counter-clockiwse is recommend this makes the visual display more conventional and is easy to recognise, and the numbers are are less obvious, hence emphasizing the visual histogram shape.
Contains actual data
Collapses tails
Underused alternative to histogram
Stem & leaf plot
Frequency Stem & Leaf 7.00 1 . & 192.00 1 . 22223333333 541.00 1 . 444444444444444455555555555555 610.00 1 . 6666666666666677777777777777777777 849.00 1 . 88888888888888888888888888899999999999999999999 614.00 2 . 0000000000000000111111111111111111 602.00 2 . 222222222222222233333333333333333 447.00 2 . 4444444444444455555555555 291.00 2 . 66666666677777777 240.00 2 . 88888889999999 167.00 3 . 000001111 146.00 3 . 22223333 153.00 3 . 44445555 118.00 3 . 666777 99.00 3 . 888999 106.00 4 . 000111 54.00 4 . 222 339.00 Extremes (>=43)
Image source: Unknown
Box plot
(Box & whisker)
Useful for interval and ratio data
Represents min., max, median, quartiles, & outliers
Image source: Unknown
Alternative to histogram
Useful for screening
Useful for comparing variables
Can get messy - too much info
Confusing to unfamiliar reader
Box plot (Box & whisker)
Image source: Author
Data plot & error bar
Data plot
Error bar
Image source: Unknown.This is a univariate precursor to a scatterplot (a plot of a ratio by ratio variable).It works if there is a small amount of data; otherwise use a histogram to indicate the frequency within equal interval ranges.From: http://www.physics.csbsju.edu/stats/display.distribution.html
Image source: Unknown.Karl Pearson in his 1893 letter to Nature suggested that the moments about the mean could be used to measure the deviations of empirical distributions from the normal distribution Moments around the mean:http://www.visualstatistics.net/Visual%20Statistics%20Multimedia/normalization.htm
Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.Histogram: At what age do you think you will die?There is an outlier near zero which is minimising the positive skew; the data is also quite strongly leptokurtic.
Alternative to histogram
Implies continuity e.g., time
Can show multiple lines
Line graph
Image source: Author
Graphical integrity
(part of academic integrity)
Image source: Unknown.
Graphical integrity
"Like good writing, good graphical displays of data communicate
ideas with clarity, precision, and efficiency.Like poor writing,
bad graphical displays distort or obscure the data, make it harder
to understand or compare, or otherwise thwart the communicative
effect which the graph should convey."
Michael Friendly Gallery of Data Visualisation
Tufte, Edward R., The Visual Display of Quantitative Information, 1983
Tuftes graphical integrity
Some lapses intentional, some not
Lie Factor = size of effect in graph size of effect in data
Misleading uses of area
Misleading uses of perspective
Leaving out important context
Lack of taste and aesthetics
Tufte, Edward R., The Visual Display of Quantitative Information, 1983
Review exercise:
Fill in the cells in this table
LevelPropertiesExamplesDescriptive StatisticsGraphs
Nominal
/CategoricalOrdinal / RankIntervalRatio
Answers: http://goo.gl/Ln9e1
References
Chambers, J., Cleveland, B., Kleiner, B., & Tukey, P. (1983). Graphical methods for data analysis. Boston, MA: Duxbury Press.
Cleveland, W. S. (1985). The elements of graphing data. Monterey, CA: Wadsworth.
Jones, G. E. (2006). How to lie with charts. Santa Monica, CA: LaPuerta.
Tufte, E. R. (1983). The visual display of quantitative information. Cheshire, CT: Graphics Press.
Tufte. E. R. (2001). Visualizing quantitative data. Cheshire, CT: Graphics Press.
Tukey J. (1977). Exploratory data analysis. Addison-Wesley.
Wild, C. J., & Seber, G. A. F. (2000). Chance encounters: A first course in data analysis and inference. New York: Wiley.
Open Office Impress
This presentation was made using Open Office Impress.
Free and open source software.
http://www.openoffice.org/product/impress.html
Column 1Column 2Column 3
Row 19.13.24.54
Row 22.48.89.65
Row 33.11.53.7
Row 44.39.026.2
Click to edit the title text format
Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level
Click to edit the title text format
Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level
Click to edit the title text format
Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level