descriptives & graphing

Download Descriptives & Graphing

If you can't read please download the document

Upload: james-neill

Post on 16-Apr-2017

19.283 views

Category:

Education


1 download

TRANSCRIPT

Descriptives & graphing

Lecture 3
Survey Research & Design in Psychology
James Neill, 2017
Creative Commons Attribution 4.0

Descriptives & Graphing

Image source: http://commons.wikimedia.org/wiki/File:3D_Bar_Graph_Meeting.jpg

7126/6667 Survey Research & Design in PsychologySemester 1, 2017, University of Canberra, ACT, AustraliaJames T. Neillhttp://www.slideshare.net/jtneill/descriptives-graphinghttp://en.wikiversity.org/wiki/Survey_research_and_design_in_psychology/Lectures/Descriptives_%26_graphing

Image source: http://commons.wikimedia.org/wiki/File:3D_Bar_Graph_Meeting.jpgImage author: lumaxart, http://www.flickr.com/photos/lumaxart/2136954043/Image license: Creative Commons Attribution Share Alike 2.0 unported, http://creativecommons.org/licenses/by-sa/2.0/deed.en

Description: Overviews descriptive statistics and graphical approaches to analysis of univariate data.

Overview:
Descriptives & Graphing

Getting to know a data set

LOM & types of statistics

Descriptive statistics

Normal distribution

Non-normal distributions

Effect of skew on central tendency

Principles of graphing

Univariate graphical techniques

Getting to know
a data-set
(how to approach data)

Play with the data get to know it.

Image source: http://www.flickr.com/photos/analytik/1356366068/

Image source: http://www.flickr.com/photos/analytik/1356366068/By analytic http://www.flickr.com/photos/analytik/License: CC-by-SA 2.0 http://creativecommons.org/licenses/by-sa/2.0/deed.en

Don't be afraid - you can't break data!

Image source: http://www.flickr.com/photos/rnddave/5094020069

Image source: http://www.flickr.com/photos/rnddave/5094020069By David (rnddave), http://www.flickr.com/photos/rnddave/License: CC-by-SA 2.0 http://creativecommons.org/licenses/by-sa/2.0/deed.en

Check & screen the data keep signal, reduce noise

Image source: https://commons.wikimedia.org/wiki/File:Nasir-al_molk_-1.jpg

Image source: https://commons.wikimedia.org/wiki/File:Nasir-al_molk_-1.jpgImage author: Ayyoubsabawiki, https://commons.wikimedia.org/w/index.php?title=User:AyyoubsabawikiLicense: CC-SA 4.0, Creative Commons Attribution-Share Alike 4.0 International, https://creativecommons.org/licenses/by-sa/4.0/deed.en

Data checking: One person reads the survey responses aloud to another person who checks the electronic data file.

For large studies, check a proportion of the surveys and declare the error-rate in the research report.

Image source: http://maxpixel.freegreatpicture.com/Business-Team-Two-People-Meeting-Computers-Office-1209640

Image source: http://maxpixel.freegreatpicture.com/Business-Team-Two-People-Meeting-Computers-Office-1209640Image author: freegreatpicture.com, http://maxpixel.freegreatpicture.comLicense: CC0 Public Domain, https://creativecommons.org/publicdomain/zero/1.0/

Data screening: Carefully 'screening' a data file helps to remove errors and maximise validity.

For example, screen for:Out of range values

Mis-entered data

Missing cases

Duplicate cases

Missing data

Image source: https://commons.wikimedia.org/wiki/File:Archaeology_dirt_screening.jpg

Image source: https://commons.wikimedia.org/wiki/File:Archaeology_dirt_screening.jpgImage author: U.S. Air Force photo/Airman 1st Class Devante WilliamsLicense: CC0 Public Domain, https://creativecommons.org/publicdomain/zero/1.0/

Explore the data

Image source: https://commons.wikimedia.org/wiki/File:Kazimierz_Nowak_in_jungle_2.jpgI

Image source: https://commons.wikimedia.org/wiki/File:Kazimierz_Nowak_in_jungle_2.jpgImage author: http://www.poznajswiat.com.pl/art/1039License: Public domain, https://commons.wikimedia.org/wiki/Commons:Licensing#Material_in_the_public_domain

Get intimate
with the data

Image source: http://www.flickr.com/photos/elmoalves/2932572231/

Image source: http://www.flickr.com/photos/elmoalves/2932572231/By analytic Elmo Alves http://www.flickr.com/photos/elmoalves/License: CC-A 2.0 http://creativecommons.org/licenses/by/2.0/deed.en

Describe the data's main features

find a meaningful, accurate
way to depict thetrue story of the data

Image source: http://www.flickr.com/photos/lloydm/2429991235/

Image source: http://www.flickr.com/photos/lloydm/2429991235/By analytic fakelvis http://www.flickr.com/photos/lloydm/License: CC-by-SA 2.0 http://creativecommons.org/licenses/by-sa/2.0/deed.en

Test hypotheses
to answer research questions

Image source: https://pixabay.com/en/light-bulb-current-light-glow-1042480/

Image source: https://pixabay.com/en/light-bulb-current-light-glow-1042480/Image author: ComFreak, https://pixabay.com/en/users/Comfreak-51581/License: Public domain

Level of measurement & types of statistics

Image source: http://www.flickr.com/photos/peanutlen/2228077524/

Image source: http://www.flickr.com/photos/peanutlen/2228077524/ by Smile My DayImage author: Terence Chang, http://www.flickr.com/photos/peanutlen/Image license: CC-by-A 2.0

Golden rule of data analysis

A variable's level of measurement determines the type of statistics that can be used, including types of:descriptive statistics

graphs

inferential statistics

Levels of measurement and
non-parametric vs. parametric

Categorical & ordinal data DV
non-parametric
(Does not assume a normal distribution)
Interval & ratio data DV
parametric
(Assumes a normal distribution)
non-parametric
(If distribution is non-normal)

DVs = dependent variables

Parametric statistics

Statistics which estimate parameters of a population, based on the normal distribution Univariate:
mean, standard deviation, skewness, kurtosis, t-tests, ANOVAs

Bivariate:
correlation, linear regression

Multivariate:
multiple linear regression

More powerful
(more sensitive)

More assumptions
(population is normally distributed)

Vulnerable to violations of assumptions
(less robust)

Parametric statistics

Non-parametric statistics

Statistics which do not assume sampling from a population which is normally distributed There are non-parametric alternatives for many parametric statistics

e.g., sign test, chi-squared, Mann-Whitney U test, Wilcoxon matched-pairs signed-ranks test.

Non-parametric statistics

Less powerful
(less sensitive)

Fewer assumptions
(do not assume a normal distribution)

Less vulnerable to assumption violation
(more robust)

Univariate descriptive statistics

Number of variables

Univariate
= one variableBivariate
= two variablesMultivariate
= more than two variables

mean, median, mode, histogram, bar chart

correlation, t-test, scatterplot, clustered bar chart

reliability analysis, factor analysis, multiple linear regression

This lecture focuses on univariate descritive statistics and graphs

What do we want to describe?

The distributional properties of variables, based on:Central tendency(ies): e.g., frequencies, mode, median, mean

Shape: e.g., skewness, kurtosis

Spread (dispersion): min., max., range, IQR, percentiles, variance, standard deviation

By using univariate statistics and possibly also graphs, we want to give a meaningful snapshop summary which captures the main features of each variables distribution.

Measures of central tendency

Statistics which represent the centre of a frequency distribution:Mode (most frequent)

Median (50th percentile)

Mean (average)

Which ones to use depends on:Type of data (level of measurement)

Shape of distribution (esp. skewness)

Reporting more than one may be appropriate.

Measures of central tendency

If meaningfulRatioInterval

Ordinal

NominalMeanMedianMode / Freq. /%s

If meaningfulx

x

x

Nominal Mode e.g., whats the favourite colour?Ordinal Median e.g., See also http://www.quickmba.com/stats/centralten/OVERHEAD p.84 Bryman & Duncan (1997)

Measures of distribution

Measures of shape, spread, dispersion, and deviation from the central tendency

Non-parametric:Min. and max.

Range

Percentiles

Parametric:SD

Skewness

Kurtosis

Ratio

Interval

Ordinal

NominalVar / SDPercentileMin / Max, Range

Measures of spread / dispersion / deviation

If meaningful

x

x

x

x

Descriptives for nominal data

Nominal LOM = Labelled categories

Descriptive statistics:Most frequent? (Mode e.g., females)

Least frequent? (e.g., Males)

Frequencies (e.g., 20 females, 10 males)

Percentages (e.g. 67% females, 33% males)

Cumulative percentages

Ratios (e.g., twice as many females as males)

Nominal data consists of labels - e.g., 1=no, 2=yesNote that if you want to test whether one frequency is significantly higher than another, then use binomial test or a contingency test (chi-square).Also note that nominal variables can be used as IVs in tests of mean differences, but not in parametric tests of association such as correlations. To use in parametric tests of association, nominal data can be dummy coded (i.e., converted into a series of dichotomous variables).

Descriptives for ordinal data

Ordinal LOM = Conveys order but not distance (e.g., ranks)

Descriptives approach is as for nominal (frequencies, mode etc.)

Plus percentiles (including median) may be useful

Note: Can use ordinal data as IVs in tests of mean differences, but not in parametric tests of association.

Descriptives for interval data

Interval LOM = order and distance, but no true 0 (0 is arbitrary).

Central tendency (mode, median, mean)

Shape/Spread (min., max., range, SD, skewness, kurtosis)

Interval data is discrete, but is often treated as ratio/continuous (especially for > 5 intervals)

Descriptives for ratio data

Ratio = Numbers convey order and distance, meaningful 0 point

As for interval, use median, mean, SD, skewness etc.

Can also use ratios (e.g., Category A is twice as large as Category B)

Mode (Mo)

Most common score - highest point in a frequency distribution a real score the most common response

Suitable for all levels of data, but may not be appropriate for ratio (continuous)

Not affected by outliers

Check frequencies and bar graph to see whether it is an accurate and useful statistic

e.g., for a distribution with 32%, 33%, and 34%, the mode would be misleading to report; instead it would be appropriate to report the similar % for each of the three categories.

Frequencies (f) and
percentages (%)

# of responses in each category

% of responses in each category

Frequency table

Visualise using a bar or pie chart

Note: Can use ordinal data as IVs in tests of mean differences, but not in parametric tests of association.

Crosstabs (contingency table) is the bivariate equivalent of frequencies

Median (Mdn)

Mid-point of distribution
(Quartile 2, 50th percentile)

Not badly affected by outliers

May not represent the central tendency in skewed data

If the Median is useful, then consider what other percentiles may also be worth reporting

Summary: Descriptive statistics

Level of measurement and normality determines whether data can be treated as parametric

Describe the central tendencyFrequencies, Percentages

Mode, Median, Mean

Describe the variability:Min., Max., Range, Quartiles

Standard Deviation, Variance

Properties of the
normal distribution

Image source: http://www.flickr.com/photos/trevorblake/3200899889/

Image source: Bell Curve http://www.flickr.com/photos/trevorblake/3200899889/By Trevor Blake http://www.flickr.com/photos/trevorblake/License: CC-by-SA 2.0 http://creativecommons.org/licenses/by-sa/2.0/deed.en

Four moments of a
normal distribution

MeanSD-ve Skew+ve SkewKurtosisImage source: UnknownKarl Pearson in his 1893 letter to Nature suggested that the moments about the mean could be used to measure the deviations of empirical distributions from the normal distribution Moments around the mean:http://www.visualstatistics.net/Visual%20Statistics%20Multimedia/normalization.htm

Four moments of a
normal distribution

Four mathematical qualities (parameters) can describe a continuous distribution which at least roughly follows a bell curve shape:1st = mean (central tendency)

2nd = SD (dispersion)

3rd = skewness (lean / tail)

4th = kurtosis (peakedness / flattness)

Mean
(1st moment )

Average score

Mean = X / NFor normally distributed ratio or interval (if treating it as continuous) data.

Influenced by extreme scores (outliers)

Beware inappropriate averaging...

With your head in an oven
and your feet in ice
you would feel,
on average,
just fineThe majority of people have more than the average number of legs
(M = 1.9999).

Image sources: Clipart

Standard deviation
(2nd moment)

SD = square root of the variance

= (X - X)2 N 1For normally distributed interval or ratio data

Affected by outliers

Can also derive the Standard Error (SE) = SD / square root of N

Standard deviation is related to the scale of measurement, e.g.If SD = 1 one for cms, it would be 10 for ms and 1000 for kmSo, dont assume SD = 50 is big or SD = .1 is small it all depends on what scale is used.Be aware that the lower the N, the lower the SD > large samples reduce the SD

N-1 is the formula when generalising from a sample to a population; otherwise use N if its the SD for the sample.

Skewness
(3rd moment )

Lean of distribution +ve = tail to right

-ve = tail to left

Can be caused by an outlier, or ceiling or floor effects

Can be accurate (e.g., cars owned per person would have a skewed distribution)

Skewness (3rd moment)
(with ceiling and floor effects)

Negative skew

Ceiling effect

Positive skew

Floor effect

Image source http://www.visualstatistics.net/Visual%20Statistics%20Multimedia/normalization.htm

Image source http://www.visualstatistics.net/Visual%20Statistics%20Multimedia/normalization.htm

Kurtosis
(4th moment )

Flatness or peakedness of distribution

+ve = peaked -ve = flattenedBy altering the X &/or Y axis, any distribution can be made to look more peaked or flat add a normal curve to help judge kurtosis visually.

Kurtosis
(4th moment )

Image source: https://classconnection.s3.amazonaws.com/65/flashcards/2185065/jpg/kurtosis-142C1127AF2178FB244.jpg

Image source: https://classconnection.s3.amazonaws.com/65/flashcards/2185065/jpg/kurtosis-142C1127AF2178FB244.jpg

The kurtosis reflects the extent to which the density of the empirical distribution differs from the probability densities of the normal curve.Mesokurtic = 0http://www.visualstatistics.net/Visual%20Statistics%20Multimedia/normalization.htm

Judging severity of
skewness & kurtosis

View histogram with normal curve

Deal with outliers

Rule of thumb:
Skewness and kurtosis > -1 or < 1 is generally considered to sufficiently normal for meeting the assumptions of parametric inferential statistics

Significance tests of skewness: Tend to be overly sensitive
(therefore avoid using)

The significance tests for skewness / kurtosis are not often used, at least in part because they are subject to sample size, so with a small size sample they are less likely to be significant than with a large sample size.

Areas under the normal curve

If distribution is normal
(bell-shaped - or close):~68% of scores within +/- 1 SD of M ~95% of scores within +/- 2 SD of M~99.7% of scores within +/- 3 SD of M

Areas under the normal curve

Image source: https://commons.wikimedia.org/wiki/File:Empirical_Rule.PNG

Image source: https://commons.wikimedia.org/wiki/File:Empirical_Rule.PNGImage author: Dan Kernler, https://commons.wikimedia.org/w/index.php?title=User:MathprofdkImage license: Creative Commons Attribution-Share Alike 4.0 International license, https://creativecommons.org/licenses/by-sa/4.0/deed.en

Non-normal distributions

Types of non-normal distribution

ModalityUni-modal (one peak)

Bi-modal (two peaks)

Multi-modal (more than two peaks)

SkewnessPositive (tail to right)

Negative (tail to left)

KurtosisPlatykurtic (Flat)

Leptokurtic (Peaked)

Non-normal distributions

Image source: Unknown.The significance tests for skewness / kurtosis are subject to sample size, so with a small size sample they are less likely to be significant than with a large sample size.

Histogram of people's weight

Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.Roughly normal, with positive skew

Histogram of daily calorie intake

N = 75

Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.Bimodal

Histogram of fertility

Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.Bimodal, with positive skew

Example normal distribution 1

At what age do you think you will die?

Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.At what age do you think you will die?There is an outlier near zero which is minimising the positive skew; the data is also leptokurtic.

Example normal distribution 2

This bimodal graph actually consists of two different, underlying normal distributions.

N = 185

Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.This distribution is bi-modal. It should not be treated as normal.In fact, if one looks more closely, it would sense to break down the distribution by gender.From the Quick Fun Survey data in Tutorial 1.

Example normal distribution 2

Distribution for females

Distribution for males

Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.This is the distribution for males; it has a ceiling effect, with very feminine not being selected at all (and not shown on the graph it should be). It is negatively skewed and leptokurtic. Note though that because Very feminine has no cases and is not shown, the population data would probably be even more skewed than this sample indicates. It is probably leptokurtic.

Non-normal distribution:
Use non-parametric descriptive statistics

Min. & Max.

Range = Max. - Min.

Percentiles

QuartilesQ1

Mdn (Q2)

Q3

IQR (Q3-Q1)

Add slide showing boxplot from p.84 Bryman & Duncan (1997)

Effects of skew on measures of central tendency

+vely skewed distributions
mode < median < meansymmetrical (normal) distributions
mean = median = mode-vely skewed distributions
mean < median < mode

The more skewed a distribution is, the more important it is to use the median tends as a measure of central tendency

Effects of skew on measures of central tendency

Image source: Unknown

Transformations

Converts data using various formulae to achieve normality and allow more powerful tests

Loses original metric

Complicates interpretation

If a survey question produces a floor effect, where will the mean, median and mode lie in relation to one another?

Review questions

Would the mean # of cars owned in Australia to exceed the median?

Review questions

Would the mean score on an easy test exceed the median performance?

Review questions

Graphical
techniques

Image source: http://www.flickr.com/photos/pagedooley/2121472112/

Image source: http://www.flickr.com/photos/pagedooley/2121472112/By Kevin Dooley http://www.flickr.com/photos/pagedooley/License: CC-by-A 2.0 http://creativecommons.org/licenses/by/2.0/deed.en

Visualisation

Visualization is any technique
for creating images, diagrams, or animations to communicate a message. - Wikipedia

Image source: http://en.wikipedia.org/wiki/File:FAE_visualization.jpg

Image source: http://en.wikipedia.org/wiki/File:FAE_visualization.jpgLicense: Public domain

Science is beautiful
(Nature Video)


(Youtube 5:30 mins)

Image source:http://commons.wikimedia.org/wiki/File:Parodyfilm.png

http://www.youtube.com/watch?v=dvM4JPGsmVw

Image source:http://commons.wikimedia.org/wiki/File:Parodyfilm.pngImage author: FRacco, http://commons.wikimedia.org/wiki/User:FRaccoImage license: Creative Commons Attribution 3.0 unported, http://creativecommons.org/licenses/by-sa/3.0/deed.en

Is Pivot a turning point for web exploration?
(Gary Flake)


(TED talk - 6 min.)

Image source:http://commons.wikimedia.org/wiki/File:Parodyfilm.png

Image source:http://commons.wikimedia.org/wiki/File:Parodyfilm.pngImage author: FRacco, http://commons.wikimedia.org/wiki/User:FRaccoImage license: Creative Commons Attribution 3.0 unported, http://creativecommons.org/licenses/by-sa/3.0/deed.en

Principles of graphing

Clear purpose

Maximise clarity

Minimise clutter

Allow visual comparison

Graphs
(Edward Tufte)

Visualise data

Reveal data Describe

Explore

Tabulate

Decorate

Communicate complex ideas with clarity, precision, and efficiency

Graphing steps

Identify purpose of the graph (make large amounts of data coherent; present many #s in small space; encourage the eye to make comparisons)

Select type of graph to use

Draw and modify graph to be clear, non-distorting, and well-labelled (maximise clarity, minimise clarity; show the data; avoid distortion; reveal data at several levels/layers)

Software for
data visualisation (graphing)

Statistical packages

e.g., SPSS Graphs or via Analyses

Spreadsheet packages

e.g., MS Excel

Word-processors

e.g., MS Word Insert Object Micrograph Graph Chart

Clevelands hierarchy

Image source:http://www.processtrends.com/TOC_data_visualization.htm

Image source:http://www.processtrends.com/TOC_data_visualization.htm Cleveland, William S., Elements of Graphing Data, 1985License: Unknown

Cleveland (1984) conducted experiments to measure people's accuracy in interpreting graphs, with findings as follows (Robbins): Position along a common scale Position along non aligned scales Length Angle-slope Area Volume Color hue - saturation - density

Clevelands hierarchy

Image source:https://priceonomics.com/how-william-cleveland-turned-data-visualization/

Image source:https://priceonomics.com/how-william-cleveland-turned-data-visualization/Cleveland, William S., Elements of Graphing Data, 1985License: Unknown

Cleveland (1984) conducted experiments to measure people's accuracy in interpreting graphs, with findings as follows (Robbins): Position along a common scale Position along non aligned scales Length Angle-slope Area Volume Color hue - saturation - density

Univariate graphs

Bar graph

Pie chart

Histogram

Stem & leaf plot

Data plot /
Error bar

Box plot

Non-parametric
i.e., nominal or ordinal

}

}

Parametric
i.e., normally distributed interval or ratio

Non-normal parametric data can be recoded and treated as nominal or ordinal data.

Bar chart (Bar graph)

Allows comparison of heights of bars

X-axis: Collapse if too many categories

Y-axis: Count/Frequency or % - truncation exaggerates differences

Can add data labels (data values for each bar)

Note
truncated
Y-axis

Use a bar chart instead

Hard to readDifficult to showSmall values

Small differences

Rotation of chart and
position of slices
influences perception

Pie chart

Image source: Unknown

Pie chart Use bar chart instead

Image source: https://priceonomics.com/how-william-cleveland-turned-data-visualization/

Image source: https://priceonomics.com/how-william-cleveland-turned-data-visualization/

Histogram

For continuous data (Likert?, Ratio)

X-axis needs a happy medium for # of categories

Y-axis matters (can exaggerate)

Image source: Author

Histogram of male & female heights

Wild & Seber (2000)

Image source: Wild, C. J., & Seber, G. A. F. (2000). Chance encounters: A first course in data analysis and inference. New York: Wiley.

Image source: Wild, C. J., & Seber, G. A. F. (2000). Chance encounters: A first course in data analysis and inference. New York: Wiley.DV = height (ratio)IV = Gender (categorical)

Stem & leaf plot

Use for ordinal, interval and ratio data (if rounded)

May look confusing to unfamiliar reader

Image source: Unknown.A bit of a plug and plea for stem & leaf plots they are underused. They are powerful because they are:Efficient e.g., they contain all the data succinctly others could use the data in a stem & leaf plot to do further analysis

Visual and mathematical: As well as containing all the data, the stem & leaf plot presents a powerful, recognizable visual of the data, akin to a bar graph. Turning a stem & leaf plot 90 degrees counter-clockiwse is recommend this makes the visual display more conventional and is easy to recognise, and the numbers are are less obvious, hence emphasizing the visual histogram shape.

Contains actual data

Collapses tails

Underused alternative to histogram

Stem & leaf plot

Frequency Stem & Leaf 7.00 1 . & 192.00 1 . 22223333333 541.00 1 . 444444444444444455555555555555 610.00 1 . 6666666666666677777777777777777777 849.00 1 . 88888888888888888888888888899999999999999999999 614.00 2 . 0000000000000000111111111111111111 602.00 2 . 222222222222222233333333333333333 447.00 2 . 4444444444444455555555555 291.00 2 . 66666666677777777 240.00 2 . 88888889999999 167.00 3 . 000001111 146.00 3 . 22223333 153.00 3 . 44445555 118.00 3 . 666777 99.00 3 . 888999 106.00 4 . 000111 54.00 4 . 222 339.00 Extremes (>=43)

Image source: Unknown

Box plot
(Box & whisker)

Useful for interval and ratio data

Represents min., max, median, quartiles, & outliers

Image source: Unknown

Alternative to histogram

Useful for screening

Useful for comparing variables

Can get messy - too much info

Confusing to unfamiliar reader

Box plot (Box & whisker)

Image source: Author

Data plot & error bar

Data plot

Error bar

Image source: Unknown.This is a univariate precursor to a scatterplot (a plot of a ratio by ratio variable).It works if there is a small amount of data; otherwise use a histogram to indicate the frequency within equal interval ranges.From: http://www.physics.csbsju.edu/stats/display.distribution.html

Image source: Unknown.Karl Pearson in his 1893 letter to Nature suggested that the moments about the mean could be used to measure the deviations of empirical distributions from the normal distribution Moments around the mean:http://www.visualstatistics.net/Visual%20Statistics%20Multimedia/normalization.htm

Image source: James Neill, 2007, Creative Commons Attribution 2.5 Australia.Histogram: At what age do you think you will die?There is an outlier near zero which is minimising the positive skew; the data is also quite strongly leptokurtic.

Alternative to histogram

Implies continuity e.g., time

Can show multiple lines

Line graph

Image source: Author

Graphical integrity

(part of academic integrity)

Image source: Unknown.

Graphical integrity

"Like good writing, good graphical displays of data communicate ideas with clarity, precision, and efficiency.Like poor writing, bad graphical displays distort or obscure the data, make it harder to understand or compare, or otherwise thwart the communicative effect which the graph should convey."

Michael Friendly Gallery of Data Visualisation

Tufte, Edward R., The Visual Display of Quantitative Information, 1983

Tuftes graphical integrity

Some lapses intentional, some not

Lie Factor = size of effect in graph size of effect in data

Misleading uses of area

Misleading uses of perspective

Leaving out important context

Lack of taste and aesthetics

Tufte, Edward R., The Visual Display of Quantitative Information, 1983

Review exercise:
Fill in the cells in this table

LevelPropertiesExamplesDescriptive StatisticsGraphs

Nominal
/CategoricalOrdinal / RankIntervalRatio

Answers: http://goo.gl/Ln9e1

References

Chambers, J., Cleveland, B., Kleiner, B., & Tukey, P. (1983). Graphical methods for data analysis. Boston, MA: Duxbury Press.

Cleveland, W. S. (1985). The elements of graphing data. Monterey, CA: Wadsworth.

Jones, G. E. (2006). How to lie with charts. Santa Monica, CA: LaPuerta.

Tufte, E. R. (1983). The visual display of quantitative information. Cheshire, CT: Graphics Press.

Tufte. E. R. (2001). Visualizing quantitative data. Cheshire, CT: Graphics Press.

Tukey J. (1977). Exploratory data analysis. Addison-Wesley.

Wild, C. J., & Seber, G. A. F. (2000). Chance encounters: A first course in data analysis and inference. New York: Wiley.

Open Office Impress

This presentation was made using Open Office Impress.

Free and open source software.

http://www.openoffice.org/product/impress.html

Column 1Column 2Column 3

Row 19.13.24.54

Row 22.48.89.65

Row 33.11.53.7

Row 44.39.026.2

Click to edit the title text format

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level

Click to edit the title text format

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level

Click to edit the title text format

Click to edit the outline text formatSecond Outline LevelThird Outline LevelFourth Outline LevelFifth Outline LevelSixth Outline LevelSeventh Outline LevelEighth Outline LevelNinth Outline Level