basics of analytical graph theory

19
1 Analytical Graphing lets start with the best graph ever made “Probably the best statistical graphic ever drawn, this map by Charles Joseph Minard portrays the losses suffered by Napoleon's army in the Russian campaign of 1812. Beginning at the Polish-Russian border, the thick band shows the size of the army at each position. The path of Napoleon's retreat from Moscow in the bitterly cold winter is depicted by the dark lower band, which is tied to temperature and time scales.” “The graph illustrates an amazing point — how an army of 400,000 can dwindle to 10,000 without losing a single major battle.”

Upload: others

Post on 12-Dec-2021

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Basics of analytical graph theory

1

Analytical Graphing

lets start with the best graph ever made

“Probably the best statistical graphic ever drawn, this map by Charles Joseph Minard portrays the losses

suffered by Napoleon's army in the Russian campaign of 1812. Beginning at the Polish-Russian border,

the thick band shows the size of the army at each position. The path of Napoleon's retreat from Moscow

in the bitterly cold winter is depicted by the dark lower band, which is tied to temperature and time

scales.”

“The graph illustrates an amazing point — how an army of 400,000 can dwindle to 10,000 without

losing a single major battle.”

Page 2: Basics of analytical graph theory

2

When is a graph appropriate?

• Always for data exploration

• Often for data analysis and to develop

predictions (models) and experimental

designs

• Sometimes for presentations

• Less often for publications

Data Exploration

• Is not snooping in the pejorative sense. Exploration is a necessary and desired operation for:

– Checking data for unusual values

– Making sure the data meet the assumptions of the chosen form of analysis

• Eg – normality, homogeneity of variances, linearity (in regression approaches)

– deciding (sometimes) what sort of analysis to do. This hopefully will have been done prior to initiating a study

– To look for patterns that may not be expected or apparent – this is indeed snooping but it is an essential part of hypothesis formation

Page 3: Basics of analytical graph theory

3

• Checking data for unusual values

• Making sure the data meet the assumptions of the chosen form of analysis

See ourworld - pop_86

Data Exploration

0 50 100 150

0 50 100 150

Population of countries (1986)

0

10

20

30

40

50

Count

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

Pro

portio

n p

er B

ar

1.0 10.0 100.0 0

5

10

15

20

Count

0.0

0.1

0.2

0.3

Pro

portio

n p

er B

ar

1.0 10.0 100.0

Population of countries (1986)

Determining distributions and outliers

Will a transformation help??

Page 4: Basics of analytical graph theory

4

Data Exploration

• Is not snooping in the pejorative sense.

Exploration is a necessary and desired

operation for:

– Checking data for unusual values

– Making sure the data meet the assumptions of

the chosen form of analysis

• Eg – normality, homogeneity of variances, linearity

(in regression approaches)

0 10 20 30 40 50

BIRTH_82

0

10

20

30

DE

AT

H_

82

The relationship between birth and death rates (ourworld)

Is it linear, or is there perhaps a more appropriate model

Page 5: Basics of analytical graph theory

5

0 10 20 30 40 50

BIRTH_82

0

10

20

30

DE

AT

H_

82

Clearly not linear using LOESS procedure (locally weighted

scatterplot smoothing): a non-parametric regression method that

combines multiple regression models in a k-nearest-neighbor-based

meta-mode

When is a graph appropriate?

• Often for data analysis (e.g.)

– To understand the nature of interaction terms

(more later)

– To understand the power of a test. Say we

wanted to determine sample size for an

experiment where we thought the response

would be around 10 (alternate Hypothesis =0)

the standard deviation about 8 and we were

willing to relax alpha (from 0.05 to 0.20)

Page 6: Basics of analytical graph theory

6

Pop. Mean = 10

Alternative = 0

SD = 8

Alpha=.05, .20

Power Curve (Alpha = 0.050)

0 5 10 15 20 25 30 35

Sample Size (per cell)

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Po

we

r

Power Curve (Alpha = 0.200)

0 5 10 15 20

Sample Size (per cell)

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Po

we

r

For example the effect of relaxing alpha on power

When is a graph appropriate?

• Sometimes for presentations

– Idea is to communicate information quickly

– Be sure you know why you are presenting the graph (is

it to convey stats or some other information (we will

talk about this more later)

– Graphs should be simple and not contain too much

information – never have a graph that is not

interpretable

• So many factors involved that no one could figure it out, or

worse

Page 7: Basics of analytical graph theory

7

I know you can’t really see this but ….”

P OP _ 1 9 8 3

PO

P_1

983

P OP _ 1 9 8 6P OP _ 1 9 9 0P OP _ 2 0 2 0B I RT H_ 8 2 B I RT H_ RT DE A T H_ 8 2DE A T H_ RTB A B Y M T 8 2B A B Y M ORTL I F E _ E X P GNP _ 8 2 GNP _ 8 6 GDP _ CA P L OG_ GDP E DUC_ 8 4 E DUC HE A L T H8 4 HE A L T H

PO

P_1983

PO

P_1

986

PO

P_1986

PO

P_1

990

PO

P_1990

PO

P_2

020

PO

P_2020

BIR

TH_8

2

BIR

TH_82

BIR

TH_R

T BIR

TH_R

T

DE

ATH

_82 D

EA

TH_82

DE

ATH

_RT D

EA

TH_R

T

BA

BY

MT8

2 BA

BY

MT82

BA

BY

MO

RT B

AB

YM

OR

T

LIFE

_EX

P

LIFE_E

XP

GN

P_8

2

GN

P_82

GN

P_8

6

GN

P_86

GD

P_C

AP G

DP

_CA

P

LOG

_GD

P

LOG

_GD

P

ED

UC

_84 E

DU

C_84

ED

UC E

DU

C

HE

ALT

H84

HE

ALTH

84

P OP _ 1 9 8 3

HE

ALT

H

P OP _ 1 9 8 6P OP _ 1 9 9 0P OP _ 2 0 2 0B I RT H_ 8 2 B I RT H_ RT DE A T H_ 8 2DE A T H_ RTB A B Y M T 8 2B A B Y M ORTL I F E _ E X P GNP _ 8 2 GNP _ 8 6 GDP _ CA P L OG_ GDP E DUC_ 8 4 E DUC HE A L T H8 4 HE A L T H

HE

ALTH

These are usually presented to demonstrate how much work the

researcher has done – really conveys that he or she has not

adequately prepared the presentation

When is a graph appropriate?

• Less often for publications – Idea is to communicate information that is too complex to leave in tables

or text

– They typically depict rather than present information (you have to read across to axes to get numbers). Hence if precise bits of information are important to the argument being made – use tables.

– If a graph is presented it must be important to the argument being made in the text (no fluff graphs)

– Information cannot be presented twice (eg table and figure, text and figure)

– If a graph is presented it must be interpretable

• You should be able to understand the purpose and content of the figure directly from the legend.

Page 8: Basics of analytical graph theory

8

Basics of analytical graph theory

• Graph types imply a basis of logic and are not always interchangeable

• Even interchangeable graph types are not always equivalent (some are just non-informative)

• Be very clear about what you are trying to convey: models, stats or data structure

• Graph construction (axes, scales etc) may obscure or make clear the points you are trying to make

• Graph trickery is usually just that – and typically subtracts from the depiction

Graph types imply a basis of logic

and are not always interchangeable

• Summary Charts

• Density Charts

• Scatterplots, quantile plots and probability

plots

Page 9: Basics of analytical graph theory

9

Summary Charts

There are a series of general graphical displays useful for

characterizing the relationship between independent variables

(usually categorical) and summary statistics of dependent

variables (usually continuous).

An example would be a bar graph of the relationship between

education and income (see survey2 data). Some types of

summary charts:

Examples of continuous and

categorical variables

Categorical Continuous

Gender (male, female) Hormone level

Nationality (French, Italian) Location (Latitude, Longitude)

Species (Human, Chimp) Phylogenetic distance

Color (red, green, blue) Color (wavelength)

Age Group (Young, Old) Age (years, days)

Height Group (Short, Medium,

Tall)

Height (cm, inches)

Weight Group (Thin, Obese) Weight (grams, pounds)

Speed (Fast, Slow) Speed (cm/sec)

Temperature (Cold, Warm) Temperature (degrees C)

Page 10: Basics of analytical graph theory

10

no gra

d hs

hs gra

d

som

e colle

ge

colle

ge gra

d

EDUC

0

10

20

30

40

50

60

70IN

CO

ME

no gra

d hs

hs gra

d

som

e colle

ge

colle

ge gra

d

EDUC

0

10

20

30

40

50

60

70

INC

OM

E

no gra

d hs

hs gra

d

som

e colle

ge

colle

ge gra

d

EDUC

0

10

20

30

40

50

60

70

INC

OM

E

no gra

d hs

hs gra

d

som

e colle

ge

colle

ge gra

d

EDUC

0

10

20

30

40

50

60

70

INC

OM

E

no gra

d hs

hs gra

d

som

e colle

ge

colle

ge gra

d

EDUC

0

10

20

30

40

50

60

70

INC

OM

E

no grad hs

hs grad

some college

college grad

Bar Dot Line

Profile Pyramid Pie

no gra

d hs

hs gra

d

som

e colle

ge

colle

ge gra

d

EDUC

0

10

20

30

40

50

60

70

INC

OM

E

MaleFemale

SEX

Female

no gra

d hs

hs gra

d

som

e colle

ge

colle

ge gra

d

EDUC

Male

SEX

no gra

d hs

hs gra

d

som

e colle

ge

colle

ge gra

d

EDUC

0

10

20

30

40

50

60

70

INC

OM

E

0

10

20

30

40

50

60

70

INC

OM

E

no gra

d hs

hs gra

d

som

e colle

ge

colle

ge gra

d

EDUC

0

10

20

30

40

50

60

70

INC

OM

E

MaleFemale

SEX

Which conveys the information most clearly – how about the

comparisons of interest

Page 11: Basics of analytical graph theory

11

Density Charts

• The density of a sample is the relative concentration of data points in intervals across the range of the distribution. A histogram is one way to display the density of a quantitative variable; box plots, dot or symmetric dot density, frequency polygons, fuzzygrams, jitter plots, density stripes, and histograms with data-driven bar widths are others.

Histogram

Length (mm)

Page 12: Basics of analytical graph theory

12

median hinge hinge

25% 25% 25% 25%

Y

outliers

Statistical Range

Features of a BOXPLOT

mean

confidence interval

Smallest 50%

Rather than comparing sample

values to the normal distribution

(mean, standard deviation, and so

on), box plots show robust (what

does this mean) statistics (median,

quartiles, and so on).

Raw Data Plots:e.g. Scatterplots,

• Scatterplots are probably the most common form of graphical display. The key feature of scatterplots is that raw data are plotted (in contrast to summary data as in summary charts). Regression lines with confidence bands or smoothers (e.g. linear, non-linear) can be added to help explain relationships among variables. An example is the relationship between mussel height, and length and mussel height and mussel mass.

• How to estimate length and mass of mussels?

Page 13: Basics of analytical graph theory

13

Height

Length

Non-linear and linear smoothing

Each point is

a mussel

Page 14: Basics of analytical graph theory

14

Scatterplots, quantile plots and

probability plots • Quantile plots and probability plots are useful for

studying the distribution of a variable.

• Quantile Plots produces quantile plots, or Q plots. Unlike

probability plots, which compare a sample to a theoretical

probability distribution, a quantile plot compares a sample

to its own quantiles (a one-sample plot) or to another

sample (a two-sample, or Q-Q, plot). The quantile of a

sample is the data point corresponding to a given fraction

of the data.

See ourworld (pop_1986 )

0 50 100 150

POP_1986

0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Fra

ctio

n o

f D

ata

Features of a Quantile Plot

Distribution of data

Distribution of quantiles

(should be uniform – but

is subject to sample size)

86% of countries

had populations

less or equal to 50

million people

Page 15: Basics of analytical graph theory

15

Scatterplots, quantile plots and

probability plots • A Probability Plot plots the values of a variable against

the corresponding percentage points of a theoretical

distribution--normal, chi-square, t, F, uniform, binomial,

logistic, exponential, gamma, Weibull, or Studentized

range. Graphs like this are called probability plots, or P

plots. You can also plot the expected values of one variable

against those of another (P-P plot). These graphs are

very important for determining if data are in need of

transformation.

See ourworld (pop_1986 )

Features of a Probability Plot No transformation Log (base 10) transformation

Page 16: Basics of analytical graph theory

16

Extra slides

Basics of analytical graph theory

• Graph types imply a basis of logic and are not always interchangeable

• Even interchangeable graph types are not always equivalent (some are just non-informative)

• Be very clear about what you are trying to convey: models, stats or data structure

• Graph construction (axes, scales etc) may obscure or make clear the points you are trying to make

• Graph trickery is usually just that – and typically subtracts from the depiction

Page 17: Basics of analytical graph theory

17

The underlying basis of the graph

• There are two general bases for any data graph

that will be presented or published.

– To display data (hopefully in the most efficient way)

– To convey information about statistics associated with

the data

– Both of the above

Although these may not appear to present a conflict

often times there is – here is an example

Error Bars

• Be very Careful - error bars convey meaning - at least two sorts

– Estimate of variability for subjects in that category, irrespective of strata or statistical assumptions

• Of use for showing spread in sampled data

• Of no use for conveying inferential statistics

– Estimate of variability for subjects in that category, with respect to strata and statistical assumptions

• Of no use for showing spread in sampled data

• Of use for conveying inferential statistics

See typing

Page 18: Basics of analytical graph theory

18

How and why are these two graphs different?

Least Squares Means

electric

plain old

word p

roce

ss

EQUIPMNT

38

48

58

68

78

SP

EE

Delectr

ic

plain old

word p

roce

ss

EQUIPMNT

40

50

60

70

80

SP

EE

D

with respect to strata and

statistical assumptions

without respect to strata and

statistical assumptions

Basics of analytical graph theory

• Graph types imply a basis of logic and are not always interchangeable

• Even interchangeable graph types are not always equivalent (some are just non-informative)

• Be very clear about what you are trying to convey: models, stats or data structure

• Graph construction (axes, scales etc) may obscure or make clear the points you are trying to make

• Graph trickery is usually just that – and typically subtracts from the depiction

Page 19: Basics of analytical graph theory

19

no gra

d hs

hs gra

d

som

e colle

ge

colle

ge gra

d

EDUC

0

10

20

30

40

50

INC

OM

E

MaleFemale

SEX

Which is best?