seeing is believing: telling stories with statistics – in pictures

Post on 23-Feb-2016

51 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

SEEING IS BELIEVING: Telling stories with statistics – in pictures. We’re failing. Do you see the same thing here?. This is your brain on statistics. The total sample is (roughly) evenly divided by gender. - PowerPoint PPT Presentation

TRANSCRIPT

SEEING IS BELIEVING: Telling stories with statistics – in pictures

We’re failing

Do you see the same thing here?

Gender Male Female

Military -------------- ---------

No 943 1,222

Yes 227 72

This is your brain on statistics

Gender Male Female

Military -------------- ---------

No 943 1,222

Yes 227 72

The total sample is (roughly) evenly divided by gender.Subtracting 72 from the 150 one would expect gives a

value of about 80, which squared is 6,400.It is already obvious this is significant.

Just for closure ..

e o (e-o) ^2 ((e-o)^2)/e157 72 7225 46.01910828142 227 7225 50.88028169

1028 943 7225 7.0282101171137 1222 7225 6.354441513

110.2820416

Seeing is a learned skill

Statisticians may see things in a picture others don’t

My points

(surprisingly, I do have some)

Data Visualization

Graphics do not necessarily stand alone

Data visualization is all around us.

Visual representation in one context is often misapplied to another.

Atomic numbers on your socks?

Data visualization needs to ADD information

Basic Assumptions

• Our audience needs to be taught to read visual data just as we read numeric data, and we need to learn to have some discussion beyond the choices of line graphs vs. pie charts

YOU NEED TO LEARN TO WRITE PICTURES

You learned to read numbers

Or, to be more specific, you need to explain to others what you see in pictures

?

Question + Data > Picture = Story

Bad visualization for one question can be good for another

• Who will win the election?

• Which regions support the Democrats?

Poll dataset did not include Hawaii or Alaska

DATA VISUALIZATION BY EXAMPLE

AN EXAMPLE OF PROGRAM EVALUATION

The government is smarter than you think

(No, I’m serious)

Was the program implemented as planned?

Was the program implemented as planned?

(This was done in JMP)

Did the program work?

GOPTIONS HBY = 2 ;PROC GPLOT

DATA=wussexample UNIFORM; PLOT z_total_post * z_total_pre / VREF=0 ;BY group;

EQUATIONS IN THE SAS LOG FOR THE STATISTICIAN IN YOU

NOTE: Regression equation : z_total_post = 0.13379 + 0.776552*z_total_pre.NOTE: The above message was for the following BY group: group=CONTROLNOTE: Regression equation : z_total_post = 1.233616 + 0.578418*z_total_pre.NOTE: The above message was for the following BY group: group=EXPERIMENTAL

Is the intervention successful under all conditions?

TRAINING WAS ADMINISTERED TO FOUR COHORTS

Admittedly, we did not train people while flying on a trapeze

Creating the interaction graph

First, in the RESULTS window, type

sgedit on

Creating the interaction graphFirst, in the RESULTS window, type

sgedit on

Ods listing sge = on ;Ods graphics on ;proc glm data = plots ; class TestType cohort ;

model z_total = TestType cohort TestType*cohort ;where group = "EXPERIMENTAL" ;

Click on the sge plot to edit it

ODDLY, THE MOST TIME-CONSUMING PART OF THIS IS MAKING THE LINES THICKER

Of course, that is kind of like being the smaller midget

Using SGEDIT to, well, edit

1. Double-click on the .sge file in the RESULTS window

2. Right-click in the plot area & select PLOT PROPERTIES

3. Select desired line thickness

THANKS FOR ASKING!

Yes, the TestType*Cohort*Group interaction (F=5.84, p < .0001) AND the TestType*Group interaction (F=22.92, p < 0001) in the other repeated measures ANOVA were significant.

LOOKING AT THE LITTLE PICTURE

(Especially true for small samples)

Graphs sometimes providebetter information than

numbers

or…

How SAS ODS GRAPHICS

can improve your life

Are these test related?

R=.22

Look!

Another example

• Years of Education as predictor of gain score

• R-square = .46 • Correlation = .68)• P <.01.

Now looky here …

Is it a real relationship?

What should we do?

Throw the score out?Keep the score in?Something else?

Ignoring my partner …

Compare your answers with the people next to you

Sometimes outliers are the most interesting part of your study

ODS GRAPHICS ON;

<some procedure>

ODS GRAPHICS OFF;

PROC CORR

One last example on knowing your data

Not just telling a story, having a conversation

PROC FREQ

Custom Map-making

How to plot the largest category in a frequency distribution

1, 2, 3

1. PROC TABULATE -> output dataset2. PROC FORMAT3. Proc GMAP

DATA VISUALIZATION BY EXAMPLE

WHERE IS DEMOCRATIC SUPPORT BASED? DATA VISUALIZATION IN POLITICAL SURVEYS

PROC TABULATE

DATA= in.VOTE2008 OUT=SummaryVOTE2008 ;

CLASS question3 state ;TABLE state, question3* RowPctN ;

WARNING: Some observations were discarded when charting PctN_01. Only first matching observation was used. Use STATISTIC= option for summary statistics.

proc format ;

value vote 50.01 - 100 = "Obama" 0 - 50 = "McCain" ;

PROC GMAP

DATA = SummaryVOTE2008 map = maps.us ;ID state ;

CHORO PctN_01 / discrete LEGEND=LEGEND1 ;

ID statement uses the _map_geometry_ variable that was merged in from the maps.us dataset to identify the location on the map.

PROC GMAP

DATA = SummaryVOTE2008 map = maps.us ;ID state ;

CHORO PctN_01 / discrete LEGEND=LEGEND1 ;Pattern1 c = red ;Pattern2 c = blue ;format PctN_01 vote. ;

PROC GMAP

CHORO PctN_01 / discrete LEGEND=LEGEND1 ;FORMAT PctN_01 vote. ;

CHORO statement uses the first observation and ignores the others.

Does Race Matter?

PROC GMAP

Vote2008 coded 0 = McCain1 =

Obama

Pctmin = Percentage of residents in voter’s district from minority groups

PROC GMAP

DATA = wuss map=maps.us ;ID state ;

area vote2008 / discrete statistic = mean ;block pctmin / discrete statistic = mean ;format pctmin rangep. vote2008 voten. ;

The BLOCK statement charts the pctmin variable. The height of the block will be based on the value of the variable, but the color will be determined using the format specified.

mean minority percentage in districts where Obama voters live is 21% versus 13% for McCain voters

(t= 5.73, p < .0001)

The usefulness of visual data

With one statement, I can change the percentage of minority & re-run the chart

value rangep 0 - 15 = "0 -15%" 15.01 - 100 = "> 15%

%" ;

DATA VISUALIZATION BY EXAMPLE

Decision Trees, ROC & Lift Curves to Predict Military Service

Speaking of easy, interactive, graphics

JMP

libname readin "E:\crimes\readout" ;

libname writeout xport "e\wuss2010\crimes.xpt" ;

proc copy in = readin out =writeout ;

How to get a SAS .xpt file into JMP, Step 1

File > Open

DECISION TREE

• ANALYZE > MODELING > PARTITION• SELECT Y• SELECT X VARIABLES• Click on the SPLIT button

Receiver Operating Characteristic

Click on the red arrow at the top left of the partition window for pull-down options include ROC and Lift curves.

ROC

• Sensitivity is the percent of true positives, for example, the percentage of people you predicted would die who actually died.

• Specificity is the percent of true negatives, for example, the percentage of people you predicted would NOT die who survived.

Comparing models

In JMP, use of training and testing datasets is REALLY easy

EXCLUDE 25% or 50% of the data and then re-run your analyses with the

excluded sample

A statistician is a person who was good at math but didn’t have enough personality to be an accountant ?

It is important that people believe you

And that’s my story

AnnMaria De Mars

The Julia Group2111 7th St #8

Santa Monica, CA 90405ANNMARIA@THEJULIAGROUP.COM

(310) 717 -9089

top related