lies, damn lies and sas to the rescue! - statistical analysis … · 2015-09-27 · lies, damn lies...

167
Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September, 2015

Upload: others

Post on 30-Jun-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Lies, damn lies and ... SAS to the rescue!

Peter L. Flom

Peter Flom Consulting

SESUGSeptember, 2015

Page 2: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Broad outline

1 Introduction2 Descriptive statistics3 Descriptive graphics4 Inferential statistics5 The regression family6 Multivariate statistics

Page 3: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Part I

Introduction

Page 4: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Schedule

8:00 Descriptive statistics8:40 Break8:50 Descriptive graphics9:30 Break9:40 Inferential statistics

10:10 Break10:20 The regression family11:00 Break11:10 Multivariate statistics12:00 End

Page 5: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Introductions of participants and self

Page 6: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

What I plan to do in this course

Give you a fundamental understanding of some basicstatistical methodsGive you a very brief survey of a lot of many moreadvanced methodsHelp you learn to work with statistics and statisticiansGive some SAS code you can give to othersNote that the most important stuff is at the beginning, soask questions!

Page 7: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

What I don’t plan to do in this course

Teach you to be a statisticianTeach you SAS

Page 8: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

What I want from you

AttentionQuestions - after all, it’s not even gradedFeedback - even anonymous is OK, after the course

Page 9: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Part II

Descriptive statistics

Page 10: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Introduction

Outline

1 Introduction

2 Measures of central tendency

3 Measures of spread

4 Other measures

Page 11: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Introduction

Descriptive vs. inferential stats

Descriptive statistics describe a variable or a sample.Inferential statistics let you infer from a sample to apopulation (more later)Descriptive statistics are necessary even when your goal isinference

Page 12: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Introduction

Types of descriptive statistics

For continuous variables descriptive statistics includeMeasures of central tendencyMeasures of dispersion or spreadMeasures of skewnessMeasures of kurtosisOther measures

For categorical variables, mostly we are limited to frequencies.

Page 13: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Measures of central tendency

Outline

1 Introduction

2 Measures of central tendency

3 Measures of spread

4 Other measures

Page 14: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Measures of central tendency

The mean

What it is

Definition

The mean is the ordinary average. Add up the numbers anddivide by the number of numbers.

Or, if you want a formula

x̄ =

n∑i=1

xi

n

where x is the variable and there are n values of the variable.

Page 15: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Measures of central tendency

The mean

What can go wrong

OutliersSkewnessThe clock problemThe rate problemDifferent scales

Page 16: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Measures of central tendency

The mean

Mean salary

proc means data = sashelp . baseba l l maxdec = 2;var sa la ry ;

run ;

Analysis Variable : Salary 1987 Salary in $ ThousandsN Mean Std Dev Minimum Maximum

263 535.93 451.12 67.50 2460.00

Page 17: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Measures of central tendency

The mean

Alternatives

1 The median2 The trimmed mean and Winsorized mean3 The geometric mean4 The harmonic mean

which are the topics of the next few slides

Page 18: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Measures of central tendency

The median

Median salary

The median is simply the value that divides the distribution inhalf - half are lower, half are higher.

ods s e l e c t BasicMeasures ;proc u n i v a r i a t e data = sashelp . baseba l l ;

var sa la ry ;run ;

Basic Statistical MeasuresLocation Variability

Mean 535.9259 Std Deviation 451.11868Median 425.0000 Variance 203508Mode 750.0000 Range 2393

Interquartile Range 560.00000

Page 19: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Measures of central tendency

The median

What can go wrong

Sometimes we want the outliersWhen there are many ties, the median may not becompletely determined.

Page 20: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Measures of central tendency

The trimmed mean and Winsorized mean

What it is

A compromise between the mean and the median.To calculate the trimmed mean, you remove a certainpercentage of the highest and lowest points and then findthe mean of what remains.The Winsorized mean is similar but, rather than deletingthe points, you set them equal to the lowest or highestvalues that are not extreme.

Page 21: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Measures of central tendency

The trimmed mean and Winsorized mean

What can go wrong

If the distribution is skewed, the trimmed mean is not anunbiased estimator for either the mean or median.

Page 22: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Measures of central tendency

The trimmed mean and Winsorized mean

Trimmed and winsorized mean salary

ods s e l e c t TrimmedMeans WinsorizedMeans ;proc u n i v a r i a t e data = sashelp . baseba l l

trimmed = .1 winsor ized = . 1 ;var sa la ry ;

run ;

Trimmed per tail% N

Trimmedmean

SEWinsorizedmean

SE

10.27 27463.89

25.08486.04

25.09

Page 23: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Measures of central tendency

The geometric mean

What it is

Definition

It’s like the mean, except instead of adding the numbers andthen dividing by the count, you multiply the numbers and takethe nth root of the product

or, if you want a formula (n∏

i=1xi

)1/N

Page 24: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Measures of central tendency

The geometric mean

What can go wrong

Doesn’t work when any value is 0 or negative

Page 25: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Measures of central tendency

The geometric mean

When to use it

Useful for combining measures on different scales. E.g.Candidates for college - combine SAT (0 to 1600) and HSGPA (0 to 4)Proportional growth over a series of times

Page 26: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Measures of central tendency

The geometric mean

Geometric mean of college applicants

data co l l ege ;i npu t name $ GPA SAT @@;d a ta l i n es ;J i l l 3.0 1550 Joe 4.0 1500

;data co l l ege ; set co l l ege ;gmean = geomean (GPA, SAT ) ;amean = mean(GPA,SAT ) ;

run ;proc p r i n t data = co l l ege ; run ;

Obs name GPA SAT gmean amean1 Jill 3 1550 68.1909 776.52 Joe 4 1500 77.4597 752.0

Page 27: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Measures of central tendency

The harmonic mean

Harmonic mean of round trip travel

Definition

It is the reciprocal of the arithmetic mean of the reciprocals of aset of numbers.

H = n1

x1+ 1

x2+... 1

xn

Page 28: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Measures of central tendency

The harmonic mean

When to use it

Averaging rates, such as speeds or batting averagesAveraging ratios such as price earning ratios

Page 29: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Measures of central tendency

The harmonic mean

What can go wrong

Like the geometric mean, it doesn’t work with negative numbersor 0’s.

Page 30: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Measures of central tendency

The harmonic mean

SAS code

data speed ;i npu t To From @@;

d a ta l i n es ;50 80 40 70

;data speed ; set speed ;hmean = harmean ( to , from ) ; amean = mean( to , from ) ;t ime = 100 / to + 100 / from ; actualspeed= 200 / t ime ; run ;

proc p r i n t data = speed ; run ;

Obs To From hmean amean time actualspeed1 50 80 61.54 65 3.25 61.542 40 70 50.91 55 3.93 50.91

Page 31: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Measures of central tendency

Exercises

Exercises

1 Name 3 variables for which the mean would not beappropriate

2 For each of those, decide which measure of centraltendency would be appropriate and why?

Page 32: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Measures of spread

Outline

1 Introduction

2 Measures of central tendency

3 Measures of spread

4 Other measures

Page 33: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Measures of spread

Standard deviation

What it is

Definition

The standard deviation is the square root of the averagesquared difference between the mean and the individual values.

Or

s =

√n∑

i=1(xi−x̄)2

n−1

Page 34: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Measures of spread

Standard deviation

What can go wrong

If the mean isn’t a good measure of central tendency, the sdisn’t a good measure of spread.

Page 35: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Measures of spread

Standard deviation

SD of salary

proc means data = sashelp . baseba l l ;var sa la ry ;

run ;

Basic Statistical MeasuresLocation Variability

Mean 535.9259 Std Deviation 451.11868Median 425.0000 Variance 203508Mode 750.0000 Range 2393

nterquartile Range 560.00000

Page 36: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Measures of spread

Standard deviation

Alternatives

Median absolute deviation (MAD)Range and interquartile rangeMore quantilesGini’s mean differenceVariations on MAD(also see graphics, later)

Page 37: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Measures of spread

MAD

What it is

Definition

The median absolute deviation is what it says:1 Find the median2 Find each value’s deviation from the median3 Take absolute values4 Find the median of those

Page 38: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Measures of spread

MAD

What can go wrong?

Not very efficientNot appropriate with asymmetric distributions

Page 39: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Measures of spread

Range and interquartile range

What it is

The range is just the smallest to largest valueThe IQR is the 1st quartile to the 3rd quartile

Page 40: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Measures of spread

Range and interquartile range

What can go wrong

The range is strongly affected by even a single outlierThe IQR is not affected at all by outliers

Page 41: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Measures of spread

Range and interquartile range

SAS code

ods s e l e c t RobustScale ;proc u n i v a r i a t e data = sashelp . baseba l l RobustScale ;

var sa la ry ;run ;

Robust Measures of ScaleMeasure Value Estimate of SigmaInterquartile Range 560.0000 415.1285Gini’s Mean Difference 468.0400 414.7897MAD 275.0000 407.7150Sn 381.6320 382.9424Qn 327.7303 325.9949

Page 42: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Measures of spread

Range and interquartile range

Exercises

List 3 variables that would not be well analyzed by the SDand suggest alternatives.

Page 43: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Other measures

Outline

1 Introduction

2 Measures of central tendency

3 Measures of spread

4 Other measures

Page 44: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Other measures

Skewness

What it is

Definition

Skewness is the asymmetry of the distribution.

1n

n∑i=1

(xi−x̄)3

[ 1n−1

n∑i=1

(xi−x̄)2]3/2

Skewness can take on any number, negative means left skew,positive means right skew, 0 means symmetrical.

Page 45: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Other measures

Skewness

Alternatives and problems

One good way to look at skewness is with density plots (to becovered later)What can go wrong

A single outlier can generate skewness.Again, if the mean is not an appropriate measure of centraltendency, this is not an appropriate measure of skew

Page 46: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Other measures

Skewness

Skewness of salary

ods s e l e c t Moments ;proc u n i v a r i a t e data = sashelp . baseba l l ;

var sa la ry ;run ;

MomentsN 263 Sum Weights 263Mean 535.93 Sum Observations 140948.507Std Deviation 451.12 Variance 203508.064Skewness 1.59 Kurtosis 3.05896473Coeff Variation 84.18 Std Error Mean 27.82

Page 47: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Other measures

Kurtosis

What it is

It is a measure of the peakedness of the distribution. However,it is very nonintuitive and hard to interpret. It can be used toindicate a non-normal distribution, but its use beyond that istricky (and confuses even experienced people).Better to use graphical measures such as density plots (to becovered later)

Page 48: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Other measures

Exercises and further reading

Exercises

List 3 variables that are markedly skewed, either to the right orleft

Page 49: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Other measures

Exercises and further reading

Discussion

Page 51: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Part III

Descriptive graphics

Page 52: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Introduction

Outline

5 Introduction

6 Univariate graphics

7 Bivariate graphics

8 Trivariate and multivariate graphics

9 Time series data

10 Exercises and further reading

Page 53: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Introduction

General thoughts on statistical graphics - 1

A good graph willShow the dataInduce the viewer to think about the substance of the dataAvoid distorting the dataPresent many numbers in a small space

Page 54: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Introduction

General thoughts on statistical graphics - 2

Make large data sets coherentEncourage the eye to look at different parts of the dataReveal several levels of detailServe a clear purpose

Page 55: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Introduction

General thoughts on statistical graphics - 3

But a good graphic will notBe a substitute for a tableBe a substitute for a model

Page 56: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Introduction

General thoughts on statistical graphics - 4

Use of color, shape and so onConsider the audienceNot all chart junk is bad

Page 57: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Univariate graphics

Outline

5 Introduction

6 Univariate graphics

7 Bivariate graphics

8 Trivariate and multivariate graphics

9 Time series data

10 Exercises and further reading

Page 58: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Univariate graphics

Univariate discrete data

Introduction

This is usually counts or proportions of something, e.g. numberof Democrats, Republicans and others. Here:

Pie charts should be avoidedDot charts are often goodA table may be even betterLog scales are sometimes helpful

Page 59: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Univariate graphics

Univariate discrete data

Pie chart with 51 categories - a mess

Geographical_Area Alabama Alaska Arizona ArkansasCalifornia Colorado Connecticut DelawareDistrict of Columbia Florida Georgia HawaiiIdaho Illinois Indiana IowaKansas Kentucky Louisiana MaineMaryland Massachusetts Michigan MinnesotaMississippi Missouri Montana NebraskaNevada New Hampshire New Jersey New MexicoNew York North Carolina North Dakota OhioOklahoma Oregon Pennsylvania Puerto RicoRhode Island South Carolina South Dakota TennesseeTexas Utah Vermont VirginiaWashington West Virginia Wisconsin Wyoming

Page 60: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Univariate graphics

Univariate discrete data

Dot chart with 51 categories

California

Texas

New York

Florida

Illinois

Pennsylvania

Ohio

Michigan

Georgia

North Carolina

New Jersey

Virginia

Washington

Arizona

Massachusetts

Indiana

Tennessee

Missouri

Maryland

Wisconsin

Minnesota

Colorado

Alabama

South Carolina

Louisiana

Kentucky

Puerto Rico

Oregon

Oklahoma

Connecticut

Iowa

Mississippi

Arkansas

Kansas

Utah

Nevada

New Mexico

West Virginia

Nebraska

Idaho

Maine

New Hampshire

Hawaii

Rhode Island

Montana

Delaware

South Dakota

Alaska

North Dakota

Vermont

District of Columbia

Wyoming

Sta

te

1 5 10 15 20 25 30 35

Population (millions, log scale)

MidwestNortheastSouthWestregion

Page 61: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Univariate graphics

Univariate discrete data

Pie chart with 9 categories - a table or dot chart

East North Central Division46395654

East South Central Division18084651

Middle Atlantic Division40621237

Mountain Division21784507

New England Division14303542

Pacific Division49070441

South Atlantic Division58398377

West North Central Division20165794

West South Central Division35235521

Page 62: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Univariate graphics

Univariate discrete data

Pie chart with 4 categories - a table or text

Geographical_Area Midwest Region Northeast Region South Region West Region

Page 63: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Univariate graphics

Univariate discrete data

SAS code

The SAS code for the pie charts isn’t shown because youshouldn’t use it. That for the dot plot is complex, I can e-mail itto you if you want.

Page 64: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Univariate graphics

Univariate continuous data

Introduction

Histograms can be misleading, at least if they areunadornedDensity plots are often better, and several smooths can beused.Box plots provide a useful summaryWhen N is small, strip charts can be useful

Page 65: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Univariate graphics

Univariate continuous data

Density plot - example

13:27 Thursday, August 20, 2015 113:27 Thursday, August 20, 2015 1

-1000 0 1000 2000 3000

1987 Salary in $ Thousands

0.0000

0.0003

0.0005

0.0008

0.0010

0.0013

De

nsity

Kernel, c=2Kernel, c=0.5Kernel

Density plot, salaries

Page 66: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Univariate graphics

Univariate continuous data

Density plot - SAS code

proc sgp lo t data = sashelp . baseba l l ;dens i t y sa la ry / type = kerne l ;dens i t y sa la ry / type = kerne l ( c = . 5 )

c u r v e l a b e l a t t r s = ( co lo r = red ) ;dens i t y sa la ry / type = kerne l ( c = 2)

c u r v e l a b e l a t t r s = ( co l o r = green ) ;xax is min = 0;

run ;

Page 67: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Univariate graphics

Univariate continuous data

Box plot - example

21:40 Friday, August 7, 2015 121:40 Friday, August 7, 2015 1

0

500

1000

1500

2000

2500

19

87

Sa

lary

in $

Th

ou

san

ds

Density plot, salaries

proc sgp lo t data =sashelp . baseba l l ;vbox sa la ry ;

run ;

Page 68: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Univariate graphics

Univariate continuous data

Box plot - example, log scale

250

500

750

1250

1750

2500

1987

Sal

ary

in $

Tho

usan

ds

Salary by division

proc sgp lo t data =sashelp . baseba l l ;vbox sa la ry ;yax is type = loglogbase = 10 l o g s t y l e = l i n e a r ;

run ;

Page 69: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Univariate graphics

Univariate continuous data

Strip plot - example

0.90 0.95 1.00 1.05 1.10

jitter

0

500

1000

1500

2000

2500

Tho

usan

ds19

87 S

alar

y in

$

Salary strip plot

Page 70: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Univariate graphics

Univariate continuous data

Strip plot - SAS code

data s t r i p ;se t sashelp . baseba l l ;j i t t e r = 1∗ ( ranun i (1234) / 5) + . 9 ;

run ;t i t l e " Salary s t r i p p l o t " ;proc sgp lo t data = s t r i p ;

s c a t t e r x = j i t t e r y = sa la ry ;xax is min = 0 max = 2 d i sp lay = none ;

run ;

Page 71: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Bivariate graphics

Outline

5 Introduction

6 Univariate graphics

7 Bivariate graphics

8 Trivariate and multivariate graphics

9 Time series data

10 Exercises and further reading

Page 72: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Bivariate graphics

Both categorical

Mosaic plots

A little known and under-used plot is the mosaic plot. It is a wayof visualizing a crosstabulation. For example, sex and party ID.

The SAS System 08:20 Monday, July 6, 2015 1

The FREQ Procedure

The SAS System 08:20 Monday, July 6, 2015 1

The FREQ Procedure

ods s e l e c t MosaicPlot ;proc f req data = mosaic ;

t ab l e pa r t y ∗sex /p l o t s = mosaic ;weight count ;

run ;

Page 73: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Bivariate graphics

One categorical

Introduction

When N is relatively small, a strip chart is good - it shows all thedata. When N is larger, a parallel boxplot shows a lot of the keyinformation.

Page 74: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Bivariate graphics

One categorical

Parallel boxplot - example

AE AW NE NW

League and Division

0

500

1000

1500

2000

2500

1987

Sal

ary

in $

Tho

usan

ds

Salary by division

t i t l e " Salary by d i v i s i o n " ;proc sgp lo t data =

sashelp . baseba l l ;vbox sa la ry

/ category = d iv ;run ;

Page 75: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Bivariate graphics

One categorical

Strip chart - example

22:14 Friday, August 7, 2015 122:14 Friday, August 7, 2015 1

80 100 120 140 160

1987 Salary in $ Thousands

AE

NE

AW

NW

Leag

ue a

nd D

ivis

ion

Salary by division, rookies

t i t l e " Salary by d i v i s i o n ,rook ies " ;

proc sgp lo t data =sashelp . baseba l l ;s c a t t e r x = sa la ryy = d iv ;

where yrmajor l e 1 ;run ;

Page 76: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Bivariate graphics

Neither categorical

The scatter plot

The most common (and one of the best) basic options here isthe scatter plot. But there are variations.

Page 77: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Bivariate graphics

Neither categorical

Scatter plot - basic example

0 1000 2000 3000 4000

Career Hits

0

500

1000

1500

2000

2500

1987

Sal

ary

in $

Tho

usan

ds

Salary by division

proc sgp lo t data =sashelp . baseba l l ;s c a t t e r x = CrHi tsy = Salary ;

run ;

Page 78: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Bivariate graphics

Neither categorical

Scatter plot - log scale

100 500 1000 2000 4000

Career Hits

250

500

750

12501750

2500

1987

Sal

ary

in $

Tho

usan

ds

Salary by division

proc sgp lo t data =sashelp . baseba l l ;

s c a t t e r x = CrHi tsy = Salary ;

xax is type = logl o g s t y l e = l i n e a r ;

yax is type = logl o g s t y l e = l i n e a r ;

run ;

Page 79: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Bivariate graphics

Neither categorical

Scatter plot - log scale plus loess

100 500 1000 2000 4000

Career Hits

100

500

10001500

2500

1987

Sal

ary

in $

Tho

usan

ds

Loess1987 Salary in $ Thousands

Salary by division

Page 80: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Bivariate graphics

Neither categorical

Scatter, log scale with loess

proc sgp lo t data = sashelp . baseba l l ;xax is l a b e l = " Career h i t s

( log scale ) "type = log l o g s t y l e = l i n e a r ;

yax is l a b e l = " Salary i n thousands of$ ( log scale ) "

type = log l o g s t y l e = l i n e a r ;s c a t t e r x = CrHi ts y = sa la ry ;loess x = CrHi ts y = Salary / nomarkers ;e l l i p s e x = CrHi ts y = Salary ;

run ;

Page 81: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Bivariate graphics

Neither categorical

Scatter plot - A fancy example

The SAS SystemThe SAS System

Scatter plot with density plots

AL

AK

AZ

AR

CA

COCT

DE

DC

FL

GA

HI

IDIL

IN

IA

KS

KY

LA

ME

MD

MA

MI

MN

MS

MO

MTNE

NVNH

NJ

NM NY

NC

ND

OH

OK

OR

PA

RI

SC

SD

TN

TX

UTVT

VA

WA

WV

WI

WY

Prediction ellipse (α=.05)

AL

AK

AZ

AR

CA

COCT

DE

DC

FL

GA

HI

IDIL

IN

IA

KS

KY

LA

ME

MD

MA

MI

MN

MS

MO

MTNE

NVNH

NJ

NM NY

NC

ND

OH

OK

OR

PA

RI

SC

SD

TN

TX

UTVT

VA

WA

WV

WI

WY

Prediction ellipse (α=.05)

2 4 6 8 10

Unemployment (%)

0.0 0.1 0.2

Density

4

6

8

10

12

Infa

nt M

ort

alit

y (p

er

XX

X)

0.0

0.1

0.2

De

nsity

Page 82: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Bivariate graphics

Neither categorical

Scatter plot - Another fancy example

The SAS SystemThe SAS System

Box plot w/barchart

120

140

160

180

200

220

Weight

0

100000

200000

N

55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 79

ht

Page 83: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Trivariate and multivariate graphics

Outline

5 Introduction

6 Univariate graphics

7 Bivariate graphics

8 Trivariate and multivariate graphics

9 Time series data

10 Exercises and further reading

Page 84: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Trivariate and multivariate graphics

All continuous

The scatterplot matrix - example

Salary by division

0 1000 2000

0 100 200 300 400 500

0 1000 2000 3000 4000

0

500

1000

1500

2000

2500

0

100

200

300

400

500

0

1000

2000

3000

4000

1987 Salary in $Thousands

Career Home Runs

Career Hits

proc sgsca t te r data =sashelp . baseba l l ;ma t r i x CrHi ts CrHome

Salary ;run ;

Page 85: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Trivariate and multivariate graphics

All continuous

The scatterplot matrix - a more complex example

12:34 Monday, August 17, 2015 112:34 Monday, August 17, 2015 1

proc sgsca t te r data =sashelp . baseba l l ;ma t r i x CrHi ts CrHomeCrBB CrRbi /markera t t r s = ( symbol =c i r c l e f i l l e d s ize = 8)d iagonal = ( ke rne l )e l l i p s ecolorresponse = sa la ry ;

run ;

Page 86: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Trivariate and multivariate graphics

All continuous

Bubble plot

50 75 100 125 150 175

Career Hits

0

5

10

15

20

25

30

Car

eer

Hom

e R

uns

Bubble plot of rookie salaries

t i t l e " Bubble p l o to f rook ie s a l a r i e s " ;

proc sgp lo t data =sashelp . baseba l l ;bubble x = CrHi ts

y = CrHomes ize = sa la ry ;

where yrmajor l e 1 ;run ;

Page 87: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Trivariate and multivariate graphics

Some continuous

Coplot

08:30 Sunday, July 5, 2015 108:30 Sunday, July 5, 2015 1

Career Hits

Ca

ree

r H

om

e R

uns

League and Division = NWLeague and Division = NE

League and Division = AWLeague and Division = AE

0 1000 2000 3000 40000 1000 2000 3000 4000

0

100

200

300

400

500

0

100

200

300

400

500

proc sgpanel data =sashelp . baseba l l ;panelby d iv ;s c a t t e r x = CrHi ts

y = CrHome ;run ;

Page 88: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Trivariate and multivariate graphics

Some continuous

Scatter plot matrix with group variable - example

Several statistics by league and division - 5 years or less

NWNEAWAELeague and Division

Career Times at Bat1987 Salary in $ Tho...Career Home RunsCareer Hits

Ca

ree

r T

ime

s a

t Ba

t1

98

7 S

ala

ry in

$ T

...C

are

er

Ho

me

Run

sC

are

er

Hits

proc sgsca t te r data =sashelp . baseba l l ;

t i t l e " Several s t a t i s t i c sby league and d i v i s i o n− 5 years or less " ;

mat r i x CrHi ts CrHomeSalary CrAtBat / group= d iv d iagonal =

( kerne l ) ;where yrmajor l e 5 ;

run ;

Page 89: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Trivariate and multivariate graphics

Some continuous

Scatter plot matrix - another example

12:34 Monday, August 17, 2015 112:34 Monday, August 17, 2015 1

proc sgsca t te r data =sashelp . baseba l l ;p l o t ( sa la ry ) ∗( nH i ts nHome NBB nAssts )/ markera t t r s = ( symbol =

c i r c l e f i l l e d s ize = 8)loesscolorresponse = yrmajorcolormodel = twocolorramp ;

run ;

Page 90: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Trivariate and multivariate graphics

None continuous

Introduction

When all variables are categorical, generalizations of themosaic plot can be used.

Page 91: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Time series data

Outline

5 Introduction

6 Univariate graphics

7 Bivariate graphics

8 Trivariate and multivariate graphics

9 Time series data

10 Exercises and further reading

Page 92: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Time series data

Electrical workers over time

Timeseries decomposition 14:24 Monday, August 3, 2015 1

The TIMESERIES Procedure

Timeseries decomposition 14:24 Monday, August 3, 2015 1

The TIMESERIES Procedure

240

260

280

300

320

ele

ctri

cal w

ork

ers

, tho

usa

nds

Jan Jul Jan Jul Jan Jul Jan Jul Jan Jul Jan Jul1977 1978 1979 1980 1981 1982

DATE

Series Values for ELECTRIC

Page 93: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Time series data

Electrical workers over time

Timeseries decomposition 14:24 Monday, August 3, 2015 1

The TIMESERIES Procedure

Timeseries decomposition 14:24 Monday, August 3, 2015 1

The TIMESERIES Procedure

Seasonal Decomposition/Adjustment for ELECTRIC

Jan Jan Jan Jan Jan Jan1977 1978 1979 1980 1981 1982

240

260

280

300

320

Sea

sona

lly A

djus

ted

Jan Jan Jan Jan Jan Jan1977 1978 1979 1980 1981 1982

0.97

0.98

0.99

1.00

1.01

1.02

1.03

Irre

gula

r

Jan Jan Jan Jan Jan Jan1977 1978 1979 1980 1981 1982

0.925

0.950

0.975

1.000

1.025

1.050

Sea

sona

l-Irr

egul

ar

Jan Jan Jan Jan Jan Jan1977 1978 1979 1980 1981 1982

240

260

280

300

320

Tre

nd-C

ycle

Page 94: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Time series data

Electrical workers over time - SAS code

t i t l e " Timeser ies decomposit ion " ;proc t imeser ies data=sashelp . workers out=_ n u l l _

p l o t s =( se r i es decomp ) ;i d date i n t e r v a l =month ;var e l e c t r i c ;

run ;

Page 95: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Exercises and further reading

Outline

5 Introduction

6 Univariate graphics

7 Bivariate graphics

8 Trivariate and multivariate graphics

9 Time series data

10 Exercises and further reading

Page 96: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Exercises and further reading

Describe a set of variables and say what graph you woulduse for it and why

Page 97: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Exercises and further reading

Discussion

Page 98: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Exercises and further reading

Further reading - blog links

Parallel box plots http://www.statisticalanalysisconsulting.com/graphics-for-bivariate-data-parallel-box-plots/

Pie is delicious but not nutritious http://www.statisticalanalysisconsulting.com/graphics-for-univariate-data-pie-is-delicious-but-not-nutritious/

Scatterplotshttp://www.statisticalanalysisconsulting.com/scatterplots-and-enhancements/

Graphics: The good, the bad and the ugly http://www.statisticalanalysisconsulting.com/graphics-the-good-the-bad-and-the-ugly/

Page 99: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Exercises and further reading

Further reading - books

Creating more effective graphs by Naomi RobbinsVisualizing data by William S. ClevelandThe elements of graphing data by William S. ClevelandA trout in the milk by Howard Wainer

Page 100: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Part IV

Inferential statistics

Page 101: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

From sample to population

A population is the entire set of all the subjects (people orwhatever) that you want to study.A sample is a subset of that population.A random sample is a sample where all subjects have adefinable chance of being selected

Page 102: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Null and alternative hypotheses

The null hypothesis is usually "nothing is going on"The alternative is "something is going on"Trial analogy

Page 103: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

What is a p value?

Definition

If, in the population from which this sample was randomlydrawn, the null was strictly true, what is probability of getting atest statistic at least as large as the one we got in a sample thesize of the one we have?

In other words, if we do 1000 really silly things, what proportionwill come out significant?

Page 104: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Experiments vs. observational studies

In an experiment subjects are randomly selected and thenrandomly assigned to a conditionIn an observational study neither of these are trueSome people use quasi-experiment where one of theabove is true

Page 105: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Problems

Not usually the question we want to askStrongly affected by sample size

Page 106: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

The Bayesian approach

IdeaSet a prior - often a uniform priorLet data modify it.

AdvantagesMore intuitiveLets you have a prior

DisadvantagesHard to set a priorUninformed prior usually gives similar results to frequentistapproachStill not the question we are interested in

Page 107: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

What we want

Effect sizes and measure of their accuracyRisk reward analysis

Page 108: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Further reading

The Insignificance of Statistical Significance Testing byDouglas JohnsonThe Cult of Statistical Significance by Stephen Zilliak andDeirdre McCloskey

Page 109: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Part V

The regression family

Page 110: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Introduction

Outline

11 Introduction

12 The OLS model

13 Other models for continuous DV

14 The logistic family

15 Count models

16 Multilevel models

17 Exercises and further reading

Page 111: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Introduction

What is regression?

Regression is a term for a variety of models relating dependentvariables (usually just one) to one or more independentvariables.

Page 112: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Introduction

Varieties of regression

The type of regression depends on the nature of the dependentvariable and on the nature of the relationships.

Continuous - OLS and alternatives (see below)Dichotomous - LogisticCategorical (>2 levels) - Multinomial logisticOrdinal - ordinal logisticCount - Poisson, negative binomial and variationsTime to event - survival models

Page 113: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

The OLS model

Outline

11 Introduction

12 The OLS model

13 Other models for continuous DV

14 The logistic family

15 Count models

16 Multilevel models

17 Exercises and further reading

Page 114: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

The OLS model

What it is

Ordinary least squares is the most common regression modeland it is what people mean when they say ‘regression‘.The model is Y = b0 + b1x1 + b2x2 + ...bpxp + e where e is errorand is normally distributed with 0 mean and constant variance.

Page 115: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

The OLS model

What can go wrong

OverfittingNonlinear fitsNonnormal residualsDependent dataCollinearity

Page 116: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Other models for continuous DV

Outline

11 Introduction

12 The OLS model

13 Other models for continuous DV

14 The logistic family

15 Count models

16 Multilevel models

17 Exercises and further reading

Page 117: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Other models for continuous DV

Introduction

Multivariate adaptive regression splines (MARS) - PROCADAPTIVEREGQuantile regression - PROC QUANTREGTranformations - PROC TRANSREG

More information: See my paper at SGF 2015.

Page 118: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Other models for continuous DV

MARS

Introduction

MARS models allow extremely flexible curves (called splines) tobe fit to data.MARS models are most useful

In high dimensional spacesWhen there is little substantive reason to assume linearityor a low-level polynomial fit

Page 119: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Other models for continuous DV

MARS

Advnatages and disadvantages of MARS models

Advantages of MARS models:Very flexible fitting of the relationship between independentand dependent variablesModel selection methods that can sharply reduce thedimension of the model.SAS implementation of these models extends them todependent variables in the exponential family.Can be more accurate than GLM, with greater parsimony

Disadvantages of MARS models:Hard to interpretLess familiar

Page 120: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Other models for continuous DV

MARS

Example

I modeled baseball salary as a function of various attributes ofthe players. ADAPTIVEREG got a significantly higher R2 withconsiderably fewer terms. But the result is very hard tointerpret.

proc adapt ivereg data = sashelp . baseba l lp l o t s = a l l d e t a i l s = bases ;

c lass team ;model sa la ry = YrMajor nAtBat nHi ts nHome nOuts ;

run ;

Page 121: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Other models for continuous DV

Quantile regression

Introduction

There are at least three motivations for quantile regression:DV is bimodal or multimodalHighly skewed DVSubstantive interest in the quantiles

Advantages include:No assumptions about the distribution of the residualsMore flexible hypotheses

Diadvanages include:Not as powerful as OLS regression when that isappropriate modelNot robust to high leverage points.

Page 122: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Other models for continuous DV

Quantile regression

Example

A quantile regression of baseball salary:

proc quantreg data = sashelp . baseba l l p l o t s = a l l ;model sa la ry = YrMajor nAtBat nHi ts nHome nOuts /

q u a n t i l e = ( 0 . 1 , 0 .5 , 0 . 9 ) ;run ;

revealed that the relationship between salary and variousplayer attributes was different at different levels of salary. e.g.:

Number of home runs was more important at high levels ofsalary.

but this should be viewed with caution because of high leveragepoints.

Page 123: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Other models for continuous DV

TRANSREG

Introduction

Sometimes it makes sense to transform one or morevariables.Can do in data step butPROC TRANSREG offers many options and allowsautomation of some tasksSome transformations (e.g. splines) are hard or impossiblein data stepTRANSREG is very flexible and allows optimal fitting.

Page 124: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Other models for continuous DV

TRANSREG

Example

A spline regression of baseball salary

proc t rans reg data = sashelp . baseba l l p l o t s = a l l ;model i d e n t i t y ( sa la ry ) = s p l i n e ( YrMajor nAtBat

nHi ts nHome nOuts ) ;run ;

showed non-monotonic relationships between salary andperformance

Page 125: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Other models for continuous DV

TRANSREG

Page 126: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

The logistic family

Outline

11 Introduction

12 The OLS model

13 Other models for continuous DV

14 The logistic family

15 Count models

16 Multilevel models

17 Exercises and further reading

Page 127: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

The logistic family

Introduction

When the dependent variable is categorical (eitherdichotomous, nominal or ordinal) OLS regression is notrecommended because

The assumption of normal residuals is violatedThe predicted values can be ludicrous

The usual method for these cases is logistic regression (either‘normal‘, multinomial or ordinal). The key output is odds ratioestimates.

Page 128: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

The logistic family

What are odds ratios?

In OLS regression the dependent variable is continuous. Inlogistic, it’s not. How do we go from a 0 - 1 response to acontinuous one from −∞ to∞?

Find odds of something happening for each level of eachIV. e.g. odds of men and women voting for Obama. Thatgoes from 0 to∞Take ratio of the odds. That goes from 0 to∞ as well.Take log of the ratio for modeling. That goes from −∞ to∞But the OR is easier to interpret

Page 129: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

The logistic family

Logistic regression - examples

Predict explain purchase of a product vs. no purchase -dichotomousPredict explain position on a team - multinomialPredict explain likelihood of returning - ordinal

Page 130: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

The logistic family

What can go wrong

Coding 0 and 1 incorrectly - be careful which responseSAS is modellingEffect coding. For categorical IVs, SAS defaults to effectcoding, but reference coding is often betterQuasi-complete and complete separation - slicing the pietoo thinConcordant and discordant in output don’t mean what theyseem toNeed to use SLICE to get interaction odds ratios

Page 131: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

The logistic family

Ordinal and multinomial logistic example

When the DV has multiple categories, they can be ordinal ornominal. If ordinal, use PROC LOGISTIC and the LINK = clogit.If nominal, LINK = glogit. Interpretation can be tricky, but isbasically a generalization of the dichotomous case.

Page 132: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Count models

Outline

11 Introduction

12 The OLS model

13 Other models for continuous DV

14 The logistic family

15 Count models

16 Multilevel models

17 Exercises and further reading

Page 133: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Count models

Introduction

When the DV is a count (a non-negative integer) and especiallywhen the counts aren’t very large, OLS is not recommended.Count models such as Poisson or negative binomial regressionshould be used. PROC GENMOD is used for these analyses.

Page 134: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Count models

Examples

How many cell phones does a person own?How many divorces will a person go through?

Page 135: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Count models

What can go wrong?

OverdispersionFailure to fitAbundance of 0’s - use ZIP or ZINB models

Page 136: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Multilevel models

Outline

11 Introduction

12 The OLS model

13 Other models for continuous DV

14 The logistic family

15 Count models

16 Multilevel models

17 Exercises and further reading

Page 137: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Multilevel models

Introduction

All the regression models above assume independent errors.When this is violated, things can go very wrong. MLM are oneway to deal with this.

Page 138: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Multilevel models

Examples

Repeated measurements of the same thing on the samepeopleMeasurements on people who are clustered

Page 139: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Exercises and further reading

Outline

11 Introduction

12 The OLS model

13 Other models for continuous DV

14 The logistic family

15 Count models

16 Multilevel models

17 Exercises and further reading

Page 140: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Exercises and further reading

Exercises

From your experience, list several regression problems andpropose a regression method for each

Page 141: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Exercises and further reading

Discussion

Page 142: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Exercises and further reading

Further reading - blog links

Simple linear regressionhttp://www.statisticalanalysisconsulting.com/what-is-simple-linear-regression/

Multiple linear regressionhttp://www.statisticalanalysisconsulting.com/what-is-multiple-linear-regression/

Survival analysishttp://www.statisticalanalysisconsulting.com/what-is-survival-analysis/

Alternative methods of regression when OLS is not righthttp://support.sas.com/resources/papers/proceedings15/3412-2015.pdf

Page 143: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Exercises and further reading

Further reading - books

Regression Analysis by Example by Samprit Chaterjee andAli HadiRegression Models for Categorical and Limited DependentVariables by J. Scott LongCategorical Data Analysis by Alan Agresti

Page 144: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Part VI

Multivariate statistics

Page 145: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Introduction

Sometimes there is no dependent variable, but you want to beable to figure out what is going on in a huge mass of data.

Page 146: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Exploratory factor analysis

Introduction

Factor analysis is a method of finding latent factors inmultivariate data. Latent variables are those that can’t bedirectly measured. Examples:

Personality scalesIQViews on complex issues

Page 147: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Exploratory factor analysis

Steps involved

Extracting factors - several methodsRotation - many methods, in two groups

Orthogonal - each factor is uncorrelated with others, easierto interpret but may not be realisticOblique - factors can be correlated

Interpretation - EFA is not determinate, much will dependon interpretation

Page 148: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Exploratory factor analysis

Example

Factor analysis of current statistics showed 2 factors:

proc f a c t o r data = sashelp . baseba l l r = varimax ;var nassts nAtBat −−nBB nouts ; run ;

Rotated Factor PatternFactor1 Factor2

nAtBat Times at Bat in 1986 0.88078 0.37098nHits Hits in 1986 0.87357 0.33843nHome Home Runs in 1986 0.81700 −0.19594nRuns Runs in 1986 0.91078 0.21618nRBI RBIs in 1986 0.92417 0.04853nBB Walks in 1986 0.74709 0.09339nAssts Assists in 1986 0.03736 0.92947nOuts Put Outs in 1986 0.45303 −0.03541nError Errors in 1986 0.10152 0.87866

Page 149: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Exploratory factor analysis

What can go wrong

GIGO can appear like GIPO - garbage in, pearls outNo simple structureUnclear number of factors

Page 150: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Principal component analysis (PCA)

Introduction

PCA is a dimension reduction method; use it when you have alarge number of variables that you want to reduce with minimalloss of information.

Page 151: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Principal component analysis (PCA)

What can go wrong

Components may not make senseComponents may not be useful for further analysisIf doing regression, consider partial least squares.

Page 152: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Cluster analysis

Introduction

Cluster analysis is a set of methods for finding groups ofobservations that go together in ways you are not aware of tostart. Examples:

Do patrons of a store tend to go into groups of people whobuy certain items?Do groups of politicians go into groups based on theirvotes on bills?

Page 153: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Cluster analysis

Methods

Agglomerative methods - start with items separate andgradually combine them using

A measure of distanceA measure of linkage

K-means methods - assign a number of clusters anddistance measure and let algorithm do the work

Page 154: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Cluster analysis

Example

Cluster analysis of the same variables

proc c l u s t e r data = sashelp . baseba l lmethod = average CCC pseudo p r i n t = 10

ou t t r ee = bb4c lus t ;var nAtBat −− nBB nassts nouts ne r ro r ;

run ;

Page 155: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Cluster analysis

Example - continued

showed evidence of 3 clusters:The SAS System 13:26 Monday, September 7, 2015 1

The CLUSTER ProcedureAverage Linkage Cluster Analysis

The SAS System 13:26 Monday, September 7, 2015 1

The CLUSTER ProcedureAverage Linkage Cluster Analysis

Criteria for the Number of Clusters

0

100

200

300

Pse

udo

T-S

quar

ed

0

100

200

300

Pse

udo

F-5

0

5

10

CC

C

2 4 6 8 10

Number of Clusters

Page 156: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Cluster analysis

Example - continued

with the following attributesThe SAS System 13:26 Monday, September 7, 2015 1The SAS System 13:26 Monday, September 7, 2015 1

0

10

20

30

Err

ors

in 1

98

6

0

250

500

750

1000

1250

Put

Out

s in

19

86

0

100

200

300

400

500

Ass

ists

in 1

98

6

0

20

40

60

80

100

Wa

lks

in 1

98

6

25

50

75

100

125

RB

Is in

19

86

25

50

75

100

125

Run

s in

19

86

0

10

20

30

40

Ho

me

Run

s in

19

86

50

100

150

200

250

Hits

in 1

98

6

100

200

300

400

500

600

700

Tim

es

at B

at i

n 1

98

6

Page 157: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Multidimensional scaling

Introduction

MDS is a method for figuring out how people are judgingsimilarity, or what similarity is based on. There are manyoptions and choices and (relatively) little literature.

Page 158: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Multidimensional scaling

Examples

How do people group politicians?How do customers group brands of items?

Page 159: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Multidimensional scaling

What can go wrong

Overfitting - use training and test setsResults may not be useful - try different methods

Page 160: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Exercises and further reading

Outline

18 Exercises and further reading

Page 161: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Exercises and further reading

Exercises

Come up with an example of a multivariate method that wouldbe useful in your research or business

Page 162: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Exercises and further reading

Further reading

Using Multivariate Statistics by Barbara Tabachnik andLinda Fidell

Page 163: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Part VII

Summary and so on

Page 164: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

General thoughts

Statistics and data analysis are not tools to be applied in arote fashion.Data analysis should illuminate a scientific or businessphenomenon or attempt to solve a problem.The time to consult with a data analyst is as early aspossible and as often as possible

Page 165: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Summary

Descriptive statistics are a vital first step in any analysisGraphical methods are also vitalInference allows you to go from a sample to a population,but can have problemsRegression relates a DV to one or more IVsMultivariate statistics allow you to summarize large datasets.

Page 166: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Contact information

Peter FlomPeter Flom Consultingwww.StatisticalAnalysisConsulting.com917 488 7176

Page 167: Lies, damn lies and SAS to the rescue! - Statistical Analysis … · 2015-09-27 · Lies, damn lies and ... SAS to the rescue! Peter L. Flom Peter Flom Consulting SESUG September,

Thank you!