distribution and outliers

Post on 09-Jan-2016

40 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Distribution and Outliers. Screening. (Significant Effects). Hadlum vs Hadlum. A univariate example that illustrates deviation from a normal pattern. Normal duration. Percentage (n=13634). Duration of Pregnancy. Bannet (1978) Appl. Statist. 27 , 242-250. - PowerPoint PPT Presentation

TRANSCRIPT

Distribution and Outliers

Screening

(Significant Effects)

Hadlum vs Hadlum

A univariate example that illustrates deviation from a normal pattern.

Duration of Pregnancy

Bannet (1978) Appl. Statist. 27, 242-250

Normal duration

Per

cent

age

(n=

1363

4)

Normal duration

Per

cent

age

(n=

1363

4)

Hadlum Jr.

Comparison of Hadlum Jr. to normal pattern

Model validation

Deviation = observed value - predicted valueresidual Modelmeasurement

y y

Normally distributed population

2

2

2

)(1)(

y

econstyp

iy

i dyypyP )()(

P(yi)

Normal Population - Cumulative plots

Traditional Graphical paper

Normal distribution paper

)(100)(% ii yPyP

Normal plot1) Sort the observations in increasing order

2) Let each observation present a percent interval that equals

of the normal distribution

nsobservatioofNumber

100

If the observations are normally distributed, they plot like a straight line in the normal plot!

Deviation from straight line implies outlying observations or non-normal distribution

Scull capacity of the Maoris

Sculls from a cemetery

1230 1380 1364 1630 14101348 1260 1420 13601540 1380 1445 15451318 1470 1410 1378

Karl Pearson (1931) Tables for Statisticans and Biometricans, Biometric Lab., London

maximum

Is the largest scull from a Maori?

Hypothesis:

The Maoris have less scull capacity

than the whites - the largest scull is a contaminant

shipwrecked sailor or missionary?

Probability plot

Scull Capacity

What to do with the damned point destroying the curve?

The easy way: Erase it!

Example

P. Garrigues

R. De Sury

M. L. Angelin

J. Bellocq

J. L. Oudin

M. Ewald

Geochemica et Cosmochimica Acta, 52, (1988) 375-384

Data

S a m p l e N o . P r e d i c t o r R e s p o n s e( D e p t h ) )(

P

MP

2 1 2 9 0 0 . 9 23 1 5 9 0 1 . 1 64 1 7 7 0 1 . 3 05 1 9 2 0 2 . 0 96 2 2 5 0 1 . 8 07 2 4 8 0 1 . 9 48 2 6 7 0 1 . 5 09 2 8 0 5 2 . 3 8

1 0 3 0 1 5 2 . 6 11 1 3 1 8 0 2 . 5 7

r 2 = 0 . 9 8

?

?

Robust regression?

Two outliers

Useful tool to avoid thinking?

Sloppy data analyst can find relief in robust regression

Result of “pooled” regression

r=0.995

Observation

r=0.865 Two phenomena influencing the ratio (predictor)

)(P

MP

No prediction possible!

Parallel displacement - perfect result for the one who wants to be

“straight-lined”

Let the computer restore harmony and beauty

top related