problem: diagnosing spina bifida

46
Problem: Diagnosing Spina Bifida The procedure of amniocentesis involves drawing a sample of the amniotic fluid that surrounds an unborn child in its mother’s womb. High concentration of alpha fetoprotein can indicate the condition spina bifida. Concentration of alpha fetoprotein tends to increase with the size of the foetus. Amniocentesis results in miscarriage for 1%. Preliminary tests involve measuring the level of alpha fetoprotein in the mother’s urine.

Upload: matana

Post on 18-Jan-2016

31 views

Category:

Documents


2 download

DESCRIPTION

Problem: Diagnosing Spina Bifida. The procedure of amniocentesis involves drawing a sample of the amniotic fluid that surrounds an unborn child in its mother’s womb. High concentration of alpha fetoprotein can indicate the condition spina bifida. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Problem: Diagnosing Spina Bifida

Problem: Diagnosing Spina BifidaThe procedure of amniocentesis involves drawing a sample of the amniotic fluid that surrounds an unborn child in its mother’s womb. High concentration of alpha fetoprotein can indicate the condition spina bifida.Concentration of alpha fetoprotein tends to increase with the size of the foetus. Amniocentesis results in miscarriage for 1%.

Preliminary tests involve measuring the level of alpha fetoprotein in the mother’s urine.

Page 2: Problem: Diagnosing Spina Bifida

Problem: Diagnosing Spina BifidaFor mothers with normal foetuses, the mean level of alpha fetoprotein is 15.73 moles/litre with a standard deviation of 0.72 moles/litre.

For mothers carrying foetuses with spina bifida, the mean is 23.05 and the standard deviation is 4.08.

In both groups the distribution of alpha fetoprotein appears to be approximately Normally distributed.

23.05 15.73

Page 3: Problem: Diagnosing Spina Bifida

Problem: Diagnosing Spina Bifida

To operate a diagnostic test for spina bifida, set a threshold concentration of alpha fetoprotein, T, say.

If the alpha fetoprotein level is below T, then the foetus is diagnosed as not having spina bifida.

If the level is above T, then further testing is required.

Page 4: Problem: Diagnosing Spina Bifida

Problem: Diagnosing Spina Bifida

If T was set at 17.80 moles/litre:

What is the probability that a foetus with spina bifida is correctly diagnosed?

What is the probability that a foetus not suffering from spina bifida is correctly diagnosed?

If they wanted to ensure that 99% of foetuses with spina bifida were correctly diagnosed, at what level should they set T ? What are the implications of setting T at this level?

Page 5: Problem: Diagnosing Spina Bifida

Continuous Random Variables

If a random variable, X, can take any value in some interval of the real line it is called a continuous random variable.E.g.

Hg levels, height, weight, alpha fetoprotein concentration, cell radius, etc. (i.e. usually quantities that are ‘measurements’ )

Page 6: Problem: Diagnosing Spina Bifida

The Standardized Histogram

Example: Dietary Carbohydrate in the Workforce

The average daily intake of carbohydrate in the diet of 5929 people.

Page 7: Problem: Diagnosing Spina Bifida

200 400 600 800Carbohydrate (g/day)

0

.002

.000

.004

The Standardized Histogram(a) The histogram of the data shows the carbohydrate intake:

* Is unimodal (modal class 200 – 225 g/day)

Page 8: Problem: Diagnosing Spina Bifida

200 400 600 800Carbohydrate (g/day)

0

.002

.000

.004

The Standardized Histogram(a) The histogram of the data shows the carbohydrate intake:

* Skewed to larger values (skewed right)

Page 9: Problem: Diagnosing Spina Bifida

200 400 600 800Carbohydrate (g/day)

0

.002

.000

.004

The Standardized Histogram(a) The histogram of the data shows the carbohydrate intake:

* Has huge variability (highest consumers more than 10 times that of lowest consumers)

Page 10: Problem: Diagnosing Spina Bifida

(b) Area between a = 225 and b = 375 shaded

0 6 0 0 8 0 02 25

.002

.000

.004

375

Shaded area = 0.483(Corresponds to 48.3% of observations)

The Standardized Histogram

Page 11: Problem: Diagnosing Spina Bifida

(b) The standardized histogram adjusts the height of the rectangle or bar to relative freq. or proportion divided by interval width so that

Area = Estimated Probability

0 600 800225

.002

.000

.004

375

Shaded area = 0.483(Corresponds to 48.3% of observations)

The Standardized HistogramR

el. F

req

/Wid

th

Page 12: Problem: Diagnosing Spina Bifida

(b) i.e. The area of the ith rectangle tells us what proportion of the data lie in the ith class interval.

0 600 800225

.002

.000

.004

375

Shaded area = 0.483(Corresponds to 48.3% of observations)

The Standardized HistogramR

el. F

req

/Wid

th

Page 13: Problem: Diagnosing Spina Bifida

For a standardized histogram:

The vertical scale is :

Relative frequency / interval width

(density scale)

Total area under the histogram = 1

The proportion of the data between a and b is the area “under” the histogram between a and b.

The Standardized Histogram

Page 14: Problem: Diagnosing Spina Bifida

(c) With approximating curve

Carbohydrate (g/day)

2 0 0 4 0 0 6 0 0 8 0 0

. 004

. 002

0

The Standardized Histogram

You can do this with Fit Distribution > Smooth Curve in JMP.

Page 15: Problem: Diagnosing Spina Bifida

(d) Area between a = 225 and b = 375 shaded

Shaded area = .486 (cf. area = .483 for histogram)

.04

0 600 800225 375

.002

This area is calculated to be 0.486 and is very close to the proportion of people who had carbohydrate intake of between 225 and 375 g/day.

The Standardized Histogram

Page 16: Problem: Diagnosing Spina Bifida

Radius of Maliginant Tumor Cells

In JMP select Histogram Options > Density Axis to create a standardized histogram

The histogram on the left is for cell radii of malignant tumor fine needle aspirations in the breast cancer study from your 2nd assignment. X = radius of a randomly selected malignant tumor cell

We estimate that, P(14 < X < 15) = .10 or a 10% chance

Page 17: Problem: Diagnosing Spina Bifida

AFP Levels in Spina Bifida Cases

In JMP select Histogram Options > Density Axis to create a standardized histogram

The histogram on the left is AFP levels found in the urine of mothers carrying a fetus with spina bifida. X = AFP level of random select mother carrying fetus with spina bifida.

We estimate that, P(22.5 < X < 25) = 2.5 X .10 = .25 or a 25% chance

Page 18: Problem: Diagnosing Spina Bifida

Smooth Density Curves

Take a standardized histogram, decrease the width of the class intervals and increase the number of observations. Then the top of the histogram tends to a smooth curve.

Page 19: Problem: Diagnosing Spina Bifida

Histogram Smooth Density Curves as sample size increases! (AFP Levels)

0.03

0.05

0.08

0.10

Den

sity

10 20 30 40

n = 100 n = 500

n = 10,000 n = 100,000

n = 1,000,000

Page 20: Problem: Diagnosing Spina Bifida

Properties of the Probability Density Function (p.d.f.)

1. f(x) 0 (i.e. the p.d.f. curve stays above the x-axis)

2. P(a X b) = area from a to b beneath the p.d.f

curve

3. Area under the p.d.f. curve = 1

Page 21: Problem: Diagnosing Spina Bifida

Endpoints of Intervals

For a continuous random variable, X, endpoints of intervals are unimportant.

P(a X b) = P(a < X b)

= P(a X < b)

= P(a < X < b)

= area from a to b between the

p.d.f. curve and the x-axis.

(Inclusion or exclusion of the endpoints will not change the area.)

Page 22: Problem: Diagnosing Spina Bifida

Limiting smooth bell shaped symmetric curve is called the Normal p.d.f. curve.

Is symmetric about the mean.

Mean = Median

If a random variable, X, has a Normal distribution with a mean and a standard deviation we write:

X ~ Normal ( , )

The Normal Distribution

50% 50%

Mean

parameters

Page 23: Problem: Diagnosing Spina Bifida

The Normal Distribution

• The Normal distribution is important because: – it fits a lot of data reasonably well; – it can be used to approximate other distributions;

e.g. the binomial distribution– it is important assumption made about the

distributional shape of continuous variables we are working with when using parametric tools for statistical inference (e.g. t-Tests & F-Tests).

Page 24: Problem: Diagnosing Spina Bifida

The Normal Distribution• A Normal distribution is solely determined by and .

(a) Changing Shifts the curve along the axis

Page 25: Problem: Diagnosing Spina Bifida

(b) Increasing Increases the spread and flattens the curve

A Normal distribution is solely determined by and .

The Normal Distribution

Page 26: Problem: Diagnosing Spina Bifida

Spina Bifida Example

Let X be the AFP level found in the urine of mother carrying a foetus with spina bifida.

We will assume that the AFP level is normally distributed with a mean of 23.05 moles/L and a standard deviation of 4.08 moles/L .

AFP Levels for Mothers Carrying Spina Bifida Foetus

Page 27: Problem: Diagnosing Spina Bifida

Spina Bifida Example (Empirical Rule)Approximately 68 % of mothers in this population will have a AFP levels within 1 standard deviation of the mean.

i.e., approximately 68 % of mothers in this population will have AFP levels

between 23.05 – 4.08 and 23.05 + 4.08

= between 18.97 and 27.13

Page 28: Problem: Diagnosing Spina Bifida

Spina Bifida Example (Empirical Rule)Approximately 95 % of mothers in this population will have a AFP levels within 2 standard deviation of the mean.

i.e., approximately 95 % of mothers in this population will have AFP levels

between 23.05 - 2 4.08 and 23.05 + 2 4.08

= between 14.89 and 31.21

Page 29: Problem: Diagnosing Spina Bifida

Spina Bifida Example (Empirical Rule)Approximately 99.73 % of mothers in this population will have a AFP levels within 3 standard deviation of the mean.

i.e., approximately 99.73% of mothers in this population will have AFP levels

between 23.05 - 3 4.08 and 23.05 - 3 4.08

= between 10.81 and 35.29

Page 30: Problem: Diagnosing Spina Bifida

The Normal Distribution

For the Normal Distribution:

A random observation has approximately:– 68% chance of falling within of ; – 95% chance of falling within of ; – 99.7% chance of falling within of .

Or:

In a Normal distribution, approximately:– 68% of observations are within of ; – 95% of observations are within of ; – 99.7% of observations are within of .

Page 31: Problem: Diagnosing Spina Bifida

The Normal Distribution

Probabilities and numbers of standard deviations

Shaded area = 0.683

68% chance of falling between - and +

- +

Shaded area = 0.997

99.7% chance of falling between - 3 and + 3

- 3 + 3

Shaded area = 0.954

95% chance of falling between - 2 and + 2

- 2 + 2

Page 32: Problem: Diagnosing Spina Bifida

Problem: Diagnosing Spina BifidaFor mothers with normal foetuses, the mean level of alpha fetoprotein is 15.73 moles/litre with a standard deviation of 0.72 moles/litre.

For mothers carrying foetuses with spina bifida, the mean is 23.05 and the standard deviation is 4.08.

In both groups the distribution of alpha fetoprotein appears to be approximately Normally distributed.

23.05 15.73

Given this information we want to be able to find probabilities associated with these distributions.

For example we might like to find: P(X > 17.8) or

P(19 < X < 25) etc…

for either group.

Page 33: Problem: Diagnosing Spina Bifida

OR

x

Obtaining Probabilities

Normal distribution probabilities can be obtained from all statistical packages by giving the mean and standard deviation of the distribution.– Most tables in books give the value of P(X x).

– i.e., cumulative or lower tail probabilities.

Area = P(X x)

Page 34: Problem: Diagnosing Spina Bifida

Problem: There are infinitely many normal distributions we might be interested in

• Because there are infinitely many values for and that we might be interested in we would need a book infinitely thick to contain all the possible tables we might need.

• Even for our motivating example we would need to two separate tables to find probabilities for mothers with healthy fetuses and another for mothers carrying fetus with spina bifida.

• We need to have “standard” normal table that allows us to easily find probabilities for all situations.

Page 35: Problem: Diagnosing Spina Bifida

Standardization and the Standard Normal Distribution

If X ~ N() then if we define a new random variable Z as,

Z = _______ ~

We say Z has a standard normal distribution.

This process is called standardization.

X -

N(0,1), i.e. a normal distribution with mean 0 and standard deviation 1.

Page 36: Problem: Diagnosing Spina Bifida

Obtaining Probabilities

Basic method for obtaining probabilities1. Sketch a Normal curve, marking on the mean

and values of interest.

2. Shade the area under the curve corresponding to the required probability.

3. Convert all values to their z-scores

4. Obtain the desired probability using a standard normal table or better yet use JMP.

Page 37: Problem: Diagnosing Spina Bifida

Standard Normal Distribution• a) P(Z > 2.25)

• b) P(Z < 1.28)

0

0

Page 38: Problem: Diagnosing Spina Bifida

Standard Normal Distribution• c) P(Z > .50)

• d) P(Z <-2.33)

0

0

Page 39: Problem: Diagnosing Spina Bifida

Standard Normal Distributione) P(-1.96 < Z < 1.96)

0

0

0

Page 40: Problem: Diagnosing Spina Bifida

Standard Normal Distribution

• f) Find z so that P(Z < z) = .95, i.e. what is the 95th percentile of the standard normal distribution?

0

Page 41: Problem: Diagnosing Spina Bifida

Original problem: Diagnosing Spina Bifida

Recall:

• For normal foetuses =15.73, = 0.72 and for foetuses with spina bifida = 23.05 and = 4.08.

• Assume the threshold for detecting spina bifida is set at 17.8. – (A foetus would be diagnosed as not having

spina bifida if the fetoprotein level is below 17.8)

23.05 15.73

Page 42: Problem: Diagnosing Spina Bifida

15.73 17.8

a) What is the probability that a foetus not suffering from spina bifida is correctly diagnosed?Let X be level of fetoprotein in normal foetus

X ~ Normal (15.73, 0.72)

Original problem: Diagnosing Spina Bifida 23.05 15.73

P(X < 17.8) = P(Z < z-score for 17.8)z-score = (17.8 – 15.73)/.72 = 2.07/.72 = 2.88

What is P(X < 17.8)?

P(X < 17.8) = P(Z < 2.88)

= .99800 2.88

Therefore 99.80% of healthy foetuses will be diagnosed correctly. This is called the specificity of the test, i.e. P(T-|D-).

Page 43: Problem: Diagnosing Spina Bifida

P(Z>-1.29) = 1-P(Z < -1.29) = 1 – 0.099

= 0.901

23.0517.8

b) What is the probability that a foetus with spina bifida is correctly diagnosed?

Let Y be the level of fetoprotein in a spina bifida foetus. Y ~ Normal (23.05, 4.08)P(Y > 17.8) = P(Z > z-score for Y = 17.8)

Original problem: Diagnosing Spina Bifida 23.05 15.73

z-score = (17.8 – 23.05)/4.08

= -5.25/4.08 = -1.29

-1.29 0

Therefore the probability that a foetus with spina bifida is correctly diagnosed is .901 or 90.1% of foetuses with spina bifida will be detected. This is called the sensitivity of the test, i.e. P(T+|D+).

Page 44: Problem: Diagnosing Spina Bifida

Original problem: Diagnosing Spina Bifida

If they wanted to ensure that 99% of foetuses with spina bifida were correctly diagnosed, at what level should they set T ?

23.05 15.73

Find a value T so that if

X ~ Normal (23.05, 4.08)

we will have

P(X > T) = .9900 or P(X < T) = .0100

First find the z-score associated with T by finding z so that

P(Z < z) = .0100

From Normal Table we find

P(Z < -2.33) = .0100 thus

T = x z = 23.05 – 4.08 x 2.33 = 13.54

T = 13.54 ensures 99% of foetuses with spina bifida will be identified. Again this probability is called the sensitivity.

Page 45: Problem: Diagnosing Spina Bifida

Standard Normal Probabilities in JMP• Normal Probability Calculator in JMP from Tutorials

section of course website.

• Here it is ready to calculate probabilities for the standard normal distribution. (= 0, = 1)

Page 46: Problem: Diagnosing Spina Bifida

Arbitrary Normal Probabilities in JMP• Change the mean and standard deviation columns to

contain the desired values. For mothers carrying foetus with spina bifida: X ~ N(23.05,4.08), i.e.

moles/liter & moles/liter

Here we have found P(X < 17.8) and P(X > 17.8)