p-values arthur berg pennsylvania state university problems p-values combination i how many ways are...

23
P-values Arthur Berg Pennsylvania State University

Upload: vankhanh

Post on 20-May-2018

214 views

Category:

Documents


1 download

TRANSCRIPT

P-values

Arthur BergPennsylvania State University

Practice Binomial Problems p-values

Consider the data: 1,2,3,4,5. Calculated the following quantities (feel free touse the computer for all of them)

I mean

I median

I geometric mean

I variance

I standard deviation

I interquartile range

I median absolute deviation

I 95% confidence interval of the mean (try this with and without thecomputer)

I 90% confidence interval of the mean

I 99.9% confidence interval of the mean

Arthur Berg P-values 2 / 23

Practice Binomial Problems p-values

Example (coin flipping)

I Suppose you flip a coin 100 times which resulted in 50 heads and 50tails. How likely is that?

I Suppose you flip a coin 100 times which resulted in 70 heads and 30tails. How unlikely is that?

Example (dice rolling)

Suppose you roll a die 10 times which resulted in five 6’s and fivenumbers less than 6.

I What are the chances of that?

I What are the chances of observing five or more 6’s?

Example (complications)

A certain surgery procedure has an average complication rate of 20%. Inlooking back at Dr. X’s cases, only a 10% complication rate was observedin the 60 patients he treated over the past year. Does this represent asignificantly lower complication rate?

Arthur Berg P-values 3 / 23

Practice Binomial Problems p-values

Combination

I How many ways are there of selecting 4 socks from a drawer filled with10 socks?

I How many different 3-flavor ice creams cones are there from a BaskinRobbin’s ice cream store offering 31 flavors (assuming the order of thescoops does not matter)?

I How many ways are there to select two mice out of a group of five?

1, 2, 3, 4, 512, 13, 14, 15, 23, 24, 25, 34, 35, 45(

n

k

)= n!

k !(n − k)!

Arthur Berg P-values 4 / 23

Practice Binomial Problems p-values

> choose(10, 4)

[1] 210

> choose(31, 3)

[1] 4495

> choose(5, 2)

[1] 10

Arthur Berg P-values 5 / 23

Practice Binomial Problems p-values

Example (genetic markers)

Suppose five independent genetic markers (not in LD) have been shownto be linked with a certain disease where the prevalence of a riskgenotype in the general population is 10% at each marker. An individualtests negative for the first marker but positive for the other four markers.What is the probability of this event?

P(− + + + +) = .9 × .1 × .1 × .1 × .1

What is the probability a given individual tests positive for four of the fivetests (but not necessarily the last four tests)?

P(4 +’s) = 5 × (.9)1(.1)4

Arthur Berg P-values 6 / 23

Practice Binomial Problems p-values

> plot(dbinom(0:100, 100, 0.5), type = "h", xlab = "",

ylab = "")

0 20 40 60 80 100

0.00

0.02

0.04

0.06

0.08

Arthur Berg P-values 7 / 23

Practice Binomial Problems p-values

Example (coin flipping)

I Suppose you flip a coin 100 times which resulted in 50 heads and 50tails. How likely is that?

I Suppose you flip a coin 100 times which resulted in 70 heads and 30tails. How unlikely is that?

> choose(100, 50) * (0.5)^50 * (0.5)^50

[1] 0.07958924

> dbinom(50, 100, 0.5)

[1] 0.07958924

> choose(100, 70) * (0.5)^30 * (0.5)^70

[1] 2.317069e-05

> dbinom(70, 100, 0.5)

[1] 2.317069e-05

Arthur Berg P-values 8 / 23

Practice Binomial Problems p-values

Example (dice rolling)

Suppose you roll a die 10 times which resulted in five 6’s and fivenumbers less than 6.

I What are the chances of that?

I What are the chances of observing five or more 6’s?

> choose(10, 5) * (1/6)^5 * (5/6)^5

[1] 0.01302381

> dbinom(5, 10, 1/6)

[1] 0.01302381

> round(dbinom(5:10, 10, 1/6), 4)

[1] 0.0130 0.0022 0.0002 0.0000 0.0000 0.0000

> sum(dbinom(5:10, 10, 1/6))

[1] 0.01546197

Arthur Berg P-values 9 / 23

Practice Binomial Problems p-values

Example (complications)

A certain surgery procedure has an average complication rate of 20%. Inlooking back at Dr. X’s cases, only a 10% complication rate was observedin the 60 patients he treated over the past year. Does this represent asignificantly lower complication rate?

> binom.test(6, 60, p = 0.2)

Exact binomial test

data: 6 and 60

number of successes = 6, number of trials = 60, p-value

= 0.05291

alternative hypothesis: true probability of success is not equal to 0.2

95 percent confidence interval:

0.03759127 0.20505774

sample estimates:

probability of success

0.1Arthur Berg P-values 10 / 23

Practice Binomial Problems p-values

Temperature Data

A Critical Appraisal of 98.6 Degrees F, the Upper Limit of the NormalBody Temperature, and Other Legacies of Carl Reinhold August Wunderlich(1992) Journal of the American Medical Association

What’s Normal? – Temperature, Gender, and Heart Rate (1996) Journalof Statistics Education

temperature dataset (n = 130)

I Body temperature (degrees Fahrenheit)

I Gender

I Heart rate (beats per minute)

Arthur Berg P-values 11 / 23

Practice Binomial Problems p-values

I Is the distribution of body temperatures normal?

I Is the true population mean really 98.6 degrees F?

I At what temperature should we consider someone’s temperature to be“abnormal”?

I Is there a significant difference between males and females in normaltemperature?

I Is there a relationship between body temperature and heart rate?

Arthur Berg P-values 12 / 23

Practice Binomial Problems p-values

to 41°C (91.5°F to 105.8°F).10 Subjectswere instructed not to eat, drink, orsmoke for 15 minutes prior to each tem¬perature measurement.

Data Analysis.—Data were analyzedusing the SAS-PC program on an IBMPS/2 model 80 computer. Because initialdescriptive analyses suggested neitherstrong kurtosis nor skewness for the700 temperatures, no data transforma¬tion was applied.

We used t tests to compare mean tem¬peratures between groups (ie, smoking,sex, race). To examine the simultaneouseffects of several demographic factorson temperature, we ran a general linearmodel in which the dependent variablewas temperature; the independent fac¬tors included in the model were age,race, smoking, all two- and three-dayinteractions among the previous threefactors, day within study, and time ofday. In SAS notation, the model wasdefined as:

Temperature = Sex Race Smoking AgeTime Day

Sex X Race Sex x Smoking Race xSmoking

Comparisons of variance in temper¬ature between days were made using Ftests. Linear regression analysis wasused to study the effect of baseline tem¬perature on pulse and the effect of ageon baseline temperature. Analyses oforal temperature used individual tem¬perature readings as variâtes; analysesofdiurnal temperature oscillations usedpatient-days as variâtes.Results

The 700 temperature recordings fromthe 148 subjects had a range of 35.6°C(96.0°F) to 38.2°C (100.8°F), overall meanof 36.8°C±0.4°C (98.2°F±0.7°F), medianof 36.8°C (98.2°F), and mode of 36.7°C(98.0°F); 37°C (98.6°F) accounted for only56 (8.0%) of the 700 oral temperature ob¬servations recorded (Fig 1). The meantemperature varied diurnally, with a6 AM nadir and a 4 to 6 PM zenith (Fig 2).The maximum temperature (as reflectedby the 99th percentile) varied from a lowof 37.2°C (98.9°F) at 6 AM to a high of37.7°C (99.9°F) at 4 PM. Comparison of ini¬tial temperature recordings obtained onadmission to the research ward with onesobtained the same hour the day after ad¬mission revealed no significant differencein variability (F tests for individual stud¬ies, P>. 12). Age did not significantly influ¬ence temperature within the age range 18through 40 years (linear regression,P=.99).

Women had a slightly higher averageoral temperature than men (36.9°C[98.4°F] vs 36.7°C [98.1°F], t test, P<.001,ri/=698) but did not exhibit a greater

0.125r

0.1

0.075

fi 0.05-

0.025-

LiW35.6(96)

JjjU

Female Male

tin nil36.1(97)

36.7 37.2 37.8(98) (99) (100)Temperature, °C (°F)

Fig 1.—Frequency distribution of 700 baseline oral temperatures obtained during two consecutive days ofobservation in 148 healthy young male and female volunteers. Arrow indicates location of 37.0°C (98.6°F).

oo

s?S Q.E

37.1 (98.9) 37.5 (99.5) 37.6 (99.6) 37.7 (99.9) 37.7 (99.8) 37.6 (99.6)

38.3(101)37.8

(100)37.2(99)36.7(98)36.1(97)35.6(96)35.0(95)

37.1 (98.9)36.4 (97.6)36.0 (96.8)

(n=19)

37.2 (99.0)36.5 (97.7)35.7 (96.3)

(n=144)

37.4 (99.3)36.8 (98.2)36.1 (97.0)

(n=41)

37.4 (99.4)36.9 (98.4)36.3 (97.3)

(n=157)

37.6 (99.6) 37.3 (99.2)36.9 (98.5) 36.8 (98.2)36.3 (97.4) 36.1 (97.0)

(n=57) (n=282)6 AM 8 AM 12 pm 4 PM 6 PM 12 am

Time of Day

Fig 2.—Mean (solid squares) oral temperatures and temperature ranges according to time of day. The fourtemperatures shown at each sample time are the 99th percentile (top), 95th percentlle (second from top),mean (second from bottom), and 5th percentile (bottom) for each sample set.

mean diurnal temperature oscillationthan male counterparts (0.56°C [1.00°F]vs 0.54°C [0.97°F]). Black subjects exhib¬ited a slightly higher mean temperatureand a slightly lower average diurnal tem¬perature oscillation than white subjects(36.8°C [98.2°F] vs 36.7°C [98.1°F] and0.51°C [0.93°F] vs 0.61°C [1.09°F], respec¬tively); these differences approached butdid not quite reach statistical significance(i test, P=.06, df=69S). Oral temperaturerecordings of smokers did not differ sig¬nificantly from those ofnonsmokers (datanot shown). Statistical analysis using a

general linear model, as described in theprevious section, yielded results qualita¬tively identical to those reported above(sex, time ofday, P<.001; race, P=.05; age,smoking, and interaction terms, P>.26).

There was a statistically significant lin-

ear relationship between temperatureand pulse rate (regression analysis,P<.001), with an average increase inheart rate of4.4 beats per minute for each1°C rise in temperature (2.44 beats perminute for each 1°F rise in temperature)over the range oftemperatures examined(35.6°C to 38.2°C [96.0°F to 100.8°F]).Comment

Thermometers used by Wunderlichwere cumbersome, had to be read insitu,11 and, when used for axillary mea¬surements (Wunderlich's preferred sitefor monitoring body temperature), re¬

quired 15 to 20 minutes to equilibrate.12Today's thermometers are smaller andmore reliable and equilibrate more rap¬idly. In addition, the mouth and rectumhave replaced the axilla as the preferred

at Penn State Milton S Hershey Med Ctr on February 9, 2011jama.ama-assn.orgDownloaded from

Arthur Berg P-values 13 / 23

Practice Binomial Problems p-values

> d <- read.csv("tempdat.csv")

> hist(d$temp)

Histogram of d$temp

96 97 98 99 100 101

05

1015

2025

3035

Arthur Berg P-values 14 / 23

Practice Binomial Problems p-values

> shapiro.test(d$temp)

Shapiro-Wilk normality test

data: d$temp

W = 0.9866, p-value = 0.2332

> library(nortest)

> ad.test(d$temp)

Anderson-Darling normality test

data: d$temp

A = 0.5201, p-value = 0.1829

> lillie.test(d$temp)

Lilliefors (Kolmogorov-Smirnov) normality test

data: d$temp

D = 0.0647, p-value = 0.2009

Arthur Berg P-values 15 / 23

Practice Binomial Problems p-values

> cvm.test(d$temp)

Cramer-von Mises normality test

data: d$temp

W = 0.082, p-value = 0.1937

> sf.test(d$temp)

Shapiro-Francia normality test

data: d$temp

W = 0.9838, p-value = 0.1113

> pearson.test(d$temp)

Pearson chi-square normality test

data: d$temp

P = 30.1538, p-value = 0.002647

> pearson.test(d$temp, n.classes = 10)$p.val

[1] 0.6881011Arthur Berg P-values 16 / 23

Practice Binomial Problems p-values

> qqnorm(d$temp)

> qqline(d$temp)

●●

●●●●

●●●●●

●●●●●

●●●●●

●●●●●●●●

●●●●●●

●●●●●●

●●●●●●●●

●●●●●

●●●●

●●

●●

●●

●●

●●●

●●●●●●

●●●●●●

●●●●●●●●●

●●●●●●

●●●●●●●●●●

●●●●●●●●

●●●●

●●●

●●

−2 −1 0 1 2

9798

9910

0

Normal Q−Q Plot

Arthur Berg P-values 17 / 23

Practice Binomial Problems p-values

> t.test(d$temp, mu = 98.6)

One Sample t-test

data: d$temp

t = -5.4548, df = 129, p-value = 2.411e-07

alternative hypothesis: true mean is not equal to 98.6

95 percent confidence interval:

98.12200 98.37646

sample estimates:

mean of x

98.24923

Arthur Berg P-values 18 / 23

Practice Binomial Problems p-values

> t.test(d$temp, mu = 98.6, conf.level = 0.999)

One Sample t-test

data: d$temp

t = -5.4548, df = 129, p-value = 2.411e-07

alternative hypothesis: true mean is not equal to 98.6

99.9 percent confidence interval:

98.03268 98.46578

sample estimates:

mean of x

98.24923

Arthur Berg P-values 19 / 23

Practice Binomial Problems p-values

One Sided Test

> t.test(d$temp, alternative = "less", conf.level = 0.99)

One Sample t-test

data: d$temp

t = 1527.877, df = 129, p-value = 1

alternative hypothesis: true mean is less than 0

99 percent confidence interval:

-Inf 98.4007

sample estimates:

mean of x

98.24923

Arthur Berg P-values 20 / 23

Practice Binomial Problems p-values

Arthur Berg P-values 21 / 23

Practice Binomial Problems p-values

Arthur Berg P-values 22 / 23

Practice Binomial Problems p-values

Arthur Berg P-values 23 / 23