12 march 2007 andy bogart. 12 march 2007 andy bogart a cooperative effort: university of north...

12 March 200712 March 2007Andy BogartAndy Bogart


A cooperative effort:A cooperative effort:

University of North DakotaUniversity of North DakotaNational Resource Center National Resource Center on Native American Agingon Native American Aging

University of WashingtonUniversity of WashingtonCenter for Clinical and Center for Clinical and

Epidemiological ResearchEpidemiological Research


A cooperative effort:A cooperative effort:

Training and practical experience inTraining and practical experience in

Research DesignResearch Design

Statistics and Data AnalysisStatistics and Data Analysis

Manuscript PreparationManuscript Preparation



Social Social EngagementEngagement

Pap TestingPap Testing

AgeAge

DisabilityDisabilityEducationEducation

??



Scientific ConsiderationsScientific Considerations

Specific Aims DevelopmentSpecific Aims Development


Statistics and Data AnalysisStatistics and Data Analysis

Hypothesis TestingHypothesis Testing

Data cleaning, coding, and Data cleaning, coding, and analysis using SPSS 14analysis using SPSS 14

120 140 160 180 200 220

40

00

06

00

00

80

00

0

Yea

rly I

ncom

e in

Dol

lars

Height in Centimeters

Income as a Function of Height

Regression AnalysisRegression Analysis

12

34

56

Fo

rce

d E

xpir

ator

y V

olu

me

(FE

V)

Males Females

Lung Function and Smoking Among Children

Non Smokers Smokers

Basic summary Basic summary statisticsstatistics


Manuscript PreparationManuscript Preparation

Creating Tables Creating Tables and Figuresand Figures

TTaabbllee 11:: SSuubbjjeecctt CChhaarraacctteerriissttiiccss

Pap test in past 3 years

Subject Characteristics

Yes n = 1917

No n = 596 p-value

Age, mean (sd) 65 (5) 64 (5) 0.003

ADL Count , % 0.008

None 71 69

1-3 adls 23 20

4 or more 7 11

BMI Category, % 0.732

Underweight 2 2 Healthy 20 19

Overweight 28 28

Obese I 26 26

Obese II 13 15

Obese III 12 10

Social engagement (per week), % <0.001

No meetings 60 70

One meeting 25 22

Two or more 15 9

Married or living as married 41 35 0.007

Has at least one personal physician 81 71 <0.001

High School graduate 80 74 0.002

Insured 0.382

No Insurance 7 6

IHS Only 18 16

At least one other type of insurance 75 78

Rurality 0.602

Urban 23 24

Large rural 18 16

Small rural 20 21

Isolated rural 39 40

0

4

8

20

0

2

4

6

8

10 Sleep Quality

Treatment GroupDirect from MasterDistance from Master

Direct from ActorDistance from Actor

Weeks Since Randomization

Sle

ep

Qu

alit

yb

y 1

0cm

VA

S S

core

None

Best Ever

Direct from Master (n) 24 21 22 19

Distance from Master (n) 25 22 18 20

Direct from Actor (n) 25 21 21 21

Distance from Actor (n) 25 19 20 19

Methods and Results Methods and Results SummariesSummaries

ˆ |

1

j jWMR j N j

j j

jN N j

K KE AUC t E E E N n

N N

KE E E E K

n n

The random variable Kj is a sum of n Bernoulli experiments, each sharing a common

probability of success denoted as jjj tTtTMMpp 2121 ,| . The expectation of Kj is

therefore the sum of the expectations of all n of these Bernoulli random variables. We substitute

the expectation npj into the expression, and obtain the final result:

jjNjNjN ppEnpn

EKEn

E

11

Thus

jjjjWMR tTtTMMpptCUAE 2121 ,|])(ˆ[ as desired.

Each individual who is observed to fail at time tj contributes an estimate of AUC(tj) consisting

of the proportion of those still living whose marker values fall below his or her own. If more than

one subject fails at time tj, then we repeat the above procedure for each subject who fails, using

their marker values in turn to play the role of m*(tj), and recording separately each resulting

estimate of AUC(tj). Multiple estimates of AUC(tj) are accommodated either by averaging or by

implementing a smoothing algorithm, as described below. The estimates of AUC(tj) are

descriptions of the set of subjects who are still at risk: the numerator Kj of the AUC(tj) estimator

sums only over the subset of individuals who are observed to survive beyond time tj. Similarly, the

denominator Nj counts only those individuals who survive beyond tj. Those

deleterious (increased age, hepatomegaly, and high bilirubin) to prognosis. Differences in the

standard errors derived from the p-values reported by Roll et al. and those calculated while fitting

the Yale-like model are attributable in part to having fit them on a larger Mayo data set than in the

original Yale study.

Table 10: Comparison of Yale Coefficient Estimates

Roll et al. (1983)

n=280

Yale-like Model n=312

se* p-value se p-value

Age 0.037 0.0120 0.002 0.0339 0.0082 3.6.10-5

Hepatomegaly 0.74 0.3368 0.028 0.4618 0.2148 0.032

Bilirubin > 5 mg/dl 0.82 0.2492 0.001 0.9675 0.2289 2.5.10-5

Bilirubin < 1.5 mg/dl -0.73 0.3138 0.020 -1.2954 0.2326 2.6.10-8

Portal fibrosis -1.34 0.4336 0.002 -0.6590 0.2728 0.016

* Standard errors estimates derived from p-values reported in Roll et al. (1983)

As in the previous example, we present in Table 11 the concordance estimates for each

marker along with bootstrapped confidence intervals, and inference on the null hypothesis of no

difference between the concordance probabilities given by each estimator. This time each

method but one provides evidence in support of the Mayo researchers’ claim that their model

performed on par with the Yale model. Harrell’s estimator is the only dissenter in this regard,

providing much lower bootstrap p-values of both types for the difference than any other estimator.

As seen in bivariate-normal data simulations in section 2, Harrell’s estimator is much higher than

the point

the relative variances: for Harrell, the relative variance increases to 23% above that of the CoxTVC.

WMRnone and WMRloess estimators show variances 17% and 20% higher than CoxTVC respectively,

and the CoxPH estimator’s variance is 6% higher than the CoxTVC. Results are similar when

correlation is instead set to -0.70, with relative variances in most cases being slightly less different

from 1 than the results stated above. The non-parametric estimators suffer a larger variance

estimate than do the two model-based estimators, as might be expected given that they have

make fewer distributional assumptions of the data.

-2 -1 0 1 2

0.0

0.2

0.4

0.6

0.8

1.0

Log of Analysis Time

Are

a U

nder

RO

C(t

)

Methods

Weighted Mean RankCox PH ModelCox TVC Model

2f tSt

Figure 1: AUC(t) Estimates for Bivariate Normal Data Simulation

Further examination of Figure 1 illustrates interesting features visible in the estimates from Table

1. In the region where log-analysis time is between edit

Manuscript editingManuscript editing


Social engagements per week

Number of ADL disabilities

Adjusted Pap Test ReceiptAdjusted Pap Test ReceiptOdds Ratio EstimatesOdds Ratio Estimates

0.0 1.0 2.0 3.0

4 or more

1 to 3

0

2 or more

1

0


Future DirectionsFuture Directions

Ongoing manuscript assistanceOngoing manuscript assistance

2 analysis projects per year2 analysis projects per year

New participants from UNDNew participants from UND


Thank you for listeningThank you for listening

12 march 2007 andy bogart. 12 march 2007 andy bogart a cooperative effort: university of north...

Documents

andy bogart slide

andy bogart statistics

research design slide

andy bogart social engagements

und slide

epidemiological research

research design statistics

analysis projects