sas macro coding for jackknife repeated replication

27
SI Workshop: July 15, 200 5 1 SAS Macro Coding for Jackknife Repeated Replication Jackknife Repeated Replication is well-suited to macro coding due to iterative and flexible abilities with SAS macro language This presentation will demonstrate how to use a general JRR macro to correctly calculate variance estimates for means and regression coefficients (logistic and OLS models)

Upload: haracha

Post on 04-Jan-2016

70 views

Category:

Documents


6 download

DESCRIPTION

SAS Macro Coding for Jackknife Repeated Replication. Jackknife Repeated Replication is well-suited to macro coding due to iterative and flexible abilities with SAS macro language - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: SAS Macro Coding for Jackknife Repeated Replication

SI Workshop: July 15, 2005 1

SAS Macro Coding for Jackknife Repeated Replication

• Jackknife Repeated Replication is well-suited to macro coding due to iterative and flexible abilities with SAS macro language

• This presentation will demonstrate how to use a general JRR macro to correctly calculate variance estimates for means and regression coefficients (logistic and OLS models)

Page 2: SAS Macro Coding for Jackknife Repeated Replication

SI Workshop: July 15, 2005 2

Analysis of Complex Sample Survey Data

• Data from complex sample surveys must be analyzed using techniques which adjust for the clustering of the sample design

• SAS, SPSS, and Stata assume a simple random sample and do not correctly calculate variances and standard errors within the standard procedures

Page 3: SAS Macro Coding for Jackknife Repeated Replication

SI Workshop: July 15, 2005 3

Analysis of Complex Survey Data

• SAS and Stata offer survey and svy procedures which use the Taylor Series Linearization approach

• JRR is another widely used replication approach, offers an alternative to the Taylor Series method

• JRR is flexible and can be adapted to many different types of statistics such as means, regression coefficients, and other statistics of interest

Page 4: SAS Macro Coding for Jackknife Repeated Replication

SI Workshop: July 15, 2005 4

Visual Representation of JRR process

• JRR systematically removes a small portion of the sample and statistics of interest are computed repeated for each sub-sample

• In this example, str=42 and secu=2 is deleted and str=42 and secu=1 is doubled.

• This process is followed for each strata until entire dataset is covered

Page 5: SAS Macro Coding for Jackknife Repeated Replication

SI Workshop: July 15, 2005 5

Page 6: SAS Macro Coding for Jackknife Repeated Replication

SI Workshop: July 15, 2005 6

Page 7: SAS Macro Coding for Jackknife Repeated Replication

SI Workshop: July 15, 2005 7

SAS JRR Macro: Logistic Regression

*Logistic Regression Jackknife for Analysis of Complex Survey Data****************** ;

*Pat Berglund, July 2003 for Summer Institute Workshop ;

libname d 'd:\sumclass' ;options compress=yes nofmterr symbolgen ;options macrogen mprint;

*create outer jackknife macro with parameters ;*Parameters to fill in:*ncluster=number of clusters, in the NCS I dataset this is 42 ;*weight=case weight ;*depend=dependent variable for the logistic model ;*preds=predictor variables entered with a space between each one ;*indata=input dataset* ;

%macro jacklogods(ncluster,weight,depend,preds,indata);

Page 8: SAS Macro Coding for Jackknife Repeated Replication

SI Workshop: July 15, 2005 8

*section 1: jackknife using strata and secu variables to do 42 jackknife selections* ;*each iteration of do loop selects one strata*secu combination and doubles the contribution of strata=x and secu=1 while setting strata=x and secu=2 to zero ;*all other combinations stay the same* ;

%let nclust=%eval(&ncluster);data one; set &indata;

%macro wgtcal ; %do i=1 %to &nclust ; pwt&i=&weight; if str=&i and secu=1 then pwt&i=pwt&i*2 ; if str=&i and secu=2 then pwt&i=0 ; %end; %mend;%wgtcal ;

Page 9: SAS Macro Coding for Jackknife Repeated Replication

SI Workshop: July 15, 2005 9

**section 2: run base model/statistic of interest for entire sample using full weight* ;

%macro base ;

ods output parameterestimates=parms (keep=variable estimate ) ;

ods listing close ;

proc logistic des data=ONE ;

model &depend=&preds ;

weight &weight ;

run ;

ods listing ;

proc print data=parms ;

run ;

proc sort ;

by variable ;

run ;

%mend base ;

%base ;

Page 10: SAS Macro Coding for Jackknife Repeated Replication

SI Workshop: July 15, 2005 10

*Section 3: Run Replicate Models* ;

* replicate models, one for each strata using weight developed in jackknife section 1* ;

*save statistic of interest for use with variance estimation* ;

%macro reps ;

%do j=1 %to &nclust ;

ods output parameterestimates=parms&j

(keep=estimate variable rename=(estimate=estimate&j )) ;ods listing close ;

proc logistic des data=ONE ;

model &depend=&preds ;

weight pwt&j ;

run ;

proc sort ;

by variable ;

%end ;

%mend reps;

%reps ;

Page 11: SAS Macro Coding for Jackknife Repeated Replication

SI Workshop: July 15, 2005 11

*Section 4: Merge Base and Replicate files together for calculation of statistics of interest* ;

data rep ;

merge parms

%do k=1 %to &nclust;

parms&k

%end;;

by variable ;

proc print ;

run ;

Page 12: SAS Macro Coding for Jackknife Repeated Replication

SI Workshop: July 15, 2005 12

*Section 5-Calculate complex design corrected variance and standard errors

*variance = sum of the squared differences between the base statistic and the replicate statistics ;

*standard error= square root of the sum of the squared differences (variance) ;

*Odds Ratio=exponent of the coefficient ;

*Confidence Intervals=OR+-1.96*corrected standard error* ;

ods listing ;

data calculate ;

set rep ;

%macro it ;

%do j=1 %to &nclust ;

sqdiff&j=(estimate-estimate&j)**2;

%end;

sumdiff=sum(of sqdiff1-sqdiff&nclust);

stderr=sqrt(sumdiff) ;

or=exp(estimate) ;

lowor=or-(1.96*stderr) ;

upor=or+(1.96*stderr) ;

%mend it ;

%it;

run ;

Page 13: SAS Macro Coding for Jackknife Repeated Replication

SI Workshop: July 15, 2005 13

proc print ;

var variable estimate stderr or lowor upor ;

run ;

%mend jacklogods ;

%jacklogods(42,p2wtv3,deplt1,sexf,d.ncsdxdm3 ) ;

*comparison with SRS logistic regression* ;

proc logistic des data=d.ncsdxdm3 ;

weight p2wtv3 ;

model deplt1=sexf ;

run ;

*comparison with SAS surveylogistic ;proc surveylogistic data=d.ncsdxdm3 ;strata str ;cluster secu ;weight p2wtv3 ;model deplt1 (event='1') =sexf ;run ;

Page 14: SAS Macro Coding for Jackknife Repeated Replication

SI Workshop: July 15, 2005 14

Results from Logistic JRR

Design Corrected Results:

Variable Estimate stderr or lowor upor

SEXF 0.7434 0.088842 2.10315 1.92902 2.27728

Page 15: SAS Macro Coding for Jackknife Repeated Replication

SI Workshop: July 15, 2005 15

SRS Results

Analysis of Maximum Likelihood Estimates

Std. Wald

Parameter DF Estimate Error Chi-Square Pr > ChiSq

SEXF 1 0.7434 0.0724 105.3802 <.0001

Page 16: SAS Macro Coding for Jackknife Repeated Replication

SI Workshop: July 15, 2005 16

SAS Surveylogistic Results

Parameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 2.0084 0.0776 669.6525 <.0001SEXF 1 -0.7434 0.0889 70.0103 <.0001

Odds Ratio Estimates Point 95% WaldEffect Estimate Confidence LimitsSEXF 0.475 0.399 0.566

Page 17: SAS Macro Coding for Jackknife Repeated Replication

SI Workshop: July 15, 2005 17

Another approach: Linear Regression

%macro jackgenmod(ncluster,weight,depend,preds,indata);

%let nclust=%eval(&ncluster);

data one;

set &indata;

%macro wgtcal ;

%do i=1 %to &nclust ;

pwt&i=&weight;

if str=&i and secu=1 then pwt&i=pwt&i*2 ;

if str=&i and secu=2 then pwt&i=0 ;

%end;

%mend;

%wgtcal ;

Page 18: SAS Macro Coding for Jackknife Repeated Replication

SI Workshop: July 15, 2005 18

Base Model for OLS

%macro base ;

ods output parameterestimates=parms

(keep=variable estimate ) ;

title "Example of Proc Reg without design correction" ;

proc reg data=ONE ;

model &depend=&preds ;

weight &weight ;

run ;

proc sort ;

by variable ;

run ;

%mend base ;

%base ;

Page 19: SAS Macro Coding for Jackknife Repeated Replication

SI Workshop: July 15, 2005 19

Replicate Models

%macro reps ;

%do j=1 %to &nclust ;

ods output parameterestimates=parms&j

(keep=estimate variable rename=(estimate=estimate&j )) ;

ods listing close ;

proc reg data=ONE ;

model &depend=&preds ;

weight pwt&j ;

run ;

proc sort ;

by variable ;

%end ;

%mend reps;

%reps ;

Page 20: SAS Macro Coding for Jackknife Repeated Replication

SI Workshop: July 15, 2005 20

Merge Replicate Datasets with Base Dataset

data rep ;

merge parms

%do k=1 %to &nclust;

parms&k

%end;;

by variable ;

proc print ;

run ;

ods listing ;

Page 21: SAS Macro Coding for Jackknife Repeated Replication

SI Workshop: July 15, 2005 21

Calculate Corrected Standard Errors from Distribution of Replicate Coefficients

data calculate ;

set rep ;

%macro it ;

%do j=1 %to &nclust ;

sqdiff&j=(estimate-estimate&j)**2;

%end;

sumdiff=sum(of sqdiff1-sqdiff&nclust);

stderr=sqrt(sumdiff) ;

%mend it ;

%it;

run ;

Page 22: SAS Macro Coding for Jackknife Repeated Replication

SI Workshop: July 15, 2005 22

Code to Print Results from JRR and Execute Outer Macro

proc print ;

title "Results from JRR for OLS regression" ;

var variable estimate stderr ;

run ;

%mend jackgenmod ;

%jackgenmod(42,p2wtv3,incpers,sexf ag25 ag35 ag45,d.ncsdxdm3 ) ;

Page 23: SAS Macro Coding for Jackknife Repeated Replication

SI Workshop: July 15, 2005 23

Proc SurveyReg Code

proc surveyreg data=d.ncsdxdm3 ;

title "Example of Proc SurveyReg" ;

strata str ;

cluster secu ;

weight p2wtv3 ;

model incpers=sexf ag25 ag35 ag45 ;

run ;

Page 24: SAS Macro Coding for Jackknife Repeated Replication

SI Workshop: July 15, 2005 24

Parameter Estimates

Parameter Std.

Variable DF Estimate Error t Value

Intercept 1 11077 485.53334 22.81

SEXF 1 -12096 434.45468 -27.84

AG25 1 15227 586.69609 25.95

AG35 1 22194 600.60265 36.95

AG45 1 21404 683.46087 31.32

Parameter Estimates from OLS SRS Regression

Page 25: SAS Macro Coding for Jackknife Repeated Replication

SI Workshop: July 15, 2005 25

JRR Results

Results from JRR for OLS regression

Obs Variable Estimate stderr

1 Intercept 11077 529.49

2 AG25 15227 698.83

3 AG35 22194 1026.29

4 AG45 21404 1055.67

5 SEXF -12096 689.31

Page 26: SAS Macro Coding for Jackknife Repeated Replication

SI Workshop: July 15, 2005 26

Proc SurveyReg Results

Estimated Regression Coefficients

Standard

Parameter Estimate Error t Value Pr > |t|

Intercept 11077.003 532.95062 20.78 <.0001

SEXF -12095.819 690.29149 -17.52 <.0001

AG25 15227.170 698.54031 21.80 <.0001

AG35 22194.355 1017.50689 21.81 <.0001

AG45 21403.763 1062.42802 20.15 <.0001

Page 27: SAS Macro Coding for Jackknife Repeated Replication

SI Workshop: July 15, 2005 27

Conclusions

• JRR is a flexible and convenient alternative to canned software procedures/programs

• Any statistic/procedure can be used within JRR structure, assuming it makes statistical sense

• SAS Macro coding allows parsimonious syntax and is ideal for repetitive and flexible coding