generating correlated random variables kriss harris senior statistician [email protected]

22
Generating Correlated Random Variables Kriss Harris Senior Statistician [email protected]

Upload: alexandria-cockcroft

Post on 14-Dec-2015

231 views

Category:

Documents


7 download

TRANSCRIPT

Page 1: Generating Correlated Random Variables Kriss Harris Senior Statistician Kriss.5.Harris@gsk.com

Generating Correlated Random Variables

Kriss HarrisSenior Statistician

[email protected]

Page 2: Generating Correlated Random Variables Kriss Harris Senior Statistician Kriss.5.Harris@gsk.com

Why?

• I was producing graphs for a SAS Graphics Training Course that will be rolled out soon, and I wanted to control the correlation between the variables.

2

Page 3: Generating Correlated Random Variables Kriss Harris Senior Statistician Kriss.5.Harris@gsk.com

Previous Method

3

Use Excel to fill down and then generate

another column that was fairly correlated

Page 4: Generating Correlated Random Variables Kriss Harris Senior Statistician Kriss.5.Harris@gsk.com

Generating Correlated Random Variables using the SAS Datastep

data bivariate_final;mean1=0; *mean for y1;mean2=10; *mean for y2;sig1=2; *SD for y1;sig2=5; *SD for y2;rho=0.90; *Correlation between y1 and y2;do i = 1 to 100;r1 = rannor(1245);r2 = rannor(2923);y1 = mean1 + sig1*r1;y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-

sig2**2*rho**2)*r2;output; end;run;

4

Page 5: Generating Correlated Random Variables Kriss Harris Senior Statistician Kriss.5.Harris@gsk.com

Generating Correlated Random Variables using the SAS Datastep

data bivariate_final;mean1=0; *mean for y1;mean2=10; *mean for y2;sig1=2; *SD for y1;sig2=5; *SD for y2;rho=0.90; *Correlation between y1 and y2;do i = 1 to 100;r1 = rannor(1245);r2 = rannor(2923);y1 = mean1 + sig1*r1;y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-

sig2**2*rho**2)*r2;output; end;run;

5

Page 6: Generating Correlated Random Variables Kriss Harris Senior Statistician Kriss.5.Harris@gsk.com

Generating Correlated Random Variables using the SAS Datastep

data bivariate_final;mean1=0; *mean for y1;mean2=10; *mean for y2;sig1=2; *SD for y1;sig2=5; *SD for y2;rho=0.90; *Correlation between y1 and y2;do i = 1 to 100;r1 = rannor(1245);r2 = rannor(2923);y1 = mean1 + sig1*r1;y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-

sig2**2*rho**2)*r2;output; end;run;

6

Page 7: Generating Correlated Random Variables Kriss Harris Senior Statistician Kriss.5.Harris@gsk.com

Generating Correlated Random Variables using the SAS Datastep

data bivariate_final;mean1=0; *mean for y1;mean2=10; *mean for y2;sig1=2; *SD for y1;sig2=5; *SD for y2;rho=0.90; *Correlation between y1 and y2;do i = 1 to 100;r1 = rannor(1245);r2 = rannor(2923);y1 = mean1 + sig1*r1;y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-

sig2**2*rho**2)*r2;output; end;run;

7

Page 8: Generating Correlated Random Variables Kriss Harris Senior Statistician Kriss.5.Harris@gsk.com

Generating Correlated Random Variables using the SAS Datastep

data bivariate_final;mean1=0; *mean for y1;mean2=10; *mean for y2;sig1=2; *SD for y1;sig2=5; *SD for y2;rho=0.90; *Correlation between y1 and y2;do i = 1 to 100;r1 = rannor(1245);r2 = rannor(2923);y1 = mean1 + sig1*r1;y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-

sig2**2*rho**2)*r2;output; end;run;

8

Page 9: Generating Correlated Random Variables Kriss Harris Senior Statistician Kriss.5.Harris@gsk.com

Generating Correlated Random Variables using the SAS Datastep

data bivariate_final;mean1=0; *mean for y1;mean2=10; *mean for y2;sig1=2; *SD for y1;sig2=5; *SD for y2;rho=0.90; *Correlation between y1 and y2;do i = 1 to 100;r1 = rannor(1245);r2 = rannor(2923);y1 = mean1 + sig1*r1;y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-

sig2**2*rho**2)*r2;output; end;run;

9

Page 10: Generating Correlated Random Variables Kriss Harris Senior Statistician Kriss.5.Harris@gsk.com

Y and x for different correlation coefficients

10

Page 11: Generating Correlated Random Variables Kriss Harris Senior Statistician Kriss.5.Harris@gsk.com

Generating Correlated Random Variables using Proc IML

• To generate more than 2 correlated random variables than it’s easier to use the Cholesky decomposition method in Proc IML.

• IML = Interactive Matrix Language

11

Page 12: Generating Correlated Random Variables Kriss Harris Senior Statistician Kriss.5.Harris@gsk.com

Generating Correlated Random Variables using Proc IML

proc iml;use bivariate_final;read all var {r1} into x3;read all var {r2} into x4;read all var {mean1} into mean1;read all var {mean2} into mean2;

x={ 4 9, 9 25}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]);

Cholesky_decomp = root(x); /* U */

matrix_con = x3||x4;mean = mean1||mean2;

final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/varnames = {y3 y4};create Cholesky_correlation from final_simulated (|colname = varnames|);append from final_simulated;

quit;

12

Use is similar to set.Reading in the simulated data and the means

Page 13: Generating Correlated Random Variables Kriss Harris Senior Statistician Kriss.5.Harris@gsk.com

Generating Correlated Random Variables using Proc IML

proc iml;use bivariate_final;read all var {r1} into x3;read all var {r2} into x4;read all var {mean1} into mean1;read all var {mean2} into mean2;

x={ 4 9, 9 25}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]);

Cholesky_decomp = root(x); /* U */

matrix_con = x3||x4;mean = mean1||mean2;

final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/varnames = {y3 y4};create Cholesky_correlation from final_simulated (|colname = varnames|);append from final_simulated;

quit;

13

Variance covariance matrix

Page 14: Generating Correlated Random Variables Kriss Harris Senior Statistician Kriss.5.Harris@gsk.com

Generating Correlated Random Variables using Proc IML

proc iml;use bivariate_final;read all var {r1} into x3;read all var {r2} into x4;read all var {mean1} into mean1;read all var {mean2} into mean2;

x={ 4 9, 9 25}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]);

Cholesky_decomp = root(x); /* U */

matrix_con = x3||x4;mean = mean1||mean2;

final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/varnames = {y3 y4};create Cholesky_correlation from final_simulated (|colname = varnames|);append from final_simulated;

quit;

14

Applying Cholesky’s decompositon

Page 15: Generating Correlated Random Variables Kriss Harris Senior Statistician Kriss.5.Harris@gsk.com

Generating Correlated Random Variables using Proc IML

proc iml;use bivariate_final;read all var {r1} into x3;read all var {r2} into x4;read all var {mean1} into mean1;read all var {mean2} into mean2;

x={ 4 9, 9 25}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]);

Cholesky_decomp = root(x); /* U */

matrix_con = x3||x4;mean = mean1||mean2;

final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/varnames = {y3 y4};create Cholesky_correlation from final_simulated (|colname = varnames|);append from final_simulated;

quit;

15

Concatenating the variables

Page 16: Generating Correlated Random Variables Kriss Harris Senior Statistician Kriss.5.Harris@gsk.com

Generating Correlated Random Variables using Proc IML

proc iml;use bivariate_final;read all var {r1} into x3;read all var {r2} into x4;read all var {mean1} into mean1;read all var {mean2} into mean2;

x={ 4 9, 9 25}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]);

Cholesky_decomp = root(x); /* U */

matrix_con = x3||x4;mean = mean1||mean2;

final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/varnames = {y3 y4};create Cholesky_correlation from final_simulated (|colname = varnames|);append from final_simulated;

quit;

16

Correlated Variables

Page 17: Generating Correlated Random Variables Kriss Harris Senior Statistician Kriss.5.Harris@gsk.com

Generating Correlated Random Variables using Proc IML

proc iml;use bivariate_final;read all var {r1} into x3;read all var {r2} into x4;read all var {mean1} into mean1;read all var {mean2} into mean2;

x={ 4 9, 9 25}; /* C */ mattrib x rowname=(rows [1:2 ]) colname=(cols [1:2]);

Cholesky_decomp = root(x); /* U */

matrix_con = x3||x4;mean = mean1||mean2;

final_simulated = mean + matrix_con * Cholesky_decomp; /*RC*/varnames = {y3 y4};create Cholesky_correlation from final_simulated (|colname = varnames|);append from final_simulated;

quit;

17

Outputting the variables

Page 18: Generating Correlated Random Variables Kriss Harris Senior Statistician Kriss.5.Harris@gsk.com

References

• Generating Multivariate Normal Data by using Proc IMLLingling Han, University of Georgia, Athens, GA

18

Page 19: Generating Correlated Random Variables Kriss Harris Senior Statistician Kriss.5.Harris@gsk.com

Appendix

• Correlation Coefficient =

19

Page 20: Generating Correlated Random Variables Kriss Harris Senior Statistician Kriss.5.Harris@gsk.com

R Code - Generating Correlated Random Variables

mean1 = 0mean2 = 10sig1 = 2sig2 = 5rho = 0.9

r1 = rnorm(100, 0, 1)r2 = rnorm(100, 0, 1)

y1 = mean1 + sig1*r1;y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-sig2**2*rho**2)*r2;

20

Page 21: Generating Correlated Random Variables Kriss Harris Senior Statistician Kriss.5.Harris@gsk.com

R Code - Generating Correlated Random Variables

mean1 = 0mean2 = 10sig1 = 2sig2 = 5rho = 0.9

r1 = rnorm(100, 0, 1)r2 = rnorm(100, 0, 1)

y1 = mean1 + sig1*r1y2 = mean2 + rho*sig2*r1+sqrt(sig2**2-sig2**2*rho**2)*r2

21

Page 22: Generating Correlated Random Variables Kriss Harris Senior Statistician Kriss.5.Harris@gsk.com

R Code - Generating Correlated Random Variables using Matrices

C = matrix(c(4, 9, 9, 25), nrow = 2, ncol = 2)cholc = chol(C)R = matrix(c(r1,r2), nrow = 100, ncol = 2, byrow

= F)mean = matrix(c(mean1,mean2), nrow = 100,

ncol = 2, byrow = T)RC = mean + R %*% cholc

22

Use previous values of r1 and r2