a shrinkage regression approach to tackle the hla region

20
A Shrinkage Regression Approach to Tackle the HLA Region Charlotte Vignal Variable Selection Workshop Vienna, July 26 th 2008

Upload: avidan

Post on 05-Feb-2016

31 views

Category:

Documents


0 download

DESCRIPTION

A Shrinkage Regression Approach to Tackle the HLA Region. Charlotte Vignal Variable Selection Workshop Vienna, July 26 th 2008. Outline. Overview of the HLA system and the challenge of analysing data from the HLA region - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: A Shrinkage Regression Approach to Tackle the HLA Region

A Shrinkage Regression Approach to Tackle the HLA Region

Charlotte VignalVariable Selection Workshop

Vienna, July 26th 2008

Page 2: A Shrinkage Regression Approach to Tackle the HLA Region

Overview of the HLA system and the challenge of analysing data from the HLA region

Multivariate association test using a Bayesian-inspired shrinkage regression approach

Application to the rheumatoid arthritis case-control study

Conclusion

Outline

Page 3: A Shrinkage Regression Approach to Tackle the HLA Region

The Human Leukocyte Antigen System

• A genomic region found in almost all vertebrates, the major histocompatibility complex (MHC) - gene composition and arrangement vary between species (below)

• In humans, the MHC is the HLA system

• A set of genes encoding proteins essential to immune response

• Major role in histocompatibility and protection against pathogens

MOUSE

RAT

CHIMPANZEE

HUMAN

Kelley et al. Immunogenetics (2005)

Page 4: A Shrinkage Regression Approach to Tackle the HLA Region

The Challenge

Susceptibility to many complex disorders maps to the HLA region

High degree of correlation within the region hampers the identification of causal variants

Widely used approaches test the effect of one genetic variable at a time

Require methods that allow the detection of (possibly multiple) causal variants among highly correlated data

Page 5: A Shrinkage Regression Approach to Tackle the HLA Region

Multi-SNP Methods can be more Powerful than Single-SNP Analyses

Multivariate logistic regression – Problematic when nVars >> nObs– Stepwise procedures can be unstable in presence of many highly-

correlated terms

Shrinkage method using Bayesian logistic regression – A variable selection approach – Based on the Least Absolute Shrinkage and Selection Operator

approach (LASSO) (Tibshirani 1996)– Fast implementation using the Bayesian Binary Regression (BBR)

software for text-categorisation analysis (Genkin et al. 2004, http:/www.stat.rutgers.edu/~madigan/BBR)

Page 6: A Shrinkage Regression Approach to Tackle the HLA Region

Each coefficient βj has a Laplace prior distribution with mode 0 and prior variance ν=2/λ2, where λ is the penalty factor

– Mode 0 encodes a prior belief of no effect– The prior variance determines the strength of this belief and

hence the sparseness of the fitted model

The maximum a posteriori (posterior mode) estimates are often zero or else shrunk towards zero

Terms with non-zero are included in the final model, and treated as significant

The value of gives a (shrunk) measure of effect size

Bayesian Logistic Regression for variable selection

j

j

j

Page 7: A Shrinkage Regression Approach to Tackle the HLA Region

The Density of the Laplace Distribution

-10 -5 0 5 10

0.0

0.1

0.2

0.3

0.4

0.5

0, 1 0, 0.5

x

p(x)

! Effect size estimates are biased towards zero;Over-shrinking true effects can lead to non-causal correlated variables to be retained

Page 8: A Shrinkage Regression Approach to Tackle the HLA Region

RA is an autoimmune disease and a complex disorder – Estimated genetic contribution of ~30-50%– The HLA region is strongly implicated in RA susceptibility – Genetic associations reported with a biomarker called the shared

epitope (SE) defined by a class of alleles at HLA-DRB1 – The mechanism by which RA is determined is still unknown

Is the SE association the only HLA effect predisposing to RA?

The subjects: 842 RA cases and 957 controls (but 774 cases and 945 controls with no missing data analysed)

The independent variables: – 2,302 genetic markers, a continuous variable coded as 0, 1 and 2

based on the number of allele copies– The shared epitope, a continuous variable coded as 0, 1, 2 based on

the number of shared epitope positive (SE+) alleles

ApplicationThe Rheumatoid Arthritis Dataset

Page 9: A Shrinkage Regression Approach to Tackle the HLA Region

The Effect of Shared Epitope on RA

Effect Wald P OR [95% CI]

SE carriage SE+ vs. SE - < 0.0001

5.1 [4.1; 6.3]

SE+ copies

1 copy vs. 0 copy2 copies vs. 1 copy2 copies vs. 0 copy

< 0.0001

3.7 [2.9; 4.6]3.2 [2.4; 4.3]

11.8 [8.6; 16.1]

The presence of SE is strongly associated with RA Increasing risk for RA associated with the number of SE+

allele copies The objective: to investigate the presence of additional causal

variants in the HLA region, possibly correlated with SE

Page 10: A Shrinkage Regression Approach to Tackle the HLA Region

λ = 62 was selected for further analyses

Specification of the Penalty λ• Cases and controls permuted 100 times for each λ within each SE group (i.e. SE effect retained)• SE (additive term) included in each model • λ selected if false positive per model < 1

Page 11: A Shrinkage Regression Approach to Tackle the HLA Region

The Effect of Shrinking a True Effect

In blue are the genetic variables selected by BLR in addition to SE

Three variables selected are correlated with SE

Shrinking a known effect may cause correlated SNPs to be selected

R2 between each genetic variables and SE across the HLA region

Page 12: A Shrinkage Regression Approach to Tackle the HLA Region

The Effect of Shrinkage on True Effects

To investigate the effect of shrinkage, SE included twice (SE & SEfake) in the model:

When SE and SEfake are shrunk, both variables retained– Shrinking a known effect may cause correlated SNPs to be selected

When SE is not shrunk, only SE is retained – Correlated SNPs could be eliminated

The shrinkage factor was not applied to SE in subsequent analyses (λ = 0)

Page 13: A Shrinkage Regression Approach to Tackle the HLA Region

BLR and Correlated Data

Can the BLR approach distinguish positive effects from spurious associations in presence of correlation?

4 variables correlated with SE were used to evaluate error rates and power

Records of each variables re-distributed in cases and controls to achieve different size of OR while maintaining correlation with SE

Error rate and power assessed by permuting cases and controls

— Error rate: frequency of the variables selected beyond SE & the simulated correlated variables over 100 permutations

— Power: frequency of the simulated variables over 100 permutations

Page 14: A Shrinkage Regression Approach to Tackle the HLA Region

Power

•Selection of simulated variables correlated with SE

variables moderately correlated with SE selected if OR> 2 variables highly correlated with SE selected if OR> 5

Page 15: A Shrinkage Regression Approach to Tackle the HLA Region

Under the null, expect 1 false positive per analysis (λ = 62)Analysis generates 1 to 2 false positives per analysis

Error Rate

• Selection of simulated variables correlated with SE

Page 16: A Shrinkage Regression Approach to Tackle the HLA Region

ATT- BLR Results Comparison

• Data were analysed by Armitage Trend Test (ATT) and BLR

• With λ=62, BLR identified 10 SNPs

• Single-point analysis using ATT

identified 109 associated SNPs at α = 4.34e-04 = 1/2302

• Variables selected by BLR are not correlated with SE

SNPDE PATT-adj R2 (SNP,SE)

SE 4.2e-61snp292 1.9e-6 0.03snp576 2.6e-6 0.02snp271 3.2e-5 0.02snp645 9.6e-5 8.5e-6snp068 2.2e-5 0.04snp384 2.4e-5 0.002snp465 9.7e-6 0.03snp156 2.3e-5 0.001snp225 3.1e-6 0.05

Page 17: A Shrinkage Regression Approach to Tackle the HLA Region

Additional AnalysisThe NEG Distribution

• Data re-analysed using the normal-exponential-gamma (NEG) prior with parameters set to expect 1 false positive per model (Hoggart et al. PLoS (2008))

! NEG has heavier tails to allow sparser solutions

Page 18: A Shrinkage Regression Approach to Tackle the HLA Region

Additional AnalysisThe NEG Distribution

NEG identified 4 variables; of which three (snp271, snp384, snp545) were also retained by DE

Variables identified with NEG prior are less correlated among themselves and with SE than those selected using DE

Three of the selected variables are in genes/region reported to contribute to RA susceptibility: BAT1 and HLADQA1/DQB1

Page 19: A Shrinkage Regression Approach to Tackle the HLA Region

BLR appears to perform better than single-point association analysis (ATT) when data are correlated

Computationally efficient Identifies fewer positive results (10 vs.109) Correlation might be more effectively handled

Simulation analyses confirm reasonable power and error rate

Three variables identified by both DE and NEG priors lie in genes previously implicated in RA

Results suggest the presence of independent RA-associated effects in the HLA region

Conclusions

Page 20: A Shrinkage Regression Approach to Tackle the HLA Region

Acknowledgements

David Balding, Imperial College, UKClive Hoggart, Imperial College, UKAruna Bansal, GSK, UKThe Genetics Division at GSK