sampling design in regional fine mapping of a quantitative trait shelley b. bull,...

21
Sampling Design in Regional Fine Mapping of a Quantitative Trait Shelley B. Bull, Lunenfeld-Tanenbaum Research Institute, & Dalla Lana School of Public Health, University of Toronto Banff International Research Station Emerging Statistical Challenges and Methods Session 7: GWAS and Beyond II 25 June 2014 Co-authors: Zhijian Chen and Radu Craiu Lunenfeld-Tanenbaum Research Institute & University of Toronto

Upload: alyson-hines

Post on 14-Jan-2016

220 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sampling Design in Regional Fine Mapping of a Quantitative Trait Shelley B. Bull, Lunenfeld-Tanenbaum Research Institute, & Dalla Lana School of Public

Sampling Design in Regional Fine Mapping of a Quantitative Trait

Shelley B. Bull, Lunenfeld-Tanenbaum Research Institute,& Dalla Lana School of Public Health, University of Toronto

Banff International Research StationEmerging Statistical Challenges and Methods

Session 7: GWAS and Beyond II25 June 2014

Co-authors: Zhijian Chen and Radu Craiu Lunenfeld-Tanenbaum Research Institute & University of Toronto

Page 2: Sampling Design in Regional Fine Mapping of a Quantitative Trait Shelley B. Bull, Lunenfeld-Tanenbaum Research Institute, & Dalla Lana School of Public

OverviewOverview

Setting

Studies designed to follow up associations detected in a GWAS

Fine-mapping of a candidate region by sequencing

Aim to identify a functional sequence variant

Approach

Phase I: Quantitative trait with GWAS data (eg. N = 5000)

Phase II: Two stage designStage 1 sample (n1) – expensive sequencing to identify a

smaller set of promising variantsStage 2 sample (n2) – cost-effective genotyping of selected

variants in an independent group Stratification in Stage 1 according to a promising GWAS tag SNP

Bayesian analysis in Stage 1, incorporating genetic model selection

Page 3: Sampling Design in Regional Fine Mapping of a Quantitative Trait Shelley B. Bull, Lunenfeld-Tanenbaum Research Institute, & Dalla Lana School of Public

Two-phase Two-stage Design Two-phase Two-stage Design

Page 4: Sampling Design in Regional Fine Mapping of a Quantitative Trait Shelley B. Bull, Lunenfeld-Tanenbaum Research Institute, & Dalla Lana School of Public

BackgroundBackground

Two-phase designs +/- Stratification on tag SNPChen et al (2012), Schaid et al (2013), Thomas et al (2013)Earlier: case-cohort designs

Two-stage designsSkol et al (2007), Thomas et al (2009), Stanhope & Skol (2012)

Bayesian approaches to genetic associationStephens & Balding (2009), Wakefield (2009), WTCCC/Maller et al (2012)

Genetic model (mis)specificationJoo et al (2010), Spencer et al (2011), Vukcevic et al (2011), Faye et al (2013)

Page 5: Sampling Design in Regional Fine Mapping of a Quantitative Trait Shelley B. Bull, Lunenfeld-Tanenbaum Research Institute, & Dalla Lana School of Public

Sampling Designs & Sample Allocation Sampling Designs & Sample Allocation

Based on tag SNP (AA, Aa, aa) from the GWAS:(1) Simple random sampling (SRS) – ignores tagSNP information(2) Equal (ES) number from each stratum(3) Oversampled homozygous (HO) – number larger than under SRS

Example: N=5000, MAF=0.2

Page 6: Sampling Design in Regional Fine Mapping of a Quantitative Trait Shelley B. Bull, Lunenfeld-Tanenbaum Research Institute, & Dalla Lana School of Public

Quantitative Trait ModelQuantitative Trait Model

QT Model Parameters: θ = (β0 , β1 , σ 2 )

Genetic Models: M1= additive, M2= dominant, M3= recessive

Page 7: Sampling Design in Regional Fine Mapping of a Quantitative Trait Shelley B. Bull, Lunenfeld-Tanenbaum Research Institute, & Dalla Lana School of Public

Bayesian Inference: Stage 1 sampleBayesian Inference: Stage 1 sample

(1) Specify priors for the genetic models and the regression parametersp(Mj ) = ⅓ p( θ | Mj ) = p( θ ) p( θ ) = p(β0 , β1 | σ 2 ) p( σ 2 ) normal-inverse-gamma (NIG)

• Derive model-specific posterior for the regression parameters for a functional sequence variant – analytic when prior is NIG

• Select a genetic model for each seq variant according to the posterior probability wj = p(Mj | data )

• Given selection of a genetic model, compare all seq variants in the region by computing the posterior probability that variant k is functional given all the data, and rank them (the Bayes factor)

p(1) ≥ p(2) ≥ … ≥ p(m)

• Construct a 95% credible interval that includes all variants such that

p(1) + p(2) + … + p(k) ≥ 0.95 for minimum k

Page 8: Sampling Design in Regional Fine Mapping of a Quantitative Trait Shelley B. Bull, Lunenfeld-Tanenbaum Research Institute, & Dalla Lana School of Public

Criteria for a Good DesignCriteria for a Good Design

Higher probability that the correct genetic model is identified for the sequence variant

Fewer sequence variants selected into the credible set (number and %) * cost

Higher probability that the functional sequence variant is selected into the credible set * power

Higher probability that the functional sequence variant is top ranked in the credible set

Page 9: Sampling Design in Regional Fine Mapping of a Quantitative Trait Shelley B. Bull, Lunenfeld-Tanenbaum Research Institute, & Dalla Lana School of Public

Simulation Design (APOE gene region, Simulation Design (APOE gene region, 1KG) 1KG)

Quantitative trait model is

Y = β0 + β1 X + γ 1(X=1) + ϵ, Parameters specified by β0=5, β1=0.25, σ2 =0.1, 0.5, 1.5 and σ/β1 =1.3, 2.8, 4.9

Page 10: Sampling Design in Regional Fine Mapping of a Quantitative Trait Shelley B. Bull, Lunenfeld-Tanenbaum Research Institute, & Dalla Lana School of Public

Simulation Results: Genetic model Simulation Results: Genetic model selection selection

Data simulated under additive, dominant and recessive genetic models.

The rate of selecting the true genetic model for the functional variant using the strong criteria of wj >0.833.

Common seq variant (MAF=0.2)

1000 simulations

Designs: SRS ____ ES - - - - HO …..

Page 11: Sampling Design in Regional Fine Mapping of a Quantitative Trait Shelley B. Bull, Lunenfeld-Tanenbaum Research Institute, & Dalla Lana School of Public

Simulation Results: Size of the 95% Simulation Results: Size of the 95% credible set credible set

Data simulated under additive, dominant and recessive genetic models.

Upper panels: common variant (MAF=0.2) with σ/β1=4.9(m=201)

Lower panels:low frequency variant (MAF=0.02) with σ/β1=2.8 (m=332)

1000 simulations

Designs: SRS ____ ES - - - - HO …..

Page 12: Sampling Design in Regional Fine Mapping of a Quantitative Trait Shelley B. Bull, Lunenfeld-Tanenbaum Research Institute, & Dalla Lana School of Public

Simulation Results: Selection of Simulation Results: Selection of functional variant functional variant

Designs: SRS ____ ES - - - - HO ….. Data simulated under additive, dominant and recessive genetic models.

Upper panels: common variant (MAF=0.2) with σ/β1=4.9(m=201)

Lower panels:low frequency variant (MAF=0.02) with σ/β1=2.8 (m=332)

1000 simulations

Page 13: Sampling Design in Regional Fine Mapping of a Quantitative Trait Shelley B. Bull, Lunenfeld-Tanenbaum Research Institute, & Dalla Lana School of Public

Simulation Results: Functional variant Simulation Results: Functional variant top ranked top ranked

Designs: SRS ____ ES - - - - HO ….. Data simulated under additive, dominant and recessive genetic models.

Upper panels: common variant (MAF=0.2) with σ/β1=4.9(m=201)

Lower panels:low frequency variant (MAF=0.02) with σ/β1=2.8 (m=332)

1000 simulations

Page 14: Sampling Design in Regional Fine Mapping of a Quantitative Trait Shelley B. Bull, Lunenfeld-Tanenbaum Research Institute, & Dalla Lana School of Public

Simulation Results: Model selection Simulation Results: Model selection

Data simulated under additive, dominant and recessive genetic models. For cases without model selection (no MS), analysed under an additive model. Common seq variant (MAF=0.2), σ/β1=4.9, n1=600, 1000 simulations

Page 15: Sampling Design in Regional Fine Mapping of a Quantitative Trait Shelley B. Bull, Lunenfeld-Tanenbaum Research Institute, & Dalla Lana School of Public

Simulation Results: Cost Efficiency (CE) Simulation Results: Cost Efficiency (CE)

A total of m sequence variants are identified in n1 individuals in stage 1, and a proportion q = (m2 / m) are genotyped in n2=N-n1 in stage 2.

Cost depends on c1, the stage 1 per individual sequencing cost, and on c2, the stage 2 per individual per marker genotyping cost.

CE is defined as “Power” / Cost, where “Power” is estimated by the probability that a functional variant falls within the 95% credible set

e.g. if N = 5000, n1=500, c1=$1000, n2=4500, m2=100, and c2=$0.50, then the total two-stage cost is $500,000 + $225,000 = $725,000 compared to a one-stage cost of $5 million.

Page 16: Sampling Design in Regional Fine Mapping of a Quantitative Trait Shelley B. Bull, Lunenfeld-Tanenbaum Research Institute, & Dalla Lana School of Public

Comments and DiscussionComments and Discussion

• Incorporating Bayesian genetic model selection is worthwhile

• Selection of informative individuals for expensive data collection can be a useful strategy in statistical genetic design and analysis

• The simulations confirm the intuition that the efficiency of the tag-stratified sampling strategy increases with tag-seq correlation.

• Winner’s curse effects propagate from the GWAS, but are more complicated

• Cost-efficiency of a two-stage design depends on the relative costs of sequencing versus genotyping – will it remain practical?

• Analysis of the sequence data limited to low frequency and common variants – extensions to rare variants

• Other design options – trait-dependent sampling

• How to conduct joint Bayesian inference for stages 1 and 2?

Page 17: Sampling Design in Regional Fine Mapping of a Quantitative Trait Shelley B. Bull, Lunenfeld-Tanenbaum Research Institute, & Dalla Lana School of Public

AcknowledgementsAcknowledgements

Co-Authors:

Zhijian Chen, STAGE Post-doctoral Fellow

Radu Craiu, Dept of Statistical Sciences

Thanks to Laura Faye and Andrew Paterson for helpful discussions, and to referees for improvements to the paper.

To appear in Genetic Epidemiology

Funding

Page 18: Sampling Design in Regional Fine Mapping of a Quantitative Trait Shelley B. Bull, Lunenfeld-Tanenbaum Research Institute, & Dalla Lana School of Public

Thanks Thanks

Page 19: Sampling Design in Regional Fine Mapping of a Quantitative Trait Shelley B. Bull, Lunenfeld-Tanenbaum Research Institute, & Dalla Lana School of Public

Simulation Results SummarySimulation Results Summary

In stage 1, a total of m variants are sequenced in n1 = 500 individuals, with equal strata sampling (ES) and an additive genetic model.

Size is the number m2 of sequence SNPs in the 95% credible set (% or count). P(Select) is the probability the functional variant is selected into the credible set. P(Rank) is the probability the functional variant is top ranked in the credible set.

Page 20: Sampling Design in Regional Fine Mapping of a Quantitative Trait Shelley B. Bull, Lunenfeld-Tanenbaum Research Institute, & Dalla Lana School of Public

GWAS Sample Size GWAS Sample Size

Page 21: Sampling Design in Regional Fine Mapping of a Quantitative Trait Shelley B. Bull, Lunenfeld-Tanenbaum Research Institute, & Dalla Lana School of Public

Title Title