bias in studies of the human genome thomas a. pearson, md, phd university of rochester school of...

50
Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Post on 18-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Bias in Studies of the Human Genome

Thomas A. Pearson, MD, PhDUniversity of Rochester

School of MedicineVisiting Scientist, NHGRI

Page 2: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Lecture 6: Bias in Studies of the Human Genome

1. Consider the causes of heterogeneity of results in gene association studies.

2. Review the types and sources of bias relevant to human genomic research. 3. Provide examples from genome-wide association studies to illustrate biases or potential for bias. 4. Identify strategies in study design, data collection, statistical analysis, and interpretation which could prevent or minimize bias in human genome research.

Page 3: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Larson, G. The Complete Far Side. 2003.

Page 4: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

PLoS Med. 2005 Aug;2(8):e124.

Page 5: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

WSJ. 2004Sep14.

Page 6: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Disease/Trait Gene Polymorph. Freq.

DVT F5 Arg506Gln 0.015

Graves’ Disease CTLA4 Thr17Ala 0.62

Type 1 DM INS 5’ VNTR 0.67

HIV/AIDS CCR5 32 bp Ins/Del 0.05-0.07

Alzheimer’s APOE Epsilon 2/3/4 0.16-0.24

Creutzfeldt-Jakob Disease

PRNP Met129Val 0.37Hirschhorn J et al, Genet Med 2002; 4:45-61.

Only 6/600 Gene-Disease Associations Significant in >75% of Studies (Hirschhorn J et al,

Genet Med 2002; 4:45-61)

Page 7: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Possible Explanations of Heterogeneity of Results in

Genetic Association Studies• Biologic mechanisms

– Genetic heterogeneity– Gene-gene interactions– Gene-environment interactions

• Spurious mechanisms– Inadequacies of genomic markers– Type 1 error– Limited sample sizes and power– Cohort, age, period (secular) effects– Bias

Page 8: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Definition of Bias in Human Research

• Sackett (1975): “Any process at any stage of inference which tends to produce results or conclusions that differ systematically from the truth.”

• Gordis (2004): “Any systematic error in the design, conduct, or analysis of a study that results in a mistaken estimate of an exposure’s effect on risk or disease.”

Page 9: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Effects of Bias on GWAS Results

• False negatives

• False positives

• Inaccurate effect sizes– Underestimates– Overestimates

Page 10: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Larson, G. The Complete Far Side. 2003.

Page 11: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Types of Bias in Genome Association Studies

• Selection of cases and controls

• Information on genotype or phenotype

• Analysis and presentation of results

• Interpretation of results

Page 12: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

20 Types of Biases Potentially Encountered in GWAS

• Common to all human observational studies (N=12)

• Unique or common in GWAS (N=8)– Supercase or supercontrol biases– Latent case bias– Population stratification– Hardy-Weinberg disequilibrium– Genotyping quality bias– Transmission disequilibrium bias– Winner’s Curse

Page 13: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Systematic Review of GWAS: NHGRI Catalog of GWAS in Print*

• 109 studies from 3/05 to 3/08.• Genotyping platforms of density>100,000 SNPs• Each study reviewed for:

– Study design– Description of case and comparison groups– Collection of genotype and other risk factors– Presentation of study results– Interpretation of study results

*http://www.genome.gov/gwastudies/

Page 14: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Characteristics of 109 GWAS

• Phenotypes– Discrete outcomes or traits: 91 in 83 studies– Quantitative traits: 40 in 26 studies

• Design of discovery study N %

– Case-control 77 70.6– Trio 4 3.7– Nested case-control 4 3.7– Cross-sectional/Cohort 24 22.0

Page 15: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Four Key Requirements for a Bias-Free Case-Control Study

Selection Bias– Cases are representative of all those who develop the

disease being studied.– Controls are representative of all those at risk of

developing the disease and eligible to become cases and be included in the study.

– Ancestral geographical origins and predominant environmental exposures of cases do not differ dramatically from controls.

Information Bias - Collection of risk factor and exposure information is the same for cases and

controls.

Page 16: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Selection Biases in GWAS: Criteria for Classification

• Misclassification bias: Absence of description or use of adequate means to define cases and/or controls.

• Nonresponse bias: Absence of description of rates of recruitment and participation in cases and/or controls.

• Prevalence-incidence bias: Use of prevalent cases of disease which have sizable short term case-fatality or remission rates.

Page 17: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Larson, G. The Complete Far Side. 2003.

Page 18: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Characteristics of 109 GWAS:Selection of Study Subjects

• Methods of selection/recruitment frequently (30%) described in supplement or other publication.

• Few baseline descriptors or cases/controls– Tables comparing cases vs. controls: 36.0%

• Statistical comparison of cases/controls: 3.5%

– Participation rates (cases or controls): 9.0%• Comparison of participants/nonparticipants: 2.0%

• Most cases (67%) prevalent cases derived from clinical sources, rather than population-based or incident cases.

Page 19: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

GWAS of Type II Diabetes in Mexican-Americans*

• Case-control study design– 281 cases with diabetes defined by current Dx/RX or

fasting blood glucose or 2 hour GTT– 280 persons from a random population sample whose

T2DM status is unknown

• 112,541 SNPs assayed in each person• 4 genes identified• ?Misclassification: Substantial prevalence (7-

14%) of T2DM likely in controls. *Hayes MG et al. Diabetes, 9/10/07.

Page 20: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Selection Biases in GWAS:Criteria for Classification

• Supercase bias: Use of additional criteria in case selection that increases the chance of a genetic etiology.

• Supercontrol bias: Use of additional criteria in control selection that decreases the chance of a genetic etiology.

• Latent case bias: Inclusion as controls of persons who could never develop the disease even if a gene carrier.

Page 21: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

A Case-Control GWAS of Prostate Cancer*

• Discovery Study– 1854 cases with symptomatic prostate cancer and

diagnosis <60years or positive family history.– 1894 controls with age>50 years and PSA<0.5 ng/ml.– Genotyping of 541,129 SNPs– 11 new SNPs associated (P<E-6)

• Replication Study– 3268 cases/3354 controls– Genotyping of 11 SNPs– 7 SNPs independently associated (P<E-7) *Eeles RA et al. NatGen 2/10/08

Page 22: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Prostate Cancer: 7 Novel SNPs in Discovery and Replication Studies

Discovery Replication SNP OR 95%CI OR 95%CIrs2660753 1.52 1.30-1.77 1.18 1.06-1.31rs9364554 1.28 1.16-1.41 1.17 1.08-1.26rs6465657 1.30 1.19-1.43 1.12 1.05-1.20rs7920517 1.39 1.27-1.53 1.22 1.14-1.31rs10993994 1.62 1.47-1.78 1.25 1.17-1.34rs7931342 0.79 0.72-0.86 0.84 0.79-0.90rs7931342 1.39 1.23-1.57 1.03 0.94-1.14 Eeles RA et al: Nat Gen 2/10/08

Page 23: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Latent Cases in a GWAS of Prostate Cancer*

Cases ControlsDiscovery Study Male Female

Iceland 1890 9312 12060Replications

Netherlands 998 1004 1017Spain 548 742 874Sweden 2893 1781 -US-Baltimore 1545 576 -US-Chicago 665 368 184

US-Nashville 526 613 -US-Rochester 1140 503 -

*Gudmundsson J et al. Nat Gen 2008; 40:281-3

Page 24: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Selection Biases in GWAS: Criteria for Classification

• Membership bias: Membership in a group may imply a degree of health which differs systematically from that of the general population.

• Population Stratification: Genetic differences between cases and controls unrelated to disease but due to sampling from populations of different ancestries.

• Phenotypic variation bias: The use of different definitions of cases or controls between discovery study and subsequent replications.

Page 25: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Wellcome Trust Case-Control (WTCC) Consortium*

Genotyping: 500,000 SNPs (Affymetrix)Cases: 2000 persons from each of 7

diseases: (bipolar disorder,coronary artery disease, Crohn disease,

rheumatoid arthritis, T1DM, T2DM, hypertension)

Controls: 3000 persons without disease 1500 in 1958 British Birth Cohort

1500 UK blood donors *WTCC, Nature 2007; 447:661-678.

Page 26: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Population Stratification*Each population has unique genetic and social

history; ancestral patterns of migration, mating, expansions/bottlenecks, stochastic variation all yield differences in allele frequencies between populations.

Population stratification: cases and controls have different allele frequencies due to diversity in populations of origin and unrelated to outcome, requiring:

1) differences in disease prevalence 2) differences in allele frequencies

*Cardon LR, Palmer LJ, Lancet 2003

Page 27: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Downloaded from: StudentConsult (on 11 May 2008 06:40 PM)

© 2005 Elsevier

Page 28: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Population Stratification and Allelic Association

Index of Indian heritage

Gm3;5,13,14 +

Gm3;5,13,14 -

0 17.8% 19.9%

4 28.3% 28.8%

8 35.9% 39.3%

Full heritage Am. Indian population

Gm3;5,13,14 prevalence: 1%NIDDM prevalence: 40%

Caucasian populationGm3;5,13,14 prevalence:

66%NIDDM prevalence: 15%

Gm3;5,13,14 haplotype

NIDDM + NIDDM -

+ 7.8% 29.0%

- 92.2% 71.0%

OR = 0.27[0.18,0.40]]

Cardon LR and Palmer LJ, Lancet 2003; 361:598-604, after Knowler et al 1988.

Page 29: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Unlinked Genetic Markers in Population Stratification

• Population stratification (or any non-random mating) allows marker-allele frequencies to vary among population segments.

• Disease more prevalent in one subpopulation will be associated with any alleles in high frequency in that subpopulation.

• If population stratification exists, can often be detected by analysis of unlinked marker loci. [Pritchard JD, Rosenberg NA; AJHG 1999; 65:220-228]

.

Page 30: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Adjusting for Population Stratification in a GWAS of T2DM*

• Case-control study of 661 cases of T2DM and 614 controls from France.

• Genotyping assayed 392,935 SNPs• SNP 200kb from lactase gene on 2q21:

– Strong association with T2DM– Strong north-south prevalence gradient in France

• Used 20,323 SNPs not related to T2DM as measure of population stratification.

• After adjustment for stratification, most of the association was removed.

*Sladek R et al. Nature 2007; 445: 881-885.

Page 31: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Phenotypic Variation Bias: Are the case homogeneous?

• GWAS of Atrial Fibrillation*– Sample 1: hospital diagnosis of AF “confirmed by 12-

lead ECG”.– Sample 2: patients with ischemic stroke or TIA,

diagnosis of AF “based on 12-lead ECG.”– Sample 3: patients hospitalized with acute stroke

“diagnosed with AF.”– Sample 4: patients with lone AF of AF plus

hypertension referred to arrythmia service, “AF documented by ECG.”

Gudbjartsson et al, Nature 2007; 448: 353-357

Page 32: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Information Bias: Systematic differences in data collection between cases and controls

• Genotyping quality bias: Lack of genotyping protocol for exclusion of SNPs for quality control criteria or publication of call rate.– Testing for Hardy-Weinberg disequilibrium– Transmission disequilibrium testing:

differential rate of genotyping error leading to distortion of allele frequency in cases/controls

Page 33: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Is DNA Collected and Handled Identically in Cases and Controls?

• T1DM gene association study: cases from GRID Study, controls from 1958 British Birth Cohort Study examining 6322 SNPs.

• Samples from lymphoblastoid cell lines extracted using same protocol in two different laboratories.

• Case and control DNAs randomly ordered with teams masked to case/control status.

• Some extreme associations could not be replicated by second genotyping method.

Clayton DG et, Nat Genet 2005; 37: 1243-46.

Page 34: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Biases in the Analysis and Presentation of Data

Environmental exposure information bias:

Lack of collection or presentation of known environmental causes of the disease or comparisons between cases and controls.

Confounding control bias: Lack of statistical adjustment or stratified analysis in presence of potential confounding.

Page 35: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Characteristics of 109 GWAS: Confounding

• Few comparisons of environmental exposures known to predispose to disease between cases and controls.– Table comparing cases and controls: 36%– Statistical comparison of cases/controls: 3.5%– Statistical adjustment for differences: 16%– Stratified analysis by confounder group: 16%

Page 36: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Distribution of Three Known Risk Factors for Neovascular AMD in a

GWA[DeWan A et al, Science 2006]

CovariateCases

(n = 96)Controls(n = 130)

Male sex (%) 68 33

Age (yrs) 75 74

Smokers (%) 63 26

DeWan A et al, Science 2006; 314:989-992.

Page 37: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Confounding• Confounder: “A factor that distorts the

apparent magnitude of the effect of a study factor on risk. Such a factor is a determinant of the outcome of interest and is unequally distributed among the exposed and the unexposed” (Last, 1983).– Associated with exposure– Independent cause or predictor of

disease– Not an intermediate step in causal

pathway

CE D

E C D

Aschengrau and Seage, Essentials of Epidemiology in Public Health, 2003.

Page 38: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

FTO Variants, Type 2 Diabetes, and Obesity*

Diabetes Association

Cohort OR [ 95% CI ] P

WTCCC phase 1 1.2 [1.16-1.37] 2xE-8

WTCCC phase 2 1.22 [1.12-1.32] 5xE-7

DGI 1.03 [0.91-1.71] 0.25

Frayling, 2007 and Zeggini, 2007

Page 39: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

FTO Variants, Type 2 Diabetes, and Obesity*

BMI Association

TT AT AA

WTCC Cases 30.2 30.5 32.0

WTCC Controls 26.3 26.3 27.1

*Frayling 2007 and Zeggini 2007

Page 40: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

FTO Variants, Type 2 Diabetes, and Obesity*

Diabetes AssociationCohort OR [+/-95%] PWTCCC phase 1 1.27 [1.16-1.37] 2xE-8WTCCC phase 2 1.22 [1.12-1.32] 5xE-7DGI 1.03 [0.91-1.71] 0.25

Diabetes Association Adjusted for BMI

WTCCC phase 2 1.03 [0.96-1.10] 0.44

Frayling TM,et al. Science 2007; 316: 889-894.Zeggini E, et al. Science 2007; 316: 1336-1341.

Page 41: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Dealing with Confounders

• In design– Randomize– Restrict: confine study subjects to those within specified

category of confounder– Match: select cases and controls so confounders equally

distributed• In analysis

– Standardize: for age, gender, time– Stratify: separate sample into subsamples according to

specified criteria – Multivariate analysis: adjust for many confounders

Aschengrau and Seage, Essentials of Epidemiology in Public Health, 2003

Page 42: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Biases in the Analysis and Presentation of Data (Cont.)

• Alpha error control bias: Lack of correction of level of alpha error accepted as significant.

• Data dredging bias: Lack of replication studies testing hypotheses identified in a discovery study.

• The winner’s curse: The overestimation of the effect size in discovery GWAS at the extremes of their range with inability to replicate the odds ratios due to lack of adequate power to identify the true odds ratio of smaller magnitude.

Page 43: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Prostate Cancer: 7 Novel SNPs in Discovery and Replication Studies

Discovery Replication SNP OR 95%CI OR 95%CIrs2660753 1.52 1.30-1.77 1.18 1.06-1.31rs9364554 1.28 1.16-1.41 1.17 1.08-1.26rs6465657 1.30 1.19-1.43 1.12 1.05-1.20rs7920517 1.39 1.27-1.53 1.22 1.14-1.31rs10993994 1.62 1.47-1.78 1.25 1.17-1.34rs7931342 0.79 0.72-0.86 0.84 0.79-0.90rs7931342 1.39 1.23-1.57 1.03 0.94-1.14 Eeles RA et al: Nat Gen 2/10/08

Page 44: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Larson, G. The Complete Far Side. 2003.

Page 45: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Interpretation Biases in Genomic Research*

• Confirmation bias: evaluating evidence that supports one’s preconceptions differently from evidence that challenges these convictions.

• Rescue bias: discounting data by finding selective faults in the experiments

• Mechanism bias: being less skeptical when underlying science furnishes credibility for the data.

*Kaptchuk TJ. BMJ 2003; 326: 1453-5.

Page 46: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI
Page 47: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Information to be Included in Initial Report

• Study information: – Source of cases and controls– Methods used for defining disease or trait– Participation rates and flow chart of selection– Standard “Table 1,” including rates of missing data– Success rate of DNA acquisition, comparability

• Genotyping and quality control procedures• Results

– Analysis methods in sufficient detail to understand and reproduce what was done

– Simple single-locus and multi-marker (haplotype) association analyses

– Significance of any known 'positive controls' Chanock, Manolio et al, Nature 2007; 447: 655-660

Page 48: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Controlling Bias in Genomic Research: Design

• Define population to be studied• Maximize representativeness• Use standard, reproducible methods for

assignment of case/control status• Use incident cases • Select controls eligible to become cases• Estimate and maximize participation rates• Apply standard genotyping QC methods• Replicate positive findings on different

genotyping platform

Page 49: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Controlling Bias in Genomic Research: Analysis

• Describe sources and methods of ascertaining cases and controls

• Compare participants and non-participants• Compare cases and controls• Stratify and adjust for important confounders

(including population stratification)• Stratify and test for important interactions• Report results of genotyping QC• Report results of prior known associations

Page 50: Bias in Studies of the Human Genome Thomas A. Pearson, MD, PhD University of Rochester School of Medicine Visiting Scientist, NHGRI

Larson, G. The Complete Far Side. 2003.