candidate gene resource steering committee meeting july 25, 2006
DESCRIPTION
Candidate Gene Resource Steering Committee Meeting July 25, 2006. Goals for Today. Strengthen relationships among CARE investigators Define pilot project (phenotypes & SNPs) Establish principles of data release Discuss genotyping study design Select phenotypes to be analyzed. - PowerPoint PPT PresentationTRANSCRIPT
Candidate Gene ResourceSteering Committee MeetingJuly 25, 2006
Goals for Today• Strengthen relationships among CARE investigators
• Define pilot project (phenotypes & SNPs)
• Establish principles of data release
• Discuss genotyping study design
• Select phenotypes to be analyzed
CARE Governance
• Steering committee– Representative of each CARE organization– Subcommittees : Data Release,
Phenotypes, Study Design, Informatics, SNP Selection, DNA/Genotyping
• NHLBI staff
• NHLBI appointed oversight committee
CARE : timeline
• RFP released March 2005
• Response submitted July 15, 2005
• Awarded April 1, 2006
• Four year award– Y1: Create DNA and phenotype database– Y2: Genotyping– Y3 / 4: Joint analysis and data distribution
Resources Provided by NHLBI
• $18.3M over 4 years to create a resource to relate genotype-phenotype across cohorts:– Create a consortium among CARE cohorts– Database DNA and phenotypes– Genotype a common set of SNPs across cohorts– Create software tools to enable joint analysis– Data distribution as per CARE data release policy– Project management and coordination
-PM hired : Deb Farlow
Areas for Discussion Today
• Data Release
• Study Design
• Phenotypes
NHLBI
Current state of genotyping technology
Presentation of informatics tools
Data release
• Data release policy to be established by CARE steering committee with NHLBI and local IRB’s
• Broad proposed secure, HIPAA compliant web architecture to implement this policy and to enable access-controlled environment for data sharing and analysis
Areas for Discussion Today
• Data Release
• Study Design
• Phenotypes
NHLBI
Current state of genotyping technology
Presentation of informatics tools
Original CARE Study Design
• Candidate Gene Study– 50,000 samples– average 10 SNPs/gene x 1700 genes = 17,000 SNPs– Requirement: $0.01 /genotype (fully loaded)
• Whole Genome Association Study– 500 cases / 1,000 controls– At least 300,000 SNPs genome wide
Candidate gene study
• Targeted genotyping technology has remained stable : same price and throughput as in approved proposal
• Key issue: criteria for selecting 17,000 candidate gene-based SNPs– biological hypotheses
Developments since RFP
• Whole genome scans promise new hypotheses for candidate genes
• Evaluation of coverage / performance of whole genome arrays
• Price for whole genome genotyping technology has improved
Whole genome scanning
• SHARE will genotype 15,000 people from NHLBI cohorts (FHS and TBA)
• RFA for 4-5 whole genome scans• GAIN, WTCCC, etc, etc• Implication: hypotheses that could be
confirmed and extended by CARE• Challenge: timing doesn’t synch up well
with original CARE timeline
Developments since RFP
• Whole genome scans promise new hypotheses for candidate genes
• Evaluation of coverage / performance of whole genome arrays
• Price for whole genome genotyping technology has improved
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Coverage
Do they work?
SamplesAverage call
rateConcordance with
Hap MapTrio
concordanceAffymetrix 500K
(Broad) 1200 99.10%48 CEU samples,
99.10% 60 trios, 99,9%Illumina 317K
(CIDR*) 1400 99.80%8 CEU samples,
99.85% 10 trios, 99.85%
* from http://www.cidr.jhmi.edu/human_gwa.html
Do They Work at High Scale?Recent Call Rate Data
(at Broad)
Product Chips Call Rate
Affy 500K 12,000 98.7%ILMN 317K 250 99.2%
In-Process QC test
HapMap sample vs Hap Map
CONCORDANCE (CNTRL VS HapMap, n=42)
97.50%
98.00%
98.50%
99.00%
99.50%
100.00%
0 5 10 15 20 25 30 35 40 45
Avg=99.62%7,947,748 comparisons
QC statistics: MS andT2D Scans
# % of Total # % of Total # % of Total # % of TotalSamples attempted 1530 100% 1558 100% 1117 100% 867 1%Pass DM (0.26) >=85% 1474 96% 1476 97% 1040 93% 817 94%Pass BRLMM >=95% 1438 94% 1428 93% 1008 90% 792 91%
Avg call rate passing samples 99.10% 99.00% 99.00% 98.70%
# Passing SNPs in passing samples 253,172 97% 230,816 97% 251,248 96% 228,972 96.10%
T2D ScanNsp StyNsp
MS ScanSty
DM vs. BRLMM 2500 chips
<5% of chips fail
Genotyping Costs per Sample
$0
$200
$400
$600
$800
$1,000
$1,200
$1,400
$1,600
$1,800
Jul-05 Oct-05 Jan-06 Apr-06 Aug-06 Nov-06 Feb-07 Jun-07
Ch
ip c
ost
per
sam
ple
Affy 500KILMN 317KILMN 550KILMN 650YMIP (20K)
WGAS: Then and Now
Original Plan
Product: Affymetrix 500KTotal cost per sample: $1600 (chip+reagents+equipment+labor+IDC)
Study Design: 500 cases / 1,000 controlsBudget=$2,400,000
WGAS: Then and Now
Now possible
Product: Affymetrix 500KTotal cost per sample: $530 (chip+reagents+equipment+labor+IDC)
Study Design: 4,500 samplesBudget=$2,400,000
WGAS: Then and Now
January 2007
Product: Affymetrix 500KTotal cost per sample: $410 (chip+reagents+equipment+labor+IDC)
Study Design: 5,800 samplesBudget=$2,400,000
In Summary
SNPs Samples Cost
7/15/05 500,000 1,500 $2.4M 17,000 50,000 $8.5M
7/25/06 500,000 4,500 $2.4M 17,000 50,000 $8.5M
1/07 500,000 5,800 $2.4M` 17,000 50,000 $8.5M
Conclusions: genotyping
• Targeted genotyping (custom set of candidate genes) stable @ $0.01 / gt
• Timing of candidate gene selection
• Improved cost and performance of whole genome arrays @ $0.001 / gt
Areas for Discussion Today
• Data Release
• Study Design
• Phenotypes
NHLBI
Current state of genotyping technology
Presentation of informatics tools
High Level Workflow – for CaRE
Upload Samples, Peds, Individuals,
Phenotypes
Create Experiments(Samples x Features)
Summarize/FilterPLINK
Data VaultQC/Curate Results
Design and Execute
Experiments
ProjectDB
LIMS DBs
BSP DB
Association & Statistics Viewers
Cohort’s CustomAlgorithms, Viewers
Web
Ser
vice
s
Data Compile
FeatureDB
Analysis: Gene Pattern + CaRE analysis tools
Production:BSP/GAP + CaRE enhancements
Designing a Pilot
• A trial run for DNA quality, genotyping, phenotype and joint analysis, and publication
• Scale and content of pilot to be refined, topic for today’s discussion sessions
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this p icture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
A R EA R EQuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Our shared aspiration: the greatest genetic epidemiology experiment to date
CCQuickTime™ and a
TIFF (Uncompressed) decompressorare needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
CSSCD
Technological Advance
Current 500K assay New 500K assay
DNA DNA
How?
Smaller format
BRLMM
Sequence Variability(DNA Analysis)
A/A B/BA/B
Mismatch probes not needed
Fewer probes needed
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Single format
No drop in Het Calls
Mendel Errors Per PlateAccuracy 99.4%
Sty/Nsp : one family 25,000 errors
Coverage of Common Variants by Whole-genome Products
Tag SNPs
Affymetrix Mapping 500K GeneChip
Illumina HumanHap300 BeadChip
Coverage Mostly Provided by Pairwise Correlations
A
A
A
T
T
T
G
G
G
T
T
T
G
G
T
G
G
G
A
A
C
A
A
C
T
T
C
T
T
C
T
T
G
T
T
G
G
G
C
C
C
C
G
G
T
T
G
G
G
G
T
T
G
G
C
C
C
C
T
T
C
C
C
C
G
G
A
A
A
A
C
C
A
A
A
A
T
T
G
G
C
C
C
C
G
G
C
C
C
C
G
G
T
T
G
G
Specified Multimarker Tests Improve Effective Coverage
A
A
A
T
T
T
G
G
T
G
G
G
A
A
C
A
A
C
G
G
C
C
C
C
G
G
T
T
G
G
G
G
T
T
G
G
C
C
C
C
T
T
G
G
T
T
G
G
C C
Coverage of the genomeYRI Coverage
0%
20%
40%
60%
80%
100%
Affy100k Affy500k Ilmn300k Ilmn550k
Array
Fra
cti
on
co
mm
on
SN
Ps
ca
ptu
red
at
r2 o
f 0
.8 Single markers2-marker predictors
CEU Coverage
0%
20%
40%
60%
80%
100%
Affy100k Affy500k Ilmn300k Ilmn550k
Array
Fra
cti
on
co
mm
on
SN
Ps
ca
ptu
red
at
r2 o
f 0
.8 Single markers2-marker predictors
Other recent developments
• Whole genome scan planned in 9,000 FHS participants (SHARE)
• Other whole genome scans will be funded (recent NHLBI RFA)