Transcript

The Agilent Technologies

SureSelect™ Platform for Target Enrichment

Focus your next-gen sequencing on DNA that matters

Kimberly Troutman

Field Applications Scientist

January 27th, 2011

Agenda

Introduction: SureSelectTM

2 Exome Approach for Genetic Diseases

1

3 Complex Diseases

Custom Biomarker Discovery and Profiling4

7 New SureSelect Products

5 Targeted RNA Sequencing

6 Kinome Kit

Page 2

Page 3

Target Enrichment: A Highly Enabling Process

What?

• Also referred to as genome partitioning, targeted re-sequencing, DNA capture…

• Captures genomic material of interest for next generation sequencer (i.e. Illumina, SOLiD, 454 etc…)

Why?

• Sequence your regions of interest!

• Enables focus on a subset of the genome

• Saves both time and money for downstream sequencing

• Identify homozygous and heterozygous

variants in targets relative to the reference

genome

gDNA

Enriched

library

Page 4Page 4

Agilent’s SureSelect™ Platform: Two Options

SureSelect Target

Enrichment System*

Developed in collaboration

with the Broad Institute

Dr. Chad Nusbaum et al.

SureSelect

DNA Capture Array

Developed in collaboration

with Cold Spring Harbor

Dr. Greg Hannon et al.

*Flagship Method Released February 2009

Agilent 60-mer Array

244k & 1M features

3 µg gDNA

1-5 µg gDNA

(with WGA)

or

20 µg gDNA (unamplified)

Released July 2009

Illumina GAIIx

Illumina HiSeq

SOLiD 3

SOLiD 4

5500

GS FLX &

GS JR

• Baits

• cRNA probes • Long (120 bases) • Biotin labeled

SureSelect Target Enrichment Kit Choices

Product Target amount (Mb) Reactions/kit Product Definition

Human X-

demo3.05 5 Human X-chr Exons

Human All

Exon v138 5-10,000

Catalog content from CCDS

2008 plus >1000 ncRNA

Human All

Exon Plus

38-50 plus up to 6.8 of

custom content5-10,000

Add custom content to All

Exon catalog content

Human All

Exon v244 5-10,000

CCDS Sept. 2009

Plus additional RefSeq

Human All

Exon 50Mb50 5-10,000

GENCODE content

Most comprehensive coverage

Multiplexable

Kinome 3.2 5-10,000 All kinases

Indexed

custom

content

<0.2

0.2 - 0.49

0.5 - 1.49

1.5 - 2.9

3 - 6.8

10-5,000Custom offering

-Illumina (12 indexes)

-SOLiD (16 barcodes)

SureSelect Kits Multiplexing Capability

Target Enrichment Size

Ranges

Illumina AB SOLiD

GA HiSeq 2000 Octet Quadrant Flow Cell Full Run

<200 Kb targets 12 12 16 16 16 16

200 Kb - 499 Kb targets 12 12 16 16 16 16

500 Kb - 1.49 Mb targets 12 12 5 10 16 16

1.5 Mb - 2.99 Mb targets 12 12 3 7 16 16

3.0 Mb - 6.0 Mb targets 8 12 2 3 16 16

Human All Exon 38 Mb 1 4 0 1 3 7

Human All Exon 50 Mb 1 3 0 0 3 5

XT

Agilent SureSelectXT Kits

gDNA kit + Library Prep kit + SureSelect Reagents

= SureSelectXT Kit

SureSelectXT Kit – Coupled with an optimized gDNA prep and library prep kit, allows the

use one kit for the entire, sample-prep-to-sequencing target enrichment workflow

• Kit composition

• gDNA Isolation – Lysis buffer and enzymes required for isolation

• Library prep – Buffers, reagents, enzymes and indexes needed for prep

• SureSelect Target Enrichment Kit – Hybridization buffers and ”baits”

• All kits are available in the XT format- catalog kits and custom content

• SureSelectXT All Exome & SureSelectXT All Exome Plus

• SureSelectXT Human Kinome & SureSelectXT Human X Chromosome

• SureSelectXT Custom from < 200 Kb to > 6.8 Mb (up to 34 Mb in Spring 2011)

• Illumina GAIIx and HiSeq 2000 (Protocol v1.0 Nov 2010)

• SOLiD 3 / 4 and 5500 (Available soon)

SureSelectXT – complete sample to sequencer

solutions for your target enrichment needsGenomic

DNA prep

Library prep

(GA, SOLiD)

Bioanalyzer,

qPCR quant

Manual Procedure for small

number of samples

Sequencer

Page 11

Agenda

Introduction: SureSelectTM

2 Exome Approach for Genetic Diseases

1

3 Complex Diseases

Custom Biomarker Discovery and Profiling4

7 New SureSelect Products

5 Targeted RNA Sequencing

6 Kinome Kit

Page 12

Exon Capture is a Powerful Tool to Study

Mendelian Diseases

• Mendelian diseases are caused by coding mutations (with some exceptions)

• Exons are only ~1-1.4 % of human genome (30-50Mb)

• Primarily protein coding regions

Advantages:

• Much less sequencing

• ~5% of WGS, so up to 20x more samples

Why coding?

• More interpretable

• Easier to follow up

• Especially adapted to study of Mendelian diseases

• CCDS exons – v1

• CCDS + RefSeq – 38 Mb v2 (Broad)

• GENCODE – 50 Mb (Sanger)

• Includes ncRNA

• All Exons on X chromosomes

• 7674 exons

• 3 Mb

Page 13

Applications to Mendelian disorders

Page 16

… and many more to come

Page 17

SureSelect Human All Exon Kits

All Exon v1 All Exon v2 All Exon50 Mb

CCDS Sept. 2008

CCDS Sept. 2008

+ additional RefSeq

content including

CCDS Sept. 2009

exons

GENCODE and

Sanger (includes

CCDS and Broad

defined v2 content as

well)

CCDS (Nov. 2010) 89.6% 98.2% 99.5%

CNV (Mar. 2010) 23.98% 27.49% 30.62%

Ensembl (Aug. 2010) 79.9% 90.9% 96.2%

miRNA (miRBase 14) 90.0% 90.0% 92.8%

GenBank (6/16/2010) 75.96% 89.07% 90.74%

RefSeq Genes (Nov. 2010) 85.0% 96.9% 99.0%

RefSeq Transcripts

(6/16/2010)88.85% 95.07% 97.50%

Target Size 38Mb 44Mb 50Mb

Developed with Broad Broad Sanger

• Human All Exon kits can be customized (PLUS) with up to 6.8 Mb additional custom content

• Human All Exon kits can be multiplexed on SOLiD4 and HiSeq2000

Human All Exon 50Mb – 2x76 bp, 50-60M HQ Reads

Page 18

76.32%

85.07%

96.65%

87.93%

77.46%

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

% on target +/-

100bp

Uniformity (3/4 mean with upper

tail):

% bases with 1x

coverage

% bases with 10x coverage

% bases with 20x coverage

The most comprehensive Human

All Exon content available

38 Mb design = a subset of 50 Mb

Sequencing capacity:

• 0.5-1 sample / lane GAIIx

• 1-3 samples / lane HiSeq

• 5-10 samples /full slide SOLiD4

Chemistry recommended:

• PE 2x76 bp Illumina v4

• PE 50+25 SOLiD

Multiplexing:

• Illumina

• SOLiD

Comparison of SNP Calls with HapMap

Page 19

99.1% 99.2%98.4% 98.0%

95.7%94.9%

70%

75%

80%

85%

90%

95%

100%

Human All Exon v2 Human All Exon 50Mb

GT is REF GT is variant HOM

GT is variant HET

99.8% 99.7%98.2% 98.1%98.5% 98.3%

99.4% 99.3%

70%

75%

80%

85%

90%

95%

100%

Human All Exon v2 Human All Exon 50Mb

GT is REF GT is variant HOM

GT is variant HET OVERALL

Genotype Concordance vs. HapMapGenotype Sensitivity vs. HapMap

All Exon Plus

Page 20

All Exon Library

+

Your Custom Library

Enter Your Custom Regions in eArray

CCDS exons

>1000 ncRNAs

38 Mb

Your regions of

interest (6.8 Mb)

Is the Human All Exon Kit not hitting all of your regions

of interest?

Human All Exon Plus Performance

Page 21

1 tube capture, 1 lane seq. at 2x76 bp on GAIIx = ~2 Gb

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Exome +

0.87 Mb

Exome +

1.7 Mb

Exome +

3.4 Mb

Exome +

6.8 Mb

Exome

Control

SNP Analysis vs. HapMap

Sensitivity Concordance

0

5000

10000

15000

20000

25000

30000

35000

Exome +

0.87 Mb

Exome +

1.7 Mb

Exome +

3.4 Mb

Exome +

6.8 Mb

Exome

Control

22394

23224

24337

26352

21976

4480

5528

5816 6

182

5173

No. S

NP

s

SNP Analysis vs. dbSNP

Concordant Novel Mismatched

Agenda

Introduction: SureSelectTM

2 Exome Approach for Genetic Diseases

1

3 Complex Diseases

Custom Biomarker Discovery and Profiling4

7 New SureSelect Products

5 Targeted RNA Sequencing

6 Kinome Kit

Page 22

Beyond Mendelian Diseases: Complex diseases

Page 23

Beyond Mendelian Diseases: Complex diseases

Page 24

25bp deletion

7bp deletion

10bp deletion

11bp deletion

Agenda

Introduction: SureSelectTM

2 Exome Approach for Genetic Diseases

1

3 Complex Diseases

Custom Biomarker Discovery and Profiling4

7 New SureSelect Products

5 Targeted RNA Sequencing

6 Kinome Kit

Page 25

Other Applications of Targeted Re-Sequencing

• Capture any custom genomic regions (introns, exons, UTRs, regulatory, etc.)

• Ideal for biomarkers discovery and profiling (e.g. cancer)

• Ideal for custom SNP follow-up

• Ideal for characterization of large sample cohorts

Key enabling features:

• High throughput

• 12 Illumina indexes / up to 96 samples per run

• 16 SOLiD barcodes / up to 128 samples per run

• Only pay what you capture, scalable from 0.2 to 6.9 Mb (sweet spot for 3rd Gen

Seq)

• <0.2 Mb

• 0.2 – 0.5 Mb

• 0.5 – 1.5 Mb

• 1.5 – 3 Mb

• 3 – 6.9 Mb

• Very reproducible, excellent allelic balance for accurate heterozygote calls

• Custom and catalog content (kinome)

• Automation (library prep and capture)

Page 26

Page 27

Target Enrichment Design Application in eArray

• eArray is a tool to design and order custom microarrays, qPCR

primers and SureSelect products (and it is free!!)

• eArray is divided into “Application Spaces”

• Allows for application specific functionality

• Target Enrichment application space features:

• Create custom baits and bait libraries

• Search existing designs/baits

– Catalog and custom

• Upload custom bait designs

• Download design files

• Share designs

• Get quotes

Page 27

Customize your SureSelectTM Kit

Create your own design or add extra custom sequence to

a catalog design – up to 6.8Mb

Customer A

Gene ID 1

Gene ID 2

Gene ID 3

Baits

#1

Baits

#2

Baits

#3

Virtual

bait

library

Bait design

Bait design

Bait design

Library

design

Kit

size

Quote

eArray Webportal

Customer B

DNA bait

library

RNA bait

library

Kit

Assemble kit

Ship Kit to Customer

Order

library

Up to 55,000

unique baits

https://earray.chem.agilent.com

Page 28

Sequence Any Genome- eArray XD

The Power of Smaller SureSelect Panels

Page 30

• Inherited loss-of-function mutations in the tumor suppressor genes BRCA1, BRCA2, and

multiple other genes predispose to high risks of breast and/or ovarian cancer. Cancer-

associated inherited mutations in these genes are collectively quite common, but individually

rare or even private.

• To determine whether massively parallel, “next-generation” sequencing would enable

accurate, thorough, and cost-effective identification of inherited mutations for breast and

ovarian cancer, we developed a genomic assay to capture [with Agilent’s custom SureSelect],

sequence, and detect all mutations in 21 genes, including BRCA1 and BRCA2, with inherited

mutations that predispose to breast or ovarian cancer.

• There were zero false-positive calls of nonsense mutations, frameshift mutations, or genomic

rearrangements for any gene in any test sample.

• This approach enables widespread genetic testing and personalized risk assessment for

breast and ovarian cancer.

Page 31

Page 32

Excellent allelic balance

Deletion up to 19bp

Page 36

Efficient Capture of 5 bp Deletion on Chr X:Menke’s Syndrome

Page 36

hg18_ChrX_77131408_77131467_+ : Wild type Bait Design

CTATTGTTTATCAACCTCATCTTATCTCAGTAGAGGAAATGAAAAAGCAGATTGAAGCT

CTATTGTTTATCAACCTCATCTT-----AGTAGAGGAAATGAAAA

ATTGTTTATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAG

TTGTTTATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAGC

GTTTATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAGCAG

TATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAGCAGATT

ATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAGCAGATTG

ATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAGCAGATTG

ATCAACCTCATCTT-----AGTAGAGGAAATGAAAAAGCAGATTG

CAACCTCATCTT-----AGTAGAGGAAATGAAAAAGCAGATTGAA

CCTCATCTT-----AGTAGAGGAAATGAAAAAGCAGATTGAAGCT

SureSelect™ Target Enrichment Kit Efficiently Captures 5 bp MutantReadout on Illumina GA

Agenda

Introduction: SureSelectTM

2 Exome Approach for Genetic Diseases

1

3 Complex Diseases

Custom Biomarker Discovery and Profiling4

7 New SureSelect Products

5 Targeted RNA Sequencing

6 Kinome Kit

Page 37

SureSelectTM RNA Target Enrichment

FIRST IN CLASS

• First RNA Capture product on the market

Custom and catalog kits

• Design kits from 200Kb to 3.4Mb using RNA Target

Enrichment space on eArray portal (PN G7581-G7585)

• RNA Capture Kinome Kit (catalog) containing same content as

current SureSelect Kit (PN G7580)

SureSelect RNA Enrichment Protocol

Start with 0.1-0.5ug RNA

• Similar process to DNA Target

enrichment

• Except that it is a cDNA NGS

library

• Protocol time ~ 3-4 days

• Protocols available

• Illumina and SOLiD

• Individual or multiplexed

samples

Agenda

Introduction: SureSelectTM

2 Exome Approach for Genetic Diseases

1

3 Complex Diseases

Custom Biomarker Discovery and Profiling4

7 New SureSelect Products

5 Targeted RNA Sequencing

6 Kinome Kit

Page 45

SureSelect “kinome” – Discovery and profiling of

biomarkers related to disease and/or drug response

Definition of SureSelect Human Kinome Kit: 3.2Mb (incl. UTRs)

(Original content defined by Prof. René Bernards – NKI)

• 518 putative kinases

• 12 PI3K domain-containing genes

• 13 diglyceride kinases

• 6 PI3K regulatory components

• 9 inositol polyphosphate Kinases

• 9 PIP4/PIP5 Kinases

• 28 genes frequently mutated in human cancer

• 19 genes specifically known to be mutated in breast cancer

• 612 genes total

G. Manning et al Science 298 1912 (2002)

Slide courtesy of Rene Bernards

Kinome Kit Performance –

3-5 samples per GAIIx lane / SOLID quad

0%

20%

40%

60%

80%

100%

Kinome

Index 1

Kinome

Index 2

Kinome

Index 3

Kinome

Index 4

Kinome

Index 5

Reproducible Performance Across Indexes

% on target +/- 200bp

% Bases 1X Coverage

% Bases 10X Coverage

% Bases 20X Coverage

1.15

0.84

1.00

0.98

1.15

Even Index Representation Across Single Lane

Kinome Index 1

Kinome Index 2

Kinome Index 3

Kinome Index 4

Kinome Index 5

Uniform Read Depth Distribution

Page 47

Agenda

Introduction: SureSelectTM

2 Exome Approach for Genetic Diseases

1

3 Complex Diseases

Custom Biomarker Discovery and Profiling4

7 New SureSelect Products

5 Targeted RNA Sequencing

6 Kinome Kit

Page 49

SureSelect + 454

• SureSelect support for both 454 FLX and GS Junior

sequencers

• Simplified protocol with a rapid library protocol

(<3hrs), the shortest in-solution capture protocol

• Only 500 ng of starting material required

• Full SureSelect product line available, custom bait

libraries from <200 kb - 6.8 Mb capture size or catalog

kit ( Human All Exon, Kinome and X chromosome)

• Allows for detection of mutations, SNPs, indels, CNVs

and fusions/translocations

454 FLX SureSelect Custom Capture: 0.5 Mb

DNA NA10831

Bait and Pond 00.5Mb_B

Avg read length: bases 360.3

Total number of bases mapped: bases 67,157,190

Percentage reads in targeted regions

:57.07%

Percentage reads in regions +/-

300bp:59.35%

Average Read Depth: fold 52.7

Percentage of targeted bases covere

d by...

...at least 1 read: 99.34%

...at least 5 reads: 98.99%

...at least 10 reads: 98.12%

...at least 20 reads: 95.42%

...at least 30 reads: 88.83%

...at least 40 reads: 76.10%

1/4 PicoTiterPlate run, 67 Gb of sequence

>95% of capture sequenced at 20X depth or greater

454 FLX SureSelect Custom Capture: 0.5 Mb

SNP Detection

343 HapMap SNPs were assayed in replicate samples of NA10831

98% 95% 97%100% 100%100%

Cancer Research – Gene Fusions

Problem: Genomic rearrangement in tyrosine

kinase genes

• Can lead to deregulation of cellular signaling

and cancer

• Identification of novel TK fusions is

laborious

• TKs are attractive therapeutic targets

Solution: SureSelect Custom Capture (908 Kb)

• based on known cancer-derived TK fusions

• Designed baits to a conserved GXGXXG

motif in 90 TKs + ATK and BRAF

• Regions extended to include preceding

exons/introns

• 454 long-read sequencing

SureSelect Custom Capture (908 Kb)

SureSelectXT Mouse All Exon Kit

• Agilent SureSelectXT Mouse All Exon Kit

• For SOLiD, Illumina & 454 platforms

• Available in 5 to 10,000 reactions

• Designed against UCSC mm9 / NCBI build 37 (July 2007)

Exon definition derived from Ensembl + RefSeq

• Complete Mouse exome coverage

• 49.6 Mb capture

• 221,784 exons and 24,306 genes

• Excellent coverage uniformity, on-target specificity and SNP

detection and accuracy

SureSelectXT Mouse All Exon Kit:

Illumina GAIIx, single lane 2x76 PE, 5.2 Gb

• C3H mouse genomic DNA, 49.6 Mb Mouse All Exon Capture

• On-target reads= 69%

• 98% of Bases covered at 1x or greater and the average read depth was 54X

• 84 % of the targeted bases were sequenced at a depth of 20X or greater, enabling high-

confidence SNP calling

SureSelect Mouse All Exon Kit SNP Sensitivity

& Concordance

A) Sensitivity of SNP detection relative to the Perlegen Mouse SNP dataset was very high with the SureSelect XT

Mouse All Exon Kit / Illumina GAIIx platform, with 99 percent of reference SNPs detected for the C3H and DBA

samples. Variant SNPs were also detected at high rates (99 percent) for both the C3H and DBA samples.

B) Of the SNPs detected, concordance with the Perlegen Mouse SNP data set for both the C3H ad DBA samples

was 98 percent for the variants and 95 percent overall.

99% 99% 95%98%

SureSelectXT Catalog Exome Kits

• Coming soon in 2011 - SureSelectXT Exome Kits for…

• Bovine, Canine, Xenopus, and Zebrafish

Acknowledgements

• Collaborators:

• Broad Institute

• Chad Nussbaum et. al

• Stacey Gabriel et. al

• Sheila Fisher et. al

• Sanger Institute

• Daniel Turner et.al.,

• NKI

• Rene Bernards

• Ian Majewski

• RIKEN Institute

• Yoichi Gondo

• All our early access collaborators

(over 20 institutions worldwide)


Top Related