ngs applications i (ueb-uat bioinformatics course - session 2.1.2 - vhir, barcelona)
DESCRIPTION
Course: Bioinformatics for Biomedical Research (2014). Session: 2.1.2- Next Generation Sequencing. Technologies and Applications. Part II: NGS Applications I. Statistics and Bioinformatisc Unit (UEB) & High Technology Unit (UAT) from Vall d'Hebron Research Institute (www.vhir.org), Barcelona.TRANSCRIPT
![Page 1: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)](https://reader031.vdocuments.site/reader031/viewer/2022020207/554ea32fb4c905977e8b476e/html5/thumbnails/1.jpg)
1
Vall d’Hebron Institut de Recerca (VHIR)
Rosa PrietoHead of the High Tech Unit
15/05/2014
Institut d’Investigació Sanitària acreditat per l’Instituto de Salud Carlos III (ISCIII)
NEXT GENERATION SEQUENCING TECHNOLOGIES AND APPLICATIONS
CURS OF BIOINFORMATICS FOR BIOMEDICAL RESEARCH
![Page 2: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)](https://reader031.vdocuments.site/reader031/viewer/2022020207/554ea32fb4c905977e8b476e/html5/thumbnails/2.jpg)
2
INTRODUCTION TO NGS1
2
3
4
Index
NGS TECHNOLOGY OVERVIEW
NGS APPLICATIONS OVERVIEW
CURS OF BIOINFORMATICS FOR BIOMEDICAL RESEARCH
WHAT IS NEXT IN SEQUENCING TECHNOLOGIES?
![Page 3: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)](https://reader031.vdocuments.site/reader031/viewer/2022020207/554ea32fb4c905977e8b476e/html5/thumbnails/3.jpg)
NGS applications
-Amplicon sequencing-Targeted DNA resequencing-Exome sequencing-Whole genome sequencing
-Metagenomics
-RNA sequencing-Targeted RNA resequencing
-Epigenomics-Sequencing of free DNA-RNA (plasma/serum)
![Page 4: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)](https://reader031.vdocuments.site/reader031/viewer/2022020207/554ea32fb4c905977e8b476e/html5/thumbnails/4.jpg)
Considerations to use NGS
-What do I want to sequence? Whole genome, exome, several genes, metagenome,epigenome, RNAseq.....
-How many samples?
-Length of read required?
-Quality and quantity of starting material?
-Size of nucleic acids to sequence
-Amount of sequence needed: coverage
(Depth of) Coverage: how many times a particular base is sequenced.30x = each base has been read by 30 sequences (in average)
Depth of coverage = (nº reads * read length / size of target genome)
(Breadth of) Coverage: amount of the target sequence that has been covered (with agiven coverage)
![Page 5: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)](https://reader031.vdocuments.site/reader031/viewer/2022020207/554ea32fb4c905977e8b476e/html5/thumbnails/5.jpg)
Considerations to use NGS
Which depth of coverage do I need?It is an empiric value that depends on the objective of the study and its particular conditions (consensus values may exist)
![Page 6: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)](https://reader031.vdocuments.site/reader031/viewer/2022020207/554ea32fb4c905977e8b476e/html5/thumbnails/6.jpg)
Amplicon sequencing: viral quasispecies
In an infected patient the population of viruses presents highrates of mutation and replication. It is a complex mixing ofdifferent mutants.
Goal of the study:
Detection and quantification of mutations or combination ofmutations that could confer resistance to viral inhibitors in
samples from infected patients.
Special interest in mutations at a low rate (minor variants).
HCV, HBV, HIV virus populations have special characteristics:
![Page 7: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)](https://reader031.vdocuments.site/reader031/viewer/2022020207/554ea32fb4c905977e8b476e/html5/thumbnails/7.jpg)
Amplicon sequencing: viral quasispecies
Minor variants often play an important role in the development of resistance to antiviral treatments in patients, even if they are present in a very low percentage in the population.
Minor variants may not be detected by classical sequencing methods You obtain hundreds of sequences with much effort and high cost
NGS allows to detect efficiently variants at a very low rate You obtain thousands of sequences with relatively low cost
WHY IS NGS APPROPIATED FOR THIS KIND OF STUDY?
454 technology is the most appropiated method in this particular case (longsequences are achieved)
![Page 8: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)](https://reader031.vdocuments.site/reader031/viewer/2022020207/554ea32fb4c905977e8b476e/html5/thumbnails/8.jpg)
Targeted sequencing using gene panels
Array-based capture system
Liquid capture system
![Page 9: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)](https://reader031.vdocuments.site/reader031/viewer/2022020207/554ea32fb4c905977e8b476e/html5/thumbnails/9.jpg)
Targeted sequencing using gene panels
Illumina
Ion Torrent
![Page 10: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)](https://reader031.vdocuments.site/reader031/viewer/2022020207/554ea32fb4c905977e8b476e/html5/thumbnails/10.jpg)
Considerations that affect capture efficiency
-Quality and quantity of input DNA-Repeat elements, tandem repeats and pseudogenes: uneven distribution of coverage-Extreme GC content: 5’UTR, first exons of genes, promoter regions-Library insert length and its distribution:
•Different capture platforms recommend different sets of standard practices forsample library preparation.•.As a result of these underlying chemistries, each platform has its own range ofrecommended fragment sizes. Agilent insert size ranges from 100 to 300bp,Nimblegen ranges from 150 to 250bp and TruSeq has the broadest range of 300to 500bp.
-Consistent laboratory procedures.
![Page 11: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)](https://reader031.vdocuments.site/reader031/viewer/2022020207/554ea32fb4c905977e8b476e/html5/thumbnails/11.jpg)
Sequence capture for cancer genomics
![Page 12: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)](https://reader031.vdocuments.site/reader031/viewer/2022020207/554ea32fb4c905977e8b476e/html5/thumbnails/12.jpg)
Exome vs. whole genome sequencing
PROS:• Enabling technologies: NGS machines, open-source algorithms,
capture reagents, lowering cost, big sample collections• Exomes are more cost effective (less sequencing for the same
coverage): human genome 3,2 Gb vs. human exome aprox. 50 Mb (1-2% of the genome)
• Simplified bioinformatics analysis compared to whole genomes
CHALLENGES:• Still can’t interpret many Mendelian disorders• Rare variants need large samples sizes• Exome might miss regions of interest (e.g. novel non-coding genes)• Exome reagents do not capture all exons• Sometimes unsuccessful to interpret clinical data
Shendure, Genome Biol 2011
![Page 13: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)](https://reader031.vdocuments.site/reader031/viewer/2022020207/554ea32fb4c905977e8b476e/html5/thumbnails/13.jpg)
( )
/emPCR
Exome sequencing workflow
![Page 14: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)](https://reader031.vdocuments.site/reader031/viewer/2022020207/554ea32fb4c905977e8b476e/html5/thumbnails/14.jpg)
Illumina exome sequencing
Kits
Sequencers
-Nimblegen EZ capture-Agilent SureSelect-Raindance.......
![Page 15: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)](https://reader031.vdocuments.site/reader031/viewer/2022020207/554ea32fb4c905977e8b476e/html5/thumbnails/15.jpg)
Ion exome sequencing
![Page 16: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)](https://reader031.vdocuments.site/reader031/viewer/2022020207/554ea32fb4c905977e8b476e/html5/thumbnails/16.jpg)
De novo sequencing
Resequencing
Whole genome sequencing
http://www.ncbi.nlm.nih.gov/projects/WGS/WGSprojectlist.cgi
![Page 17: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)](https://reader031.vdocuments.site/reader031/viewer/2022020207/554ea32fb4c905977e8b476e/html5/thumbnails/17.jpg)
Whole genome sequencing
Sequenced reads
Contigs
Scaffolds
Mapped Scaffolds
Genome map
Long reads (454, PacBio, PE Illumina reads)
Shot gun
![Page 18: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)](https://reader031.vdocuments.site/reader031/viewer/2022020207/554ea32fb4c905977e8b476e/html5/thumbnails/18.jpg)
18
Secuenciación de la cepa bacteriana E. coli O104:H4 con GS Junior, MiSeq, PGM.
1. Creación de un ensamblaje de referencia (Roche GS FLX+ shotgun + 8 Kb PE, coverage 32x). Contiene 1 cromosoma (5.3 kb) y 2 plásmidos. Quedan 153 gaps correspondientes a regiones repetitivas sin resolver.
2. Secuenciación de la misma cepa usando:• 2 runs del 454 GS Junior• 2 chips 316 del Ion Torrent PGM• 1 run del MiSeq (2x150 bases)
Performance comparison of benchtop high-troughput sequencing platforms.Nat. Biotechn. 30 (5): 434-441 (2012)
Whole genome sequencing
![Page 19: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)](https://reader031.vdocuments.site/reader031/viewer/2022020207/554ea32fb4c905977e8b476e/html5/thumbnails/19.jpg)
19
Conclusions: “One important conclusion from this evaluation is that saying that one has “sequenced a bacterial genome” means different things on different benchtop sequencing
platforms”
MiSeq GS Junior IonTorrent
Throughput/run The highest The lowest The fastest
Errors The lowest Intermediate(indels) Many, specially in homopolymers
Read length Intermediate (2x150bp)
The longest (520 bp) The shortest (100bp)
Run time The longest (27 hr) Intermediate (9 hr) The shortest (3 hr)
Price per Mb The cheapest The most expensive Intermediate
Other considerations Unfillable gaps Errors in homopolymers The worstest performance
Performance comparison of benchtop high-troughput sequencing platforms.Nat. Biotechn. 30 (5): 434-441 (2012)
Whole genome sequencing
![Page 20: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)](https://reader031.vdocuments.site/reader031/viewer/2022020207/554ea32fb4c905977e8b476e/html5/thumbnails/20.jpg)
20
• La pequeña fracción del genoma con variaciones entre los individuos puede explicar diferencias en la susceptibilidad a una
enfermedad, en la respuesta a fármacos o en la reacción a factores ambientales. El “Proyecto de los 1000 genomas” tratará
de establecer un mapa del genoma humano que incluya la descripción de la mayor cantidad posible de variaciones en el
mismo, mejorando de forma espectacular la información obtenida con el proyecto HapMap.
• El proyecto se realiza con el soporte principal de tres instituciones: el Wellcome Trust Sanger Institute (Hinxton, Inglaterra),
el Beijing Genomics Institute (Shenzen, China) y el National Human Genome Research Institute, que forma parte del NIH
(National Institutes of Health, USA).
1000 Genomes Project
![Page 21: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)](https://reader031.vdocuments.site/reader031/viewer/2022020207/554ea32fb4c905977e8b476e/html5/thumbnails/21.jpg)
21
Methods:
1-Low coverage (5x) sequencing: SOLiD+Illumina
2-Whole exome sequencing (80× average coverage across a consensus target of 24 Mb spanning more than 15,000 genes)):SeqCap EZHuman Exome Library, Nimblegen, and SureSelect All Exon V2 Target Enrichment kit from Agilent.
3-SNP genotyping: Initially all samples were typed using a Sequenom MassArray SNP Genotyping panel of 23 SNPs and onegender determining assay to establish a genetic fingerprint. After gender concordance was verified the samples were placed on 96well plates using the llumina HumanOmni2.5OQuad v1.0 B SNP array.
1000 Genomes Project
![Page 22: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)](https://reader031.vdocuments.site/reader031/viewer/2022020207/554ea32fb4c905977e8b476e/html5/thumbnails/22.jpg)
22
El proyecto publicará el genotipo de los voluntarios,junto con información detallada de su fenotipo:registros médicos, varios análisis, imágenes RM, etc.Toda la información estará disponible para cualquieraen Internet, para que investigadores puedan probarvarias hipótesis acerca de las relaciones entre elgenotipo, el ambiente y el fenotipo.
Personal Genome Project
![Page 23: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)](https://reader031.vdocuments.site/reader031/viewer/2022020207/554ea32fb4c905977e8b476e/html5/thumbnails/23.jpg)
23
ClinVar
MedGen to research the phenotype
http://www.ncbi.nlm.nih.gov/medgen/
GTR (Genetic Testing Registry) to choose appropriate tests
http://www.ncbi.nlm.nih.gov/gtr/
ClinVar to research variant pathogenicity
http://www.ncbi.nlm.nih.gov/clinvar/
NCBI’s Resources for Phenotype (MedGen),Tests (GTR) and Variation (ClinVar)
![Page 24: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)](https://reader031.vdocuments.site/reader031/viewer/2022020207/554ea32fb4c905977e8b476e/html5/thumbnails/24.jpg)
24
NCBI’s Resources for Phenotype (MedGen),Tests (GTR) and Variation (ClinVar)
Patient showing signs compatible with Marfan syndrome:
![Page 25: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)](https://reader031.vdocuments.site/reader031/viewer/2022020207/554ea32fb4c905977e8b476e/html5/thumbnails/25.jpg)
25
NCBI’s Resources for Phenotype (MedGen),Tests (GTR) and Variation (ClinVar)
![Page 26: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)](https://reader031.vdocuments.site/reader031/viewer/2022020207/554ea32fb4c905977e8b476e/html5/thumbnails/26.jpg)
26
List of tests for Marfan syndrome (panels included)
![Page 27: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)](https://reader031.vdocuments.site/reader031/viewer/2022020207/554ea32fb4c905977e8b476e/html5/thumbnails/27.jpg)
27
NCBI’s Resources for Phenotype (MedGen),Tests (GTR) and Variation (ClinVar)
![Page 28: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)](https://reader031.vdocuments.site/reader031/viewer/2022020207/554ea32fb4c905977e8b476e/html5/thumbnails/28.jpg)
28
![Page 29: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)](https://reader031.vdocuments.site/reader031/viewer/2022020207/554ea32fb4c905977e8b476e/html5/thumbnails/29.jpg)
29
NCBI’s Resources for Phenotype (MedGen),Tests (GTR) and Variation (ClinVar)
![Page 30: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)](https://reader031.vdocuments.site/reader031/viewer/2022020207/554ea32fb4c905977e8b476e/html5/thumbnails/30.jpg)
30
Searching ClinVar
NM_000138.4:c.4786C>TFBN1:c.4786C>Tc.4786C>TArg1596TerR1596*
![Page 31: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)](https://reader031.vdocuments.site/reader031/viewer/2022020207/554ea32fb4c905977e8b476e/html5/thumbnails/31.jpg)
31
Allele summary• Gene• Variant type• Genomic location• HGVS expressions*• Molecular consequence*• Links*• Frequency*
Phenotype summary• Names• Links*• Age of onset *• Prevalence *
Interpretation• Significance• Review status *• Accession.version *
* May be provided by NCBI
ClinVar detailed display
![Page 32: NGS Applications I (UEB-UAT Bioinformatics Course - Session 2.1.2 - VHIR, Barcelona)](https://reader031.vdocuments.site/reader031/viewer/2022020207/554ea32fb4c905977e8b476e/html5/thumbnails/32.jpg)
32
ClinVar detailed display