computational methods to study sequencing data

19
Computational Methods to study Sequencing data -Meenakshi Sharma

Upload: saber

Post on 07-Jan-2016

39 views

Category:

Documents


3 download

DESCRIPTION

Computational Methods to study Sequencing data. -Meenakshi Sharma. Outline. Bioinformatics Genomics Motivation Challenges Next-Generation-Sequencing Pipeline Sequencing Mapping Assembly Blast. Introduction. Biology Computer Science Data Mining Statistics Applied Mathematics - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Computational Methods to  study Sequencing data

Computational Methods to study Sequencing data

-Meenakshi Sharma

Page 2: Computational Methods to  study Sequencing data

2

Outline

• Bioinformatics• Genomics• Motivation• Challenges• Next-Generation-Sequencing Pipeline– Sequencing– Mapping– Assembly– Blast

Page 3: Computational Methods to  study Sequencing data

3

Introduction

• Biology• Computer Science• Data Mining• Statistics• Applied Mathematics• Applied Chemistry• Applied Physics

Applied Sciences

Computer ScienceBiology

Bioinformatics

Page 4: Computational Methods to  study Sequencing data

4

Definition

• Bioinformatics definition by bioinformatics definition Committee, National Institute of Mental Health released on July 17, 2000

“Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.”

Page 5: Computational Methods to  study Sequencing data

5

Genomics

• Determine the complete DNA sequence for all genetic material contained in an organism

• Analysis and comparison of entire genome of a single or multiple species

• Genome: set of all genes possessed by an organism

Page 6: Computational Methods to  study Sequencing data

6

Genome

Page 7: Computational Methods to  study Sequencing data

7

Motivation

• Gene and genome organization• Study protein structure and functions• Study metabolic pathways• Study ecology and environment• Find potential pathogen

Page 8: Computational Methods to  study Sequencing data

8

Challenges

Page 9: Computational Methods to  study Sequencing data

9

Challenges

Page 10: Computational Methods to  study Sequencing data

10

Challenges

• Knowledge acquisition and knowledge management • Methods for Information and Knowledge Processing – Information retrieval – Statistical data analysis – High-performance and large-scale computing – Applications of new devices and emerging hardware

technologies– Visualization of data and knowledge

• Legal issues, policy issues, history, ethics

Page 11: Computational Methods to  study Sequencing data

11

Next-Generation-Sequencing Pipeline

SequencingSample PreparationOutput: Reads

Quality AnalysisStatisticsOutput: Quality plots

AssemblyOutput: Contigs

MappingOutput: Coverage

BlastOutput: List of organisms matched

Page 12: Computational Methods to  study Sequencing data

12

Healthy Tissue

Infected Tissue

Library Preparation

Illumina Sequencer

Reads fromHealthy Sample

Reads from Infected Sample

ATGCGACTCACCATGGCGACTAGGGCAATTATGTAG

ATGGGTGAATTCATGCGGACTTCGCGTATGATCCGA

Sequencing

Page 13: Computational Methods to  study Sequencing data

13

ATGATGATGATGATGCGACTCTACCGGCGTANC_000018

ATGATGATGATGATACTTCGCGTTCTCGCGTA

NC_000018

ATGCGACTCATGCGACTC

ATGCGACTC

ATGATGATGATGATGCGACTCTACCGGCGTA

000000000000000001

0

0000000 2 2 1 5 0 0000000000 3 … 0000000 10 20 12 45 10 0000000000 10 …

ATGCGACTCACCATGGCGACTAGGGCAATTATGTAG

ATGGGTGAATTCATGCGGACTTCGCGTATGATCCGA

Reads fromHealthy Sample

Reads from Infected Sample

Mapping

Page 14: Computational Methods to  study Sequencing data

14

Comparing coverages in 2 samplesHealthy Tissue

Infected Tissue

Coverage Value

Page 15: Computational Methods to  study Sequencing data

15

ATGCGA TGCGAG TGCGAT TGCGAG

ATGAAA TGAAAA TGAAAA GAAATA

ATGCGACTCACCATGGCGACTAGGGCAATTATGTAG…

ATGGGTTTATTCATGTCGACTTGTCAGATGATCTAA…

ATGCGAACCATGACTAGATTATGTTTCGCGAACTCCCTATCGAGATTATGTTTCGCGAATGTTTCGCGAGGTGT…

ATGGGTATTCATGTCTTTGTATGATCTAATGGGTAATGGTGTGTATGATCTA…

Assembly

Page 16: Computational Methods to  study Sequencing data

16

ATGCGAACCATGACTAGATTATGTTTCGCGAACTCCCTATCGAGATTATGTTTCGCGAATGTTTCGCGAGGTGT…

ATGGGTATTCATGTCTTTGTATGATCTAATGGGTAATGGTGTGTATGATCTA…

ATGCGAACCATG| papilloma virus ACTAGATTATGTTTCGCGA| Ecoli ACTCCCTATCGA| human mitochondriaGATTATGTTTCGCGA| human chr 12ATGTTTCGCGAGGTGT| polio virus…

ATGGGTATTCATG| small pox virusTCTTTGTATGATCTA| human chr 21ATGGGTAATG| growth factor geneGTGTGTATGATCTA| human mitochondria…

Blast

Page 17: Computational Methods to  study Sequencing data

ATGCGAACCATGACTAGATTATGTA

ATGGGTATTCATGACTTGTATGATCTA

NC_989231 ATGTAATCTAGTAGATGAGATGATAG ACTAG ACTTGT

ATGCGAACCATGACTAGATTATGTA

ATGGGTATTCATGACTTGTATGATCTA

ATGCGAACCATGACTAGATTATGTTTCGCGAGATTATGTTTCGCGAATGTTTCGCGAGGTGT

ATGGGTATTCATGTCTTTGTATGATCTAATGGGTAATGGTGTGTATGATCTA

ATGCGAACCATGACTAGATTATGTTTCGCGAGATTATGTTTCGCGAATGTTTCGCGAGGTGT

ATGGGTATTCATGTCTTTGTATGATCTAATGGGTAATGGTGTGTATGATCTA

Sequencing reads

Coverage ValuesAssembled Contigs

Matched genes and Organisms

TAGATC TGAGAT TAGATC ATGTAA TGAGAT TAGATC ATGTAA TGAGAT TAGATCNC_989231 ATGTAATCTAGTAGATGAGATGATAGATCGCAT ACTAG TGAGAT TCGCAT ACTAG TGAGAT TCGCAT ACTAG TCGCAT

Differential Coverage

ATGCGAACCATGACTAGATTATGTA

ATGGGTATTCATGACTTGTATGATCTA

17

Sequencing

Assembly Mapping

Blast Coverage Analysis

Page 18: Computational Methods to  study Sequencing data

18

References

1) Gibas, C. and Jambec, P., Developing Bioinformatics Computer Skills, April 2001, O'Reilly & Associates, Inc. Web. 13 February 2012.

2) Kahn, Scott D., On the Future of Genomic Data Science 331, 728 (2011); DOI: 10.1126/science.1197891

3) Wetterstrand KA., DNA Sequencing Costs: Data from the NHGRI Large-Scale Genome Sequencing Program, Available at: www.genome.gov/sequencingcosts. 13 February 2012.

Page 19: Computational Methods to  study Sequencing data

19

Thank you!