bigdata in life sciences, genomics and systems biology

29
BigData in Life Sciences, Genomics and Systems Biology Harsha Rajasimha 9 th September 2015

Upload: harsha-rajasimha

Post on 07-Jan-2017

499 views

Category:

Data & Analytics


2 download

TRANSCRIPT

BigData in Life Sciences, Genomics and Systems Biology

Harsha Rajasimha 9th September 2015

BigData in Life Sciences, Genomics and Systems Biology

What is Bigdata

Life sciences, Genomics and systems biology

BigData in life sciences – where is it coming from?

Genomics and Systems Biology – BigData challenges.

Making sense of BigData

Future of BigData in genomics/SB

What is BigData?

http://www.ibmbigdatahub.com/infographic/four-vs-big-data

Medicine, Ag, Food Safety, Forensics, Epidemiology

concern the study of living organisms, including biology, botany, zoology, microbiology, physiology, biochemistry, and related subjects

Gen“omics” Before 2000: One Gene at a time based on prior knowledge Now: All ~25,000 genes at once – no prior knowledge necessary

5

Genomics is a discipline in genetics that applies recombinant DNA, DNA sequencing methods, and bioinformatics to sequence, assemble, and analyze the function and structure of genomes (the complete set of DNA within a single cell of an organism).

OMICS Characteristics Comprehensiveness Scale High-throughput and low-cost technology development Rapid data release Social and ethical implications

Central Dogma of Molecular Biology

DNA RNA PROTEIN

Transcription

Reverse Transcription

RNAi Gene silencing

FUNCTION

Molecular biology is a branch of science concerning biological activity at the molecular level. The field of molecular biology overlaps with biology and chemistry and in particular, genetics and biochemistry.

Systems biology is the study of systems of biological components, which may be molecules, cells, organisms or entire species. Living systems are dynamic and complex, and their behavior may be hard to predict from the properties of individual parts.

Life Sciences BigData Examples

Measuring Instruments: LIMS, ELNs Imaging: Molecular and cellular, pathology Genomics: personal genomes, aggregate databases, gene

expression Electronic Health Records: variety of information, phenotypes Literature evidence: Pubmed, ISI web of science, Clinical trials,

WWW Curated content: biochemical pathways, drug

response/resistance

Precision medicine is an emerging approach for disease treatment and prevention that takes into account individual variability in genes, environment, and lifestyle for each person.

• Biological Mechanism Example:

• Metabolic Pathways

• Systems Biology Graphical Notation

~4800

BigData Use Case 1: Disease Genes Discovery

Obama’s PMI: Need for Large Cohorts

15

Genome / DNA Sequencing

•Game Changer 1: First human genome sequenced (2001) •Game Changer 2: Human genome costs <1K (2014)

Cost is decreasing at the square of Moore’s law: Flatley’s Law

ability to digitize humans through genomics and genotyping will overturn the practice of medicine. only a small fraction of 700,000 medical practitioners in US are upto speed with genomics...

16

Personal Genomes, Cancer Genomes, and other genomes

Other BigData Use Cases

• Insurance: Cost benefit analysis of tests • Health record- guided drug development • Patient Stratification – drug response based on DNA • Measuring Instruments

• FDA Office of Regulatory affairs 14 labs, 1000+ instruments, data

• 1000genomes, 100K genomes UK, PMI million cohort – genomes + phenomes

• Biochemical pathways: Reactome, KEGG, etc.

Lot of BigData – not enough analysis

1

http://searchhealthit.techtarget.com/tip/Big-data-in-health-care-Lots-of-data-but-not-enough-analysis

Solutions to BigData Data Storage Data Organization Data Analytics Data Movement Data Exchange Data Visualization BD2K: BigData is worthless Data Dissemination: Open data, Free data, Open Govt

1

Data Storage

2

http://blog.zoolz.com

Cloud, Enterprise storage, Planning for tomorrow

Data Management, Retrieval • Relational databases • No-SQL databases • Data use cases

http://www.tomsitpro.com/articles/rdbms-sql-cassandra-dba-developer,2-547-2.html

Data Organization and DBs

2

http://www.enaxisconsulting.com/images/userfiles/images/MDM-Chart640Final(1).jpg

Business Cases, Continuity, Infrastructure, Governance

E.g., NIH public data repositories

Data Analytics

Data Movement / Transfer • How is the data expected to move within and outside

the infrastructure? • Bring data to analysis tools or tools to data? • From Archives to compute storage, From local to cloud, • Network bandwidth considerations • DAS, NAS, SAN, Tapes, RAM, Cache

2

Data Integration and Exchange • APIs: Application programming

interfaces for on-demand access

• XML: SBML • EMRs • RDF/OWL: BioPAX • FastQ • DICOM • Commons: genomics, cancer,

etc.

Data Visualization

2

Circos plot

Health InfoScape: 7+ million EMRs, SENSEable city lab at MIT and GE HealthyMagination. Freq of co-occurrence of medical conditions.

Alignment of 8 yersinia whole bacterial genomes

BD2K: BigData is worthless

2

Data Dissemination

Dissemination

Discussion!

Lets do it with BigData!

[email protected]