bigdata in life sciences, genomics and systems biology
TRANSCRIPT
BigData in Life Sciences, Genomics and Systems Biology
What is Bigdata
Life sciences, Genomics and systems biology
BigData in life sciences – where is it coming from?
Genomics and Systems Biology – BigData challenges.
Making sense of BigData
Future of BigData in genomics/SB
Medicine, Ag, Food Safety, Forensics, Epidemiology
concern the study of living organisms, including biology, botany, zoology, microbiology, physiology, biochemistry, and related subjects
Gen“omics” Before 2000: One Gene at a time based on prior knowledge Now: All ~25,000 genes at once – no prior knowledge necessary
5
Genomics is a discipline in genetics that applies recombinant DNA, DNA sequencing methods, and bioinformatics to sequence, assemble, and analyze the function and structure of genomes (the complete set of DNA within a single cell of an organism).
OMICS Characteristics Comprehensiveness Scale High-throughput and low-cost technology development Rapid data release Social and ethical implications
Central Dogma of Molecular Biology
DNA RNA PROTEIN
Transcription
Reverse Transcription
RNAi Gene silencing
FUNCTION
Molecular biology is a branch of science concerning biological activity at the molecular level. The field of molecular biology overlaps with biology and chemistry and in particular, genetics and biochemistry.
Systems biology is the study of systems of biological components, which may be molecules, cells, organisms or entire species. Living systems are dynamic and complex, and their behavior may be hard to predict from the properties of individual parts.
Life Sciences BigData Examples
Measuring Instruments: LIMS, ELNs Imaging: Molecular and cellular, pathology Genomics: personal genomes, aggregate databases, gene
expression Electronic Health Records: variety of information, phenotypes Literature evidence: Pubmed, ISI web of science, Clinical trials,
WWW Curated content: biochemical pathways, drug
response/resistance
Precision medicine is an emerging approach for disease treatment and prevention that takes into account individual variability in genes, environment, and lifestyle for each person.
15
Genome / DNA Sequencing
•Game Changer 1: First human genome sequenced (2001) •Game Changer 2: Human genome costs <1K (2014)
Cost is decreasing at the square of Moore’s law: Flatley’s Law
ability to digitize humans through genomics and genotyping will overturn the practice of medicine. only a small fraction of 700,000 medical practitioners in US are upto speed with genomics...
Other BigData Use Cases
• Insurance: Cost benefit analysis of tests • Health record- guided drug development • Patient Stratification – drug response based on DNA • Measuring Instruments
• FDA Office of Regulatory affairs 14 labs, 1000+ instruments, data
• 1000genomes, 100K genomes UK, PMI million cohort – genomes + phenomes
• Biochemical pathways: Reactome, KEGG, etc.
Lot of BigData – not enough analysis
1
http://searchhealthit.techtarget.com/tip/Big-data-in-health-care-Lots-of-data-but-not-enough-analysis
Solutions to BigData Data Storage Data Organization Data Analytics Data Movement Data Exchange Data Visualization BD2K: BigData is worthless Data Dissemination: Open data, Free data, Open Govt
1
Data Management, Retrieval • Relational databases • No-SQL databases • Data use cases
http://www.tomsitpro.com/articles/rdbms-sql-cassandra-dba-developer,2-547-2.html
Data Organization and DBs
2
http://www.enaxisconsulting.com/images/userfiles/images/MDM-Chart640Final(1).jpg
Business Cases, Continuity, Infrastructure, Governance
E.g., NIH public data repositories
Data Movement / Transfer • How is the data expected to move within and outside
the infrastructure? • Bring data to analysis tools or tools to data? • From Archives to compute storage, From local to cloud, • Network bandwidth considerations • DAS, NAS, SAN, Tapes, RAM, Cache
2
Data Integration and Exchange • APIs: Application programming
interfaces for on-demand access
• XML: SBML • EMRs • RDF/OWL: BioPAX • FastQ • DICOM • Commons: genomics, cancer,
etc.
Data Visualization
2
Circos plot
Health InfoScape: 7+ million EMRs, SENSEable city lab at MIT and GE HealthyMagination. Freq of co-occurrence of medical conditions.
Alignment of 8 yersinia whole bacterial genomes