persistent and hackathons smart india hackathon the...© 2017 i4c 3 tm smart india hackathon smart...
Post on 24-May-2020
6 Views
Preview:
TRANSCRIPT
www.i4c.co.in© 2017 i4C
TM
Persistent and Hackathons
Smart India Hackathon
© 2017 i4C2
TM
Digital Transformation
Our country needs audaciousdigital transformation to
reach its potential
25% of India between age of 16-25
Over 300 Million young people
the largest exporter of information technology services and skilled manpower among developing countries
…but a vast majority of its population still lacks the skills to meaningfully participate in the digital economy.
© 2017 i4C3
TM
Smart India Hackathon
Smart India Hackathon is a non-stop digital product development
competition, where problems are posed to technology students for
innovative solutions. It…
Harnesses creativity & expertise of students
Sparks institute-level hackathons
Builds funnel for ‘Startup India’ campaign
Crowdsource solutions for improving governance and quality of life
Provide opportunity to citizens to provide innovative solutions to India’s daunting problems
World’s biggest open innovation model !!How it works
Students submit proposals
Problem statements released by ministries
Evaluation + selection of finalists
Smart India Hackathon finals
© 2014 Persistent Systems Ltd 4
Ingredients for a successful hackathon
• Executive sponsorship• Theme identification
• Broad areas of problem domains, solution requirements identified• Technology available:
• Communication and collaboration• Platforms, hardware• Internal and external resources• Data
• Enablement• Coaches• Demos • Licenses
• Internalization• Innovation as a habit• Traction
• Contact: sinaihackathon@persistent.com shriram@persistent.com
© 2014 Persistent Systems Ltd 5
VEGA: Healthcare data platform
© 2014 Persistent Systems Ltd 6
© 2014 Persistent Systems Ltd 7
Introduction: SanGeniX is a comprehensive and user friendly NGS data analysis solution. It’s rich graphical user interface (GUI) which enables user to perform primary, secondary and tertiary analysis of NGS data in a seamless fashion.
Business Problem: NGS data analysis is very complicated and requires specialized bioinformatics as well as computation resources.
Domain: Genomics
Areas of application: Personalized medicine, Clinical diagnostics, Animal genomics, Plant genomics, and Microbial genomics
Impact: Simplifies analysis and interpretation of NGS data for biologists.
SanGeniX: NGS data analysis solution
© 2016 Persistent Systems Ltd
8
SanGeniX: NGS data analysis with workflows and visualization
Benefits
•Integration of industry standard algorithms ensures data reliability
•Automation - less hands-on time.
•Point-and-click - Easy to use
•Intuitive and innovative data visualization
•Role based access to ensure security and compliance
•Scalable
Business Value
•Increased productivity•Cost effective due to cloud hosted
pay per use solution•Highly customizable to include
customer specific workflows•Allows focus on biological
questions and analysis instead of pondering over data•HIPAA Compliance
Important features
•Predefined workflows for all the common NGS workflows
•Integration of industry standard algorithms
•Flexibility to include custom algorithms
© 2016 Persistent Systems Ltd
9
Graphically enriched visualization
© 2017 Persistent Systems Ltd
www.persistent.com
A comprehensive literature mining solution
for biomedical literature
www.biogyan.com
11© 2017 Persistent Systems Ltd
Introduction
User’s Query
• Gene names
• Biological processes
• Cell-type/Tissue
Collate & Process PubMed articles
• Compute article relevance
• Highlight searched terms & sentences
• Generate holistic graphs & tables
User Action
• Study results
• Mark validity, add comments
• Export results to Spreadsheet, EndNote, BibTeX
Key features:• Allows combinatorial search of genes, cell-types and cellular processes
• Ranks scientific articles based on relevance to the search terms
• Displays biological pathways and 3D structures of queried genes
• Allows offline reading of articles
• Available at: http://biogyan.com/
12© 2017 Persistent Systems Ltd
BioGyan - Screenshots
Biological pathway & 3D structure
Distribution of articles for a queried geneQueried genes & their participation
Result Screen highlighting ranked articles
13© 2017 Persistent Systems Ltd
Market Space
• Life Sciences Research Space:• Academic research labs• Research Hospitals• Pharma/Biotech research labs
• Key market players:• All first and second tier Universities and Pharma/Biotech companies
• Industry Issues:• Literature mining is an integral part of R&D, drug discovery and regulatory filings.
• Key Services:• Licensed product• Literature mining service• Development of customized solution
© 2017 Persistent Systems Ltd
www.persistent.com
Multiomics Project
15© 2017 Persistent Systems Ltd
Need of the hour: Omics data integration
and analysis for precision medicine
Adapted from http://bmcmedgenomics.biomedcentral.com/articles/10.1186/s12920-015-0108-y
Multi-omics: Integrative biology
Meaningful insights
Disease: Biological make up and environment
Accurate diagnosis, prognosis and
treatment
Precision
Medicine
16© 2017 Persistent Systems Ltd
Multi-omics Platform
Development of a comprehensive
multi-omics data analysis platform
Implementation of diverse analysis
pipelines for addressing use-cases
such as biomarker discovery and
disease sub-typing
Data analytics and advanced
visualization for deriving actionable
outcomes
Open for collaboration
Multi-omics platform for deriving actionable
insights from big biological data
Inter-relationships between omic data (e.g. genomics,
transcriptomics, proteomics, metabolomics)
Leverage bioinformatics, data analytics and machine
learning
Address diverse use-cases (e.g. diagnostic, drug
discovery, precision medicine)
Data
LakeNetwork
Analysis
Data
Analytics
Machine
Learning
17© 2017 Persistent Systems Ltd
Multi-omics approach for biomarker prediction
from large scale data
18© 2017 Persistent Systems Ltd
Biomarker discovery pipelines: Multiple Approaches
Biomarker Prediction: Tertiary/Downstream analysis
Transcriptomics data:mRNA , miRNA
Genomics data(CNV, SNP, etc)
Proteomics data
Clinical data
Statistics/Bioinformatics analysis Pattern recognition: Machine
Learning
Pattern recognition: Data integration methods/tools
Input layer: Multiomics datasets (multiple samples)
Analysis layer: Multiple approach
Features (genes) selection: Statistically significant genes Features (genes) selection:
Contribution to model buildingPrediction layer:
Putative biomarker prediction
Individual dataset Individual/combined dataset Combined dataset
Epigenetics data(Methylation, ChIPSeq, etc)
Metabolomics data
19© 2017 Persistent Systems Ltd
Statistical Approach for biomarker prediction
Outcome: List of cross talking
pathway pairs and network
Tool: Cross-Talk module
Genomics (CNV, SNP)
Transcriptomics(MRNA, MiRNA)
Proteomics(RPPA, Ms /MS)
Metabolomics(LC/GC-MS)
Epigenomics(Methylation, ChIP-Seq)
Protein Identification
Metabolite Identification (HMDB, Mass Bank, METLIN)
Normalization & Analysis(Gistic2, CNTOOLS, MAFTOOLS)
Normalization & Analysis(Limma, EdgeR, DESeq2)
Normalization & Analysis(Galaxy-P, PGA)
Normalization & Analysis(SIMCA-P, R)
Normalization & Analysis(Limma, MACS)
Significant genes/ Genomic regions
Significant Genes/miRNAs
Significantproteins
Significantmetabolites
Significant genomic regions (Methylation, Enriched sites)
Input Layer
Individual Omicanalysis layer
Integrative layer
IntegratedAnalysis
Correlation Analysis | Clustering / Cluster of clusters | Multivariate Analysis | Pathway analysis / Network Analysis
Biomarker Prediction / Tertiary analysis
Literature Annotation
Network based analysis Annotation from Databases
Outcome: Literature based gene process
associationsTool: BioGyan
Outcome: List of risk pathways
Tool: iSubPathway
Miner
Outcome: PPI network, pathway
enrichment, functional
annotationsTool: REACTOME FI
Outcome: Annotations like molecular function, diseases, drugs,
pathways, TFBS etc.Tool: ToppGene
Prediction layer
20© 2017 Persistent Systems Ltd
Machine Learning for biomarker prediction: Approach 1
Transcriptomics(mRNA, miRNA)
Proteomics(RPPA, Ms /MS)
Metabolomics(LC/GC-MS)
Epigenomics(Methylation, ChIP-Seq)
Protein Identification Metabolite Identification (HMDB, Mass Bank, METLIN)
Normalization & Analysis(Gistic2, CNTOOLS, MAFTOOLS)
Normalization & Analysis(Limma, edgeR, DESeq2)
Normalization & Analysis(Galaxy-P, PGA)
Normalization & Analysis(SIMCA-P, R)
Normalization & Analysis(Limma, MACS)
Significant genes/miRNAs
Significant proteins Significant metabolites
Significant genomic regions (Methylation, Enriched sites)
Input Layer
Integrative layer
IntegratedAnalysis
Correlation Analysis | Clustering / Cluster of clusters | Multivariate Analysis | Pathway analysis / Network Analysis
Biomarker Prediction / Tertiary analysis
Literature Annotation Network based analysis Annotation from Databases
Outcome: Literature based gene process associations
Tool: BioGyan
Outcome: List of risk pathways
Tool: iSubPathwayMiner
Outcome: List of cross talking pathway pairs
and networkTool: Cross-Talk module
Outcome: PPI network, pathway enrichment,
functional annotationsTool: REACTOME FI
Outcome: Annotations like molecular function, diseases, drugs, pathways, TFBS etc.
Tool: ToppGene
Model1
Feature Selection
Model2
Feature Selection
Model3
Feature Selection
Model4
Feature Selection
Model5
Feature Selection
Machine learning
Nearest NeighborsLinear SVMRBF SVMGaussian ProcessDecision TreeRandom Forest
Neural NetworksAdaBoostNaive BayesQDALogistic Regression
Genomics(CNV, SNP)
Significant genes/ genomic regions
Prediction layer
21© 2017 Persistent Systems Ltd
Machine Learning for biomarker prediction: Approach 2
Machine Learning layer
Biomarker Prediction / Tertiary analysis
Literature Annotation Network based analysis Annotation from Databases
Outcome: Literature based gene process associations
Tool: BioGyan
Outcome: List of risk pathways
Tool: iSubPathwayMiner
Outcome: List of cross talking pathway pairs
and networkTool: Cross-Talk module
Outcome: PPI network, pathway enrichment,
functional annotationsTool: REACTOME FI
Outcome: Annotations like molecular function, diseases, drugs, pathways, TFBS etc.
Tool: ToppGene
Feature Selection
Combined Model Machine learning Nearest Neighbors
Linear SVMRBF SVMGaussian ProcessDecision TreeRandom Forest
Neural NetworksAdaBoostNaive BayesQDALogistic Regression
Two or more data set combined
Correlation Analysis | Clustering / Cluster of clusters | Multivariate Analysis
Genomics (CNV, SNP)
Transcriptomics(MRNA, MiRNA)
Proteomics(RPPA, Ms /MS)
Metabolomics(LC/GC-MS)
Epigenomics(Methylation, ChIP-Seq)
Protein Identification Metabolite Identification (HMDB, Mass Bank, METLIN)
Normalization & Analysis(Gistic2, CNTOOLS, MAFTOOLS)
Normalization & Analysis(Limma, EdgeR, DESeq2)
Normalization & Analysis(Galaxy-P, PGA)
Normalization & Analysis(SIMCA-P, R)
Normalization & Analysis(Limma, MACS)
Significant genes/ Genomic regions
Significant Genes/miRNAs
Significant proteins Significant metabolites
Significant genomic regions (Methylation, Enriched sites)
Integrated Input Layer
Prediction layer
22© 2017 Persistent Systems Ltd
Data Integration methods for biomarker prediction
Integrated layer
Biomarker Prediction / Tertiary analysis
Literature Annotation Network based analysis Annotation from Databases
Outcome: Literature based gene process associations
Tool: BioGyan
Outcome: List of risk pathways
Tool: iSubPathwayMiner
Outcome: List of cross talking pathway pairs
and networkTool: Cross-Talk module
Outcome: PPI network, pathway enrichment,
functional annotationsTool: REACTOME FI
Outcome: Annotations like molecular function, diseases, drugs, pathways, TFBS etc.
Tool: ToppGene
mixOmics MOcluster iClusterplus
Cluster membership of samplesFeature Ranking Visualization: Heatmaps, plots, etc
OncoImpact
Driver Genes
Genomics (CNV, SNP)
Transcriptomics(MRNA, MiRNA)
Proteomics(RPPA, Ms /MS)
Metabolomics(LC/GC-MS)
Epigenomics(Methylation, ChIP-Seq)
Protein Identification Metabolite Identification (HMDB, Mass Bank, METLIN)
Normalization & Analysis(Gistic2, CNTOOLS, MAFTOOLS)
Normalization & Analysis(Limma, EdgeR, DESeq2)
Normalization & Analysis(Galaxy-P, PGA)
Normalization & Analysis(SIMCA-P, R)
Normalization & Analysis(Limma, MACS)
Significant genes/ Genomic regions
Significant Genes/miRNAs
Significant proteins Significant metabolites
Significant genomic regions (Methylation, Enriched sites)
Input Layer
Prediction layer
23© 2017 Persistent Systems Ltd
Current Use Case:
TCGA Breast cancer data analysis for biomarker
prediction
Gene expression data(read counts per gene)
NormalTNBC
Non-TNBC
Data integration methods
MOclustermixOmicsiClusterPlus
High scoring genes
Biomarker prediction: Tertiary analysis
TNBC biomarkers from literature
Machine learning
Nearest NeighborsLinear SVMRBF SVMGaussian ProcessDecision TreeRandom Forest
Automated feature selection
Neural NetworksAdaBoostNaive BayesQDALogistic Regression
Statistical analysis
Statistically significant genes
DESeq2edgeRLimma
PubMed TCGA (Breast cancer datasets)
Data filtering: Quality control
Copy number variation data(Segmentation mean data)
Statistical analysis
Machine learning
GISTICCNTools
Statistically significant genes
Automated feature selection
Nearest NeighborsLinear SVMRBF SVMGaussian ProcessDecision TreeRandom Forest
Neural NetworksAdaBoostNaive BayesQDALogistic Regression
Data filtering: Quality control
Machine learning
Automated feature
selection
Nearest NeighborsLinear SVMRBF SVMGaussian ProcessDecision TreeRandom Forest
Neural NetworksAdaBoostNaive BayesQDALogistic Regression
766 samples 766 samples
831 samples
831 samples 825 samples 825 samples
top related