persistent and hackathons smart india hackathon the...© 2017 i4c 3 tm smart india hackathon smart...

23
www.i4c.co.in © 2017 i4C TM Persistent and Hackathons Smart India Hackathon

Upload: others

Post on 24-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Persistent and Hackathons Smart India Hackathon the...© 2017 i4C 3 TM Smart India Hackathon Smart India Hackathon is a non-stop digital product development competition, where problems

www.i4c.co.in© 2017 i4C

TM

Persistent and Hackathons

Smart India Hackathon

Page 2: Persistent and Hackathons Smart India Hackathon the...© 2017 i4C 3 TM Smart India Hackathon Smart India Hackathon is a non-stop digital product development competition, where problems

© 2017 i4C2

TM

Digital Transformation

Our country needs audaciousdigital transformation to

reach its potential

25% of India between age of 16-25

Over 300 Million young people

the largest exporter of information technology services and skilled manpower among developing countries

…but a vast majority of its population still lacks the skills to meaningfully participate in the digital economy.

Page 3: Persistent and Hackathons Smart India Hackathon the...© 2017 i4C 3 TM Smart India Hackathon Smart India Hackathon is a non-stop digital product development competition, where problems

© 2017 i4C3

TM

Smart India Hackathon

Smart India Hackathon is a non-stop digital product development

competition, where problems are posed to technology students for

innovative solutions. It…

Harnesses creativity & expertise of students

Sparks institute-level hackathons

Builds funnel for ‘Startup India’ campaign

Crowdsource solutions for improving governance and quality of life

Provide opportunity to citizens to provide innovative solutions to India’s daunting problems

World’s biggest open innovation model !!How it works

Students submit proposals

Problem statements released by ministries

Evaluation + selection of finalists

Smart India Hackathon finals

Page 4: Persistent and Hackathons Smart India Hackathon the...© 2017 i4C 3 TM Smart India Hackathon Smart India Hackathon is a non-stop digital product development competition, where problems

© 2014 Persistent Systems Ltd 4

Ingredients for a successful hackathon

• Executive sponsorship• Theme identification

• Broad areas of problem domains, solution requirements identified• Technology available:

• Communication and collaboration• Platforms, hardware• Internal and external resources• Data

• Enablement• Coaches• Demos • Licenses

• Internalization• Innovation as a habit• Traction

• Contact: [email protected] [email protected]

Page 5: Persistent and Hackathons Smart India Hackathon the...© 2017 i4C 3 TM Smart India Hackathon Smart India Hackathon is a non-stop digital product development competition, where problems

© 2014 Persistent Systems Ltd 5

VEGA: Healthcare data platform

Page 6: Persistent and Hackathons Smart India Hackathon the...© 2017 i4C 3 TM Smart India Hackathon Smart India Hackathon is a non-stop digital product development competition, where problems

© 2014 Persistent Systems Ltd 6

Page 7: Persistent and Hackathons Smart India Hackathon the...© 2017 i4C 3 TM Smart India Hackathon Smart India Hackathon is a non-stop digital product development competition, where problems

© 2014 Persistent Systems Ltd 7

Introduction: SanGeniX is a comprehensive and user friendly NGS data analysis solution. It’s rich graphical user interface (GUI) which enables user to perform primary, secondary and tertiary analysis of NGS data in a seamless fashion.

Business Problem: NGS data analysis is very complicated and requires specialized bioinformatics as well as computation resources.

Domain: Genomics

Areas of application: Personalized medicine, Clinical diagnostics, Animal genomics, Plant genomics, and Microbial genomics

Impact: Simplifies analysis and interpretation of NGS data for biologists.

SanGeniX: NGS data analysis solution

Page 8: Persistent and Hackathons Smart India Hackathon the...© 2017 i4C 3 TM Smart India Hackathon Smart India Hackathon is a non-stop digital product development competition, where problems

© 2016 Persistent Systems Ltd

8

SanGeniX: NGS data analysis with workflows and visualization

Benefits

•Integration of industry standard algorithms ensures data reliability

•Automation - less hands-on time.

•Point-and-click - Easy to use

•Intuitive and innovative data visualization

•Role based access to ensure security and compliance

•Scalable

Business Value

•Increased productivity•Cost effective due to cloud hosted

pay per use solution•Highly customizable to include

customer specific workflows•Allows focus on biological

questions and analysis instead of pondering over data•HIPAA Compliance

Important features

•Predefined workflows for all the common NGS workflows

•Integration of industry standard algorithms

•Flexibility to include custom algorithms

Page 9: Persistent and Hackathons Smart India Hackathon the...© 2017 i4C 3 TM Smart India Hackathon Smart India Hackathon is a non-stop digital product development competition, where problems

© 2016 Persistent Systems Ltd

9

Graphically enriched visualization

Page 10: Persistent and Hackathons Smart India Hackathon the...© 2017 i4C 3 TM Smart India Hackathon Smart India Hackathon is a non-stop digital product development competition, where problems

© 2017 Persistent Systems Ltd

www.persistent.com

A comprehensive literature mining solution

for biomedical literature

www.biogyan.com

Page 11: Persistent and Hackathons Smart India Hackathon the...© 2017 i4C 3 TM Smart India Hackathon Smart India Hackathon is a non-stop digital product development competition, where problems

11© 2017 Persistent Systems Ltd

Introduction

User’s Query

• Gene names

• Biological processes

• Cell-type/Tissue

Collate & Process PubMed articles

• Compute article relevance

• Highlight searched terms & sentences

• Generate holistic graphs & tables

User Action

• Study results

• Mark validity, add comments

• Export results to Spreadsheet, EndNote, BibTeX

Key features:• Allows combinatorial search of genes, cell-types and cellular processes

• Ranks scientific articles based on relevance to the search terms

• Displays biological pathways and 3D structures of queried genes

• Allows offline reading of articles

• Available at: http://biogyan.com/

Page 12: Persistent and Hackathons Smart India Hackathon the...© 2017 i4C 3 TM Smart India Hackathon Smart India Hackathon is a non-stop digital product development competition, where problems

12© 2017 Persistent Systems Ltd

BioGyan - Screenshots

Biological pathway & 3D structure

Distribution of articles for a queried geneQueried genes & their participation

Result Screen highlighting ranked articles

Page 13: Persistent and Hackathons Smart India Hackathon the...© 2017 i4C 3 TM Smart India Hackathon Smart India Hackathon is a non-stop digital product development competition, where problems

13© 2017 Persistent Systems Ltd

Market Space

• Life Sciences Research Space:• Academic research labs• Research Hospitals• Pharma/Biotech research labs

• Key market players:• All first and second tier Universities and Pharma/Biotech companies

• Industry Issues:• Literature mining is an integral part of R&D, drug discovery and regulatory filings.

• Key Services:• Licensed product• Literature mining service• Development of customized solution

Page 14: Persistent and Hackathons Smart India Hackathon the...© 2017 i4C 3 TM Smart India Hackathon Smart India Hackathon is a non-stop digital product development competition, where problems

© 2017 Persistent Systems Ltd

www.persistent.com

Multiomics Project

Page 15: Persistent and Hackathons Smart India Hackathon the...© 2017 i4C 3 TM Smart India Hackathon Smart India Hackathon is a non-stop digital product development competition, where problems

15© 2017 Persistent Systems Ltd

Need of the hour: Omics data integration

and analysis for precision medicine

Adapted from http://bmcmedgenomics.biomedcentral.com/articles/10.1186/s12920-015-0108-y

Multi-omics: Integrative biology

Meaningful insights

Disease: Biological make up and environment

Accurate diagnosis, prognosis and

treatment

Precision

Medicine

Page 16: Persistent and Hackathons Smart India Hackathon the...© 2017 i4C 3 TM Smart India Hackathon Smart India Hackathon is a non-stop digital product development competition, where problems

16© 2017 Persistent Systems Ltd

Multi-omics Platform

Development of a comprehensive

multi-omics data analysis platform

Implementation of diverse analysis

pipelines for addressing use-cases

such as biomarker discovery and

disease sub-typing

Data analytics and advanced

visualization for deriving actionable

outcomes

Open for collaboration

Multi-omics platform for deriving actionable

insights from big biological data

Inter-relationships between omic data (e.g. genomics,

transcriptomics, proteomics, metabolomics)

Leverage bioinformatics, data analytics and machine

learning

Address diverse use-cases (e.g. diagnostic, drug

discovery, precision medicine)

Data

LakeNetwork

Analysis

Data

Analytics

Machine

Learning

Page 17: Persistent and Hackathons Smart India Hackathon the...© 2017 i4C 3 TM Smart India Hackathon Smart India Hackathon is a non-stop digital product development competition, where problems

17© 2017 Persistent Systems Ltd

Multi-omics approach for biomarker prediction

from large scale data

Page 18: Persistent and Hackathons Smart India Hackathon the...© 2017 i4C 3 TM Smart India Hackathon Smart India Hackathon is a non-stop digital product development competition, where problems

18© 2017 Persistent Systems Ltd

Biomarker discovery pipelines: Multiple Approaches

Biomarker Prediction: Tertiary/Downstream analysis

Transcriptomics data:mRNA , miRNA

Genomics data(CNV, SNP, etc)

Proteomics data

Clinical data

Statistics/Bioinformatics analysis Pattern recognition: Machine

Learning

Pattern recognition: Data integration methods/tools

Input layer: Multiomics datasets (multiple samples)

Analysis layer: Multiple approach

Features (genes) selection: Statistically significant genes Features (genes) selection:

Contribution to model buildingPrediction layer:

Putative biomarker prediction

Individual dataset Individual/combined dataset Combined dataset

Epigenetics data(Methylation, ChIPSeq, etc)

Metabolomics data

Page 19: Persistent and Hackathons Smart India Hackathon the...© 2017 i4C 3 TM Smart India Hackathon Smart India Hackathon is a non-stop digital product development competition, where problems

19© 2017 Persistent Systems Ltd

Statistical Approach for biomarker prediction

Outcome: List of cross talking

pathway pairs and network

Tool: Cross-Talk module

Genomics (CNV, SNP)

Transcriptomics(MRNA, MiRNA)

Proteomics(RPPA, Ms /MS)

Metabolomics(LC/GC-MS)

Epigenomics(Methylation, ChIP-Seq)

Protein Identification

Metabolite Identification (HMDB, Mass Bank, METLIN)

Normalization & Analysis(Gistic2, CNTOOLS, MAFTOOLS)

Normalization & Analysis(Limma, EdgeR, DESeq2)

Normalization & Analysis(Galaxy-P, PGA)

Normalization & Analysis(SIMCA-P, R)

Normalization & Analysis(Limma, MACS)

Significant genes/ Genomic regions

Significant Genes/miRNAs

Significantproteins

Significantmetabolites

Significant genomic regions (Methylation, Enriched sites)

Input Layer

Individual Omicanalysis layer

Integrative layer

IntegratedAnalysis

Correlation Analysis | Clustering / Cluster of clusters | Multivariate Analysis | Pathway analysis / Network Analysis

Biomarker Prediction / Tertiary analysis

Literature Annotation

Network based analysis Annotation from Databases

Outcome: Literature based gene process

associationsTool: BioGyan

Outcome: List of risk pathways

Tool: iSubPathway

Miner

Outcome: PPI network, pathway

enrichment, functional

annotationsTool: REACTOME FI

Outcome: Annotations like molecular function, diseases, drugs,

pathways, TFBS etc.Tool: ToppGene

Prediction layer

Page 20: Persistent and Hackathons Smart India Hackathon the...© 2017 i4C 3 TM Smart India Hackathon Smart India Hackathon is a non-stop digital product development competition, where problems

20© 2017 Persistent Systems Ltd

Machine Learning for biomarker prediction: Approach 1

Transcriptomics(mRNA, miRNA)

Proteomics(RPPA, Ms /MS)

Metabolomics(LC/GC-MS)

Epigenomics(Methylation, ChIP-Seq)

Protein Identification Metabolite Identification (HMDB, Mass Bank, METLIN)

Normalization & Analysis(Gistic2, CNTOOLS, MAFTOOLS)

Normalization & Analysis(Limma, edgeR, DESeq2)

Normalization & Analysis(Galaxy-P, PGA)

Normalization & Analysis(SIMCA-P, R)

Normalization & Analysis(Limma, MACS)

Significant genes/miRNAs

Significant proteins Significant metabolites

Significant genomic regions (Methylation, Enriched sites)

Input Layer

Integrative layer

IntegratedAnalysis

Correlation Analysis | Clustering / Cluster of clusters | Multivariate Analysis | Pathway analysis / Network Analysis

Biomarker Prediction / Tertiary analysis

Literature Annotation Network based analysis Annotation from Databases

Outcome: Literature based gene process associations

Tool: BioGyan

Outcome: List of risk pathways

Tool: iSubPathwayMiner

Outcome: List of cross talking pathway pairs

and networkTool: Cross-Talk module

Outcome: PPI network, pathway enrichment,

functional annotationsTool: REACTOME FI

Outcome: Annotations like molecular function, diseases, drugs, pathways, TFBS etc.

Tool: ToppGene

Model1

Feature Selection

Model2

Feature Selection

Model3

Feature Selection

Model4

Feature Selection

Model5

Feature Selection

Machine learning

Nearest NeighborsLinear SVMRBF SVMGaussian ProcessDecision TreeRandom Forest

Neural NetworksAdaBoostNaive BayesQDALogistic Regression

Genomics(CNV, SNP)

Significant genes/ genomic regions

Prediction layer

Page 21: Persistent and Hackathons Smart India Hackathon the...© 2017 i4C 3 TM Smart India Hackathon Smart India Hackathon is a non-stop digital product development competition, where problems

21© 2017 Persistent Systems Ltd

Machine Learning for biomarker prediction: Approach 2

Machine Learning layer

Biomarker Prediction / Tertiary analysis

Literature Annotation Network based analysis Annotation from Databases

Outcome: Literature based gene process associations

Tool: BioGyan

Outcome: List of risk pathways

Tool: iSubPathwayMiner

Outcome: List of cross talking pathway pairs

and networkTool: Cross-Talk module

Outcome: PPI network, pathway enrichment,

functional annotationsTool: REACTOME FI

Outcome: Annotations like molecular function, diseases, drugs, pathways, TFBS etc.

Tool: ToppGene

Feature Selection

Combined Model Machine learning Nearest Neighbors

Linear SVMRBF SVMGaussian ProcessDecision TreeRandom Forest

Neural NetworksAdaBoostNaive BayesQDALogistic Regression

Two or more data set combined

Correlation Analysis | Clustering / Cluster of clusters | Multivariate Analysis

Genomics (CNV, SNP)

Transcriptomics(MRNA, MiRNA)

Proteomics(RPPA, Ms /MS)

Metabolomics(LC/GC-MS)

Epigenomics(Methylation, ChIP-Seq)

Protein Identification Metabolite Identification (HMDB, Mass Bank, METLIN)

Normalization & Analysis(Gistic2, CNTOOLS, MAFTOOLS)

Normalization & Analysis(Limma, EdgeR, DESeq2)

Normalization & Analysis(Galaxy-P, PGA)

Normalization & Analysis(SIMCA-P, R)

Normalization & Analysis(Limma, MACS)

Significant genes/ Genomic regions

Significant Genes/miRNAs

Significant proteins Significant metabolites

Significant genomic regions (Methylation, Enriched sites)

Integrated Input Layer

Prediction layer

Page 22: Persistent and Hackathons Smart India Hackathon the...© 2017 i4C 3 TM Smart India Hackathon Smart India Hackathon is a non-stop digital product development competition, where problems

22© 2017 Persistent Systems Ltd

Data Integration methods for biomarker prediction

Integrated layer

Biomarker Prediction / Tertiary analysis

Literature Annotation Network based analysis Annotation from Databases

Outcome: Literature based gene process associations

Tool: BioGyan

Outcome: List of risk pathways

Tool: iSubPathwayMiner

Outcome: List of cross talking pathway pairs

and networkTool: Cross-Talk module

Outcome: PPI network, pathway enrichment,

functional annotationsTool: REACTOME FI

Outcome: Annotations like molecular function, diseases, drugs, pathways, TFBS etc.

Tool: ToppGene

mixOmics MOcluster iClusterplus

Cluster membership of samplesFeature Ranking Visualization: Heatmaps, plots, etc

OncoImpact

Driver Genes

Genomics (CNV, SNP)

Transcriptomics(MRNA, MiRNA)

Proteomics(RPPA, Ms /MS)

Metabolomics(LC/GC-MS)

Epigenomics(Methylation, ChIP-Seq)

Protein Identification Metabolite Identification (HMDB, Mass Bank, METLIN)

Normalization & Analysis(Gistic2, CNTOOLS, MAFTOOLS)

Normalization & Analysis(Limma, EdgeR, DESeq2)

Normalization & Analysis(Galaxy-P, PGA)

Normalization & Analysis(SIMCA-P, R)

Normalization & Analysis(Limma, MACS)

Significant genes/ Genomic regions

Significant Genes/miRNAs

Significant proteins Significant metabolites

Significant genomic regions (Methylation, Enriched sites)

Input Layer

Prediction layer

Page 23: Persistent and Hackathons Smart India Hackathon the...© 2017 i4C 3 TM Smart India Hackathon Smart India Hackathon is a non-stop digital product development competition, where problems

23© 2017 Persistent Systems Ltd

Current Use Case:

TCGA Breast cancer data analysis for biomarker

prediction

Gene expression data(read counts per gene)

NormalTNBC

Non-TNBC

Data integration methods

MOclustermixOmicsiClusterPlus

High scoring genes

Biomarker prediction: Tertiary analysis

TNBC biomarkers from literature

Machine learning

Nearest NeighborsLinear SVMRBF SVMGaussian ProcessDecision TreeRandom Forest

Automated feature selection

Neural NetworksAdaBoostNaive BayesQDALogistic Regression

Statistical analysis

Statistically significant genes

DESeq2edgeRLimma

PubMed TCGA (Breast cancer datasets)

Data filtering: Quality control

Copy number variation data(Segmentation mean data)

Statistical analysis

Machine learning

GISTICCNTools

Statistically significant genes

Automated feature selection

Nearest NeighborsLinear SVMRBF SVMGaussian ProcessDecision TreeRandom Forest

Neural NetworksAdaBoostNaive BayesQDALogistic Regression

Data filtering: Quality control

Machine learning

Automated feature

selection

Nearest NeighborsLinear SVMRBF SVMGaussian ProcessDecision TreeRandom Forest

Neural NetworksAdaBoostNaive BayesQDALogistic Regression

766 samples 766 samples

831 samples

831 samples 825 samples 825 samples