Grap
hCon
nect
the power of graphs to analyze biological data
about me
who am i ...
Davy Suvee@DSUVEE
➡ big data architect @ datablend - continuum• provide big data and nosql consultancy
• 5 years of hands-on expertise in the pharma/biotech sector
massive data
big data in pharma
full genome sequencing
complex databiological networks
scalable number crunching platform
visual insights-driven platform
graphs!!
outlier detection platform
big data in pharma (2 specific use cases)
neo4j, mongodb/cassandra and gephi
euretos - brainneo4j, mongodb, solr and prefuse
gene expression clustering
★ 4.800 samples★ 27.000 genes
➡ oncology data set:
➡ Question:★ for a particular subset of samples, which genes are co-expressed?
storing gene expressions (mongodb)
{ "_id" : { "$oid" : "4f1fb64a1695629dd9d916e3"} , "sample_name" : "122551hp133a21.cel" , "genomics_id" : 122551 , "sample_id" : 343981 , "donor_id" : 143981 , "sample_type" : "Tissue" , "sample_site" : "Ascending colon" , "pathology_category" : "MALIGNANT" , "pathology_morphology" : "Adenocarcinoma" , "pathology_type" : "Primary malignant neoplasm of colon" , "primary_site" : "Colon" , "expressions" : [ { "gene" : "X1_at" , "expression" : 5.54217719084415} , { "gene" : "X10_at" , "expression" : 3.92335121981739} , { "gene" : "X100_at" , "expression" : 7.81638155662255} , { "gene" : "X1000_at" , "expression" : 5.44318512260619} , … ]}
correlating samples (mongodb/map-reduce)
pearson correlation
x y
43 99
21 65
25 79
42 75
57 87
59 81
0,52
co-expression graph (neo4j)
➡ create a node for each sample➡ if correlation between two samples >= 0.8
create an edge between both nodes
122552
122553
122551
correlated
value : 0,86
co-expression visualisation (gephi)
euretos - brain
➡ pubmed: 23 million biomedical articles• 1300 new ones added every day• google-like search interface
➡ reading an article ...• malaria is transferred by mosquitoes
euretos - brain
authors references
euretos - brain
ooooooh crap ...
euretos - brain
➡ nanopub (nanopub.org)• the smallest unit of publishable information
➡ assertion• subject: malaria• predicate: transferred by• object: mosquito
➡ provenance• how this came to be (meta-data)
euretos - brain➡ unfortunately, malaria is encoded in various ways ...
malaria P22384 AQ879
db1 db2 db3
malaria
euretos - brain
malaria mosquitotransferred by
euretos - brain
➡ brain (http://www.euretos.com/brain)• exploration and analysis platform• millions of concepts/triples/nanopubs• pubmed, uniprot, omim, pubchem, ...
➡ architectural stack• meta-data is stored in mongodb• graph in neo4j• swing interface connecting to rest endpoints
brain
brain
brain
brain
brain
brain
brain
brain
Questions?
Follow us
twitter.com/data_blendwww.datablend.be
www.datablend.be [email protected] 0499/05.00.89
datablend - continuum