access and analyze graphs in oracle spatial network data ... · access and analyze graphs in oracle...
TRANSCRIPT
Access and Analyze Graphs in Oracle Spatial Network Data
Model using Cytoscape
Susie StephensPrincipal Product Manager, Life Sciences
Oracle
OutlineNetwork Data Model OverviewCase StudiesNDM / Cytoscape Demo
OutlineNetwork Data Model OverviewCase StudiesNDM / Cytoscape Demo
Network ConceptA network is a graph representation for modeling objects of interest and their relationships. It contains the following elements:
Nodes: objects of interestsLinks: relationship between nodesPaths: ordered list of connected links
S1
X2
X1k1· X0
k2 · S1
k3 · S1
X0
Network Data Model OverviewData model for managing graph (link-node) structuresSupports variety of network structures (hierarchical, directed, undirected, random, scale-free)Framework for applying network constraints and rules (e.g. path length, cost, minimum bounding rectangle)
Graph Analysis FunctionsPathway ConnectivityShortest Path/All PathsTracing (Accessibility)Within DistanceNearest Neighbor Minimum Bounding Rectangle Minimum Cost Spanning TreeAbove Analysis With Constraints (Depth, Cost, Distance)
Network SchemaNetwork Metadata
– Name, Type, Node/Link/Path Table InformationNetwork Tables
– Node TableNode_ID, Node_Type, Geometry,…
– Link Table Link_ID, Link_Type, Start_Node_ID, End_Node_ID, Cost, Geometry, …
– Path Table (Path-Link Table)Path_ID, Start_Node_ID, End_Node_ID, Cost, Geometry, Path Links,…
Application Information is added to network schema– Add additional columns in node, link, and path tables directly– Add foreign key(s) to node, link, and path tables
NDM APIs PL/SQL Package: (server-side)
– Network data query and management in Oracle10g– Maintains referential integrity and validation– Supports link/node/path updates
Java API: (mid-tier or client side)– Network loading/storage– Network analysis– Network Creation/Editing
Java Functional Extensibility– Network, Node, Link, Path are Java Interfaces– Application-based network analysis extensions
Visualization ToolsBundled Java visualizer & APIs for 3rd party tools, application development
Cytoscape Oracle Tom Sawyer
OutlineNetwork Data Model OverviewCase StudiesNDM / Cytoscape Demo
Case Study: Metabolic Pathway ModelingIdentify reaction paths between two given chemical compounds
1. ModelingDirected NetworkChemical Compounds as NodesEnzymes as LinksReaction Paths as Paths
2. AnalysisShortest Path AnalysisAll Path Analysis
Pathway Data Definition in NDM
NODE_ID NODE_NAME ACT COSTS SAMPLE_ID ENTRY_ID ----------------------------------------------------------------------------------------------------------------------------------
1 C00022 Y 1 Pyruvate 492 C00122 Y 1 Fumarate 503 C00036 Y 1 Oxaloacetate 514 C05379 Y 1 Oxalosuccinate 525 C00074 Y 1 Phosphoenolpyruvate 536 C00024 Y 1 Acetyl-CoA 547 C00149 Y 1 (S)-Malate 558 C00311 Y 1 Isocitrate 569 C00417 Y 1 cis-Aconitate 5710 C00042 Y 1 Succinate 58
LINK_ID LINK_NAME START_NODE_ID END_NODE_ID ACT COST SAMPLE_ID-------------------------------------------------------------------------------------------------------------------------------------------------1 1.1.1.42 (rn:R00268) 4 19 Y 1 isocitrate dehydrogenase (NADP)2 1.1.1.42 (rn:R00268) 19 4 Y 1 isocitrate dehydrogenase (NADP)3 1.1.1.42 (rn:R00268) 4 20 Y 1 isocitrate dehydrogenase (NADP)4 1.1.1.42 (rn:R00268) 20 4 Y 1 isocitrate dehydrogenase (NADP)5 4.1.1.49 (rn:R00341) 3 19 Y 1 phosphoenolpyruvate carboxykinase (ATP)6 4.1.1.49 (rn:R00341) 3 5 Y 1 phosphoenolpyruvate carboxykinase (ATP)7 1.1.1.37 (rn:R00342) 3 7 Y 1 malate dehydrogenase8 6.4.1.1 (rn:R00344) 1 3 Y 1 pyruvate carboxylase
Node table
Link table
Source: Susumu Goto, Kyoto University
Shortest Path Analysis
Source: Susumu Goto, Kyoto University
NDM Reference“We are glad to hear that Oracle are developing the new Network Data
Model specialized for the bioinformatics field. Efficient and effective pathway analysis will be achieved, especially for pathway computation from large amount of protein interaction and reaction data, using graph
algorithms embedded in the model.”
- Professor and Director Minoru Kanehisa, Bioinformatics Center Institute for Chemical Research Kyoto University
Case Study: Protein InteractionsInfer functions of unknown proteins from their relationships with well-known proteins
1. ModellingUndirected NetworkProteins as NodesInteractions as Links
2. AnalysisNo. of Degrees (Neighboring Nodes)Shortest Path AnalysisWithin-Cost Analysis (Nearest Neighbor Analysis)
NDM Reference"Oracle 10g's Network Data Model feature is great for building a
semantic work infrastructure. Oracle 10g's graphical representation is an excellent tool for planning our Y2H protein interaction data storage
needs and for building a signaling network from our Nature-AfCSMolecule Pages Database."
- Joshua Li, Sr. Computational Scientist, San Diego SupercomputerCenter / UCSD
Case Study: Data IntegrationIntegrate native biological databases by creating rules for network routes. Support multiple ontologies and jump in and out of database space for analysis
1. ModelingDirected NetworksClasses as NodesInteractions as Links
2. AnalysisNo. of Degrees (Neighboring Nodes)Shortest Path AnalysisWithin-Cost Analysis (Nearest Neighbor Analysis)
Integration Architecture
NREF EMBL GO KEGG BIND AFCS
Nodes Edges Graph
NDM layer (semantic layer)
NativeFormats
Data type determines available routesRoutes can be determined using semantics
Distributed Database layer
Network Route
NDM ReferenceOracle Database 10g's spatial network data model will assist in
advancing our research and enable us to accelerate the drug discovery and development process. The incredible amount of research data we generate every day places unique challenges on our database. With integrated data manipulation and analytic tools, our scientists can be
more efficient in processing and running computations. "
- Aram Adourian, Ph.D., vice president, Computational Sciences, Beyond Genomics
RDF Data ModelAn RDF data model to store RDF statements, including reification Java Ntriple2NDM converter for loading existing RDF data An RDF_MATCH function which can be used in SQL to find graph patterns in RDF (similar to SPARQL) Will be release as part of Oracle Database 10.2 later this year
OutlineNetwork Data Model OverviewCase StudiesNDM / Cytoscape Demo
SummaryNDM simplifies network management and analysis by providing an open and generic modelNDM provides reliability, performance and securityA number of visualization tools are integrated with NDM
More InformationNDM Whitepaper
– http://www.oracle.com/technology/products/spatial/pdf/10g_network_model_twp.pdf
NDM Paper published in IEEE with LS Case Studies– http://www.oracle.com/technology/industries/life_sciences
/pdf/ls_ieee_graph.pdfNDM presentation for Life Sciences
– http://www.oracle.com/technology/industries/life_sciences/presentations/biopathways704_graph_modeling.pdf
Cytoscape Plugin– http://www.oracle.com/technology/industries/life_sciences
/ls_sample_code.htmlCytoscape
– http://cytoscape.org/
Q U E S T I O N SQ U E S T I O N SA N S W E R SA N S W E R S