neo4p dcbpw-2015
TRANSCRIPT
(Perl)-[:speaks]->(Neo4j)
Mark A. Jensen
1https://github.com/majensen/rest-neo4p.git
• Perler since 2000
• CPAN contributor (MAJENSEN) since 2009
• BioPerl Core Developer
• Director, Genomic Data Programs,Leidos Biomedical Research Inc (FNLCR)
• @thinkinator, LinkedIn
2
Not my sponsor, but could be yours!
• http://www.perlfoundation.org/how_to_write_a_proposal
3
Motivation
• Cancer Genomics: Biospecimen, Clinical, Analysis Data
– complex
– growing
– evolving technologies
– evolving policies
– need for precise accounting
• Graph models are well-suited to this world
4
5
Patient
TumorSample
Clinical
Extract
Extract Data File
Data File
NormalSample
derived_from analysis_of
• age• diagnosis• stage
• date shipped
NodesRelationships
Properties
7
select bar.namefrom bar, bar_baz, baz, baz_goob, goob,
goob_squirrel, squirrel, squirrel_spam, spam, spam_eggs,eggs, eggs_foo, foo
where bar.id = bar_baz.bar_id and bar_baz.baz_id = baz.id andbaz.id = baz_goob.baz_id and baz_goob.goob_id = goob.id andgoob.id = goob_squirrel.goob_id and goob_squirrel.id = squirrel.id
andsquirrel.id = squirrel_spam.squirrel_id andsquirrel_spam.spam_id = spam.id and spam.id = spam_eggs.spam_id andspam_eggs.eggs_id = eggs.id and eggs_foo.eggs_id = eggs.id andeggs_foo.foo_id = foo.id and foo.name = 'zloty';
match (f:foo)-[*5..8]-(b:bar) where f.name = 'zloty' return b.name
Neo4j• “Native” graph DB engine (currently in v2.2)
– Written in Java, but
– Very complete REST API
– Custom query language: Cypher
– Free community edition
– Lots of community support, including many “language drivers”
• Not the only one out there, but probably the most widely used (certainly the best marketed)
8
Design Goals
• "OGM" – Perl 5 objects backed by the graph• User should never have to deal with a REST endpoint*
*Unless she wants to.
• User should never/only have to deal with Cypher queries†
†Unless he wants/doesn’t want to.
• Robust enough for production code– System should approach complete coverage of the REST
service– System should be robust to REST API changes and server
backward-compatible (or at least version-aware)• Take advantage of the self-describing features of the API
13
REST::Neo4p core objects
• Are Node, Relationship, Index– Index objects represent legacy (v1.0) indexes– v2.0 “background” indexes handled in Schema
• Are blessed scalar refs : "Inside-out object" pattern– the scalar value is the item ID (or index name)– For any object $obj, $$obj (the ID) is exactly what you need
for constructing the API calls
• Are subclasses of Entity– Entity does the object table handling, JSON-to-object
conversion and HTTP agent calls– Isolates most of the kludges necessary to handle the few
API inconsistencies that exist(ed)
14
Batch Calls
• Certain situations (database loading, e.g.) make sense to batch : do many things in one API call rather than many single calls
• REST API provides this functionality
• How to make it "natural" in the context of working with objects?
– Use Perl prototyping sugar to create a "batch block"
15
Example:
Rather than call the server for every line, you can mix in REST::Neo4p::Batch, and then use a batch {} block:
16
Calls withinblock are collected anddeferred
How does that work?
• Agent module isolates all bona fide calls– very few kludges to core object modules req'd
• batch() puts the agent into “batch mode” and executes wrapped code– agent stores incoming calls as JSON in a queue
• After wrapped code is executed, batch() switches agent back to normal mode and has it call the batch endpoint with the queue contents
• Batch processes the response and creates objects if requested
19
Agent
• Is transparent– But can always see it with REST::Neo4p->agent– Agent module alone meant to be useful and independent
• Elicits and uses the API self-discovery feature on connect()
• Isolates all HTTP requests and responses• Captures and distinguishes API and HTTP errors
– emits REST::Neo4p::Exceptions objects
• [Instance] Is a subclass of a "real" user agent:– LWP::UserAgent– Mojo::UserAgent, or – HTTP::Thin
21
Working within API Self-Description
22
• Get the list of actions with
– $agent->available_actions
• And AUTOLOAD will provide (see pod for args):
– $agent->get_<action>()
– $agent->put_<action>()
– $agent->post_<action>()
– $agent->delete_<action>()
• Other accessors, e.g. node(), return the appropriate URL for your server
Schemas - Use Case
You start out with a set of well categorized things, that have some well defined relationships.Each thing will be represented as a node, that's fine. But,
You want to guarantee (to your client, for example) that1. You can classify every node you add or read
unambiguously into a well-defined group(you know everything that’s in there);
2. You never relate two nodes belonging to particular groups in a way that doesn't make sense according to your well-defined relationships (you can find everything that’s in there).
23
Schema Helps
• REST::Neo4p::Schema – Access the (limited) schema functionality of Neo4j server– Create indexes
– Maintain uniqueness of nodes within Label classes
• REST::Neo4p::Constrain - An add-in for constraining (or validating)– property values
– connections (relationships) based on node properties
– relationship types
according to flexible specifications
24
Constrain/Constraint
• Multiple modes:
– Automatic (throws exception if constraint violated)
– Manual (validation function returns false if constraint violated)
– Suspended (lift constraint processing when desired)
• Freeze/Thaw (in JSON) constraint specifications for reuse
28
Cypher Queries
• REST::Neo4p::Query takes a familiar, DBI-like approach
– Prepare, execute, fetch
– "rows" returned are arrays containing scalars, Node objects, and/or Relationship objects
• Simple Perl data structures can be requested instead if desired
– If a query returns a path, a Path object (a simple container) is returned
29
Cypher Queries
• Transactions are supported when you have v2.0.1 server or greater
– started with REST::Neo4p->begin_work()
– committed with REST::Neo4p->commit()
– canceled with REST::Neo4p->rollback()
(here, the class looks like the database handle in DBI, in fact…)
32
Future Directions/Contribution Ideas
• Test on v2.2 server and fix any issues
• Make Neo4p closer to an ORM (require explicit push/pull from backend server)
• Sunset v1.0 support – Completely touch-free testing within transactions
– Integrate node labels better
• Make batch response parsing more efficient– e.g., don't stream if response is not huge
• Add traversal functionality
• Beautify and deodorize
35
Thanks!
36https://github.com/majensen/rest-neo4p.git