neo4p dcbpw-2015

36
(Perl)-[:speaks]->(Neo4j) Mark A. Jensen 1 https://github.com/majensen/rest-neo4p.git

Upload: mark-jensen

Post on 17-Jul-2015

257 views

Category:

Software


1 download

TRANSCRIPT

(Perl)-[:speaks]->(Neo4j)

Mark A. Jensen

1https://github.com/majensen/rest-neo4p.git

• Perler since 2000

• CPAN contributor (MAJENSEN) since 2009

• BioPerl Core Developer

• Director, Genomic Data Programs,Leidos Biomedical Research Inc (FNLCR)

• @thinkinator, LinkedIn

2

Not my sponsor, but could be yours!

• http://www.perlfoundation.org/how_to_write_a_proposal

3

Motivation

• Cancer Genomics: Biospecimen, Clinical, Analysis Data

– complex

– growing

– evolving technologies

– evolving policies

– need for precise accounting

• Graph models are well-suited to this world

4

5

Patient

TumorSample

Clinical

Extract

Extract Data File

Data File

NormalSample

derived_from analysis_of

• age• diagnosis• stage

• date shipped

NodesRelationships

Properties

Graph vs RDBMS

6

foo

barbaz

spam

eggs

squirrel

goob

7

select bar.namefrom bar, bar_baz, baz, baz_goob, goob,

goob_squirrel, squirrel, squirrel_spam, spam, spam_eggs,eggs, eggs_foo, foo

where bar.id = bar_baz.bar_id and bar_baz.baz_id = baz.id andbaz.id = baz_goob.baz_id and baz_goob.goob_id = goob.id andgoob.id = goob_squirrel.goob_id and goob_squirrel.id = squirrel.id

andsquirrel.id = squirrel_spam.squirrel_id andsquirrel_spam.spam_id = spam.id and spam.id = spam_eggs.spam_id andspam_eggs.eggs_id = eggs.id and eggs_foo.eggs_id = eggs.id andeggs_foo.foo_id = foo.id and foo.name = 'zloty';

match (f:foo)-[*5..8]-(b:bar) where f.name = 'zloty' return b.name

Neo4j• “Native” graph DB engine (currently in v2.2)

– Written in Java, but

– Very complete REST API

– Custom query language: Cypher

– Free community edition

– Lots of community support, including many “language drivers”

• Not the only one out there, but probably the most widely used (certainly the best marketed)

8

Neo4p

10

Neo4p

11

Create Node

Label Node

Create Unique Node

Add a Prop

Link Nodes

Load/Use Index

Neo4p

12

Design Goals

• "OGM" – Perl 5 objects backed by the graph• User should never have to deal with a REST endpoint*

*Unless she wants to.

• User should never/only have to deal with Cypher queries†

†Unless he wants/doesn’t want to.

• Robust enough for production code– System should approach complete coverage of the REST

service– System should be robust to REST API changes and server

backward-compatible (or at least version-aware)• Take advantage of the self-describing features of the API

13

REST::Neo4p core objects

• Are Node, Relationship, Index– Index objects represent legacy (v1.0) indexes– v2.0 “background” indexes handled in Schema

• Are blessed scalar refs : "Inside-out object" pattern– the scalar value is the item ID (or index name)– For any object $obj, $$obj (the ID) is exactly what you need

for constructing the API calls

• Are subclasses of Entity– Entity does the object table handling, JSON-to-object

conversion and HTTP agent calls– Isolates most of the kludges necessary to handle the few

API inconsistencies that exist(ed)

14

Batch Calls

• Certain situations (database loading, e.g.) make sense to batch : do many things in one API call rather than many single calls

• REST API provides this functionality

• How to make it "natural" in the context of working with objects?

– Use Perl prototyping sugar to create a "batch block"

15

Example:

Rather than call the server for every line, you can mix in REST::Neo4p::Batch, and then use a batch {} block:

16

Calls withinblock are collected anddeferred

17

You can execute more complex logic within the batch block, and keep the objects beyond it:

18

But miracles are not yet implemented:

Object here doesn't really exist yet…

How does that work?

• Agent module isolates all bona fide calls– very few kludges to core object modules req'd

• batch() puts the agent into “batch mode” and executes wrapped code– agent stores incoming calls as JSON in a queue

• After wrapped code is executed, batch() switches agent back to normal mode and has it call the batch endpoint with the queue contents

• Batch processes the response and creates objects if requested

19

HTTP Agent

20

Agent

• Is transparent– But can always see it with REST::Neo4p->agent– Agent module alone meant to be useful and independent

• Elicits and uses the API self-discovery feature on connect()

• Isolates all HTTP requests and responses• Captures and distinguishes API and HTTP errors

– emits REST::Neo4p::Exceptions objects

• [Instance] Is a subclass of a "real" user agent:– LWP::UserAgent– Mojo::UserAgent, or – HTTP::Thin

21

Working within API Self-Description

22

• Get the list of actions with

– $agent->available_actions

• And AUTOLOAD will provide (see pod for args):

– $agent->get_<action>()

– $agent->put_<action>()

– $agent->post_<action>()

– $agent->delete_<action>()

• Other accessors, e.g. node(), return the appropriate URL for your server

Schemas - Use Case

You start out with a set of well categorized things, that have some well defined relationships.Each thing will be represented as a node, that's fine. But,

You want to guarantee (to your client, for example) that1. You can classify every node you add or read

unambiguously into a well-defined group(you know everything that’s in there);

2. You never relate two nodes belonging to particular groups in a way that doesn't make sense according to your well-defined relationships (you can find everything that’s in there).

23

Schema Helps

• REST::Neo4p::Schema – Access the (limited) schema functionality of Neo4j server– Create indexes

– Maintain uniqueness of nodes within Label classes

• REST::Neo4p::Constrain - An add-in for constraining (or validating)– property values

– connections (relationships) based on node properties

– relationship types

according to flexible specifications

24

App-level Constraints

25

26

27

Will throw atRecord 5

Constrain/Constraint

• Multiple modes:

– Automatic (throws exception if constraint violated)

– Manual (validation function returns false if constraint violated)

– Suspended (lift constraint processing when desired)

• Freeze/Thaw (in JSON) constraint specifications for reuse

28

Cypher Queries

• REST::Neo4p::Query takes a familiar, DBI-like approach

– Prepare, execute, fetch

– "rows" returned are arrays containing scalars, Node objects, and/or Relationship objects

• Simple Perl data structures can be requested instead if desired

– If a query returns a path, a Path object (a simple container) is returned

29

30

Cypher Queries

• Prepare and execute with parameter substitutions

31

Do This!

Not This!

Cypher Queries

• Transactions are supported when you have v2.0.1 server or greater

– started with REST::Neo4p->begin_work()

– committed with REST::Neo4p->commit()

– canceled with REST::Neo4p->rollback()

(here, the class looks like the database handle in DBI, in fact…)

32

DBI – DBD::Neo4p

• Yes, you can really do this:

33

DBI – DBD::Neo4p

34

• Row returns: choice of full objects or simple Perl structures

Future Directions/Contribution Ideas

• Test on v2.2 server and fix any issues

• Make Neo4p closer to an ORM (require explicit push/pull from backend server)

• Sunset v1.0 support – Completely touch-free testing within transactions

– Integrate node labels better

• Make batch response parsing more efficient– e.g., don't stream if response is not huge

• Add traversal functionality

• Beautify and deodorize

35