python-graph-lovestory

16
python-graph-lovestory Documentation Release 0.99 Amirouche Boubekki April 21, 2013

Upload: jie-bao

Post on 19-Jan-2015

537 views

Category:

Technology


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: python-graph-lovestory

python-graph-lovestory DocumentationRelease 0.99

Amirouche Boubekki

April 21, 2013

Page 2: python-graph-lovestory
Page 3: python-graph-lovestory

CONTENTS

1 Walkthrough 31.1 Blueprints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

i

Page 4: python-graph-lovestory

ii

Page 5: python-graph-lovestory

python-graph-lovestory Documentation, Release 0.99

Here you will find documentation about the component of my python graph lovestory, I hope you like it because I do.

CONTENTS 1

Page 6: python-graph-lovestory

python-graph-lovestory Documentation, Release 0.99

2 CONTENTS

Page 7: python-graph-lovestory

CHAPTER

ONE

WALKTHROUGH

There’s no direct to the matter of «Developping Web application using graph databases» tutorial instead you read thefollowing subject in order which introduce each library part of the stack each of which deal with specific matters andas such complexities are introduced along way you discover the stack, so that you know all the good parts but also allthe bad parts before you start.

1.1 Blueprints

1.1.1 Kesako ?

Blueprints allows to use several graph database with the same API. It can be used to embed a graph database in yourPython program. If several process need to access the same database it’s not what you need. python-blueprints arepyjnius powered bindings of Tinkerpop’s Java Blueprints.

1.1.2 Installation

There is no binary package for now so you may have some difficulties installing python-blueprints on Windows andMacOS machines, but it’s possible.

Follow the cli dance:

mkvirtualenv --system-site-packages coolprojectnamepip install cython git+git://github.com/kivy/pyjnius.git blueprints

You are ready for some graph database awesomeness in Python.

1.1.3 Getting started with core API

The python-blueprints API is straightforward it’s basicly the Blueprints API in Python, if you know Neo4j’s python-embedded the API is similar but not the same.

Create a graph

Creating a graph is just matter of knowing where to store the files and the backend you want to use, currently onlyNeo4j and OrienDB are supported.

For the purpose of the tutorial, we will use /tmp/ as storage directory.

Using Neo4j:

3

Page 8: python-graph-lovestory

python-graph-lovestory Documentation, Release 0.99

from printemps.core import Graph

graph = Graph(’neo4j’, ’/tmp/’)

Getting OrientDB running is very similar:

from printemps.core import Graph

graph = Graph(’orientdb’, ’local:/tmp/’)

A Wiki model

The following is exactly the same for both OrientDB and Neo4j. In order to make easier for everybody to under-stand how graphs works, we will model a wiki, while we introduce the base API of any graph databases used withprintemps.core.

A wiki will be a set of pages which have several revisions.

Create and modify edge and vertex

To create a vertex just call Graph.create_vertex() method inside a transaction:

with graph.transaction():wiki = graph.create_vertex()

There is no Vertex.save()method nor Edge.save(), the elements are automatically persisted if the transactionsucceed.

If you want to know the identifier of the wiki in the database to store it somewhere or learn it by hearth, you can useVertex.id(), Edge.id() does the same for edges.

Both vertex and edge work like a dictionary, you can set and get properties, they are persisted if you do it inside atransaction, I don’t know what happens outside transactions. Let’s give a name and description to our wiki vertex:

with graph.transaction():wiki[’title’] = ’Printemps Wiki’wiki[’description’] = ’My first graph based wiki’

Keys are always strings, values can be:

• strings

• integers

• list of strings

• list of integers

We will see later how it can be done, it’s very natural for Python programmers.

Now we will create a page, a page will be vertex too:

with graph.transaction():frontpage = graph.create_vertex()frontpage[’title’] = ’Welcome to Printemps Wiki’

The page needs to be linked to wiki as a part of, for that matter there is a method Graph.create_edge(start,label, end) than can be used like this:

4 Chapter 1. Walkthrough

Page 9: python-graph-lovestory

python-graph-lovestory Documentation, Release 0.99

with graph.transaction():partof = graph.edge(wiki, ’part of’, frontpage)

An edge has three important methods, that do actually nothing but return the value we are interested in, but since thoseare not editable, you access them through methods:

• Edge.start() returns the vertex where the edge is starting, in the case of partof it’s wiki vertex

• Edge.end() returns the vertex where the edge is ending, in the case of partof it’s frontpage vertex

• Edge.label() returns the label of the edge, in the case of partof it’s the string ’part of’

In general, every object you think of is a vertex, but some times some «objects» are modeled as edges, those are links.An object representing a link between two objects is an edge. If the link object involves more that two edges, then itcan be represented as an hyperedge.

Note: this is advanced topic you can skip it.

The idea behind the hyperedge is that a vertex can be linked to several other vertex using only one special edge the hy-peredge, which means the edge starts with one vertex, and ends with several vertex. Here is an example representationof an hyperedge:

1.1. Blueprints 5

Page 10: python-graph-lovestory

python-graph-lovestory Documentation, Release 0.99

This can be modeled in a graph using only vertices and simple edges with an intermediate vertex which serves as ahub for serveral edges that will link to the end vertices of the hyperedge. Here is the pattern illustrated:

6 Chapter 1. Walkthrough

Page 11: python-graph-lovestory

python-graph-lovestory Documentation, Release 0.99

Hyperedges are not part of popular graphdbs as is, so you have to use the intermediate vertex pattern.

To sum up, link objects with more that two objects involved in the link are the exception among link objects and arerepresented as vertex.

Navigation

Stay away with your motors, sails and emergency fire lighters, it’s just plain Python even though you can do it in boattoo, but this is not my issue at the present moment.

Before advancing any further, let’s sum up, we have a graph with two vertices, and one edge, it can be represented asfollow:

1.1. Blueprints 7

Page 12: python-graph-lovestory

python-graph-lovestory Documentation, Release 0.99

Because we like the wiki so much we know its identifier by hearth and stored it in a variable namedwiki_identifier, we can retrieve the wiki vertex like so:

wiki = graph.vertex(wiki_identifier)

Vertices have two kinds of edges:

• Vertex.incomings(): a generator yielding edges that ends at this vertex, currently there is none on wiki

• Vertex.outgoings(): a generator yielding edges that starts at this vertex, currently there is only one.

To retrieve the frontpage we can use next function of wiki.outgoings() to rertrieve the first and only edge asfirst hop and navigate to the index using Edge.end() as second hop:

link = next(wiki.outgoings())frontpage = link.end()

We got back our frontpage vertex back, Ulysse himself wouldn’t believe it, it’s not the same object though.

More vertices and more edges

What we have right now is only a wiki with a page and its title, but there is no content and no revisions. For that matterwe will use more edges and more vertex. Before the actual code which re-use all the above we will have a look at whatwe are going to build:

8 Chapter 1. Walkthrough

Page 13: python-graph-lovestory

python-graph-lovestory Documentation, Release 0.99

This is one of the normalized graph that can be used to represent the wiki, every graph structure that solve this problemhas its strengths, this happens, I think, to be the simplest.

First let’s create a function that create a revision for a given page given a body text, if you followed the whole tutorial itshould be easy to understand, and even if you happen to be here by mistake, I think it semantically expressive enoughto be understood by any Python programmer:

def create_revision(graph, page, body):with graph.transaction():

max_revision = 0for link in page.outgoings()

max_revision = max(link[’revision’], max_revision)new_revision = max_revision + 1# create the vertex firstrevision = graph.vertex()revision[’body’] = body# link the edge and annotate itlink = graph.edge(page, ’revised as’, revision)link[’revision’] = new_revision

create_revision does the following:

1.1. Blueprints 9

Page 14: python-graph-lovestory

python-graph-lovestory Documentation, Release 0.99

1. Look for the highest revision in edges linked to page

2. Increment the revision number for the new page

3. Create the new revision

4. Link it to page with the proper revision property on the link vertex

A basic wiki would only need to fetch the last revision that’s what we do in the following fetch_last_revisionfunction:

def fetch_last_revision(graph, page):max_revision = Nonefor link in page.outgoings()

new_revision = max(link[’revision’], max_revision)if new_revision != max_revision:

max_revision = link.end()return max_revision # if it returns None, the page is empty

That is all! Creating a page is very similar to this, so I won’t repeat the same code... Oh! I almost forgot about the listof strings as property, the following function will add the tags passed as arguments which must be a list of strings, astags property of the last revision:

def add_tags(graph, page, *tags):rev = fetch_last_revision(graph, page)rev[’tags’] = tags

The basics are straightforward. Getting links working between pages is left as an exercices to the reader.

Index

GraphDBs have index, to create an index of vertex use the following code:

pages = graph.index.create(’pages’, graph.VERTEX)

To create an index of edges do this:

revisions = graph.index.create(’revisions’, graph.EDGE)

Then you can put vertex in an index using put(key, value, element):

pages.put(’page’, ’page’, page)

key and value parameters are not really interesting in the above example but an index can be that simple. You canuse key and value to have a fine-grained index of related elements, for instances, the following snipped builds anindex for revisions, properly separating minor, major revisions and sorting them by date of revisions:

revisions.put(’all’, ’today’, r2)revisions.put(’all’, ’yesterday’, r1)revisions.put(’all’, ’before’, r0)revisions.put(’minor’, ’today’, r2)revisions.put(’major’, ’yesterday’, r1)revisions.put(’all’, ’before’, r0)

You can use Graph.index.get(name) to retrieve an index:

index = graph.index.get(’pages’)

To retrieve an index content, use Index.get, like this:

10 Chapter 1. Walkthrough

Page 15: python-graph-lovestory

python-graph-lovestory Documentation, Release 0.99

index = index.get(’pages’, ’pages’)first_page = next(index)

That’s almost all the index API, for more please refer to the API documentation.

End

When you finished working with the database don’t forget to call Graph.close().

More

If you still struggle with the API here is it with more comments:

• from blueprints import Graph

– Graph(name, path) remember that name is lower case of the databases names and the path for Ori-entDB is prepended with local:.

– Graph.transaction() is a contextmanager, thus used with with statement that starts a transaction,elements are automatically saved and you must always do mutating operations in transaction.

– Graph.create_vertex() create a vertex in a transaction.

– Graph.create_edge(start, label, end) create an edge in a transaction starting at startvertex, ending at end vertex with label as label. The tutorial doesn’t say much about labels, so I addhere that it’s a way to know which edge is which when they are several edges starting and ending at thesame vertices.

– Graph.vertex(id) and Graph.edge(id) the former retrieve the vertex with id as identifier andthe latter the edge.

– Graph.close() clean up your database after you finished work.

– Graph.edges() and Graph.vertices() were not presented because they IMO should not be usedoutside debug in an application where speed matters.

• An element is a vertex or an edge, they both are usable as dict to get and set values but can only be mutated in atransaction. Every element can be deleted with delete() method in a transaction.

• Vertex you don’t import Vertex class, you get it from Graph.vertex() or graph.get_vertex(id)or hoping through Edge.end() or Edge.starts.

– Vertex.outgoings() is a generator over the edges that are starting from the current vertex, each edgeretrieved implied a hop.

– Vertex.incomings() is a generator over the edges that are ending in the current vertex, each edgeretrieved implied a hop.

• Edge similarly are not imported, they are created with Graph.edge(start, label, end)retrieved with Graph.get_vertice(id) and via iteration of Vertex.outgoings() andVertex.incomings() generators.

– Vertex.start() retrieve starting vertex via a hop

– Vertex.end() retrieve ending vertex via a hop

– Vertex.label() retrieve the label associated with the edge.

• Similarly you don’t import the Index class, but create one using Graph.index.create(name,ELEMENT) where ELEMENT should be one of Graph.EDGE or Graph.VERTEX or retrieve the index byits name using Graph.index.get(name).

1.1. Blueprints 11

Page 16: python-graph-lovestory

python-graph-lovestory Documentation, Release 0.99

– Index.put(key, value, element put element in the key, value namespace.

– Index.get(key, vallue) to retrieve the index content, this is a generator over the index content.

hops are a metric used to compute the complexity of a query.

1.1.4 Moar doc

blueprints Package

blueprints Package

edge Module

element Module

graph Module

index Module

java Module

vertex Module

Subpackages

12 Chapter 1. Walkthrough