Page 1
django-rdflib and PostgreSQLthe best of both worlds
Ştefan Talpalaru http://od-eon.com/blogs/stefan/
Page 2
About me
- Python/Django user for the last 3 years- Linux user from the 20th century- Gentoo user since first contact- founder and CTO at Odeon Consulting Group Pte Ltd
Page 3
Outline
- RDF- full-text search- django-fts-odeon- django-rdflib- BAMS
Page 6
the triple:subject - predicate - object
Page 7
the abbreviation for "terminal nerve" in the Swanson 2004 nomenclature for ratis 'tn'
Page 8
"terminal nerve" - "nomenclature" - "Swanson 2004""terminal nerve" - "species" - "rat""terminal nerve" - "abbreviation" - "tn"
Page 9
"terminal nerve" - "abbreviation" - "tn"
Page 10
http://brancusi1.usc.edu/brain_parts/terminal-nerve-3/
Page 11
http://brancusi1.usc.edu/RDF/abbreviation/
Page 12
namespaces are syntactic sugarbams = http://brancusi1.usc.edu/RDF/bams:abbreviation
Page 13
http://brancusi1.usc.edu/brain_parts/terminal-nerve-3/http://brancusi1.usc.edu/RDF/abbreviation/"tn"
Page 14
simple model, simple storage schemes - triplestores
Page 15
the quad:subject - predicate - object - context
Page 16
labeled graphsmarking subsets of the data
Page 17
further triple/quad store complications
Page 18
optimizing frequent operations(like fetching subjects with a certain rdf:type)
Page 19
indexing single elements and pairs(to speed up lookups)
Page 21
because string matching is not good enough
Page 23
stemming algorithms
Page 24
PostgreSQLSnowball version of the Porter algorithm
Page 25
spelling dictionaries
Page 27
thesaurus - map phrases to words
Page 28
flexibility in search term ordering
Page 30
ranking by relevance
Page 31
highlighting results
Page 33
fork of django-fts
Page 34
focus on the pgsql back-end
Page 35
better Django integration- custom manager - the fts methods now survive chaining- management command to update indices
Page 36
result highlighting support
Page 37
from django.db import modelsimport fts
class Literals(fts.SearchableModel): lexical = models.TextField() search_objects = fts.SearchManager(fields=('lexical',))
Page 38
Literals.search_objects.search('foo bar', rank_field='rank')
Page 40
- pure Python library- place it in a git repository and have it work on various setups
Page 41
dependencies:- Django- PostgreSQL- South- pyparsing for the SPARQL parser- django-fts-odeon
Page 42
forked from rdflib
Page 43
rdflib looked poorly maintained at the time
Page 44
developers started dropping features instead of fixing the bugs
Page 45
first they came for SPARQL (now in a library called "rdfextras")
Page 46
then they came for most of the storage back-ends(MySQL and PostgreSQL included)
Page 47
why stick with it?
Page 48
nice pythonic query API
Page 49
no better alternative
Page 50
easier to fix/enhance than to rewrite
Page 51
Django integration
Page 52
the easy part: reusing the db connection
Page 53
from rdflib.store.MySQL import SQLfrom django.db import connection, transaction
class PostgreSQL(SQL):... def _connect(self, db=None): return connection def commit(self): transaction.commit_unless_managed()
def rollback(self): transaction.rollback_unless_managed()
Page 54
the interesting part:- using a South migration to create the db tables- creating (unmanaged) models for rdflib's tables
Page 55
from rdflib.term import Literal, URIRef, BNode, Variablefrom django_rdflib.utils import get_rdflib_store_graphstore, graph = get_rdflib_store_graph()
Page 56
triple1 = (URIRef('http://foo.com/subject1'), URIRef('http://bar.com/pred1'), Literal('obj1'))triple2 = (URIRef('http://foo.com/subject2'), URIRef('http://bar.com/pred2'), Literal('obj2'))graph.add(triple1)graph.add(triple2)graph.commit()
Page 57
quad1 = (URIRef('http://foo.com/subject1'), None, None, None )quad2 = (URIRef('http://foo.com/subject2'), None, None, None )graph.removeN([quad1, quad2])
Page 58
lit_str = """fooBar"""
q = """SELECT ?s ?p WHERE { ?s ?p \"""%s\""" .}""" % (lit_str, )
for (s, p) in graph.query(q): pprint((s, p))
Page 59
BAMS - brancusi1.usc.eduBrain Architecture Management System
Mihail Bota - Associate Professor (Research) of Biological Sciences, University of Southern California
Page 60
collating neurobiological data from scientific publications
Page 61
PHP/MySQL -> Python/Django/PostgreSQL/RDF
Page 68
Foundational Model of Connectivity(Swanson & Bota, 2010)
Page 74
https://github.com/odeoncg/django-fts-odeonhttps://github.com/odeoncg/django-rdflib