2014-11 apacheconeu : lizard - clustering an rdf triplestore
TRANSCRIPT
Why Cluster?➢ Service Resilience
○ Failures○ Server admin and security patches
➢ Performance / scale○ More hardware : CPU, RAM, system bus
Why Cluster?➢ Alternatives
○ Full clustering○ Master-slave (load scale only ; update issues)○ Application visible partitioning
Who am I?➢ Committer on Apache Jena
○ Deploy/operate Jena/TDB in ££job.
➢ W3C○ Co-editor on SPARQ 1.0 and 1.1 query lang
spaces○ RDF 1.1 (on syntax, inc. SPARQL alignment)○ ASF’s W3C AC representative
Acknowledgements➢ Apache➢ Partial funding : InnovateUK* ➢ Users
○ For the discussion and encouragement
* Used to be the Technology Strategy Board.UK Department for Business, Innovation & Skills
Outline➢ TDB Design➢ SPARQL Execution➢ Lizard Design➢ Back to SPARQL
TDB➢ Custom RDF Database
○ Quads + Triples
➢ Custom index code○ Threaded B+Trees
➢ Dictionary terms○ NodeId – 8 bytes○ Inline terms (numbers, dates, dateTimes)
TDB : Node Table
TDB : Indexes➢ Indexes are covering
○ Range scans○ All key, no value○ No "triple table"
SPARQL Execution{ ?x :p 123 . }
Convert to NodeIds
Look in POS to get all PO?, assign S to ?x
123 is an inline constant in TDB.
{ ?x :p 123 . ?x :q ?v . }
A database joinIndex join (Loop+substitution)
Index join (= loop) on :x1 :q ?vwhere :x1 is the value of ?x
Index Implementation➢ TDB uses threaded B+Trees for indexes
○ 8K blocks 100-way B+Tree○ Threaded : scan only touches data blocks
SPO SPO SPO ------ ------ ------
Ptr Ptr ------ ------ ------
SPO SPO SPO SPO ------ ------
Ptr Ptr Ptr ------ ------
SPO SPO SPO SPO SPO SPO SPO SPO SPO SPO ------ ------
Choices
Where to introduce distribution?
Query and Update
Indexes / B+Trees Node table / Objects
Blocks Key → Value Store
This Does Not Work (very well)
➢ Impedance mismatch○ Too much data moving about○ Little parallelism○ Bad cold-start
Distribute the storageK->V store
Index access on query processor
Query and Update
B+Trees Objects
Blocks Key→Value
Distribute
➢ Distribute the indexes○ With modified index access
➢ Distribute the nodes➢ Comms : Apache Thrift
Query and Update
B+Trees Objects
Blocks Key→Value
Clustered Node Table➢ Node Table
○ N replicas; Read R / Write We.g. W=N and R =1 =>Complete copies of node table on each data server
○ ReplaceableRequirement: NodeId for naming
Clustered Indexes➢ Indexes
○ Shard by subject○ Replica shards○ Compound access operations
Clustered IndexesIndex
Shard 1 Shard 2 Shard 3
Machine 1 Machine 2
Configuration 1
Configuration 2
Modified SPARQL Execution➢ Different unit of index access
○ subject + several predicates(subj, pred1, pred2, pred3, …)
➢ Different join algorithms○ Merge join○ Parallel hash join