data access over large semi-structured databasespaiva lima da silva bruno 12 / 48. rule-based data...

65
Rule-Based Data Access ALASKA Experimental work CSP Conclusion Data Access over Large Semi-Structured Databases A Generic Approach Towards Rule-Based Systems Bruno PAIVA LIMA DA SILVA Soutenance de th` ese Lundi 13 Janvier 2014 Rapporteurs Ollivier HAEMMERL ´ E ............................................ IRIT, Universit´ e Toulouse III Fabien GANDON ....................................................... INRIA, Sophia-Antipolis Examinateurs Odile PAPINI .............................................. LSIS, Universit´e Aix-Marseille II Directeur de th` ese Marie-Laure MUGNIER ........................................ LIRMM, Universit´e Montpellier II Co-Encadrants de th` ese Jean-Fran¸ cois BAGET ............................................................ INRIA, LIRMM Madalina CROITORU .......................................... LIRMM, Universit´e Montpellier II Data Access over Large Semi-Structured Databases PAIVA LIMA DA SILVA Bruno 1 / 48

Upload: others

Post on 30-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Data Access over LargeSemi-Structured Databases

A Generic Approach Towards Rule-Based Systems

Bruno PAIVA LIMA DA SILVA

Soutenance de theseLundi 13 Janvier 2014

RapporteursOllivier HAEMMERLE ............................................ IRIT, Universite Toulouse III

Fabien GANDON ....................................................... INRIA, Sophia-Antipolis

ExaminateursOdile PAPINI .............................................. LSIS, Universite Aix-Marseille II

Directeur de theseMarie-Laure MUGNIER ........................................ LIRMM, Universite Montpellier II

Co-Encadrants de theseJean-Francois BAGET ............................................................ INRIA, LIRMM

Madalina CROITORU .......................................... LIRMM, Universite Montpellier II

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 1 / 48

Page 2: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Ontology-Based Data Access

In this work, we address the:

Ontology-Based Data Access (OBDA) problem.[Also known as Ontological Conjunctive Query Answering (OCQA)]

This is a cross-domain problem between the domains of:

Knowledge Representation (KR)

Artificial Intelligence (AI)

Databases

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 2 / 48

Page 3: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Ontology-Based Data Access

Given:

Knowledge base (KB)

Factual knowledge (F)

Ontology (Universal knowledge) (O)

(Boolean) Conjunctive Query (Q)

OBDA consists in verifying if the query Q can be deduced from theknowledge base (if there is an answer to the query in the KB).

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 3 / 48

Page 4: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Example

Example of a knowledge base:

(a) Alice lives and works in Paris.

(b) Alice is a computer analyst.

(c) Bob lives in Montpellier, and works in Nımes.

(d) Bob is a school teacher.

Ontology:

1 “If someone is a school teacher, then he teaches Latin.”

2 “If someone is a computer analyst, then he knows programming.”

3 “If someone lives in Montpellier and works in Nımes, then he drives to work.”

Given the following queries:

Q1: Is there a school teacher who lives in Montpellier?Q2: Is there anyone who drives to work?

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 4 / 48

Page 5: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Example

Example of a knowledge base:

(a) Alice lives and works in Paris.

(b) Alice is a computer analyst.

(c) Bob lives in Montpellier, and works in Nımes.

(d) Bob is a school teacher.

Ontology:

1 “If someone is a school teacher, then he teaches Latin.”

2 “If someone is a computer analyst, then he knows programming.”

3 “If someone lives in Montpellier and works in Nımes, then he drives to work.”

Given the following queries:

Q1: Is there a school teacher who lives in Montpellier?Q2: Is there anyone who drives to work?

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 4 / 48

Page 6: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Example

Example of a knowledge base:

(a) Alice lives and works in Paris.

(b) Alice is a computer analyst.

(c) Bob lives in Montpellier, and works in Nımes.

(d) Bob is a school teacher.

Ontology:

1 “If someone is a school teacher, then he teaches Latin.”

2 “If someone is a computer analyst, then he knows programming.”

3 “If someone lives in Montpellier and works in Nımes, then he drives to work.”

Given the following queries:

Q1: Is there a school teacher who lives in Montpellier?Q2: Is there anyone who drives to work?

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 4 / 48

Page 7: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Example

Example of a knowledge base:

(a) Alice lives and works in Paris.

(b) Alice is a computer analyst.

(c) Bob lives in Montpellier, and works in Nımes

(d) Bob is a school teacher.

Ontology:

1 “If someone is a school teacher, then he teaches Latin.”

2 “If someone is a computer analyst, then he knows programming.”

3 “If someone lives in Montpellier and works in Nımes, then he drives to work.”

Given the following queries:

Q1: Is there a school teacher who lives in Montpellier?Q2: Is there anyone who drives to work?

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 4 / 48

Page 8: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Example

Example of a knowledge base:

(a) Alice lives and works in Paris.

(b) Alice is a computer analyst.

(c) Bob lives in Montpellier, and works in Nımes.

(d) Bob is a school teacher.

Ontology:

1 “If someone is a school teacher, then he teaches Latin.”

2 “If someone is a computer analyst, then he knows programming.”

3 “If someone lives in Montpellier and works in Nımes, then he drives to work.”

Given the following queries:

Q1: Is there a school teacher who lives in Montpellier?Q2: Is there anyone who drives to work?

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 4 / 48

Page 9: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Example

Example of a knowledge base:

(a) Alice lives and works in Paris.

(b) Alice is a computer analyst.

(c) Bob lives in Montpellier, and works in Nımes.

(d) Bob is a school teacher.

Ontology:

1 “If someone is a school teacher, then he teaches Latin.”

2 “If someone is a computer analyst, then he knows programming.”

3 “If someone lives in Montpellier and works in Nımes, then he drives to work.”

Given the following queries:

Q1: Is there a school teacher who lives in Montpellier?Q2: Is there anyone who drives to work?

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 4 / 48

Page 10: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Ontology

There are two common methods for using ontological content:

Forward chaining:Hypothesis of the rules are queried in the facts and theconclusions of those are added to the facts, until the base issaturated.

Backwards chaining:Initial query is decomposed/rewritten according to the rules ofthe ontology. Those new queries are then applied to theknowledge base.

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 5 / 48

Page 11: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Approaches for OBDA

There are currently two distinct manners to represent ontologicaldata:

Rules (RBDA)

OBDA

DescriptionLogics

Figure: Approaches for the OBDA problem.

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 6 / 48

Page 12: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Table of contents

1 Rule-Based Data Access

2 ALASKA

3 Experimental work

4 CSP

5 Conclusion

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 7 / 48

Page 13: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

1 Rule-Based Data Access

2 ALASKA

3 Experimental work

4 CSP

5 Conclusion

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 8 / 48

Page 14: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Formalism

We use a decidable subset of First-Order Logic (FOL) to represent the problem:

Terms: Constants or variablesAlice, Bob, Paris, Montpellier, Latin, Programming, x, y, z ...

Predicates: Relations between termsman/1, woman/1, drives-to-work/1, lives/2, teaches/2, knows/2 ...

Atoms: Single piece of informationman(Bob), computerAnalyst(Alice), lives(Bob,Nımes)

Facts: A fact is a set of atoms{city(Montpellier)}, {person(x),lives(x,Montpellier),works(x,Nımes)}, ...

Rules: Composed of two facts

1 [body] {schoolTeacher(x)} → [head] {teaches(x,Latin)}2 [body] {computerAnalyst(x)} → [head] {knows(x,Programming)}3 [body] {lives(x,Montpellier),works(x,Nımes)} → [head] {drives-to-work(x)}

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 9 / 48

Page 15: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Problem definition

According to this formalism, we have:

Factual knowledge (F): A fact or the union of facts

Ontology (O): A set of rules

Conjunctive Query (Q): A query is a fact

F |= QProblem: Finding substitutions (Entailment)

{F ,O} |= QProblem: Using ontological information, Finding substitutions(Rule-Entailment)

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 10 / 48

Page 16: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Elementary operations

The efficiency of RBDA depends on the efficiency of some elementaryoperations:

In forward chaining:

Calling the entailment algorithm for each rule hypothesis andinserting small new pieces of information to the KB.

Finding an answer to the query in a saturated knowledge base.

In backwards chaining:

[Rewriting queries according to the ontology]

Efficiency of calling the entailment algorithm for each rewriting.

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 11 / 48

Page 17: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Context

Semi-structured data [Abiteboul,1997]Knowledge bases with: “irregular, partial or implicit structure”, “schema

is ignored”, “schema evolving rapidly”, “difficult distinction between

schema and data”, etc.

Emergence of very large and semi-structured knowledge bases:i.e. the web.

Emergence of databases with different data models (seeNoSQL).

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 12 / 48

Page 18: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Research question

Given the problem (RBDA) and the context, our research questionis the following:

“How to provide a platform providing under a unified logicalframework different implementation approaches foraddressing the Rule-Based Data Access problem?”

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 13 / 48

Page 19: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

State of the art

We have studied different existing tools/approaches for adressingthe RBDA problem according to their ability to:

Represent a knowledge base under our formalism.

Perform conjunctive queries.

Represent rules and perform conjunctive queries with thepresence of rules.

Handle data stored in secondary memory (disk).

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 14 / 48

Page 20: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Prolog

Representing a knowledgebase: OK

next-to(d1,d2).next-to(d2,t).reads(d1,n).reads(d2,n).reads(t,n).same-suit(d1,d2).dressed-in(d1,Black).dressed-in(t,Blue).

Conjunctive queries: Native

Conjunctive queries withrules: Native

Does not handle data insecondary memory

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 15 / 48

Page 21: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

CoGITaNT

Representing a knowledge base:OK

> : black

dressed-in

same-suit

> : d1 next-to > : d2

reads reads

> : n

1

2

1 2

1 2

1

2

1

2

Conjunctive queries: Native

Conjunctive queries with rules:Native

Does not handle data insecondary memory

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 16 / 48

Page 22: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Relational databases

Representing a knowledge base:OK

next-to

t1 t2

d1 d2d2 t

reads

t1 t2

d1 nd2 nt n

dressed-in

t1 t2

d1 Blackt Blue

same-suit

t1 t2

d1 d2

Conjunctive queries: Nativewith SQL

Conjunctive queries with rules:Not Native

Handles data in secondarymemory

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 17 / 48

Page 23: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Triples stores

Representing a knowledge base:OK

d1 next-to d2.

d2 next-to t.

d1 reads n.

d2 reads n.

t reads n.

d1 same-suit d2.

d1 dressed-in Black.

t dressed-in Blue.

Conjunctive queries: Nativewith SPARQL

Conjunctive queries with rules:Not Native

Handles data in secondarymemory

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 18 / 48

Page 24: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Graph databases

Representing a knowledge base:OK

label: d1dressed-in: black

label: d2

label: n

same-suit

next-to

readsreads

Conjunctive queries: Notnative in every GDBMS

Conjunctive queries with rules:Not Native

Handles data in secondarymemory

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 19 / 48

Page 25: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Recap

The table below recaps the information from the previous slides:

- KB Rep. F |= Q F ,R |= Q Sec. memoryProlog Yes Native Native No

CoGITaNT Yes Native Native NoRelational Databases Yes Native (SQL) Not native Yes

Triple Stores Yes Native (SPARQL) Not native YesGraph Databases Yes Not native Not native Yes

Figure: Table comparing the features of the studied methods.

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 20 / 48

Page 26: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Research question

Back to the research question:

“How to provide a platform providing under a unified logicalframework different implementation approaches foraddressing the Rule-Based Data Access problem?”

FOL-based formalism

Relational databases Triples stores Graph databases

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 21 / 48

Page 27: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

1 Rule-Based Data Access

2 ALASKA

3 Experimental work

4 CSP

5 Conclusion

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 22 / 48

Page 28: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

ALASKA platform

ALASKA platform

Abstract Logic-based Architecture Storage systems & Knowledge base Analysis

Its goal is to enable to integrate, in a generic manner, heterogenous storage

systems.

Such integration is made using an intermediary language, which is our logical

formalism.

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 23 / 48

Page 29: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Features & details

Multi-layered architecture: Program goes from higher level

operations down to I/O disk functions.

Classes and interfaces in the abstract layer ensure all the connected

systems use a common datatype (based on our formalism).

Written in JAVA: loss in efficiency, but database connection libraries

are often directly available.

Systems connected so far: Jena TDB, Sqlite, DEX, Neo4j -

[Non-definitive list]

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 24 / 48

Page 30: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Class diagram

KRRoperations

IFact

< interface >

IAtom

< interface >

ITerm

< interface >

GDBConnectors

RDBConnectors

TSConnectors

Predicate TermAtom

GDB RDB TS

Applicationlayer (1)

Abstractlayer (2)

Translationlayer (3)

Datalayer (4)

Figure: Class diagram of ALASKA architecture.

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 25 / 48

Page 31: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Elementary operations

The IFact interface defines methods that need to be implementedby every storage system connected to ALASKA.

The implementation of each of those functions differs according tothe data model of the store. The following cases are the principal:

Adding an atom (inserting new information)

Listing the neighbourhood of a term (enumerating)

Verifying the existence of an atom (checking)

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 26 / 48

Page 32: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Adding an atom

F : {friend(Alice,Bob), friend(Alice,Carl), friend(Bob,Carl)}Adding the atom enemy(Bob,Diana) to F

At ALASKA level:F : {friend(Alice,Bob), friend(Alice,Carl), enemy(Bob,Carl), enemy(Bob,Diana)}

In a triples store:

Alice friend Bob.

Alice friend Carl.

Bob friend Carl.

Bob enemy Diana.

In a relational database:

friendt1 t2

Alice BobAlice CarlBob Carl

enemyt1 t2

Bob Diana

In a graph database:

Alice Bob

Carl

Diana

friend

friend friend

enemy

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 27 / 48

Page 33: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Adding an atom

F : {friend(Alice,Bob), friend(Alice,Carl), friend(Bob,Carl)}Adding the atom enemy(Bob,Diana) to F

At ALASKA level:F : {friend(Alice,Bob), friend(Alice,Carl), enemy(Bob,Carl)}

In a triples store:

Alice friend Bob.

Alice friend Carl.

Bob friend Carl.

Bob enemy Diana.

In a relational database:

friendt1 t2

Alice BobAlice CarlBob Carl

enemyt1 t2

Bob Diana

In a graph database:

Alice Bob

Carl

Diana

friend

friend friend

enemy

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 27 / 48

Page 34: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Adding an atom

F : {friend(Alice,Bob), friend(Alice,Carl), friend(Bob,Carl)}Adding the atom enemy(Bob,Diana) to F

At ALASKA level:F : {friend(Alice,Bob), friend(Alice,Carl), enemy(Bob,Carl), enemy(Bob,Diana)}

In a triples store:

Alice friend Bob.

Alice friend Carl.

Bob friend Carl.

Bob enemy Diana.

In a relational database:

friendt1 t2

Alice BobAlice CarlBob Carl

enemyt1 t2

Bob Diana

In a graph database:

Alice Bob

Carl

Diana

friend

friend friend

enemy

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 27 / 48

Page 35: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Adding an atom

F : {friend(Alice,Bob), friend(Alice,Carl), friend(Bob,Carl)}Adding the atom enemy(Bob,Diana) to F

At ALASKA level:F : {friend(Alice,Bob), friend(Alice,Carl), enemy(Bob,Carl), enemy(Bob,Diana)}

In a triples store:

Alice friend Bob.

Alice friend Carl.

Bob friend Carl.

Bob enemy Diana.

In a relational database:

friendt1 t2

Alice BobAlice CarlBob Carl

enemyt1 t2

Bob Diana

In a graph database:

Alice Bob

Carl

Diana

friend

friend friend

enemy

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 27 / 48

Page 36: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Adding an atom

F : {friend(Alice,Bob), friend(Alice,Carl), friend(Bob,Carl)}Adding the atom enemy(Bob,Diana) to F

At ALASKA level:F : {friend(Alice,Bob), friend(Alice,Carl), enemy(Bob,Carl), enemy(Bob,Diana)}

In a triples store:

Alice friend Bob.

Alice friend Carl.

Bob friend Carl.

Bob enemy Diana.

In a relational database:

friendt1 t2

Alice BobAlice CarlBob Carl

enemyt1 t2

Bob Diana

In a graph database:

Alice Bob

Carl Diana

friend

friend friend enemy

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 27 / 48

Page 37: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Backtracking algorithm

Using the methods detailed above, we have written a genericbacktracking algorithm:

Very simple backtracking algorithm. No optimization.

Every time it needs to read an information from the knowledgebase, it uses functions defined in ALASKA abstract layer.

The use of those ensures that it works with every systemconnected to ALASKA, and its efficiency relies on theefficiency of the store for the reasoning elementary operations.

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 28 / 48

Page 38: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Use cases for ALASKA

ALASKA can also be/has also been used for:

Implementing rules algorithms

Storing/querying graphs in the Qualinca project.

A for Abstract: Any KR/AI application requiring transparentreading and writing access to disk-stored information may useALASKA.

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 29 / 48

Page 39: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

1 Rule-Based Data Access

2 ALASKA

3 Experimental work

4 CSP

5 Conclusion

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 30 / 48

Page 40: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

ALASKA for RBDA

Our current goal is to use ALASKA to evaluate the efficiency ofthe connected systems on elementary operations:

Storage tests:

Measuring the time and size when storing increasingly large

knowledge bases on disk.

Querying tests:

Measuring the time that each system takes to answer a set of

queries using different algorithms/query engines.

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 31 / 48

Page 41: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Storage workflow

RDF File Input Manager RDF Parser

IFact Manager

IFact to GraphTranslation

IFact to RDBTranslation

IFact to TriplesTranslation

Graph DBRelational DBTriples Store

Layer (1)

Layer (2)

Layer (3)

Layer (4)

Figure: Workflow for storing a knowledge base in RDF using ALASKA.

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 32 / 48

Page 42: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Buffering algorithm

1 Storing all information at once: forbidden (context).

2 Storing piece by piece: highly unefficient in transactional systems.

We have used an algorithm with a read-and-write atom buffer:

Parsing level:

Using N-TRIPLES files instead of RDF/XML: reading and filling the buffer.

Transaction management:

Manual management of transactions. Limiting write operations on disk.

Garbage collection:

Recycling JAVA objects in order to reduce GC work, thus memory overload.

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 33 / 48

Page 43: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Storage results

Using our platform, we have evaluated the storage efficiency ofdifferent storage systems:

Knowledge Base

Transformation into IFact

RelationalDatabase

GraphDatabase

TriplesStore

0 10 20 30 40 50 60 70 80 90 1000

0.5

1

1.5

·104

Size (Millions of triples)

Tim

e(s

)

Stores

Jena TDB DEX Sqlite Neo4J

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 34 / 48

Page 44: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Results

Size of the stored knowledge basesSystem 5M 20M 40M 75M 100M

DEX 55 Mb 214.2 Mb 421.7 Mb 785.1 Mb 1.0 GbNeo4J 157.4 Mb 629.5 Mb 1.2 Gb 2.3 Gb 3.1 GbSqlite 767.4 Mb 2.9 Gb 6.0 Gb 11.6 Gb 15.5 Gb

Jena TDB 1.1 Gb 3.9 Gb 7.5 Gb 13.9 Gb 18.1 GbRDF File 533.2 Mb 2.1 Gb 4.2 Gb 7.8 Gb 10.4 Gb

Figure: Storage time and KB sizes in different systems

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 35 / 48

Page 45: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Querying workflow

Q

AbstractArchitecture

Q → SQLQ → SPARQLGeneric Backtrack

Algorithm

F stored inGraph DB

F stored inRelational DB

F stored inTriple Store

Figure: Workflow for querying with ALASKA.

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 36 / 48

Page 46: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Querying results

1 creator(X,PaulErdoes) . creator(X,Y)

Returns the persons and the papers that were written with Paul Erdoes.

2 type(X,Article) . journal(X,Journal1-1940) . creator(X,Y)

Returns the creators of all the elements that are articles and were published inJournal 1 (1940).

0 10 20 30 40 50 60 70 80 90 1000

20

40

60

Size (Thousand of triples)

Tim

e(m

s)

Q1

Jena(BT) Jena(SPARQL)

DEX(BT) Neo4J(BT)

Sqlite(SQL)

0 10 20 30 40 50 60 70 80 90 1000

100

200

300

400

Size (Thousand of triples)

Tim

e(m

s)

Q2

Jena(BT) Jena(SPARQL)

DEX(BT) Neo4J(BT)

Sqlite(SQL)

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 37 / 48

Page 47: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Conclusions

The results of the querying tests have revealed two different open issues:

1 Difficulty of scaling when performing less complex queries

2 Generic backtrack algorithm unefficient on more complex queries,even for smaller KBs

An optimization of our generic algorithm is needed: they are known inthe domain of CSP.

Instead of implementing them one by one, we have chosen to integrate a

CSP solver within ALASKA, to benefit from such optimizations.

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 38 / 48

Page 48: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

1 Rule-Based Data Access

2 ALASKA

3 Experimental work

4 CSP

5 Conclusion

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 39 / 48

Page 49: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

New querying workflow

Q

AbstractArchitecture

Q → SQLQ → SPARQLGeneric Backtrack

AlgorithmCSP Solver

F stored inGraph DB

F stored inRelational DB

F stored inTriple Store

Figure: Workflow for querying with ALASKA.

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 40 / 48

Page 50: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Transformation #1

Example:

F :{computerAnalyst(Alice),

schoolTeacher(Bob),fireman(Carl),

computerAnalyst(Diana)

lives(Alice,Paris),lives(Bob,Montpellier),

lives(Carl,Paris),lives(Diana,Paris)

likes(Bob,Alice),likes(Diana,Carl)}

1 2 3Alice Bob Carl

4 5 6Diana Paris Montpellier

Q = {likes(x,y),computerAnalyst(y), lives(y,Paris)}

x [1,2,3,4,5,6]

y [1,2,3,4,5,6] Paris [5]

likes

lives

cpuA

Solutions to the network:{{x = 2, y = 1, Paris = 5}}

Answers to the query:{{x = Bob, y = Alice, Paris = Paris}}

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 41 / 48

Page 51: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Transformation #1

Example:

F :{computerAnalyst(Alice),

schoolTeacher(Bob),fireman(Carl),

computerAnalyst(Diana)

lives(Alice,Paris),lives(Bob,Montpellier),

lives(Carl,Paris),lives(Diana,Paris)

likes(Bob,Alice),likes(Diana,Carl)}

1 2 3Alice Bob Carl

4 5 6Diana Paris Montpellier

Q = {likes(x,y),computerAnalyst(y), lives(y,Paris)}

x [1,2,3,4,5,6]

y [1,2,3,4,5,6] Paris [5]

likes

lives

cpuA

Solutions to the network:{{x = 2, y = 1, Paris = 5}}

Answers to the query:{{x = Bob, y = Alice, Paris = Paris}}

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 41 / 48

Page 52: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Transformation #1

Example:

F :{computerAnalyst(Alice),

schoolTeacher(Bob),fireman(Carl),

computerAnalyst(Diana)

lives(Alice,Paris),lives(Bob,Montpellier),

lives(Carl,Paris),lives(Diana,Paris)

likes(Bob,Alice),likes(Diana,Carl)}

1 2 3Alice Bob Carl

4 5 6Diana Paris Montpellier

Q = {likes(x,y),computerAnalyst(y), lives(y,Paris)}

x [1,2,3,4,5,6]

y [1,2,3,4,5,6] Paris [5]

likes

lives

cpuA

Solutions to the network:{{x = 2, y = 1, Paris = 5}}

Answers to the query:{{x = Bob, y = Alice, Paris = Paris}}

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 41 / 48

Page 53: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Transformation #1

Example:

F :{computerAnalyst(Alice),

schoolTeacher(Bob),fireman(Carl),

computerAnalyst(Diana)

lives(Alice,Paris),lives(Bob,Montpellier),

lives(Carl,Paris),lives(Diana,Paris)

likes(Bob,Alice),likes(Diana,Carl)}

1 2 3Alice Bob Carl

4 5 6Diana Paris Montpellier

Q = {likes(x,y),computerAnalyst(y), lives(y,Paris)}

x [2,4]

y [1] Paris [5]

likes(2,1),(4,3)

lives(1,5),(3,5),(4,5)

cpuA(1,4)

Solutions to the network:{{x = 2, y = 1, Paris = 5}}

Answers to the query:{{x = Bob, y = Alice, Paris = Paris}}

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 41 / 48

Page 54: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Transformation #1

Example:

F :{computerAnalyst(Alice),

schoolTeacher(Bob),fireman(Carl),

computerAnalyst(Diana)

lives(Alice,Paris),lives(Bob,Montpellier),

lives(Carl,Paris),lives(Diana,Paris)

likes(Bob,Alice),likes(Diana,Carl)}

1 2 3Alice Bob Carl

4 5 6Diana Paris Montpellier

Q = {likes(x,y),computerAnalyst(y), lives(y,Paris)}

x [2,4]

y [1] Paris [5]

likes(2,1),(4,3)

lives(1,5),(3,5),(4,5)

cpuA(1,4)

Solutions to the network:{{x = 2, y = 1, Paris = 5}}

Answers to the query:{{x = Bob, y = Alice, Paris = Paris}}

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 41 / 48

Page 55: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Transformation #1

Example:

F :{computerAnalyst(Alice),

schoolTeacher(Bob),fireman(Carl),

computerAnalyst(Diana)

lives(Alice,Paris),lives(Bob,Montpellier),

lives(Carl,Paris),lives(Diana,Paris)

likes(Bob,Alice),likes(Diana,Carl)}

1 2 3Alice Bob Carl

4 5 6Diana Paris Montpellier

Q = {likes(x,y),computerAnalyst(y), lives(y,Paris)}

x [2,4]

y [1] Paris [5]

likes(2,1),(4,3)

lives(1,5),(3,5),(4,5)

cpuA(1,4)

Solutions to the network:{{x = 2, y = 1, Paris = 5}}

Answers to the query:{{x = Bob, y = Alice, Paris = Paris}}

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 41 / 48

Page 56: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Transformation

Once again, reading the whole KB prior to an operation: forbidden.

In order to adapt our problem into a CSP, a different transformation isneeded:

The network is built according to the query, but no informationfrom the KB is read at construction time.

Introduction of a new constraint type: once it is triggered, it adds anew constraint to the network and then reads the feasible tuplesfrom the KB.

Introduction of a threshold value: constraints are only triggeredwhen the domain of one of the variables goes under that value.

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 42 / 48

Page 57: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Transformation #2

Example: Q = {p(x , y), q(y ,A), r(x ,A)}

x [1:N]

y [1:N] A [1:N]

p

q

r

Figure: Example of a CSP network representing a query.

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 43 / 48

Page 58: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Transformation #2

Example: Q = {p(x , y), q(y ,A), r(x ,A)}

x [1:N]

y [1:N] A [1]

p

q

r

A=1

Figure: Example of a CSP network representing a query.

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 43 / 48

Page 59: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Transformation #2

Example: Q = {p(x , y), q(y ,A), r(x ,A)}

x [x1,x2,..., xn1]

y [y1,y2,..., yn2] A [1]

p

q

r

enumq(?,A)

enumr(?,A)

A=1

Figure: Example of a CSP network representing a query.

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 43 / 48

Page 60: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Transformation #2

Example: Q = {p(x , y), q(y ,A), r(x ,A)}

x [x ′k ,x ′k ,..., xk1]

y [y ′1,y ′2,..., yk2] A [1]

p

q

r

enumq(?,A)

enumr(?,A)

A=1

checkp(xi ,yj)

Figure: Example of a CSP network representing a query.

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 43 / 48

Page 61: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Current issues

So far, different issues have appeared for performing conjunctivequeries over large knowledge bases using a CSP solver:

Queries must contain a constant term in order to the solverstarts.

The max value of explored terms per variable (N) is very low(around 30K-35K).

The backtracking process of the solver needs to be manuallydefined.

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 44 / 48

Page 62: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

1 Rule-Based Data Access

2 ALASKA

3 Experimental work

4 CSP

5 Conclusion

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 45 / 48

Page 63: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Contributions

The achievements of this work are:

A generic and logic-based software architecture allowing thecommunication to different storage systems through an unified language.

The ability of using such architecture to write higher-level reasoningprograms independently of how the data to be manipulated is stored.

An algorithm that enables the storage of a very large knowledge base ondisk in a single machine and avoids full memory consumption.

The use of such architecture to perform efficiency comparisons on theelementary operations of the RBDA problem.

The transformation of the problem of querying a knowledge base into aconstraint solving problem and its adaptation to large knowledge bases.

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 46 / 48

Page 64: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

Future work

Implementation and integration of rules algorithms intoALASKA.

The perspective of enabling ALASKA to integrate DescriptionLogics.

Using indexing techniques for the CSP version of our problemwhen only propagating is not enough for finding substitutions.

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 47 / 48

Page 65: Data Access over Large Semi-Structured DatabasesPAIVA LIMA DA SILVA Bruno 12 / 48. Rule-Based Data AccessALASKAExperimental workCSPConclusion Research question Given the problem (RBDA)

Rule-Based Data Access ALASKA Experimental work CSP Conclusion

The end

Thank you for your attention!

Data Access over Large Semi-Structured Databases

PAIVA LIMA DA SILVA Bruno 48 / 48