2nd proj. update: integrating swi-prolog for semantic reasoning in bioclipse
DESCRIPTION
Contains a small background on the semantic web, and shows how Prolog is thought to be used from inside Bioclipse research software for RDF data handling.TRANSCRIPT
2nd Status report of degree project
Integrating Blipkit/BioProlog for semantic reasoning in Bioclipse
Samuel Lampa, 2010-01-25Project blog: http://saml.rilspace.com
2nd Status report of degree project
Integrating Blipkit/BioProlog for semantic reasoning in Bioclipse
Samuel Lampa, 2010-01-25Project blog: http://saml.rilspace.com
Some background...Some background...
What is “Semantic Web”?What is “Semantic Web”?
“Enabling more powerful use of information”
Main goals:● Data availability (on the web)● Machine-readability of data● Knowledge integration● Automatic “conclusion drawing”
● “Reasoning”, using Reasoners →
What is Semantic Web?What is Semantic Web?
This project compares two reasoners: Pellet and Blipkit
This project compares two reasoners: Pellet and Blipkit
Research questionResearch question
How do biochemical questions
formulated as Prolog queries
compare to other solutions
available in Bioclipse in terms of
speed and expressiveness?
Research questionResearch question
● Pellet/Jena● Uses W3C languages
– OWL (Class definitions)– RDF (Facts)– SPARQL (Querying)
● Blipkit/BioProlog● Uses Prolog, with W3C languages “on top”
– Class definitions, Facts and Queries either in W3C languages (“on top” of prolog) or in pure Prolog!
Semantic ReasonersSemantic Reasoners
What is Prolog?What is Prolog?
● State facts and rules● Execute by running queries over these
facts and rules
● Unique features:● Backtracking● “Closed-world assumption”
What is Prolog?What is Prolog?
Prolog code exampleProlog code example
% === SOME FACTS ===
hasHBondDonors( substanceX, 3 ). % “substance X has 3 H-bond donors”% etc …
% === A RULE ("RULE OF FIVE" ÀLA PROLOG) ===
isDrugLike( Substance ) :- hasHBondDonorsCount( Substance, HBDonors ), HBDonors <= 5, hasHBondAcceptorsCount( Substance, HBAcceptors ), HBAcceptors <= 10, hasMolecularWeight( Substance, MW ), MW < 500.
% === QUERYING THE RULE ===
?- isDrugLike(substanceX)true.?- isDrugLike(X)X = substanceX ;X = substanceY.
Prolog code exampleProlog code example
% === SOME FACTS ===
hasHBondDonors( substanceX, 3 ). % “substance X has 3 H-bond donors”% etc …
% === A RULE ("RULE OF FIVE" ÀLA PROLOG) ===
isDrugLike( Substance ) :- hasHBondDonorsCount( Substance, HBDonors ), HBDonors <= 5, hasHBondAcceptorsCount( Substance, HBAcceptors ), HBAcceptors <= 10, hasMolecularWeight( Substance, MW ), MW < 500.
% === QUERYING THE RULE ===
?- isDrugLike(substanceX)true.?- isDrugLike(X)X = substanceX ;X = substanceY.
Prolog code exampleProlog code example
Body
Head Implication (“If [body] then [head]”)
Comma means conjunction (“and”)
Capitalized terms are always variables
% === SOME FACTS ===
hasHBondDonors( substanceX, 3 ). % “substance X has 3 H-bond donors”% etc …
% === A RULE ("RULE OF FIVE" ÀLA PROLOG) ===
isDrugLike( Substance ) :- hasHBondDonorsCount( Substance, HBDonors ), HBDonors <= 5, hasHBondAcceptorsCount( Substance, HBAcceptors ), HBAcceptors <= 10, hasMolecularWeight( Substance, MW ), MW < 500.
% === QUERYING THE RULE ===
?- isDrugLike(substanceX)true.?- isDrugLike(X)X = substanceX ;X = substanceY.
Prolog code exampleProlog code example
By submitting a variable (“X”), it will be populated with all instances which satisfies the “isDrugLike” rule
Testing a specific atom (“sutstanceX”)
Where are we now?Where are we now?
Project planProject plan
What is done so far?What is done so far?
● Integration of Blipkit in Bioclipse● Done: General purpose methods● Done: Found usage strategy for combined use of
Bioclipse JS scripting and Prolog
● Comparing Prolog and Pellet● Done: Simple performance testing● Now: Stuck on NMR spectrum similarity search
– (No backtracking on arithmetic operators in SPARQL)
What is What is donedone so far? so far?
What is left?What is left?
● Integration of Prolog / Blipkit● Refinements?
● Comparing Prolog and Pellet● NMR spectrum similarity search
– Investigate use of OWL in querying– Other options? SWRL?
● ChEMBL data● Toxicity data (opentox.org)
What remains to be done?What remains to be done?
Example Bioclipse / Prolog script
Example Bioclipse / Prolog script
blipkit.init();blipkit.loadRDFToProlog("nmrshiftdata.100.rdf.xml");
// Define a “convenience prolog method”
blipkit.loadPrologCode(" \ hasPeak( Subject, Predicate ) :- \ rdf_db:rdf( Subject, \ 'http://www.nmrshiftdb.org/onto#hasPeak', \ Predicate ). \");
// Call the convenience method (which in turn executes it's // “body”), and returns all mathing results as an arrayvar resultList =blipkit.queryProlog(["hasPeak","10","Subject","Predicate"]);
Example Bioclipse/Prolog scriptExample Bioclipse/Prolog script
blipkit.init();blipkit.loadRDFToProlog("nmrshiftdata.100.rdf.xml");
// Define a “convenience prolog method”
blipkit.loadPrologCode(" \ hasPeak( Subject, Predicate ) :- \ rdf_db:rdf( Subject, \ 'http://www.nmrshiftdb.org/onto#hasPeak', \ Predicate ). \");
// Call the convenience method (which in turn executes it's // “body”), and returns all mathing results as an arrayvar resultList =blipkit.queryProlog(["hasPeak","10","Subject","Predicate"]);
Example Bioclipse/Prolog scriptExample Bioclipse/Prolog script
Prolog rule to load into prolog engine
Prolog method to callLimit the number of results Prolog variables
Current status of research questionCurrent status of research question
● Performance ● Prolog won so far. Exceptions?
● Usability ● Prolog very convenient for iterative
wrapping of complex logic. Can RDF/OWL/SPARQL replicate this?
● Where do RDF/OWL/SPARQL excel?
Current status of research questionCurrent status of research question
Project planProject plan
Thank you!Project blog: http://saml.rilspace.com
Thank you!Project blog: http://saml.rilspace.com
Project plan – Current versionProject plan – Current version
Project plan – Proposed versionProject plan – Proposed version