deer - rdf data extraction and enrichment framework
TRANSCRIPT
Motivation Approach Evaluation Conclusion and Future Work
DEERRDF Data Extraction and Enrichment Framework
Mohamed Ahmed Sherif
September 15, 2015
Mohamed Ahmed Sherif — DEER 1/31
Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Mohamed Ahmed Sherif — DEER 2/31
Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Mohamed Ahmed Sherif — DEER 3/31
Motivation Approach Evaluation Conclusion and Future Work
RDF Extraction & Enrichment
Need for enriched datasets
TourismQuestion AnsweringEnhanced Reality...
RDF extraction and enrichment
Triples to be added to the originalKB and/orTriples to be deleted from theoriginal KB
Mohamed Ahmed Sherif — DEER 4/31
Motivation Approach Evaluation Conclusion and Future Work
Example
GeoSuPhat
mo:MusicArtist
... artist name of George (Pete) Peterson. Born and raisednear the shores of Lake Michigan and its cold windy wintershe was Influenced by bands like Pink Floyd, Kraftwerk, ToddRundgren, ELP, Hawkwind and his love of computers andsynthesis. After serving in the military and a few yearsgoing to college in Colorado where he studied electronicsand computer programming. George moved to Europe andsettled down in Germany where he had met his wifeChristine. ...
jamendo:336993
foaf:name
a
mo:biography
Mohamed Ahmed Sherif — DEER 5/31
Motivation Approach Evaluation Conclusion and Future Work
DEER Enrichment Functions
Idea
Generate enrichment data based on existing data
Input/output are RDF datasets
Current Enrichment Functions
Dereferencing
Linking
NLP
Conformation
Filter
Mohamed Ahmed Sherif — DEER 6/31
Motivation Approach Evaluation Conclusion and Future Work
DEER Enrichment Operators
Idea
Create complex workflows by connecting modules
Input/output are RDF datasets
Current Enrichment Operators
Clone
Merge
Mohamed Ahmed Sherif — DEER 7/31
Motivation Approach Evaluation Conclusion and Future Work
DEER Specification Paradigm
d1
Dereferencing
d2
cloned3
NLP
d4
Filter
d5 d6Merge
d7
AuthorityConform
d8
Mohamed Ahmed Sherif — DEER 8/31
Motivation Approach Evaluation Conclusion and Future Work
Manual DEER Specification
1 @p r e f i x : <ht tp : // geoknow . org / s p e c s o n t o l o g y/> .2 @p r e f i x r d f s : <ht tp : //www.w3 . org /2000/01/ rd f−schema#> .3 @p r e f i x geo : <ht tp : //www.w3 . org /2003/01/ geo/wgs84 pos#> .4 : d1 a : Datase t ;5 : ha sU r i <ht tp : // dbped ia . org / r e s o u r c e / Be r l i n> ;6 : f romEndPoint <ht tp : // dbped ia . org / spa r q l> .7 : d2 a : Datase t .8 : d3 a : Datase t .9 : d4 a : Datase t .
10 : d5 a : Datase t .11 : d6 a : Datase t .12 : d7 a : Datase t .13 : d8 a : Datase t ;14 : o u t p u t F i l e ” De e rBe r l i n . t t l ” ;15 : outputFormat ” Tu r t l e ” .16 : d e r e f a : Module , : De r e f e r enc i ngModu l e ;17 r d f s : l a b e l ” De r e f e r e n c i n g module” ;18 : ha s I npu t : d1 ;19 : hasOutput : d2 ;20 : hasParameter : dere fParam1 .21 : dere fParam1 a : ModuleParameter , : De re f e r enc ingModu l ePa ramete r ;22 : hasKey ” i n pu tP r op e r t y 1 ” ;23 : hasVa lue geo : l a t .24 : c l o n e a : Operator , : C loneOpera to r ;25 r d f s : l a b e l ” Clone op e r a t o r ” ;26 : ha s I npu t : d2 ;27 : hasOutput : d3 , : d4 .28 : n l p a : Module , : NLPModule ;29 r d f s : l a b e l ”NLP module” ;30 : ha s I npu t : d3 ;31 : hasOutput : d5 ;32 : hasParameter : nlpPram1 , : nlpPram2 .33 : nlpPram1 a : ModuleParameter , : NLPModuleParameter ;34 : hasKey ” useFoxL igh t ” ;35 : hasVa lue ”OFF” .36 : nlpPram2 a : ModuleParameter , : NLPModuleParameter ;37 : hasKey ” askEndPoint ” ;38 : hasVa lue f a l s e .39 : f i l t e r a : Module , : F i l t e rModu l e ;40 r d f s : l a b e l ” F i l t e r module” ;41 : ha s I npu t : d4 ;42 : hasOutput : d6 ;43 : hasParameter : F i l t e rP r am1 .44 : f i l t e r P r am1 a : ModuleParameter , : NLPModuleParameter ;45 : hasKey ” t r i p l e s P a t t e r n ” ;46 : hasVa lue ”? s <ht tp : // dbped ia . org / on to l ogy / ab s t r a c t> ?o ” .47 : merge a : Operator , : MergeOperator ;48 r d f s : l a b e l ”Merge op e r a t o r ” ;49 : ha s I npu t : d6 , : d5 ;50 : hasOutput : d7 .51 : aconform a : Module , : Author i t yCon fo rmat ionModu le ;52 r d f s : l a b e l ” Au tho r i t y Conformat ion module” ;53 : ha s I npu t : d7 ;54 : hasOutput : d8 ;55 : hasParameter : aconformPram1 , : aconformPram2 .56 : aconformPram1 a : ModuleParameter , : NLPModuleParameter ;57 : hasKey ” s o u r c e S ub j e c tAu t h o r i t y ” ;58 : hasVa lue ” h t tp :// dbped ia . org ” .59 : aconformPram2 a : ModuleParameter , : NLPModuleParameter ;60 : hasKey ” t a r g e t S u b j e c tA u t h o r i t y ” ;61 : hasVa lue ” h t tp :// dee r . o rg ” .
Mohamed Ahmed Sherif — DEER 9/31
Motivation Approach Evaluation Conclusion and Future Work
Manual KB Enrichment
Manual customized enrichment pipelines
⊕ Leads to the expected results
Time consuming
Cannot be ported easily to other datasets
Mohamed Ahmed Sherif — DEER 10/31
Motivation Approach Evaluation Conclusion and Future Work
Automatic KB Enrichment
Enrichment pipeline M : K → K that maps KB K to anenriched KB K ′ with K ′ = M(K ).
M is an ordered list of atomic enrichment functionsm ∈M
M =
{φ if K = K ′,
(m1, . . . ,mn),where mi ∈M, 1 ≤ i ≤ n otherwise.
Research questions
1 How to create self-configuring atomic enrichmentfunctions m ∈M?
2 How to automatically generate an enrichment pipeline M?
Mohamed Ahmed Sherif — DEER 11/31
Motivation Approach Evaluation Conclusion and Future Work
Automatic KB Enrichment
Enrichment pipeline M : K → K that maps KB K to anenriched KB K ′ with K ′ = M(K ).
M is an ordered list of atomic enrichment functionsm ∈M
M =
{φ if K = K ′,
(m1, . . . ,mn),where mi ∈M, 1 ≤ i ≤ n otherwise.
Research questions
1 How to create self-configuring atomic enrichmentfunctions m ∈M?
2 How to automatically generate an enrichment pipeline M?
Mohamed Ahmed Sherif — DEER 11/31
Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Mohamed Ahmed Sherif — DEER 12/31
Motivation Approach Evaluation Conclusion and Future Work
Running Example
Dataset DrugBank
Goal Gather information about companies related to drugs fora market study
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
:Druga
a
aa
Mohamed Ahmed Sherif — DEER 13/31
Motivation Approach Evaluation Conclusion and Future Work
Running Example
Dataset DrugBank
Goal Gather information about companies related to drugs fora market study
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Druga
a
aa
owl:sameAs
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 13/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment FunctionsI. Dereferencing atomic enrichment function
Datasets are linked (e.g., using owl:sameAs)
Deferences pre-specified set of predicates
Adds found predicates to source the dataset
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
:Druga
a
aa
Mohamed Ahmed Sherif — DEER 14/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment FunctionsI. Dereferencing atomic enrichment function
Datasets are linked (e.g., using owl:sameAs)
Deferences pre-specified set of predicates
Adds found predicates to source the dataset
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Druga
a
aa
owl:sameAs
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 14/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment FunctionsI. Dereferencing atomic enrichment function
Datasets are linked (e.g., using owl:sameAs)
Deferences pre-specified set of predicates
Adds found predicates to source the dataset
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Druga
a
aa
owl:sameAs
owl:sameAs
rdfs:commentrdfs:comment
Mohamed Ahmed Sherif — DEER 14/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment FunctionsI. Dereferencing atomic enrichment function
Datasets are linked (e.g., using owl:sameAs)
Deferences pre-specified set of predicates
Adds found predicates to source the dataset
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Druga
a
aa
owl:sameAs
owl:sameAs
rdfs:commentrdfs:comment
rdfs:comment
Mohamed Ahmed Sherif — DEER 14/31
Motivation Approach Evaluation Conclusion and Future Work
Self-ConfigurationI. Dereferencing Enrichment Functions
Finds the set of predicates Dp from the enriched CBDsthat are missing from source CBDs
Non-enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen :Drugaowl:sameAs
Enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Dp = {:relatedCompany, rdfs:comment}
Mohamed Ahmed Sherif — DEER 15/31
Motivation Approach Evaluation Conclusion and Future Work
Self-ConfigurationI. Dereferencing Enrichment Functions
Finds the set of predicates Dp from the enriched CBDsthat are missing from source CBDs
Non-enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen :Drugaowl:sameAs
Enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Dp = {:relatedCompany, rdfs:comment}
Mohamed Ahmed Sherif — DEER 15/31
Motivation Approach Evaluation Conclusion and Future Work
Self-ConfigurationI. Dereferencing Enrichment Functions
Finds the set of predicates Dp from the enriched CBDsthat are missing from source CBDs
Non-enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen :Drugaowl:sameAs
Enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Dp = {:relatedCompany, rdfs:comment}
Mohamed Ahmed Sherif — DEER 15/31
Motivation Approach Evaluation Conclusion and Future Work
Self-ConfigurationI. Dereferencing Enrichment Functions
Finds the set of predicates Dp from the enriched CBDsthat are missing from source CBDs
Non-enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen :Drugaowl:sameAs
Enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Dp = {:relatedCompany, rdfs:comment}
Mohamed Ahmed Sherif — DEER 15/31
Motivation Approach Evaluation Conclusion and Future Work
Self-ConfigurationI. Dereferencing Enrichment Functions
Dereferences Dp = {:relatedCompany, rdfs:comment}
CBD of Ibuprofen
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Druga
a
aa
owl:sameAs
owl:sameAs
rdfs:comment
Finds only rdfs:comment, adds it to the source dataset
Dereferencing enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 16/31
Motivation Approach Evaluation Conclusion and Future Work
Self-ConfigurationI. Dereferencing Enrichment Functions
Dereferences Dp = {:relatedCompany, rdfs:comment}
CBD of Ibuprofen
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Druga
a
aa
owl:sameAs
owl:sameAs
rdfs:comment
Finds only rdfs:comment, adds it to the source dataset
Dereferencing enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 16/31
Motivation Approach Evaluation Conclusion and Future Work
Self-ConfigurationI. Dereferencing Enrichment Functions
Dereferences Dp = {:relatedCompany, rdfs:comment}
CBD of Ibuprofen
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Druga
a
aa
owl:sameAs
owl:sameAs
rdfs:comment
Finds only rdfs:comment, adds it to the source dataset
Dereferencing enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 16/31
Motivation Approach Evaluation Conclusion and Future Work
Self-ConfigurationI. Dereferencing Enrichment Functions
Dereferences Dp = {:relatedCompany, rdfs:comment}
CBD of Ibuprofen
:Aspirin
:Paracetamol
:Ibuprofen
:Quinine
db:Ibuprofen
db:Aspirin
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Druga
a
aa
owl:sameAs
owl:sameAs
rdfs:comment
Finds only rdfs:comment, adds it to the source dataset
Dereferencing enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 16/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment FunctionsII. NLP atomic enrichment function
Datatype objects contain unstructured information
Uses Named Entity Recognition to extract implicit data
Adds extracted entities to the source datasets
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 17/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment FunctionsII. NLP atomic enrichment function
Datatype objects contain unstructured information
Uses Named Entity Recognition to extract implicit data
Adds extracted entities to the source datasets
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 17/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment FunctionsII. NLP atomic enrichment function
Datatype objects contain unstructured information
Uses Named Entity Recognition to extract implicit data
Adds extracted entities to the source datasets
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 17/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment FunctionsII. NLP atomic enrichment function
Datatype objects contain unstructured information
Uses Named Entity Recognition to extract implicit data
Adds extracted entities to the source datasets
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 17/31
Motivation Approach Evaluation Conclusion and Future Work
Self-ConfigurationII. NLP Enrichment Function
Extracts all possible named entity types
Adds extracted entities to the source dataset
NLP enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 18/31
Motivation Approach Evaluation Conclusion and Future Work
Self-ConfigurationII. NLP Enrichment Function
Extracts all possible named entity types
Adds extracted entities to the source dataset
NLP enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Drugaowl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 18/31
Motivation Approach Evaluation Conclusion and Future Work
Self-ConfigurationII. NLP Enrichment Function
Extracts all possible named entity types
Adds extracted entities to the source dataset
NLP enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 18/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment FunctionsIII. Predicate conformation atomic enrichment function
Enriched datasets may contain diverse ontologies
Predicate conformation maps a set of a pre-specifiedpredicates to a target ontology
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 19/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment FunctionsIII. Predicate conformation atomic enrichment function
Enriched datasets may contain diverse ontologies
Predicate conformation maps a set of a pre-specifiedpredicates to a target ontology
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 19/31
Motivation Approach Evaluation Conclusion and Future Work
Atomic Enrichment FunctionsIII. Predicate conformation atomic enrichment function
Enriched datasets may contain diverse ontologies
Predicate conformation maps a set of a pre-specifiedpredicates to a target ontology
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo:relatedCompany
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 19/31
Motivation Approach Evaluation Conclusion and Future Work
Self-ConfigurationIII. Predicate conformation Enrichment Function
Finds list of predicates Ps and Pt from the source resp.target datasets with the same subject and objectsChanges each Ps with its respective Pt
NLP enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo
owl:sameAs
rdfs:comment
Enriched CBD of Ibuprofen (positive example target)
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 20/31
Motivation Approach Evaluation Conclusion and Future Work
Self-ConfigurationIII. Predicate conformation Enrichment Function
Finds list of predicates Ps and Pt from the source resp.target datasets with the same subject and objectsChanges each Ps with its respective Pt
NLP enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo
owl:sameAs
rdfs:comment
Enriched CBD of Ibuprofen (positive example target)
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 20/31
Motivation Approach Evaluation Conclusion and Future Work
Self-ConfigurationIII. Predicate conformation Enrichment Function
Finds list of predicates Ps and Pt from the source resp.target datasets with the same subject and objectsChanges each Ps with its respective Pt
NLP enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Drug
:BootsCompany
a
fox:relatedTo:relatedCompany
owl:sameAs
rdfs:comment
Enriched CBD of Ibuprofen (positive example target)
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Mohamed Ahmed Sherif — DEER 20/31
Motivation Approach Evaluation Conclusion and Future Work
KB Enrichment Refinement Operator
Input
Set of atomic enrichment functions MSet of positive examples E
Refinement Operator
ρ(M) =⋃
∀m∈MM ++ m ( ++ is the list append operator)
Output
Enrichment pipeline M
Mohamed Ahmed Sherif — DEER 21/31
Motivation Approach Evaluation Conclusion and Future Work
KB Enrichment Refinement Operator
Input
Set of atomic enrichment functions MSet of positive examples E
Refinement Operator
ρ(M) =⋃
∀m∈MM ++ m ( ++ is the list append operator)
Output
Enrichment pipeline M
Mohamed Ahmed Sherif — DEER 21/31
Motivation Approach Evaluation Conclusion and Future Work
KB Enrichment Refinement Operator
Input
Set of atomic enrichment functions MSet of positive examples E
Refinement Operator
ρ(M) =⋃
∀m∈MM ++ m ( ++ is the list append operator)
Output
Enrichment pipeline M
Mohamed Ahmed Sherif — DEER 21/31
Motivation Approach Evaluation Conclusion and Future Work
Positive Example
:Ibuprofendb:Ibuprofen :Drugaowl:sameAs
Non-enriched CBD of Ibuprofen
:Ibuprofendb:Ibuprofen
Ibuprofen was extracted by the research armof Boots company during the 1960s ...
:Drug
:BootsCompany
a
:relatedCompany
owl:sameAs
rdfs:comment
Enriched CBD of Ibuprofen
Mohamed Ahmed Sherif — DEER 22/31
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = ⊥2 Self-configure all mi ∈M, add as child to ⊥3 Select most promising node
4 Expand most promising node
⊥
Mohamed Ahmed Sherif — DEER 23/31
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = ⊥2 Self-configure all mi ∈M, add as child to ⊥3 Select most promising node
4 Expand most promising node
⊥
(m1) (m2) (m3)
Mohamed Ahmed Sherif — DEER 23/31
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = ⊥2 Self-configure all mi ∈M, add as child to ⊥3 Select most promising node
4 Expand most promising node
⊥
(m1) (m2) (m3)
Mohamed Ahmed Sherif — DEER 23/31
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = ⊥2 Self-configure all mi ∈M, add as child to ⊥3 Select most promising node
4 Expand most promising node
⊥
(m1) (m2) (m3)
(m1,m2) (m1,m3)
Mohamed Ahmed Sherif — DEER 23/31
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = ⊥2 Self-configure all mi ∈M, add as child to ⊥3 Select most promising node
4 Expand most promising node
⊥
(m1) (m2) (m3)
(m1,m2) (m1,m3)
Mohamed Ahmed Sherif — DEER 23/31
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = ⊥2 Self-configure all mi ∈M, add as child to ⊥3 Select most promising node
4 Expand most promising node
⊥
(m1) (m2) (m3)
(m1,m2) (m1,m3) (m3,m1) (m3,m2)
Mohamed Ahmed Sherif — DEER 23/31
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = ⊥2 Self-configure all mi ∈M, add as child to ⊥3 Select most promising node
4 Expand most promising node
⊥
(m1) (m2) (m3)
(m1,m2) (m1,m3) (m3,m1) (m3,m2)
Mohamed Ahmed Sherif — DEER 23/31
Motivation Approach Evaluation Conclusion and Future Work
Learning Algorithm
1 Start by empty enrichment pipeline M = ⊥2 Self-configure all mi ∈M, add as child to ⊥3 Select most promising node
4 Expand most promising node
⊥
(m1) (m2) (m3)
(m1,m2) (m1,m3) (m3,m1) (m3,m2)
(m3,m2,m1)
Mohamed Ahmed Sherif — DEER 23/31
Motivation Approach Evaluation Conclusion and Future Work
Most Promising Node Selection
Node complexity c(n)
Linear combination of the node’s children count and level
Node fitness f (n)
Difference between node’s enrichment pipeline F-measureand weighted complexity, f (n) = F (n)− ω.c(n)
ω controls the tradeoff between
Greedy search (ω = 0)Search strategies closer to breadth-first search (ω > 0).
Most promising node
The leaf node with the maximum fitness through thewhole refinement tree
Mohamed Ahmed Sherif — DEER 24/31
Motivation Approach Evaluation Conclusion and Future Work
Most Promising Node Selection
Node complexity c(n)
Linear combination of the node’s children count and level
Node fitness f (n)
Difference between node’s enrichment pipeline F-measureand weighted complexity, f (n) = F (n)− ω.c(n)
ω controls the tradeoff between
Greedy search (ω = 0)Search strategies closer to breadth-first search (ω > 0).
Most promising node
The leaf node with the maximum fitness through thewhole refinement tree
Mohamed Ahmed Sherif — DEER 24/31
Motivation Approach Evaluation Conclusion and Future Work
Most Promising Node Selection
Node complexity c(n)
Linear combination of the node’s children count and level
Node fitness f (n)
Difference between node’s enrichment pipeline F-measureand weighted complexity, f (n) = F (n)− ω.c(n)
ω controls the tradeoff between
Greedy search (ω = 0)Search strategies closer to breadth-first search (ω > 0).
Most promising node
The leaf node with the maximum fitness through thewhole refinement tree
Mohamed Ahmed Sherif — DEER 24/31
Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Mohamed Ahmed Sherif — DEER 25/31
Motivation Approach Evaluation Conclusion and Future Work
Experimental Setup
Datasets
1 manual experimental enrichment pipelines for Jamendo
2 manual experimental enrichment pipelines for DrugBank
5 manual experimental enrichment pipelines for DBpedia(AdministrativeRegion)
Learning Algorithm
6 atomic enrichment functions
Termination criterion:
Maximum number of iterations of 10Optimal enrichment pipeline found (F-score = 1)
Mohamed Ahmed Sherif — DEER 26/31
Motivation Approach Evaluation Conclusion and Future Work
Experimental Setup
Datasets
1 manual experimental enrichment pipelines for Jamendo
2 manual experimental enrichment pipelines for DrugBank
5 manual experimental enrichment pipelines for DBpedia(AdministrativeRegion)
Learning Algorithm
6 atomic enrichment functions
Termination criterion:
Maximum number of iterations of 10Optimal enrichment pipeline found (F-score = 1)
Mohamed Ahmed Sherif — DEER 26/31
Motivation Approach Evaluation Conclusion and Future Work
Configuration of the Search Strategy
Node fitnessf (n) = F (n)− ω.c(n)
ω controls the tradeoff between
Greedy search (ω = 0)Search strategies closer tobreadth first search (ω > 0).
Result: ω = 0.75 leads to thebest results
ω P R F
0 1.0 0.99 0.990.25 1.0 0.99 0.990.50 1.0 0.99 0.990.75 1.0 1.0 1.01.0 1.0 0.99 0.99
Mohamed Ahmed Sherif — DEER 27/31
Motivation Approach Evaluation Conclusion and Future Work
Effect of Positive Examples
Manual Examples Size of Time Size of Time Learn IterationsM count M M(KB) learned
M′M′(KB) Time count F -score
M1DBpedia
1 1 0.2 1 1.6 1.3 1 1.02 1 0.2 1 1.8 1.3 1 1.0
M2DBpedia
1 2 23.3 1 0.1 0.2 1 0.992 2 15 2 17 0.3 9 0.99
M3DBpedia
1 3 14.7 3 15.2 6.1 9 0.992 3 15 2 15.1 0.1 9 0.99
M4DBpedia
1 4 0.4 2 0.1 0.7 2 0.992 4 0.6 2 0.3 0.9 2 0.99
M5DBpedia
1 5 22 2 0.1 0.7 2 1.02 5 25.5 2 0.2 0.9 2 1.0
M1DrugBank
1 2 3.5 1 4.1 0.1 10 0.992 2 3.6 1 3.4 0.1 10 0.99
M2DrugBank
1 3 25.2 1 0.1 0.1 10 0.992 3 22.8 1 0.1 0.1 10 0.99
M1Jamendo
1 1 10.9 2 10.6 0.1 2 0.992 1 10.4 2 10.4 0.1 1 0.99
Mohamed Ahmed Sherif — DEER 28/31
Motivation Approach Evaluation Conclusion and Future Work
Outline
1 Motivation
2 Approach
3 Evaluation
4 Conclusion and Future Work
Mohamed Ahmed Sherif — DEER 29/31
Motivation Approach Evaluation Conclusion and Future Work
Conclusion and Future Work
Conclusion
Introduced DEER
Presented self-configuring atomic enrichment functions
Presented an approach for learning enrichment pipelinesbased on a refinement operator
Future Work
Parallelize the algorithm on several CPUs as well as loadbalancing
Support directed acyclic graphs as enrichmentspecifications by allowing to split and merge datasets
Pro-active enrichment strategies and active learning
Implement more enrichment function and operators
Mohamed Ahmed Sherif — DEER 30/31
Motivation Approach Evaluation Conclusion and Future Work
Conclusion and Future Work
Conclusion
Introduced DEER
Presented self-configuring atomic enrichment functions
Presented an approach for learning enrichment pipelinesbased on a refinement operator
Future Work
Parallelize the algorithm on several CPUs as well as loadbalancing
Support directed acyclic graphs as enrichmentspecifications by allowing to split and merge datasets
Pro-active enrichment strategies and active learning
Implement more enrichment function and operators
Mohamed Ahmed Sherif — DEER 30/31
Motivation Approach Evaluation Conclusion and Future Work
Thank You! Questions?
Mohamed SherifAugustusplatz 10D-04109 Leipzig
[email protected]://aksw.org/MohamedSherif
http://aksw.org/Projects/DEER
Automating RDF Dataset Transformation and Enrichment by Mohamed Ahmed Sherif, Axel-Cyrille Ngonga
Ngomo, and Jens Lehmann in 12th Extended Semantic Web Conference, Portoroz, Slovenia, 2015
Mohamed Ahmed Sherif — DEER 31/31