deer - rdf data extraction and enrichment framework

64
Motivation Approach Evaluation Conclusion and Future Work DEER RDF Data Extraction and Enrichment Framework Mohamed Ahmed Sherif September 15, 2015 Mohamed Ahmed Sherif — DEER 1/31

Upload: mohamed-sherif

Post on 22-Jan-2017

473 views

Category:

Engineering


1 download

TRANSCRIPT

Motivation Approach Evaluation Conclusion and Future Work

DEERRDF Data Extraction and Enrichment Framework

Mohamed Ahmed Sherif

September 15, 2015

Mohamed Ahmed Sherif — DEER 1/31

Motivation Approach Evaluation Conclusion and Future Work

Outline

1 Motivation

2 Approach

3 Evaluation

4 Conclusion and Future Work

Mohamed Ahmed Sherif — DEER 2/31

Motivation Approach Evaluation Conclusion and Future Work

Outline

1 Motivation

2 Approach

3 Evaluation

4 Conclusion and Future Work

Mohamed Ahmed Sherif — DEER 3/31

Motivation Approach Evaluation Conclusion and Future Work

RDF Extraction & Enrichment

Need for enriched datasets

TourismQuestion AnsweringEnhanced Reality...

RDF extraction and enrichment

Triples to be added to the originalKB and/orTriples to be deleted from theoriginal KB

Mohamed Ahmed Sherif — DEER 4/31

Motivation Approach Evaluation Conclusion and Future Work

Example

GeoSuPhat

mo:MusicArtist

... artist name of George (Pete) Peterson. Born and raisednear the shores of Lake Michigan and its cold windy wintershe was Influenced by bands like Pink Floyd, Kraftwerk, ToddRundgren, ELP, Hawkwind and his love of computers andsynthesis. After serving in the military and a few yearsgoing to college in Colorado where he studied electronicsand computer programming. George moved to Europe andsettled down in Germany where he had met his wifeChristine. ...

jamendo:336993

foaf:name

a

mo:biography

Mohamed Ahmed Sherif — DEER 5/31

Motivation Approach Evaluation Conclusion and Future Work

DEER Enrichment Functions

Idea

Generate enrichment data based on existing data

Input/output are RDF datasets

Current Enrichment Functions

Dereferencing

Linking

NLP

Conformation

Filter

Mohamed Ahmed Sherif — DEER 6/31

Motivation Approach Evaluation Conclusion and Future Work

DEER Enrichment Operators

Idea

Create complex workflows by connecting modules

Input/output are RDF datasets

Current Enrichment Operators

Clone

Merge

Mohamed Ahmed Sherif — DEER 7/31

Motivation Approach Evaluation Conclusion and Future Work

DEER Specification Paradigm

d1

Dereferencing

d2

cloned3

NLP

d4

Filter

d5 d6Merge

d7

AuthorityConform

d8

Mohamed Ahmed Sherif — DEER 8/31

Motivation Approach Evaluation Conclusion and Future Work

Manual DEER Specification

1 @p r e f i x : <ht tp : // geoknow . org / s p e c s o n t o l o g y/> .2 @p r e f i x r d f s : <ht tp : //www.w3 . org /2000/01/ rd f−schema#> .3 @p r e f i x geo : <ht tp : //www.w3 . org /2003/01/ geo/wgs84 pos#> .4 : d1 a : Datase t ;5 : ha sU r i <ht tp : // dbped ia . org / r e s o u r c e / Be r l i n> ;6 : f romEndPoint <ht tp : // dbped ia . org / spa r q l> .7 : d2 a : Datase t .8 : d3 a : Datase t .9 : d4 a : Datase t .

10 : d5 a : Datase t .11 : d6 a : Datase t .12 : d7 a : Datase t .13 : d8 a : Datase t ;14 : o u t p u t F i l e ” De e rBe r l i n . t t l ” ;15 : outputFormat ” Tu r t l e ” .16 : d e r e f a : Module , : De r e f e r enc i ngModu l e ;17 r d f s : l a b e l ” De r e f e r e n c i n g module” ;18 : ha s I npu t : d1 ;19 : hasOutput : d2 ;20 : hasParameter : dere fParam1 .21 : dere fParam1 a : ModuleParameter , : De re f e r enc ingModu l ePa ramete r ;22 : hasKey ” i n pu tP r op e r t y 1 ” ;23 : hasVa lue geo : l a t .24 : c l o n e a : Operator , : C loneOpera to r ;25 r d f s : l a b e l ” Clone op e r a t o r ” ;26 : ha s I npu t : d2 ;27 : hasOutput : d3 , : d4 .28 : n l p a : Module , : NLPModule ;29 r d f s : l a b e l ”NLP module” ;30 : ha s I npu t : d3 ;31 : hasOutput : d5 ;32 : hasParameter : nlpPram1 , : nlpPram2 .33 : nlpPram1 a : ModuleParameter , : NLPModuleParameter ;34 : hasKey ” useFoxL igh t ” ;35 : hasVa lue ”OFF” .36 : nlpPram2 a : ModuleParameter , : NLPModuleParameter ;37 : hasKey ” askEndPoint ” ;38 : hasVa lue f a l s e .39 : f i l t e r a : Module , : F i l t e rModu l e ;40 r d f s : l a b e l ” F i l t e r module” ;41 : ha s I npu t : d4 ;42 : hasOutput : d6 ;43 : hasParameter : F i l t e rP r am1 .44 : f i l t e r P r am1 a : ModuleParameter , : NLPModuleParameter ;45 : hasKey ” t r i p l e s P a t t e r n ” ;46 : hasVa lue ”? s <ht tp : // dbped ia . org / on to l ogy / ab s t r a c t> ?o ” .47 : merge a : Operator , : MergeOperator ;48 r d f s : l a b e l ”Merge op e r a t o r ” ;49 : ha s I npu t : d6 , : d5 ;50 : hasOutput : d7 .51 : aconform a : Module , : Author i t yCon fo rmat ionModu le ;52 r d f s : l a b e l ” Au tho r i t y Conformat ion module” ;53 : ha s I npu t : d7 ;54 : hasOutput : d8 ;55 : hasParameter : aconformPram1 , : aconformPram2 .56 : aconformPram1 a : ModuleParameter , : NLPModuleParameter ;57 : hasKey ” s o u r c e S ub j e c tAu t h o r i t y ” ;58 : hasVa lue ” h t tp :// dbped ia . org ” .59 : aconformPram2 a : ModuleParameter , : NLPModuleParameter ;60 : hasKey ” t a r g e t S u b j e c tA u t h o r i t y ” ;61 : hasVa lue ” h t tp :// dee r . o rg ” .

Mohamed Ahmed Sherif — DEER 9/31

Motivation Approach Evaluation Conclusion and Future Work

Manual KB Enrichment

Manual customized enrichment pipelines

⊕ Leads to the expected results

Time consuming

Cannot be ported easily to other datasets

Mohamed Ahmed Sherif — DEER 10/31

Motivation Approach Evaluation Conclusion and Future Work

Automatic KB Enrichment

Enrichment pipeline M : K → K that maps KB K to anenriched KB K ′ with K ′ = M(K ).

M is an ordered list of atomic enrichment functionsm ∈M

M =

{φ if K = K ′,

(m1, . . . ,mn),where mi ∈M, 1 ≤ i ≤ n otherwise.

Research questions

1 How to create self-configuring atomic enrichmentfunctions m ∈M?

2 How to automatically generate an enrichment pipeline M?

Mohamed Ahmed Sherif — DEER 11/31

Motivation Approach Evaluation Conclusion and Future Work

Automatic KB Enrichment

Enrichment pipeline M : K → K that maps KB K to anenriched KB K ′ with K ′ = M(K ).

M is an ordered list of atomic enrichment functionsm ∈M

M =

{φ if K = K ′,

(m1, . . . ,mn),where mi ∈M, 1 ≤ i ≤ n otherwise.

Research questions

1 How to create self-configuring atomic enrichmentfunctions m ∈M?

2 How to automatically generate an enrichment pipeline M?

Mohamed Ahmed Sherif — DEER 11/31

Motivation Approach Evaluation Conclusion and Future Work

Outline

1 Motivation

2 Approach

3 Evaluation

4 Conclusion and Future Work

Mohamed Ahmed Sherif — DEER 12/31

Motivation Approach Evaluation Conclusion and Future Work

Running Example

Dataset DrugBank

Goal Gather information about companies related to drugs fora market study

:Aspirin

:Paracetamol

:Ibuprofen

:Quinine

:Druga

a

aa

Mohamed Ahmed Sherif — DEER 13/31

Motivation Approach Evaluation Conclusion and Future Work

Running Example

Dataset DrugBank

Goal Gather information about companies related to drugs fora market study

:Aspirin

:Paracetamol

:Ibuprofen

:Quinine

db:Ibuprofen

db:Aspirin

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Druga

a

aa

owl:sameAs

owl:sameAs

rdfs:comment

Mohamed Ahmed Sherif — DEER 13/31

Motivation Approach Evaluation Conclusion and Future Work

Atomic Enrichment FunctionsI. Dereferencing atomic enrichment function

Datasets are linked (e.g., using owl:sameAs)

Deferences pre-specified set of predicates

Adds found predicates to source the dataset

:Aspirin

:Paracetamol

:Ibuprofen

:Quinine

:Druga

a

aa

Mohamed Ahmed Sherif — DEER 14/31

Motivation Approach Evaluation Conclusion and Future Work

Atomic Enrichment FunctionsI. Dereferencing atomic enrichment function

Datasets are linked (e.g., using owl:sameAs)

Deferences pre-specified set of predicates

Adds found predicates to source the dataset

:Aspirin

:Paracetamol

:Ibuprofen

:Quinine

db:Ibuprofen

db:Aspirin

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Druga

a

aa

owl:sameAs

owl:sameAs

rdfs:comment

Mohamed Ahmed Sherif — DEER 14/31

Motivation Approach Evaluation Conclusion and Future Work

Atomic Enrichment FunctionsI. Dereferencing atomic enrichment function

Datasets are linked (e.g., using owl:sameAs)

Deferences pre-specified set of predicates

Adds found predicates to source the dataset

:Aspirin

:Paracetamol

:Ibuprofen

:Quinine

db:Ibuprofen

db:Aspirin

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Druga

a

aa

owl:sameAs

owl:sameAs

rdfs:commentrdfs:comment

Mohamed Ahmed Sherif — DEER 14/31

Motivation Approach Evaluation Conclusion and Future Work

Atomic Enrichment FunctionsI. Dereferencing atomic enrichment function

Datasets are linked (e.g., using owl:sameAs)

Deferences pre-specified set of predicates

Adds found predicates to source the dataset

:Aspirin

:Paracetamol

:Ibuprofen

:Quinine

db:Ibuprofen

db:Aspirin

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Druga

a

aa

owl:sameAs

owl:sameAs

rdfs:commentrdfs:comment

rdfs:comment

Mohamed Ahmed Sherif — DEER 14/31

Motivation Approach Evaluation Conclusion and Future Work

Self-ConfigurationI. Dereferencing Enrichment Functions

Finds the set of predicates Dp from the enriched CBDsthat are missing from source CBDs

Non-enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen :Drugaowl:sameAs

Enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drug

:BootsCompany

a

:relatedCompany

owl:sameAs

rdfs:comment

Dp = {:relatedCompany, rdfs:comment}

Mohamed Ahmed Sherif — DEER 15/31

Motivation Approach Evaluation Conclusion and Future Work

Self-ConfigurationI. Dereferencing Enrichment Functions

Finds the set of predicates Dp from the enriched CBDsthat are missing from source CBDs

Non-enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen :Drugaowl:sameAs

Enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drug

:BootsCompany

a

:relatedCompany

owl:sameAs

rdfs:comment

Dp = {:relatedCompany, rdfs:comment}

Mohamed Ahmed Sherif — DEER 15/31

Motivation Approach Evaluation Conclusion and Future Work

Self-ConfigurationI. Dereferencing Enrichment Functions

Finds the set of predicates Dp from the enriched CBDsthat are missing from source CBDs

Non-enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen :Drugaowl:sameAs

Enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drug

:BootsCompany

a

:relatedCompany

owl:sameAs

rdfs:comment

Dp = {:relatedCompany, rdfs:comment}

Mohamed Ahmed Sherif — DEER 15/31

Motivation Approach Evaluation Conclusion and Future Work

Self-ConfigurationI. Dereferencing Enrichment Functions

Finds the set of predicates Dp from the enriched CBDsthat are missing from source CBDs

Non-enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen :Drugaowl:sameAs

Enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drug

:BootsCompany

a

:relatedCompany

owl:sameAs

rdfs:comment

Dp = {:relatedCompany, rdfs:comment}

Mohamed Ahmed Sherif — DEER 15/31

Motivation Approach Evaluation Conclusion and Future Work

Self-ConfigurationI. Dereferencing Enrichment Functions

Dereferences Dp = {:relatedCompany, rdfs:comment}

CBD of Ibuprofen

:Aspirin

:Paracetamol

:Ibuprofen

:Quinine

db:Ibuprofen

db:Aspirin

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Druga

a

aa

owl:sameAs

owl:sameAs

rdfs:comment

Finds only rdfs:comment, adds it to the source dataset

Dereferencing enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drugaowl:sameAs

rdfs:comment

Mohamed Ahmed Sherif — DEER 16/31

Motivation Approach Evaluation Conclusion and Future Work

Self-ConfigurationI. Dereferencing Enrichment Functions

Dereferences Dp = {:relatedCompany, rdfs:comment}

CBD of Ibuprofen

:Aspirin

:Paracetamol

:Ibuprofen

:Quinine

db:Ibuprofen

db:Aspirin

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Druga

a

aa

owl:sameAs

owl:sameAs

rdfs:comment

Finds only rdfs:comment, adds it to the source dataset

Dereferencing enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drugaowl:sameAs

rdfs:comment

Mohamed Ahmed Sherif — DEER 16/31

Motivation Approach Evaluation Conclusion and Future Work

Self-ConfigurationI. Dereferencing Enrichment Functions

Dereferences Dp = {:relatedCompany, rdfs:comment}

CBD of Ibuprofen

:Aspirin

:Paracetamol

:Ibuprofen

:Quinine

db:Ibuprofen

db:Aspirin

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Druga

a

aa

owl:sameAs

owl:sameAs

rdfs:comment

Finds only rdfs:comment, adds it to the source dataset

Dereferencing enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drugaowl:sameAs

rdfs:comment

Mohamed Ahmed Sherif — DEER 16/31

Motivation Approach Evaluation Conclusion and Future Work

Self-ConfigurationI. Dereferencing Enrichment Functions

Dereferences Dp = {:relatedCompany, rdfs:comment}

CBD of Ibuprofen

:Aspirin

:Paracetamol

:Ibuprofen

:Quinine

db:Ibuprofen

db:Aspirin

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Druga

a

aa

owl:sameAs

owl:sameAs

rdfs:comment

Finds only rdfs:comment, adds it to the source dataset

Dereferencing enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drugaowl:sameAs

rdfs:comment

Mohamed Ahmed Sherif — DEER 16/31

Motivation Approach Evaluation Conclusion and Future Work

Atomic Enrichment FunctionsII. NLP atomic enrichment function

Datatype objects contain unstructured information

Uses Named Entity Recognition to extract implicit data

Adds extracted entities to the source datasets

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drugaowl:sameAs

rdfs:comment

Mohamed Ahmed Sherif — DEER 17/31

Motivation Approach Evaluation Conclusion and Future Work

Atomic Enrichment FunctionsII. NLP atomic enrichment function

Datatype objects contain unstructured information

Uses Named Entity Recognition to extract implicit data

Adds extracted entities to the source datasets

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drugaowl:sameAs

rdfs:comment

Mohamed Ahmed Sherif — DEER 17/31

Motivation Approach Evaluation Conclusion and Future Work

Atomic Enrichment FunctionsII. NLP atomic enrichment function

Datatype objects contain unstructured information

Uses Named Entity Recognition to extract implicit data

Adds extracted entities to the source datasets

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drugaowl:sameAs

rdfs:comment

Mohamed Ahmed Sherif — DEER 17/31

Motivation Approach Evaluation Conclusion and Future Work

Atomic Enrichment FunctionsII. NLP atomic enrichment function

Datatype objects contain unstructured information

Uses Named Entity Recognition to extract implicit data

Adds extracted entities to the source datasets

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drug

:BootsCompany

a

fox:relatedTo

owl:sameAs

rdfs:comment

Mohamed Ahmed Sherif — DEER 17/31

Motivation Approach Evaluation Conclusion and Future Work

Self-ConfigurationII. NLP Enrichment Function

Extracts all possible named entity types

Adds extracted entities to the source dataset

NLP enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drugaowl:sameAs

rdfs:comment

Mohamed Ahmed Sherif — DEER 18/31

Motivation Approach Evaluation Conclusion and Future Work

Self-ConfigurationII. NLP Enrichment Function

Extracts all possible named entity types

Adds extracted entities to the source dataset

NLP enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drugaowl:sameAs

rdfs:comment

Mohamed Ahmed Sherif — DEER 18/31

Motivation Approach Evaluation Conclusion and Future Work

Self-ConfigurationII. NLP Enrichment Function

Extracts all possible named entity types

Adds extracted entities to the source dataset

NLP enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drug

:BootsCompany

a

fox:relatedTo

owl:sameAs

rdfs:comment

Mohamed Ahmed Sherif — DEER 18/31

Motivation Approach Evaluation Conclusion and Future Work

Atomic Enrichment FunctionsIII. Predicate conformation atomic enrichment function

Enriched datasets may contain diverse ontologies

Predicate conformation maps a set of a pre-specifiedpredicates to a target ontology

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drug

:BootsCompany

a

fox:relatedTo

owl:sameAs

rdfs:comment

Mohamed Ahmed Sherif — DEER 19/31

Motivation Approach Evaluation Conclusion and Future Work

Atomic Enrichment FunctionsIII. Predicate conformation atomic enrichment function

Enriched datasets may contain diverse ontologies

Predicate conformation maps a set of a pre-specifiedpredicates to a target ontology

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drug

:BootsCompany

a

fox:relatedTo

owl:sameAs

rdfs:comment

Mohamed Ahmed Sherif — DEER 19/31

Motivation Approach Evaluation Conclusion and Future Work

Atomic Enrichment FunctionsIII. Predicate conformation atomic enrichment function

Enriched datasets may contain diverse ontologies

Predicate conformation maps a set of a pre-specifiedpredicates to a target ontology

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drug

:BootsCompany

a

fox:relatedTo:relatedCompany

owl:sameAs

rdfs:comment

Mohamed Ahmed Sherif — DEER 19/31

Motivation Approach Evaluation Conclusion and Future Work

Self-ConfigurationIII. Predicate conformation Enrichment Function

Finds list of predicates Ps and Pt from the source resp.target datasets with the same subject and objectsChanges each Ps with its respective Pt

NLP enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drug

:BootsCompany

a

fox:relatedTo

owl:sameAs

rdfs:comment

Enriched CBD of Ibuprofen (positive example target)

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drug

:BootsCompany

a

:relatedCompany

owl:sameAs

rdfs:comment

Mohamed Ahmed Sherif — DEER 20/31

Motivation Approach Evaluation Conclusion and Future Work

Self-ConfigurationIII. Predicate conformation Enrichment Function

Finds list of predicates Ps and Pt from the source resp.target datasets with the same subject and objectsChanges each Ps with its respective Pt

NLP enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drug

:BootsCompany

a

fox:relatedTo

owl:sameAs

rdfs:comment

Enriched CBD of Ibuprofen (positive example target)

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drug

:BootsCompany

a

:relatedCompany

owl:sameAs

rdfs:comment

Mohamed Ahmed Sherif — DEER 20/31

Motivation Approach Evaluation Conclusion and Future Work

Self-ConfigurationIII. Predicate conformation Enrichment Function

Finds list of predicates Ps and Pt from the source resp.target datasets with the same subject and objectsChanges each Ps with its respective Pt

NLP enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drug

:BootsCompany

a

fox:relatedTo:relatedCompany

owl:sameAs

rdfs:comment

Enriched CBD of Ibuprofen (positive example target)

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drug

:BootsCompany

a

:relatedCompany

owl:sameAs

rdfs:comment

Mohamed Ahmed Sherif — DEER 20/31

Motivation Approach Evaluation Conclusion and Future Work

KB Enrichment Refinement Operator

Input

Set of atomic enrichment functions MSet of positive examples E

Refinement Operator

ρ(M) =⋃

∀m∈MM ++ m ( ++ is the list append operator)

Output

Enrichment pipeline M

Mohamed Ahmed Sherif — DEER 21/31

Motivation Approach Evaluation Conclusion and Future Work

KB Enrichment Refinement Operator

Input

Set of atomic enrichment functions MSet of positive examples E

Refinement Operator

ρ(M) =⋃

∀m∈MM ++ m ( ++ is the list append operator)

Output

Enrichment pipeline M

Mohamed Ahmed Sherif — DEER 21/31

Motivation Approach Evaluation Conclusion and Future Work

KB Enrichment Refinement Operator

Input

Set of atomic enrichment functions MSet of positive examples E

Refinement Operator

ρ(M) =⋃

∀m∈MM ++ m ( ++ is the list append operator)

Output

Enrichment pipeline M

Mohamed Ahmed Sherif — DEER 21/31

Motivation Approach Evaluation Conclusion and Future Work

Positive Example

:Ibuprofendb:Ibuprofen :Drugaowl:sameAs

Non-enriched CBD of Ibuprofen

:Ibuprofendb:Ibuprofen

Ibuprofen was extracted by the research armof Boots company during the 1960s ...

:Drug

:BootsCompany

a

:relatedCompany

owl:sameAs

rdfs:comment

Enriched CBD of Ibuprofen

Mohamed Ahmed Sherif — DEER 22/31

Motivation Approach Evaluation Conclusion and Future Work

Learning Algorithm

1 Start by empty enrichment pipeline M = ⊥2 Self-configure all mi ∈M, add as child to ⊥3 Select most promising node

4 Expand most promising node

Mohamed Ahmed Sherif — DEER 23/31

Motivation Approach Evaluation Conclusion and Future Work

Learning Algorithm

1 Start by empty enrichment pipeline M = ⊥2 Self-configure all mi ∈M, add as child to ⊥3 Select most promising node

4 Expand most promising node

(m1) (m2) (m3)

Mohamed Ahmed Sherif — DEER 23/31

Motivation Approach Evaluation Conclusion and Future Work

Learning Algorithm

1 Start by empty enrichment pipeline M = ⊥2 Self-configure all mi ∈M, add as child to ⊥3 Select most promising node

4 Expand most promising node

(m1) (m2) (m3)

Mohamed Ahmed Sherif — DEER 23/31

Motivation Approach Evaluation Conclusion and Future Work

Learning Algorithm

1 Start by empty enrichment pipeline M = ⊥2 Self-configure all mi ∈M, add as child to ⊥3 Select most promising node

4 Expand most promising node

(m1) (m2) (m3)

(m1,m2) (m1,m3)

Mohamed Ahmed Sherif — DEER 23/31

Motivation Approach Evaluation Conclusion and Future Work

Learning Algorithm

1 Start by empty enrichment pipeline M = ⊥2 Self-configure all mi ∈M, add as child to ⊥3 Select most promising node

4 Expand most promising node

(m1) (m2) (m3)

(m1,m2) (m1,m3)

Mohamed Ahmed Sherif — DEER 23/31

Motivation Approach Evaluation Conclusion and Future Work

Learning Algorithm

1 Start by empty enrichment pipeline M = ⊥2 Self-configure all mi ∈M, add as child to ⊥3 Select most promising node

4 Expand most promising node

(m1) (m2) (m3)

(m1,m2) (m1,m3) (m3,m1) (m3,m2)

Mohamed Ahmed Sherif — DEER 23/31

Motivation Approach Evaluation Conclusion and Future Work

Learning Algorithm

1 Start by empty enrichment pipeline M = ⊥2 Self-configure all mi ∈M, add as child to ⊥3 Select most promising node

4 Expand most promising node

(m1) (m2) (m3)

(m1,m2) (m1,m3) (m3,m1) (m3,m2)

Mohamed Ahmed Sherif — DEER 23/31

Motivation Approach Evaluation Conclusion and Future Work

Learning Algorithm

1 Start by empty enrichment pipeline M = ⊥2 Self-configure all mi ∈M, add as child to ⊥3 Select most promising node

4 Expand most promising node

(m1) (m2) (m3)

(m1,m2) (m1,m3) (m3,m1) (m3,m2)

(m3,m2,m1)

Mohamed Ahmed Sherif — DEER 23/31

Motivation Approach Evaluation Conclusion and Future Work

Most Promising Node Selection

Node complexity c(n)

Linear combination of the node’s children count and level

Node fitness f (n)

Difference between node’s enrichment pipeline F-measureand weighted complexity, f (n) = F (n)− ω.c(n)

ω controls the tradeoff between

Greedy search (ω = 0)Search strategies closer to breadth-first search (ω > 0).

Most promising node

The leaf node with the maximum fitness through thewhole refinement tree

Mohamed Ahmed Sherif — DEER 24/31

Motivation Approach Evaluation Conclusion and Future Work

Most Promising Node Selection

Node complexity c(n)

Linear combination of the node’s children count and level

Node fitness f (n)

Difference between node’s enrichment pipeline F-measureand weighted complexity, f (n) = F (n)− ω.c(n)

ω controls the tradeoff between

Greedy search (ω = 0)Search strategies closer to breadth-first search (ω > 0).

Most promising node

The leaf node with the maximum fitness through thewhole refinement tree

Mohamed Ahmed Sherif — DEER 24/31

Motivation Approach Evaluation Conclusion and Future Work

Most Promising Node Selection

Node complexity c(n)

Linear combination of the node’s children count and level

Node fitness f (n)

Difference between node’s enrichment pipeline F-measureand weighted complexity, f (n) = F (n)− ω.c(n)

ω controls the tradeoff between

Greedy search (ω = 0)Search strategies closer to breadth-first search (ω > 0).

Most promising node

The leaf node with the maximum fitness through thewhole refinement tree

Mohamed Ahmed Sherif — DEER 24/31

Motivation Approach Evaluation Conclusion and Future Work

Outline

1 Motivation

2 Approach

3 Evaluation

4 Conclusion and Future Work

Mohamed Ahmed Sherif — DEER 25/31

Motivation Approach Evaluation Conclusion and Future Work

Experimental Setup

Datasets

1 manual experimental enrichment pipelines for Jamendo

2 manual experimental enrichment pipelines for DrugBank

5 manual experimental enrichment pipelines for DBpedia(AdministrativeRegion)

Learning Algorithm

6 atomic enrichment functions

Termination criterion:

Maximum number of iterations of 10Optimal enrichment pipeline found (F-score = 1)

Mohamed Ahmed Sherif — DEER 26/31

Motivation Approach Evaluation Conclusion and Future Work

Experimental Setup

Datasets

1 manual experimental enrichment pipelines for Jamendo

2 manual experimental enrichment pipelines for DrugBank

5 manual experimental enrichment pipelines for DBpedia(AdministrativeRegion)

Learning Algorithm

6 atomic enrichment functions

Termination criterion:

Maximum number of iterations of 10Optimal enrichment pipeline found (F-score = 1)

Mohamed Ahmed Sherif — DEER 26/31

Motivation Approach Evaluation Conclusion and Future Work

Configuration of the Search Strategy

Node fitnessf (n) = F (n)− ω.c(n)

ω controls the tradeoff between

Greedy search (ω = 0)Search strategies closer tobreadth first search (ω > 0).

Result: ω = 0.75 leads to thebest results

ω P R F

0 1.0 0.99 0.990.25 1.0 0.99 0.990.50 1.0 0.99 0.990.75 1.0 1.0 1.01.0 1.0 0.99 0.99

Mohamed Ahmed Sherif — DEER 27/31

Motivation Approach Evaluation Conclusion and Future Work

Effect of Positive Examples

Manual Examples Size of Time Size of Time Learn IterationsM count M M(KB) learned

M′M′(KB) Time count F -score

M1DBpedia

1 1 0.2 1 1.6 1.3 1 1.02 1 0.2 1 1.8 1.3 1 1.0

M2DBpedia

1 2 23.3 1 0.1 0.2 1 0.992 2 15 2 17 0.3 9 0.99

M3DBpedia

1 3 14.7 3 15.2 6.1 9 0.992 3 15 2 15.1 0.1 9 0.99

M4DBpedia

1 4 0.4 2 0.1 0.7 2 0.992 4 0.6 2 0.3 0.9 2 0.99

M5DBpedia

1 5 22 2 0.1 0.7 2 1.02 5 25.5 2 0.2 0.9 2 1.0

M1DrugBank

1 2 3.5 1 4.1 0.1 10 0.992 2 3.6 1 3.4 0.1 10 0.99

M2DrugBank

1 3 25.2 1 0.1 0.1 10 0.992 3 22.8 1 0.1 0.1 10 0.99

M1Jamendo

1 1 10.9 2 10.6 0.1 2 0.992 1 10.4 2 10.4 0.1 1 0.99

Mohamed Ahmed Sherif — DEER 28/31

Motivation Approach Evaluation Conclusion and Future Work

Outline

1 Motivation

2 Approach

3 Evaluation

4 Conclusion and Future Work

Mohamed Ahmed Sherif — DEER 29/31

Motivation Approach Evaluation Conclusion and Future Work

Conclusion and Future Work

Conclusion

Introduced DEER

Presented self-configuring atomic enrichment functions

Presented an approach for learning enrichment pipelinesbased on a refinement operator

Future Work

Parallelize the algorithm on several CPUs as well as loadbalancing

Support directed acyclic graphs as enrichmentspecifications by allowing to split and merge datasets

Pro-active enrichment strategies and active learning

Implement more enrichment function and operators

Mohamed Ahmed Sherif — DEER 30/31

Motivation Approach Evaluation Conclusion and Future Work

Conclusion and Future Work

Conclusion

Introduced DEER

Presented self-configuring atomic enrichment functions

Presented an approach for learning enrichment pipelinesbased on a refinement operator

Future Work

Parallelize the algorithm on several CPUs as well as loadbalancing

Support directed acyclic graphs as enrichmentspecifications by allowing to split and merge datasets

Pro-active enrichment strategies and active learning

Implement more enrichment function and operators

Mohamed Ahmed Sherif — DEER 30/31

Motivation Approach Evaluation Conclusion and Future Work

Thank You! Questions?

Mohamed SherifAugustusplatz 10D-04109 Leipzig

[email protected]://aksw.org/MohamedSherif

http://aksw.org/Projects/DEER

Automating RDF Dataset Transformation and Enrichment by Mohamed Ahmed Sherif, Axel-Cyrille Ngonga

Ngomo, and Jens Lehmann in 12th Extended Semantic Web Conference, Portoroz, Slovenia, 2015

Mohamed Ahmed Sherif — DEER 31/31