semashup - ensen in aimashup2014 by m.alsarem and p.portier

Post on 13-Jan-2015

88 Views

Category:

Software

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

SEMashup -Mazen Alsarem & Pierre-Edouard Portier 1

How to enhance Web snippets with Linked Data?Mazen Alsarem & Pierre-Edouard PortierLaboratory LIRIS, INSA de Lyon, France

SEMashup

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

2

Given the query: “epimenides knossos paradox”,Among the first results returned by the Google

SE, we find these snippets:

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

3

We enhance these snippets:

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

4

Our snippet highlights an alternative excerpt to better summarize the conceptual content of the document.

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

5

Alternative excerpt:

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

6

Our snippet also accentuates concepts that are present in the document and related to the user's information need as expressed by her query.

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

7

Important concepts:

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

8

After clicking the concept “Epimenides”:

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

9

Auto scrolling to an instance of the concept “Epimenides” in the underlying document:

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

10

How is it done?

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

11

A mashup of Web of Data services

We use the DBpedia Spotlight service to extract concepts from the document.

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

12

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

13

A mashup of Web of Data services

We use the DBpedia Spotlight service to extract concepts from the document.

We query a DBpedia SPARQL endpoint to find existing triples between the concepts.

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

14

dbp_res:Bertrand_Russell

dbp_res:Logic

dbp_res:Mathematics

dbp_res:Zondervan

dbp_res:Grand_Rapids,_Michigan

dbp_res:Callimachus

dbp_res:Alexandria

dbp_ont:mainInterest dbp_prop:deathPlace

dbp_prop:headquarters

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

15

In order to benefit from the Linked Data, we need to select the concepts to extend.

We propose to rank the concepts by their importance relatively to the user's information need.

To do this efficiently, we cannot rely only on the small graph we built, but we need to go back to the textual content of the document.

Therefore, we introduce a new iterative SVD algorithm.

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

16

To each concept, we associate a text made of its abstract and of the sentences of the document that contain its instances.

We build a concept-stem matrix whose entries are frequencies.

We do a first SVD decomposition.

We give more importance to the concepts and the stems close to the query, whereafter we do a second SVD decomposition.

In the reduced SVD space, we measure how the norms of the concepts and the stems evolved.

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

17

dbp:

Epim

enid

es

dbp:

Knoss

osdb

p:Par

adox

Evolution of the norms of the concepts in the reduced SVD space, between iterations 1 and 2:

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

18

The stems and the concepts that moved the most will be stressed at next iteration, the stems that nearly didn't move will be removed.

Concepts linked by a predicate to concepts elected to be stressed, will also be stressed.

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

19

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

20

We use a DBpedia SPARQL endpoint to find new triples about the most important resources.

In a pre-processing step, we kept only the DBpedia predicates that carry enough information (we discarded the predicates whose objects when concatenated had a low entropy).

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

21

In order to rank the triples of the extended graph and build the snippet, we do a tensor decomposition (CP) of the graph.

In order to take into account the types of the predicates, we choose to do a tensor decomposition instead of a decomposition of the adjacency matrix (each horizontal slice of the tensor represents the adjacency matrix for one given predicate).

SEMashup -Mazen Alsarem & Pierre-Edouard Portier

22

Thank you!

And, please, come see the live demo!

http://demo.ensen-insa.org

top related