towards integration of web data into a coherent educational data graph

17
Motivation Data on the Web 02/07/22 Lile 2013 – Rio de Janeiro Some eyecatching opener illustrating growth and or diversity of web data Towards Integration of Web Data into a coherent Educational Data Graph LILE 2013 : 3rd International Workshop on Learning and Education with the Web of Data 14 May 2013, Rio de Janeiro, Brazil Davide Taibi – Besnik Fetahu – Stefan Dietze (CNR – ITD, IT) (L3S Research Center, DE)

Upload: besnik-fetahu

Post on 27-Jan-2015

106 views

Category:

Technology


0 download

DESCRIPTION

Paper presented at LILE Workshop at WWW 2013 (Rio de Janeiro, Brasil). Full paper URL: http://www2013.org/companion/p419.pdf

TRANSCRIPT

Page 1: Towards Integration of Web Data into a coherent Educational Data Graph

MotivationData on the Web

10/04/23Lile 2013 – Rio de Janeiro

Some eyecatching opener illustrating growth and or diversity of web data

Towards Integration of Web Data into a coherent Educational Data Graph

LILE 2013 : 3rd International Workshop on Learning and Education with the Web of Data14 May 2013, Rio de Janeiro, Brazil

Davide Taibi – Besnik Fetahu – Stefan Dietze (CNR – ITD, IT) (L3S Research Center, DE)

Page 2: Towards Integration of Web Data into a coherent Educational Data Graph

Outline

• Linked Open Data serving data-intensive applications

• Heterogeneity of datasets and schemas

• Is it all that easy to use Linked Open Data and what are they all about?– Interlinking of datasets only at a superficial level– Different schemas for similar resource classes accross datasets– Non-structured resource descriptions– Best-case scenario: very abstract topic definitions– Difficult to query for a subset of resources and datasets for a specific topic

• Our approach– Schema level integration– Enhanced dataset & resource descriptions– Instance level integration– Scalable annotation extraction– Clustering and correlation of datasets

10/04/23 Lile 2013 – Rio de Janeiro

Page 3: Towards Integration of Web Data into a coherent Educational Data Graph

Introduction

• Large amounts of publicly available Linked Open Data of educational relevance• Difficulties on providing large-scale integration• Dataset and resource description annotation• Clustering and dataset interlinking

10/04/23 Lile 2013 – Rio de Janeiro

Educational Data

Page 4: Towards Integration of Web Data into a coherent Educational Data Graph

Steps towards a Linked Education Data Graph

10/04/23 Lile 2013 – Rio de Janeiro

Page 5: Towards Integration of Web Data into a coherent Educational Data Graph

Schema Level Integration

10/04/23 Lile 2013 – Rio de Janeiro

http://data.linkededucation.org/ns/linked-education.rdf

Page 6: Towards Integration of Web Data into a coherent Educational Data Graph

Schema Level Integration

10/04/23 Lile 2013 – Rio de Janeiro

http://data.linkededucation.org/ns/linked-education.rdf

LinkedUniversities Dataset

Page 7: Towards Integration of Web Data into a coherent Educational Data Graph

Schema Level Integration

• VoID based schema:– http://data.linkededucation.org/ns/linked-education.rdf– Dataset cataloging and classification– Mappings (types, properties)

• Datasets: – LinkedUniversities Dataset– mEducator– Europeana

• Imported resources for clustering experiments:– 6 millions of distinct resources– 97 millions of RDF triples – 21.6 GB of data

• SPARQL endpoint: – http://okkam.l3s.uni-hannover.de:8880/openrdf-workbench/repositories/linked-

learning-rdf

10/04/23 Lile 2013 – Rio de Janeiro

DBLP-L3S BBC programmes ACM publications

Page 8: Towards Integration of Web Data into a coherent Educational Data Graph

Instance-level integration

10/04/23 Lile 2013 – Rio de Janeiro

<http://dbpedia.org/page/Gravitation>

<http://dbpedia.org/page/Strong>

<http://dbpedia.org/page/Dense>

• DBpedia Spotlight as NER & NED tool

• Annotation of unstructured content

• Selective & Scalable annotation

• Annotate tokens of different size

Page 9: Towards Integration of Web Data into a coherent Educational Data Graph

Instance-level integration

Characteristics of enrichments•Disambiguation •Acronyms detection (e.g. “dns”, “gmt”)•Synonyms detection (e.g. “globe”, “earth”)•Context detection (e.g. “apple” fruits, “apple” computer)

10/04/23 Lile 2013 – Rio de Janeiro

<http://dbpedia.org/page/Gravitation>

Page 10: Towards Integration of Web Data into a coherent Educational Data Graph

Correlation and Clustering

10/04/23 Lile 2013 – Rio de Janeiro

Gravitation

Equations

Earth

• Annotations used to construct a network of resources, with edges based on common resource annotations.

Page 11: Towards Integration of Web Data into a coherent Educational Data Graph

Correlation and Clustering

• Methods used for clustering• Based on the shared enrichments

• Naïve • Based on the ef-irf (Enrichment Frequency-Inverse Resource Frequency) index

• Jaccard• Cosine

Different threshold have been used to generate clusters

10/04/23 Lile 2013 – Rio de Janeiro

Page 12: Towards Integration of Web Data into a coherent Educational Data Graph

Evaluation

Three evaluation stages:

•Quantitative & Qualitative

• Assess annotation accuracy for exhaustive and scalable approaches

• Measure standard precision/recall metrics

• 250 resources for each dataset used for assessment

•Performance

• Gains in terms of scalability

10/04/23 Lile 2013 – Rio de Janeiro

Page 13: Towards Integration of Web Data into a coherent Educational Data Graph

Quantitative Evaluation

Context #Resources #Annotations #Entity Types

ACM 249 200 239mEducator 250 495 355BBC 250 1364 769LinkedUniversities 243 166 283DBLP 250 295 161Europeana 249 938 672Total 1491 3458 937

10/04/23 Lile 2013 – Rio de Janeiro

• Number of extracted entities is related to the length of a textual description in a

resource

• For long texts up to 87 distinct entities and more than 200 entity type associations

Page 14: Towards Integration of Web Data into a coherent Educational Data Graph

Qualitative Evaluation

10/04/23 Lile 2013 – Rio de Janeiro

• Human evaluators to measure annotation accuracy

• 2000 annotations for both (exhaustive and scalable) approaches were

assessed

• Number of evaluators for the first approach was 32, with an average of 63

tasks per user, while for the second, there were 23 users with an average

of 87 completed tasks

Precision RecallExhaustive 0.82 0.429Scalable 0.77 0.687∆[E-S] -0.05 +0.26

Page 15: Towards Integration of Web Data into a coherent Educational Data Graph

Performance Evaluation

Size-k No Filtering Filtered:resource level Filtered: dataset level

1 53089 24850 74642 51346 17919 132813 49603 11800 96074 47871 7793 64325 46153 5184 42896 44480 3529 2922

10/04/23 Lile 2013 – Rio de Janeiro

• Reduction of textual content to be analyzed for the annotation phase:

• Terms of tags {NN,NNP,NNPS}, reduce the amount of text by almost 40%.

• For various token sizes, the reduced amount goes up to 86%

• NER complexity task from DBpedia Spotlight:

• Reduction of HTTP requests.

• Avoid annotating similar chunks of text.

• Significant gains in terms of execution time: 3.5hrs vs. 20mins

Page 16: Towards Integration of Web Data into a coherent Educational Data Graph

Conclusion

• Large-scale educational data-graph

• Well-interlinked datasets at schema and instance level

• Enhanced dataset and resource description

• Scalable annotation procedure

• EF-IRF clustering approach

• Clusters and correlated datasets

10/04/23 Lile 2013 – Rio de Janeiro

Page 17: Towards Integration of Web Data into a coherent Educational Data Graph

Thank you!Questions?

10/04/23 Lile 2013 – Rio de Janeiro