linked data for federation of oer data & repositories

18
Linked Data for Open Educational Data Sharing and Repository Federation Stefan Dietze (L3S Research Center, DE, @stefandietze, http://purl.org/dietze) 02/04/13 Stefan Dietze

Upload: stefan-dietze

Post on 27-Jan-2015

109 views

Category:

Education


0 download

DESCRIPTION

An overview over different alternatives and opportunities of using Linked Data principles and datasets for federated access to distributed OER repositories. The talk was held at the ARIADNE/GLOBE convening (http://ariadne-eu.org/content/open-federations-2013-open-knowledge-sharing-education) at LAK 2013, Leuven, Belgium on 8 April 2013

TRANSCRIPT

Page 1: Linked Data for Federation of OER Data & Repositories

Motivation Data on the Web

Some eyecatching opener illustrating growth and or diversity of web data

Linked Data for Open Educational Data Sharing and Repository Federation

Stefan Dietze (L3S Research Center, DE, @stefandietze, http://purl.org/dietze)

02/04/13 Stefan Dietze

Page 2: Linked Data for Federation of OER Data & Repositories

De-facto standard for sharing data on the Web

Vision: well connected graph of open Web data

W3C standards (RDF, SPARQL) to expose data

Persistent URIs to interlink datasets

Linked Data

Domain Number of

datasets Triples % (Out-)Links %

Media 25 1,841,852,061 5.82 % 50,440,705 10.01 %

Geographic 31 6,145,532,484 19.43 % 35,812,328 7.11 %

Government 49 13,315,009,400 42.09 % 19,343,519 3.84 %

Publications 87 2,950,720,693 9.33 % 139,925,218 27.76 %

Cross-domain 41 4,184,635,715 13.23 % 63,183,065 12.54 %

Life sciences 41 3,036,336,004 9.60 % 191,844,090 38.06 %

User-generated

content 20 134,127,413 0.42 % 3,449,143 0.68 %

295 31,634,213,770

503,998,829

Source: http://lod-cloud.net/state, September 2011

Media Ontology

FOAF

Gene Ontology

FMA Ontology

BIBO

Geo Ontology

DBpedia Ontology

Dublin Core

rNews

Page 3: Linked Data for Federation of OER Data & Repositories

Option 1: LD for integration of heterogeneous APIs & data Use case: biomedical education in

=> http://metamorphosis.med.duth.gr/ Metamorphosis+ Tailored (L)CMS plugins

=> http://www.meducator3.net/

Data/services integration & retrieval/search APIs

? Educational Web Resources

Page 4: Linked Data for Federation of OER Data & Repositories

Data/services integration & retrieval/search APIs Linked Educational Resources

http://linkededucation.org/meducator

Approach: 1) On the fly queries via “SmartLink” (Linked Data registry execution engine for open APIs)

2) Data lifting from heterogeneous repositories using “SmartLink” API and lifting specifications

3) Data enrichment (via DBpedia, Freebase, BioPortal) & clustering, eg to identify correlated resources

Goal: improvement of distributed (non-LD) data with public LOD vocabularies; tighter interlinking to provide coherent graph of educational data (across disparate stores)

http://purl.org/smartlink

Schemas: OAI-DC, LOM, …

Formats: XML, JSON

Interfaces: OAI-PMH, REST, SOAP

Option 1: LD for integration of heterogeneous APIs & data

Educational Web Resources

Page 5: Linked Data for Federation of OER Data & Repositories

db:Viral

Infections db:Human

Papilloma Virus

db:Life

Sciences

<led:Resource-OpenLearn-2139393292>

<led:title>…viral…disease…</led:title>

</led:Resource-OpenLearn-2139393292>

<led:Resource-BBC-519215>

<led:title>…virus…</led:title>

</led:Resource-BBC-519215>

Option 1: LD for integration of heterogeneous APIs & data LD vocabularies for disambiguation & clustering

Stefan Dietze 08/04/13

<led:Resource-mEducator-2139393292>

<led:title>Virtual patient 1002,

infections & HPV</led:title>

</led:Resource-mEducator-2139393292>

db:Disease

Page 6: Linked Data for Federation of OER Data & Repositories

Data/services integration & retrieval/search APIs Linked Educational Resources

http://linkededucation.org/meducator http://purl.org/smartlink

Schemas: OAI-DC, LOM, …

Formats: XML, JSON

Interfaces: OAI-PMH, REST, SOAP

Option 1: LD for integration of heterogeneous APIs & data Some issues/challenges

On-the-fly data integration, but issues wrt:

Annotation and description overhead: data lifting requires well-defined lifting specs for each API

Performance: distributed queries (multiple HTTP requests), on-the fly data lifting and processing

Scalability: decrease of query performance with increasing amount of repositories and/or data

Educational Web Resources

Page 7: Linked Data for Federation of OER Data & Repositories

<dc:title> <akt:has-title> ?

OER

Publication

VideoLecture

LinkedUniversities

educational videos

Step 1 – Alignment of types/properties

12/03/13 7 Mathieu d‘Aquin, Stefan Dietze

Option 2: large-scale data harvesting and LD-ification Linked Data for automated cross-platform integration

6 million distinct (but linked) resources

97 million RDF triples

21.6 GB of data

Schema: http://data.linkededucation.org/ns/linked-education.rdf

SPARQL: http://data.linkededucation.org/request/linked-learning/sparql

LD and non-LD data

Step 2 – Linking of resources

Page 8: Linked Data for Federation of OER Data & Repositories

<dc:title> <akt:has-title> ?

OER

Publication

VideoLecture

LinkedUniversities

educational videos

Step 1 – Alignment of types/properties

12/03/13 8 Mathieu d‘Aquin, Stefan Dietze

Option 2: large-scale data harvesting and LD-ification Linked Data for automated cross-platform integration

6 million distinct (but linked) resources

97 million RDF triples

21.6 GB of data

Schema: http://data.linkededucation.org/ns/linked-education.rdf

SPARQL: http://data.linkededucation.org/request/linked-learning/sparql

LD and non-LD data

Step 2 – Linking of resources

Larger scale data processing, but issues wrt:

Scalability and performance of data storage (potential solutions: applying distributed RDF storage, map/reduce etc)

Poor query performance (on large-scale datasets)

Redundant data maintenance => periodic data imports

Maintenance of different identifiers (in case of non-LD sources: URIs vs internal IDs)

Page 9: Linked Data for Federation of OER Data & Repositories

“LinkedUp/Linked Education cloud” as (expanded) subset of LOD cloud: CKAN – “The DataHub” (http://datahub.io) for data collection in dedicated group “linked-education”

Public RDF vocabulary of datasets (“Linked Education Catalog”) (classification of datasets according to, eg, represented types, disciplines, data quality)

Additional integration datasets: dataset links and coreferences => providing a unified view on educational data => Linked Education Graph

Infrastructure, unified (SPARQL) endpoint & APIs for distributed/federated querying

Option 3: dataset cataloging and query federation LinkedUp approach [ http://linkedup-project.eu ]

Educational Datasets

Stefan Dietze 08/04/13

LinkedUp LinkedUp Dataset Catalog Data Interlinking & Correlation

Page 10: Linked Data for Federation of OER Data & Repositories

Linked Education Cloud & Catalog

http://datahub.io/group/linked-education

http://data.linkededucation.org/linkedup/catalog/

Page 11: Linked Data for Federation of OER Data & Repositories

Option 3: dataset cataloging and query federation Sparse knowledge / metadata about datasets

http://datahub.io/dataset/lak-dataset

Resource Types?

Topics & disciplines?

Quality & availability?

http://datahub.io/group/linked-education

Page 12: Linked Data for Federation of OER Data & Repositories

Option 3: dataset cataloging and query federation Co-occurence of (mapped) types

Stefan Dietze 08/04/13

Page 13: Linked Data for Federation of OER Data & Repositories

Option 3: dataset cataloging and query federation Dataset graph (according to type co-occurence)

Stefan Dietze 08/04/13

Page 14: Linked Data for Federation of OER Data & Repositories

Approach

Enriching sample resources from each dataset with DBpedia entities/categories

Linking resources to LOD entities & categories via

Option 3: dataset cataloging and query federation Detection of topics and dataset similarities

Top-ranked categories/topics in Linked Education Catalog &

their frequency

Stefan Dietze 02/04/13

DBpedia Category Total Management 180 Academia 151 Social_sciences 131 Philosophy_of_science 125 Design 120 Sociology_index 117 Systems_science 117 Anthropology 116 Universities_and_colleges 116 Economics 114 Scientific_method 111 Cognitive_science 110 Systems 107 Sociological_terms 104 Neuropsychological_assessment 100 Concepts_in_metaphysics 96 Developmental_psychology 93 Political_philosophy 89 Cybernetics 88 Education 87 Philosophy_of_education 86 Arts 77 Critical_thinking 73 Biology 71 Political_science_terms 71

Page 15: Linked Data for Federation of OER Data & Repositories
Page 16: Linked Data for Federation of OER Data & Repositories

Summary and outlook

Summary

Different ways of using LD for federation of OER repositories

Linked Education data catalog (http://linkedup-project.eu, http://data.linkededucation.org/linkedup/catalog/): Linked Data-based catalog of open educational datasets (gradual addition of metadata about, eg, types, topics etc)

On the way: exposing non-LD educational data according to LD priniciples (eg LAK dataset)

Future work

Data interlinking: complementary dataset of links between datasets and actual data/resources

Query federation and dedicated APIs

Exploitation in innovative educational scenarios and applications => LinkedUp Challenge (http://linkedup-challenge.org)

Stefan Dietze 08/04/13

40.000 EUR price budget

Large network of organisations in LD & TEL

Dedicated data and support

Series of affiliated events at major conferences (www2013, ESWC2013, OKCON, LAK2013…)

Page 17: Linked Data for Federation of OER Data & Repositories

LAK Challenge / LA & Linked Data Tutorial in a nutshell

Stefan Dietze

http://www.solaresearch.org/events/lak/lak-data-challenge/

http://linkedu.eu/event/lak2013-linkeddata-tutorial/