20141030 linda workshop echallenges2014 - linked data analytics

14
Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd. Business Value Creation from Linked Data Analytics: The LinDA Approach Anastasios Zafeiropoulos, Eleni Fotopoulou Ubitech Ltd./R&D Department Athens, Greece [email protected]

Upload: lindafp7

Post on 31-Jul-2015

48 views

Category:

Presentations & Public Speaking


1 download

TRANSCRIPT

Page 1: 20141030 LinDa Workshop echallenges2014 - Linked Data Analytics

Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd.

Business Value Creation from Linked Data Analytics: The LinDA Approach

Anastasios Zafeiropoulos, Eleni FotopoulouUbitech Ltd./R&D Department

Athens, [email protected]

Page 2: 20141030 LinDa Workshop echallenges2014 - Linked Data Analytics

Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd.

Introduction

• Existence of a wide set of available data sources nowadays with parallel lack of means to exploit them in an optimal way and realise advanced analysis.

• Need for proper interconnection of concepts for managing to examine the relationship among entities represented in different data sets.

• An approach for the extraction of Linked Data analytics is presented, based on the exploitation of a set of tools for proper transformation and interlinking of public and private datasets and the realization of analysis over them.

• The proposed approach targets at enhancing the ability of public and private sector organizations to provide usable Linked Data, while offering SMEs the opportunity to perform advanced algorithmic analysis.

Page 3: 20141030 LinDa Workshop echallenges2014 - Linked Data Analytics

Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd.

Challenges (1)

• Need for management of structured and un-structured data in multiple formats that in some cases lack representation based on defined schemas.

• Data aggregation from distributed sources• increasing wealth of dataset cross-linkage;• SPARQL queries cannot readily be executed as their constituent triple

patterns span across multiple datasets.

• Compilation of proper and meaningful datasets to be provided to the analytics tools– review the datasets and prepare them in proper format;– extract knowledge from the data through interlinking, inferences as well as

analytics extraction;– maintain and update the data regularly;– elimination of co-references among the available data.

Page 4: 20141030 LinDa Workshop echallenges2014 - Linked Data Analytics

Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd.

Challenges (2)

• Handle data with different quality characteristics regarding their accuracy, consistency, timeliness, completeness, relevance, interpretability and trustworthiness– set of information quality assessment metrics; – some of the indicators cannot be automatically assessed;– data quality assessment is performed only on a small sample of the

data which results in a decrease of the precision of the quality scores;

• Need to process high volume data in some cases as well as have the capacity to apply and evaluate the results of proper algorithms.

• Learning curve for the adoption of Linked Data technologies from SMEs and public administrations.

Page 5: 20141030 LinDa Workshop echallenges2014 - Linked Data Analytics

Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd.

Linked Data Analytics: the LinDA approach

• Linked Data: a set of best practices for representing and connecting structured information on the web.

• LinDA addresses one of the most significant challenges of the usage and publication of Linked Data, the renovation and conversion of existing data formats into structures that support the semantic enrichment and interlinking of data.

• The proposed approach is building upon the collection of data from available data sources, their transformation in proper format (e.g. RDF format) and their interlinking for the creation of extended linked datasets, fed as input in the analytics extraction process.

Page 6: 20141030 LinDa Workshop echallenges2014 - Linked Data Analytics

Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd.

The LinDA Approach

Page 7: 20141030 LinDa Workshop echallenges2014 - Linked Data Analytics

Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd.

Linked Data Analytics – Algorithms Categorization

• A library of basic and robust data analytic functionality is provided.

• Design and deployment of workflows for algorithms execution based on their categorization:– Classifiers for identifying to which of a set of categories a new

observation belongs based on a training set;– Clusterers for grouping a set of objects in such a way that objects in

the same group are more similar to each other;– Statistical and Forecasting Analysis for discovering interesting

relations between variables and providing information regarding future trends;

– Attribute Selection (evaluators and search methods) algorithms for selecting a subset of relevant features for use in model construction based on evaluation metrics;

Page 8: 20141030 LinDa Workshop echallenges2014 - Linked Data Analytics

Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd.

Integrated Components

• Based on existing open-source platforms for extracting data analytics.

• Weka open-source tool.• R project for statistical computing.• Customized end-user applications for selected business

domains, targeted at reducing the overall complexity in the configuration of the algorithms and the preparation and management of the linked datasets.

Page 9: 20141030 LinDa Workshop echallenges2014 - Linked Data Analytics

Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd.

Analytics Ecosystem Components

Page 10: 20141030 LinDa Workshop echallenges2014 - Linked Data Analytics

Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd.

Interconnection with LinDA Components

Page 11: 20141030 LinDa Workshop echallenges2014 - Linked Data Analytics

Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd.

Production of Linked Data Analytics

• Supporting RDF input and output.• RDF to CSV transformation in the Publication and

Consumption Framework.• Enriched CSV input loaded to the analytics tool: metadata

for initiating RDF URI, submitted query, analytics process id, analytics process description, storage options at LinDA repository.

• RDF output available in the LinDA repository.• Analytics output available for creation of visualisations

(where appropriate).

Page 12: 20141030 LinDa Workshop echallenges2014 - Linked Data Analytics

Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd.

Elaborated Ontologies

• FOAF Ontology: describe person activities;• PROV Ontology: represent and interchange provenance information

generated in different systems and under different contexts;• SIO Ontology: simple upper level comprised of essential types and

relations for the rich description of arbitrary (real, hypothesized, virtual, fictional) objects, processes and their attributes.

Page 13: 20141030 LinDa Workshop echallenges2014 - Linked Data Analytics

Workshop 7d, 30 October 2014 eChallenges e-2014 Copyright 2014 , Ubitech Ltd.

Pilots & Business Value

• Business Intelligence Pilot• Media Analytics Pilot• Environment Analytics Pilot• Redesign of current business processes• Reduction in overall complexity and

administration overhead• Deployment of novel services• Short Demo

Page 14: 20141030 LinDa Workshop echallenges2014 - Linked Data Analytics

info @

LinDA-project.eu

@LinDA_FP7

+

LinDA-project.eu

LinDAFP7

Thank you! Questions? Anastasios Zafeiropoulos| [email protected]

Senior R&D Architect

Ubitech Ltd.| www.ubitech.eu