payola eswc 2014 demo poster

1
PAYOLA Payola is a web framework for analyzing and visualizing Linked Data. It enables users to build their own instances of LDVM pipelines. Payola provides an LDVM analyzer editor in which SPARQL queries and custom plugins can be combined. Firstly, the user defines a set of data sources such as SPARQL endpoints or RDF files as input data and then connects other plugins to them. Join and Union plugins enable users to analyze a dataset created from multiple datasets stored in separate SPARQL endpoints. It is also possible to transform results of an analyzer with a custom transformer. When the pipeline is evaluated, the user can choose a visualizer to see the results in various forms. Throughout the LDVM pipeline all data is RDF and the user can download the results in a form of an RDF file. Payola also offers collaborative features. A user is able to create an analyzer and share it with the rest of the Payola users. That enables them to run such an analyzer as well as to create a new analytical plugin, which is based on that analyzer. As analytical plugins have parameters that affect their behavior, a new analyzer-based plugin may also have parameters, which can be chosen from the parameters of the plugins of the original analyzer. This feature supports formation of an ecosystem where expert users create ana- lyzers for those who are less experienced. Combining those analyzers into new ones enables even inexperienced users to create a complex analyzer with less effort. It is possible to extend Payola with custom plugins for analysis and visualization. For instance, a user is allowed to upload a code snippet of a new analytical plugin via our web interface. The framework compiles the code and inte- grates the created plugin immediately into the application. The latest Payola version offers a one-click solution for presenting results of an LDVM pipeline in a chosen visu- alizer. When an LDVM pipeline is created, it is assigned a unique URL. When a user accesses such a URL, Payola automatically loads the pipeline and creates the desired vi- sualization. To speed things up, it implements caching of analyzer results so that it can serve more users in a shorter time without repeated analysis evaluation. JIŘÍ HELMICH (1,2) , JAKUB KLÍMEK (3,2) , MARTIN NEČASKÝ (1) 1 Charles University in Prague, Faculty of Mathematics and Physics Malostranské nám. 25, 118 00 Praha 1, Czech Republic {helmich, necasky}@ksi.mff.cuni.cz 2 University of Economics, Prague Nám. W. Churchilla 4, 130 67 Praha 3, Czech Republic 3 Czech Technical University in Prague, Faculty of Information Technology Thákurova 9, 160 00 Praha 6, Czech Republic [email protected] LINKED DATA VISUALIZATION MODEL LDVM is an adaptation of the general Data State Reference Model (DSRM) for the specifics of the visualization of RDF and Linked Data. It is an abstract data process inspired by a typical Knowledge Discovery Process. We extend DSRM with three additional concepts - analyzers, transformers and visualizers. They denote reusable software compo- nents that can be chained to form an LDVM instance. LDVM resembles a pipeline starting with raw source data (not necessarily RDF) and results with a visualization of the source data. It is organized into 4 stages that source data needs to pass through. Within stages, there are operators that allow in-stage data transformations. SPARQL Operators, Visual- ization Operators and View Operators. http://payola.cz Source RDF and non-RDF Data Analy cal RDF Abstrac on Data Transforma on Visualiza on RDF Abstrac on View Visualiza on Transformaon Visual Mapping Transforma on Visualiza on Operators View Operators Analycal SPARQL Operators Analyzer Visualizer Visualiza on Transformer 1. Source RDF and non-RDF data raw data that can be RDF or adhering to other data models and formats (e.g. XML, CSV) as well as semi-structured or even non-structured data (e.g. HTML pages or raw text). 2. Analytical abstraction extraction and representation of relevant data in RDF obtained from source data. 3. Visualization abstraction preparation of an RDF data structure required by a particular visualization technique (e.g., 1D, 2D, 3D or multi-dimensional data, tree data, etc.) 4. View creation of a visualization for the end user. Payola uses variety of 3rd party JavaScript libraries like D3JS in order to create visualizations. Data is propagated through the LDVM pipeline by ap- plying 3 types of transformation operators: 1. Data transformation transforms the raw data represented in a source data model or format into a representation in the RDF data model; the result forms the base for creating the ana- lytical RDF abstraction. 2. Visualization transformation transforms the obtained analytical abstraction into a visualization abstraction. 3. Visual mapping transformation maps the visualization abstraction data structure to a concrete visual structure on the screen using a partic- ular visualization technique specified using a set of pa- rameters. ARES Business Entities COI.CZ Geocoordi nates Institution s of public power (OVM) Consolida ted Law NUTS codes LAU regions Demogra phy Budgets Exchange rates CPV 2008 Elections results Research projects Czech Public Contracts Court decisions RUIAN TED Public Contracts OVM Agendas Governmental Business-entities Geographical Statistical COI.CZ Populated cities Hierarchy DataCube GEO Source RDF data Analyzers Transformers Views The resulting plugin can be used in various ways in an LDVM analyzer. Connected directly to a data source it works as a filter and transformer which se- lects only data related to the specified DSD and maps it to DCV at the same time. It could also be beneficial for a user to use the plugin as an inner analytical op- erator to filter and map processed data since using DCV it becomes snowflake-shaped and can be easier to work with in further analytical steps. Or, as a final plugin of an analyzer, it can transform results of a non-DCV analysis into DCV in the same way a visu- alization transformation does. While experimenting with statistical data, we have encoun- tered Linked Data datasets which contain statistical data, but do not use Data Cube Vocabulary. Since we have a visualizer using DCV, we implemented a tool, which is capable of map- ping RDF non-cube data to a form compliant with DCV as a plugin usable in LDVM analyzers. While creating a new LDVM analyzer in Payola, a user is also able to create a new instance of the DCV analytical plugin. On its input the plugin recieves arbitrary RDF data and based on a user-defined pat- tern, it maps the data to a specified DCV data structure defini- tion. A user is asked to supply a URL containing at least one DCV data structure definition (DSD) in RDF. The user is pre- sented with a list of available DSDs and after selecting one, a new analytical plugin is created for this DSD. This plugin can then be used by other Payola users without the need for speci- fying the URL with DSD and becomes a part of our extensible library of reusable DCV analyzers. To be able to map an arbitrary dataset into a form compliant with DCV, the plugin needs the user to specify the data map- ping. Based on DCV, this could be partially automated in the future. The process is based on the query-by-example princi- ple. The plugin shows the user a generic graph visualization based on a preview of the input which will be processed by the DCV analytical plugin. It lets them to select a pattern: step by step, they are asked by the application to mark a vertex, which represents one of dimensions/measures/attributes of the chosen DSD (red vertices). To narrow down the volume of the results or to be able to specify more sophisticated patterns, the user is also able to mark vertices (green ones), which refine the pattern, but do not represent any DSD component. Based on the given example, the plugin produces a SPARQL query. When executed against a SPARQL endpoint, it creates new links between existing resources and components of the DSD. Our approach is based on the idea to describe the expected input of a LDVM component with an input signature and the expected output with an output data sample. The signature and the data sample are provided by the creator of the com- ponent. Each component can then check whether its input signature is compatible with the output sample of the previous component. The input signature comprises a set of SPARQL ASK queries which should be inexpensive so that they can be evaluated quickly on a number of datasets. The output data sample is a small RDF data sample that shows the format of the output of the component. The input signature of one component is then compatible with the output data sample of another component when all the SPARQL ASK queries of the signature are evaluated on the data sample as true. Our rationale is to provide a simple and lightweight solution, which allows to check the compatibil- ity of a number of components without complex reasoning. Output data sample & input signature examples Exploration mode

Upload: jiri-helmich

Post on 20-Jul-2015

84 views

Category:

Internet


0 download

TRANSCRIPT

PAYOLA

Payola is a web framework for analyzing and visualizing Linked Data. It enables users to build their own instances of LDVM pipelines. Payola provides an LDVM analyzer editor in which SPARQL queries and custom plugins can be combined.

Firstly, the user defines a set of data sources such as SPARQL endpoints or RDF files as input data and then connects other plugins to them. Join and Union plugins enable users to analyze a dataset created from multiple datasets stored in separate SPARQL endpoints. It is also possible to transform results of an analyzer with a custom transformer. When the pipeline is evaluated, the user can choose a visualizer to see the results in various forms. Throughout the LDVM pipeline all data is RDF and the user can download the results in a form of an RDF file.

Payola also offers collaborative features. A user is able to create an analyzer and share it with the rest of the Payola users. That enables them to run such an analyzer as well as to create a new analytical plugin, which is based on that analyzer. As analytical plugins have parameters that affect their behavior, a new analyzer-based plugin may also have parameters, which can be chosen from the parameters of the plugins of the original analyzer. This feature supports formation of an ecosystem where expert users create ana-lyzers for those who are less experienced. Combining those analyzers into new ones enables even inexperienced users to create a complex analyzer with less effort.

It is possible to extend Payola with custom plugins for analysis and visualization. For instance, a user is allowed to upload a code snippet of a new analytical plugin via our web interface. The framework compiles the code and inte-grates the created plugin immediately into the application.

The latest Payola version offers a one-click solution for presenting results of an LDVM pipeline in a chosen visu-alizer. When an LDVM pipeline is created, it is assigned a unique URL. When a user accesses such a URL, Payola automatically loads the pipeline and creates the desired vi-sualization. To speed things up, it implements caching of analyzer results so that it can serve more users in a shorter time without repeated analysis evaluation.

JIŘÍ HELMICH (1,2), JAKUB KLÍMEK (3,2), MARTIN NEČASKÝ (1)

1 Charles University in Prague, Faculty of Mathematics and PhysicsMalostranské nám. 25, 118 00 Praha 1, Czech Republic{helmich, necasky}@ksi.mff.cuni.cz

2 University of Economics, PragueNám. W. Churchilla 4, 130 67 Praha 3, Czech Republic

3 Czech Technical University in Prague, Faculty of Information TechnologyThákurova 9, 160 00 Praha 6, Czech [email protected]

LINKED DATA VISUALIZATION MODEL

LDVM is an adaptation of the general Data State Reference Model (DSRM) for the specifics of the visualization of RDF and Linked Data. It is an abstract data process inspired by a typical Knowledge Discovery Process. We extend DSRM with three additional concepts - analyzers, transformers and visualizers. They denote reusable software compo-nents that can be chained to form an LDVM instance.

LDVM resembles a pipeline starting with raw source data (not necessarily RDF) and results with a visualization of the source data.

It is organized into 4 stages that source data needs to pass through. Within stages, there are operators that allow in-stage data transformations. SPARQL Operators, Visual-ization Operators and View Operators.

http://payola.cz

Source RDF and non-RDF Data

Analytical RDFAbstraction

Data Transformation

Visualization RDF Abstraction

View

Visualization Transformation

Visual Mapping Transformation

Visualization Operators

ViewOperators

Analytical SPARQL

Operators

Analyzer

Visualizer

VisualizationTransformer

1. Source RDF and non-RDF dataraw data that can be RDF or adhering to other data models and formats (e.g. XML, CSV) as well as semi-structured or even non-structured data (e.g. HTML pages or raw text).

2. Analytical abstractionextraction and representation of relevant data in RDF obtained from source data.

3. Visualization abstractionpreparation of an RDF data structure required by a particular visualization technique (e.g., 1D, 2D, 3D or multi-dimensional data, tree data, etc.)

4. Viewcreation of a visualization for the end user. Payola uses variety of 3rd party JavaScript libraries like D3JS in order to create visualizations.

Data is propagated through the LDVM pipeline by ap-plying 3 types of transformation operators:

1. Data transformationtransforms the raw data represented in a source data model or format into a representation in the RDF data model; the result forms the base for creating the ana-lytical RDF abstraction.

2. Visualization transformationtransforms the obtained analytical abstraction into a visualization abstraction.

3. Visual mapping transformationmaps the visualization abstraction data structure to a concrete visual structure on the screen using a partic-ular visualization technique specified using a set of pa-rameters.

ARESBusiness Entities

COI.CZ

Geocoordinates

Institutions of public

power (OVM)

Consolidated Law

NUTScodes

LAU regions

Demography

Budgets

Exchange rates

CPV 2008

Elections results

Research projects

Czech Public

Contracts

Court decisions

RUIAN

TED Public

Contracts

OVMAgendas

GovernmentalBusiness-entities

GeographicalStatistical

COI.CZ

Populated cities

Hierarchy

DataCube

GEO

Source RDF data Analyzers Transformers Views

The resulting plugin can be used in various ways in an LDVM analyzer. Connected directly to a data source it works as a filter and transformer which se-lects only data related to the specified DSD and maps it to DCV at the same time. It could also be beneficial for a user to use the plugin as an inner analytical op-

erator to filter and map processed data since using DCV it becomes snowflake-shaped and can be easier to work with in further analytical steps. Or, as a final plugin of an analyzer, it can transform results of a non-DCV analysis into DCV in the same way a visu-alization transformation does.

While experimenting with statistical data, we have encoun-tered Linked Data datasets which contain statistical data, but do not use Data Cube Vocabulary. Since we have a visualizer using DCV, we implemented a tool, which is capable of map-ping RDF non-cube data to a form compliant with DCV as a plugin usable in LDVM analyzers. While creating a new LDVM analyzer in Payola, a user is also able to create a new instance of the DCV analytical plugin. On its input the plugin recieves arbitrary RDF data and based on a user-defined pat-tern, it maps the data to a specified DCV data structure defini-tion. A user is asked to supply a URL containing at least one DCV data structure definition (DSD) in RDF. The user is pre-sented with a list of available DSDs and after selecting one, a new analytical plugin is created for this DSD. This plugin can then be used by other Payola users without the need for speci-fying the URL with DSD and becomes a part of our extensible library of reusable DCV analyzers.

To be able to map an arbitrary dataset into a form compliant with DCV, the plugin needs the user to specify the data map-ping. Based on DCV, this could be partially automated in the future. The process is based on the query-by-example princi-ple. The plugin shows the user a generic graph visualization based on a preview of the input which will be processed by the DCV analytical plugin. It lets them to select a pattern: step by step, they are asked by the application to mark a vertex, which represents one of dimensions/measures/attributes of the chosen DSD (red vertices). To narrow down the volume of the results or to be able to specify more sophisticated patterns, the user is also able to mark vertices (green ones), which refine the pattern, but do not represent any DSD component. Based on the given example, the plugin produces a SPARQL query. When executed against a SPARQL endpoint, it creates new links between existing resources and components of the DSD.

Our approach is based on the idea to describe the expected input of a LDVM component with an input signature and the expected output with an output data sample. The signature and the data sample are provided by the creator of the com-ponent. Each component can then check whether its input signature is compatible with the output sample of the previous component.

The input signature comprises a set of SPARQL ASK queries which should be inexpensive so that they can be evaluated quickly on a number of datasets. The output data sample is a small RDF data sample that shows the format of the output of the component.

The input signature of one component is then compatible with the output data sample of another component when all the SPARQL ASK queries of the signature are evaluated on the data sample as true. Our rationale is to provide a simple and lightweight solution, which allows to check the compatibil-ity of a number of components without complex reasoning.

Output data sample & input signature examples

Exploration mode