lod2 webinar series: zemanta / open refine

31
LOD2 Webinar . 29.11.2011 . Page 1 http://lod2.eu Creating Knowledge out of Interlinked Data

Upload: lod2-creating-knowledge-out-of-interlinked-data

Post on 07-May-2015

749 views

Category:

Documents


1 download

DESCRIPTION

This webinar in the course of the LOD2 webinar series will present Zemanta and its LODRefine - a LOD-enabled version of OpenRefine (previously Google Refine), which is a part of the LOD2 stack. LODRefine extends cleansing and linking functionalities of OpenRefine by providing means to reconcile and augment your data with DBpedia or any other SPARQL endpoint, extract named entities using Zemanta API, export data in one of the RDF formats, and recently also to exploit available crowdsourcing services. In webinar we will demonstrate several task which demonstrate the ease of use and versatility of LODRefine. If you are interested in Linked (Open) Data principles and mechanisms, LOD tools & services and concrete use cases that can be realised using LOD then join us in the free LOD2 webinar series: http://lod2.eu/BlogPost/webinar-series

TRANSCRIPT

Page 1: LOD2 Webinar Series: Zemanta / Open refine

LOD2 Webinar . 29.11.2011 . Page 1 http://lod2.eu

Creating Knowledge out of Interlinked Data

Page 2: LOD2 Webinar Series: Zemanta / Open refine

LOD2 Webinar . 29.11.2011 . Page 2 http://lod2.eu

Creating Knowledge out of Interlinked Data

http://lod2.eu

LOD2 is a large-scale integrating project co-funded by the European Commission within the FP7 Information and Communication Technologies Work Programme. This 4-year project comprises leading Linked Open Data technology researchers, companies, and service providers. Coming from across 12 countries the partners are coordinated by the Agile Knowledge Engineering and Semantic Web Research Group at the University of Leipzig, Germany. LOD2 will integrate and syndicate Linked Data with existing large-scale applications. The project shows the benefits in the scenarios of Media and Publishing, Corporate Data intranets and eGovernment.

Page 3: LOD2 Webinar Series: Zemanta / Open refine

LOD2 Webinar . 29.11.2011 . Page 3 http://lod2.eu

Creating Knowledge out of Interlinked Data

http://lod2.eu

Once  per  month  the  LOD2  webinar  series  offer  a  free  webinar  about  tools  and  services  along  the  Linked  Open  Data  Life  Cycle.    Stay  with  us  and  learn  more  about  acquisiAon,  ediAng,  composing,  connected  applicaAons  –  and  finally  publishing  Linked  Open  Data.  

Page 4: LOD2 Webinar Series: Zemanta / Open refine

LOD2 Webinar . 29.11.2011 . Page 4 http://lod2.eu

Creating Knowledge out of Interlinked Data

http://lod2.eu

LODRefine – LOD-enabled OpenRefine The tool for cleansing, linking and augmenting data by Mateja Verlic, Zemanta

Page 5: LOD2 Webinar Series: Zemanta / Open refine

LOD2 Webinar . 29.11.2011 . Page 5 http://lod2.eu

Creating Knowledge out of Interlinked Data

Zemanta brings useful content to bloggers, connect authors to their peers and publishers to marketers.

•  Content research services •  Content enrichment tools

Our role in LOD2 •  Web scale link & text mining from unstructured data •  Tools for cleansing data and crowdsourcing of cleansing

Dr. Mateja Verlič

Company

Page 6: LOD2 Webinar Series: Zemanta / Open refine

LOD2 Webinar . 29.11.2011 . Page 6 http://lod2.eu

Creating Knowledge out of Interlinked Data

Presentation outline

•  Terminology briefing •  Introduction to LODRefine •  The core: OpenRefine •  LOD-friendly extensions •  Demonstration •  Q & A

Page 7: LOD2 Webinar Series: Zemanta / Open refine

LOD2 Webinar . 29.11.2011 . Page 7 http://lod2.eu

Creating Knowledge out of Interlinked Data

Def: to reconcile •  To reestablish a close relationship between. •  To make compatible or consistent. (The Free Dictionary)

Reconciling

Page 8: LOD2 Webinar Series: Zemanta / Open refine

LOD2 Webinar . 29.11.2011 . Page 8 http://lod2.eu

Creating Knowledge out of Interlinked Data

Def: to augment •  To make (something already developed or well under way) greater, as in size,

extent, or quantity (The Free Dictionary)

Augmenting / extending

Page 9: LOD2 Webinar Series: Zemanta / Open refine

LOD2 Webinar . 29.11.2011 . Page 9 http://lod2.eu

Creating Knowledge out of Interlinked Data

Def: crowdsourcing •  is the act of outsourcing tasks, traditionally performed by an employee or

contractor, to an undefined, large group of people or community (a crowd), through an open call.

Crowdsourcing

Page 10: LOD2 Webinar Series: Zemanta / Open refine

LOD2 Webinar . 29.11.2011 . Page 10 http://lod2.eu

Creating Knowledge out of Interlinked Data

Introduction to LODRefine

LOD-enabled OpenRefine Google Refine ==> OpenRefine LODGrefine ==> LODRefine •  Supporting DBpedia (and Freebase) •  Supporting crowdsourcing •  Exporting RDF •  Extracting named entities

Page 11: LOD2 Webinar Series: Zemanta / Open refine

LOD2 Webinar . 29.11.2011 . Page 11 http://lod2.eu

Creating Knowledge out of Interlinked Data

LODRefine’s place in LOD life cycle

Page 12: LOD2 Webinar Series: Zemanta / Open refine

LOD2 Webinar . 29.11.2011 . Page 12 http://lod2.eu

Creating Knowledge out of Interlinked Data

OpenRefine

Cross-platform server-client application •  Runs locally •  No dataset Supports: •  Faceted browsing •  Regular expressions •  GREL expressions •  Extensions   value.split(",")[0].strip()  

Page 13: LOD2 Webinar Series: Zemanta / Open refine

LOD2 Webinar . 29.11.2011 . Page 13 http://lod2.eu

Creating Knowledge out of Interlinked Data

OpenRefine

Page 14: LOD2 Webinar Series: Zemanta / Open refine

LOD2 Webinar . 29.11.2011 . Page 14 http://lod2.eu

Creating Knowledge out of Interlinked Data

The Extensions

Extend functionalities of OpenRefine Developed by •  Zemanta: DBpedia extension, Crowdsourcing •  DERI: RDF Refine •  Free Your Metadata Group: Named Entity Extraction extension

Page 15: LOD2 Webinar Series: Zemanta / Open refine

LOD2 Webinar . 29.11.2011 . Page 15 http://lod2.eu

Creating Knowledge out of Interlinked Data

RDF Refine extension

Reconciliation and interlinking •  DBpedia •  Any SPARQL Endpoint or RDF dump •  Supporting for Apache Stanbol Exporting RDF •  Defining graph shape before exporting •  Using custom vocabularies or importing existing ones

Webpage: http://refine.deri.ie/ Github: https://github.com/fadmaa/grefine-rdf-extension

Page 16: LOD2 Webinar Series: Zemanta / Open refine

LOD2 Webinar . 29.11.2011 . Page 16 http://lod2.eu

Creating Knowledge out of Interlinked Data

RDF Refine extension - reconciling

Page 17: LOD2 Webinar Series: Zemanta / Open refine

LOD2 Webinar . 29.11.2011 . Page 17 http://lod2.eu

Creating Knowledge out of Interlinked Data

DBpedia extension

Extending reconciled data with columns from DBpedia •  RDF extension recommended Extracting Named Entities using Zemanta API •  API key required

Webpage: http://code.zemanta.com/sparkica Github: https://github.com/sparkica/dbpedia-extension

Page 18: LOD2 Webinar Series: Zemanta / Open refine

LOD2 Webinar . 29.11.2011 . Page 18 http://lod2.eu

Creating Knowledge out of Interlinked Data

DBpedia extension – extending data

Page 19: LOD2 Webinar Series: Zemanta / Open refine

LOD2 Webinar . 29.11.2011 . Page 19 http://lod2.eu

Creating Knowledge out of Interlinked Data

DBpedia extension – extracting entities

Page 20: LOD2 Webinar Series: Zemanta / Open refine

LOD2 Webinar . 29.11.2011 . Page 20 http://lod2.eu

Creating Knowledge out of Interlinked Data

NER extension

Extracts named entities from unstructured text Currently supports •  Alchemy API •  DBpedia Lookup •  Zemanta API

API keys required Webpage: http://freeyourmetadata.org/named-entity-extraction/ Github: https://github.com/RubenVerborgh/Refine-NER-Extension

Page 21: LOD2 Webinar Series: Zemanta / Open refine

LOD2 Webinar . 29.11.2011 . Page 21 http://lod2.eu

Creating Knowledge out of Interlinked Data

NER extension – extracting entities

Page 22: LOD2 Webinar Series: Zemanta / Open refine

LOD2 Webinar . 29.11.2011 . Page 22 http://lod2.eu

Creating Knowledge out of Interlinked Data

Crowdsourcing extension

Support for •  Creating new crowdsourcing jobs •  Publishing data on CrowdFlower service •  Multiple labor channels (Amazon MT) •  CrowdFlower API key required Job templates •  Evaluating reconciliation results •  Finding information (e.g. URLs) Webpage: http://code.zemanta.com/sparkica/ Github: https://github.com/sparkica/crowdsourcing

Page 23: LOD2 Webinar Series: Zemanta / Open refine

LOD2 Webinar . 29.11.2011 . Page 23 http://lod2.eu

Creating Knowledge out of Interlinked Data

Crowdsourcing extension – create job from template

Page 24: LOD2 Webinar Series: Zemanta / Open refine

LOD2 Webinar . 29.11.2011 . Page 24 http://lod2.eu

Creating Knowledge out of Interlinked Data

Crowdsourcing extension – upload data

Page 25: LOD2 Webinar Series: Zemanta / Open refine

LOD2 Webinar . 29.11.2011 . Page 25 http://lod2.eu

Creating Knowledge out of Interlinked Data

Availability of LODRefine & extensions

Page 26: LOD2 Webinar Series: Zemanta / Open refine

LOD2 Webinar . 29.11.2011 . Page 26 http://lod2.eu

Creating Knowledge out of Interlinked Data

Availability of LODRefine & extensions

Page 27: LOD2 Webinar Series: Zemanta / Open refine

LOD2 Webinar . 29.11.2011 . Page 27 http://lod2.eu

Creating Knowledge out of Interlinked Data

Demonstration

Top 50 summer books by Forbes •  Creating project •  Preparing data •  Reconciling, extending data with DBpedia Reconciliation evaulation for NHL players (links extracted from blogs) •  Create crowdsourcing job from template •  Upload data to CrowdFlower

Page 28: LOD2 Webinar Series: Zemanta / Open refine

LOD2 Webinar . 29.11.2011 . Page 28 http://lod2.eu

Creating Knowledge out of Interlinked Data

Thanks for your attention! http://lod2.eu

Contact Zemanta Celovska 32, SI-1000 Ljubljana, Slovenia

Presenter Mateja Verlic Email: [email protected] Twitter: @sparkica Skype: mverlic

LODRefine and extensions – resources LODRefine Webpage: http://code.zemanta.com/sparkica Github: https://github.com/sparkica/OpenRefine/tree/lodrefine Extensions DBpedia extension: https://github.com/sparkica/dbpedia-extension Crowdsourcing extension:

https://github.com/sparkica/crowdsourcing Refine-stats extension: https://github.com/sparkica/refine-stats Utlitities extension: https://github.com/sparkica/utilities

Other extensions – resources RDF extension Webpage: http://refine.deri.ie/ Github: https://github.com/fadmaa/grefine-rdf-extension NER extension Webpage: http://freeyourmetadata.org/named-entity-extraction/ Github: https://github.com/RubenVerborgh/Refine-NER-Extension

LOD2 project & Webinars LOD2 project: http://lod2.eu Webinar series: http://lod2.eu/BlogPost/webinar-series

OpenRefine Resources Google Group: https://groups.google.com/forum/#!forum/openrefine Github: https://github.com/OpenRefine/OpenRefine/ Wiki: https://github.com/OpenRefine/OpenRefine/wiki

Page 29: LOD2 Webinar Series: Zemanta / Open refine

LOD2 Webinar . 29.11.2011 . Page 29 http://lod2.eu

Creating Knowledge out of Interlinked Data

Credits Jingle R.E.M., Martin Kaltenböck, Florian Kondert Coordination Thomas Thurner

Martin Kaltenböck Moderation Martin Kaltenböck Presented by Mateja Verlič

Page 30: LOD2 Webinar Series: Zemanta / Open refine

LOD2 Webinar . 29.11.2011 . Page 30 http://lod2.eu

Creating Knowledge out of Interlinked Data

http://lod2.eu

Hope  you  enjoyed  staying  with  us  –  if  you  need  more  detailed  informaAon,  visit  us  at  www.lod2.eu  and  let  us  know  how  we  can  improve  to  meet  your  expectaAons!    Don’t  forget  to  register  for  our  next  webinar          26.02.  2013  –  dbPedia  Spotlight  (University  of  Mannheim)          27.03.  2013  –  CKAN  and  publicdata.eu  (Open  Knowledge  FoundaAon)    Have  a  great  day  and  don’t  forget  ...  

Page 31: LOD2 Webinar Series: Zemanta / Open refine

LOD2 Webinar . 29.11.2011 . Page 31 http://lod2.eu

Creating Knowledge out of Interlinked Data

http://lod2.eu