lod2 webinar series: zemanta / open refine
DESCRIPTION
This webinar in the course of the LOD2 webinar series will present Zemanta and its LODRefine - a LOD-enabled version of OpenRefine (previously Google Refine), which is a part of the LOD2 stack. LODRefine extends cleansing and linking functionalities of OpenRefine by providing means to reconcile and augment your data with DBpedia or any other SPARQL endpoint, extract named entities using Zemanta API, export data in one of the RDF formats, and recently also to exploit available crowdsourcing services. In webinar we will demonstrate several task which demonstrate the ease of use and versatility of LODRefine. If you are interested in Linked (Open) Data principles and mechanisms, LOD tools & services and concrete use cases that can be realised using LOD then join us in the free LOD2 webinar series: http://lod2.eu/BlogPost/webinar-seriesTRANSCRIPT
LOD2 Webinar . 29.11.2011 . Page 1 http://lod2.eu
Creating Knowledge out of Interlinked Data
LOD2 Webinar . 29.11.2011 . Page 2 http://lod2.eu
Creating Knowledge out of Interlinked Data
http://lod2.eu
LOD2 is a large-scale integrating project co-funded by the European Commission within the FP7 Information and Communication Technologies Work Programme. This 4-year project comprises leading Linked Open Data technology researchers, companies, and service providers. Coming from across 12 countries the partners are coordinated by the Agile Knowledge Engineering and Semantic Web Research Group at the University of Leipzig, Germany. LOD2 will integrate and syndicate Linked Data with existing large-scale applications. The project shows the benefits in the scenarios of Media and Publishing, Corporate Data intranets and eGovernment.
LOD2 Webinar . 29.11.2011 . Page 3 http://lod2.eu
Creating Knowledge out of Interlinked Data
http://lod2.eu
Once per month the LOD2 webinar series offer a free webinar about tools and services along the Linked Open Data Life Cycle. Stay with us and learn more about acquisiAon, ediAng, composing, connected applicaAons – and finally publishing Linked Open Data.
LOD2 Webinar . 29.11.2011 . Page 4 http://lod2.eu
Creating Knowledge out of Interlinked Data
http://lod2.eu
LODRefine – LOD-enabled OpenRefine The tool for cleansing, linking and augmenting data by Mateja Verlic, Zemanta
LOD2 Webinar . 29.11.2011 . Page 5 http://lod2.eu
Creating Knowledge out of Interlinked Data
Zemanta brings useful content to bloggers, connect authors to their peers and publishers to marketers.
• Content research services • Content enrichment tools
Our role in LOD2 • Web scale link & text mining from unstructured data • Tools for cleansing data and crowdsourcing of cleansing
Dr. Mateja Verlič
Company
LOD2 Webinar . 29.11.2011 . Page 6 http://lod2.eu
Creating Knowledge out of Interlinked Data
Presentation outline
• Terminology briefing • Introduction to LODRefine • The core: OpenRefine • LOD-friendly extensions • Demonstration • Q & A
LOD2 Webinar . 29.11.2011 . Page 7 http://lod2.eu
Creating Knowledge out of Interlinked Data
Def: to reconcile • To reestablish a close relationship between. • To make compatible or consistent. (The Free Dictionary)
Reconciling
LOD2 Webinar . 29.11.2011 . Page 8 http://lod2.eu
Creating Knowledge out of Interlinked Data
Def: to augment • To make (something already developed or well under way) greater, as in size,
extent, or quantity (The Free Dictionary)
Augmenting / extending
LOD2 Webinar . 29.11.2011 . Page 9 http://lod2.eu
Creating Knowledge out of Interlinked Data
Def: crowdsourcing • is the act of outsourcing tasks, traditionally performed by an employee or
contractor, to an undefined, large group of people or community (a crowd), through an open call.
Crowdsourcing
LOD2 Webinar . 29.11.2011 . Page 10 http://lod2.eu
Creating Knowledge out of Interlinked Data
Introduction to LODRefine
LOD-enabled OpenRefine Google Refine ==> OpenRefine LODGrefine ==> LODRefine • Supporting DBpedia (and Freebase) • Supporting crowdsourcing • Exporting RDF • Extracting named entities
LOD2 Webinar . 29.11.2011 . Page 11 http://lod2.eu
Creating Knowledge out of Interlinked Data
LODRefine’s place in LOD life cycle
LOD2 Webinar . 29.11.2011 . Page 12 http://lod2.eu
Creating Knowledge out of Interlinked Data
OpenRefine
Cross-platform server-client application • Runs locally • No dataset Supports: • Faceted browsing • Regular expressions • GREL expressions • Extensions value.split(",")[0].strip()
LOD2 Webinar . 29.11.2011 . Page 13 http://lod2.eu
Creating Knowledge out of Interlinked Data
OpenRefine
LOD2 Webinar . 29.11.2011 . Page 14 http://lod2.eu
Creating Knowledge out of Interlinked Data
The Extensions
Extend functionalities of OpenRefine Developed by • Zemanta: DBpedia extension, Crowdsourcing • DERI: RDF Refine • Free Your Metadata Group: Named Entity Extraction extension
LOD2 Webinar . 29.11.2011 . Page 15 http://lod2.eu
Creating Knowledge out of Interlinked Data
RDF Refine extension
Reconciliation and interlinking • DBpedia • Any SPARQL Endpoint or RDF dump • Supporting for Apache Stanbol Exporting RDF • Defining graph shape before exporting • Using custom vocabularies or importing existing ones
Webpage: http://refine.deri.ie/ Github: https://github.com/fadmaa/grefine-rdf-extension
LOD2 Webinar . 29.11.2011 . Page 16 http://lod2.eu
Creating Knowledge out of Interlinked Data
RDF Refine extension - reconciling
LOD2 Webinar . 29.11.2011 . Page 17 http://lod2.eu
Creating Knowledge out of Interlinked Data
DBpedia extension
Extending reconciled data with columns from DBpedia • RDF extension recommended Extracting Named Entities using Zemanta API • API key required
Webpage: http://code.zemanta.com/sparkica Github: https://github.com/sparkica/dbpedia-extension
LOD2 Webinar . 29.11.2011 . Page 18 http://lod2.eu
Creating Knowledge out of Interlinked Data
DBpedia extension – extending data
LOD2 Webinar . 29.11.2011 . Page 19 http://lod2.eu
Creating Knowledge out of Interlinked Data
DBpedia extension – extracting entities
LOD2 Webinar . 29.11.2011 . Page 20 http://lod2.eu
Creating Knowledge out of Interlinked Data
NER extension
Extracts named entities from unstructured text Currently supports • Alchemy API • DBpedia Lookup • Zemanta API
API keys required Webpage: http://freeyourmetadata.org/named-entity-extraction/ Github: https://github.com/RubenVerborgh/Refine-NER-Extension
LOD2 Webinar . 29.11.2011 . Page 21 http://lod2.eu
Creating Knowledge out of Interlinked Data
NER extension – extracting entities
LOD2 Webinar . 29.11.2011 . Page 22 http://lod2.eu
Creating Knowledge out of Interlinked Data
Crowdsourcing extension
Support for • Creating new crowdsourcing jobs • Publishing data on CrowdFlower service • Multiple labor channels (Amazon MT) • CrowdFlower API key required Job templates • Evaluating reconciliation results • Finding information (e.g. URLs) Webpage: http://code.zemanta.com/sparkica/ Github: https://github.com/sparkica/crowdsourcing
LOD2 Webinar . 29.11.2011 . Page 23 http://lod2.eu
Creating Knowledge out of Interlinked Data
Crowdsourcing extension – create job from template
LOD2 Webinar . 29.11.2011 . Page 24 http://lod2.eu
Creating Knowledge out of Interlinked Data
Crowdsourcing extension – upload data
LOD2 Webinar . 29.11.2011 . Page 25 http://lod2.eu
Creating Knowledge out of Interlinked Data
Availability of LODRefine & extensions
LOD2 Webinar . 29.11.2011 . Page 26 http://lod2.eu
Creating Knowledge out of Interlinked Data
Availability of LODRefine & extensions
LOD2 Webinar . 29.11.2011 . Page 27 http://lod2.eu
Creating Knowledge out of Interlinked Data
Demonstration
Top 50 summer books by Forbes • Creating project • Preparing data • Reconciling, extending data with DBpedia Reconciliation evaulation for NHL players (links extracted from blogs) • Create crowdsourcing job from template • Upload data to CrowdFlower
LOD2 Webinar . 29.11.2011 . Page 28 http://lod2.eu
Creating Knowledge out of Interlinked Data
Thanks for your attention! http://lod2.eu
Contact Zemanta Celovska 32, SI-1000 Ljubljana, Slovenia
Presenter Mateja Verlic Email: [email protected] Twitter: @sparkica Skype: mverlic
LODRefine and extensions – resources LODRefine Webpage: http://code.zemanta.com/sparkica Github: https://github.com/sparkica/OpenRefine/tree/lodrefine Extensions DBpedia extension: https://github.com/sparkica/dbpedia-extension Crowdsourcing extension:
https://github.com/sparkica/crowdsourcing Refine-stats extension: https://github.com/sparkica/refine-stats Utlitities extension: https://github.com/sparkica/utilities
Other extensions – resources RDF extension Webpage: http://refine.deri.ie/ Github: https://github.com/fadmaa/grefine-rdf-extension NER extension Webpage: http://freeyourmetadata.org/named-entity-extraction/ Github: https://github.com/RubenVerborgh/Refine-NER-Extension
LOD2 project & Webinars LOD2 project: http://lod2.eu Webinar series: http://lod2.eu/BlogPost/webinar-series
OpenRefine Resources Google Group: https://groups.google.com/forum/#!forum/openrefine Github: https://github.com/OpenRefine/OpenRefine/ Wiki: https://github.com/OpenRefine/OpenRefine/wiki
LOD2 Webinar . 29.11.2011 . Page 29 http://lod2.eu
Creating Knowledge out of Interlinked Data
Credits Jingle R.E.M., Martin Kaltenböck, Florian Kondert Coordination Thomas Thurner
Martin Kaltenböck Moderation Martin Kaltenböck Presented by Mateja Verlič
LOD2 Webinar . 29.11.2011 . Page 30 http://lod2.eu
Creating Knowledge out of Interlinked Data
http://lod2.eu
Hope you enjoyed staying with us – if you need more detailed informaAon, visit us at www.lod2.eu and let us know how we can improve to meet your expectaAons! Don’t forget to register for our next webinar 26.02. 2013 – dbPedia Spotlight (University of Mannheim) 27.03. 2013 – CKAN and publicdata.eu (Open Knowledge FoundaAon) Have a great day and don’t forget ...
LOD2 Webinar . 29.11.2011 . Page 31 http://lod2.eu
Creating Knowledge out of Interlinked Data
http://lod2.eu