wordlift for digital publishers and how to create an open database of knowledge

26
Andrea Volpini @cyberandy @multilingweb - Dipartimento di Informatica, Sapienza Università di Roma 6th July 2015 WordLift for Digital Publishers

Upload: andrea-volpini

Post on 06-Aug-2015

250 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: WordLift for Digital Publishers and how to create an Open Database of Knowledge

Andrea Volpini @cyberandy

@multilingweb - Dipartimento di Informatica, Sapienza Università di Roma 6th July 2015

WordLift for Digital Publishers

Page 2: WordLift for Digital Publishers and how to create an Open Database of Knowledge

This fine event is hosted by:

@multilingweb // LIDER

future of journalism opendata

@wordliftit v3 @mico_project

Hello, I am: @cyberandy

No.8 - MARK ROTHKO

This workshop is about:

Page 3: WordLift for Digital Publishers and how to create an Open Database of Knowledge

Meet Your Audience

Page 4: WordLift for Digital Publishers and how to create an Open Database of Knowledge

Some are humans and some …are not.

Astro Boy Comic

Page 5: WordLift for Digital Publishers and how to create an Open Database of Knowledge

“Hi Stacey! Would you like me to read your favourite news?”

Page 6: WordLift for Digital Publishers and how to create an Open Database of Knowledge

“ok Hound, When will the sun rise in Japan two days before Christmas in 2021?”

Friendly, helpful and intelligent a complete new class of voice-enabled

assistants has just arrived

Page 7: WordLift for Digital Publishers and how to create an Open Database of Knowledge

Beta Testing the Apocalypse - TOM KACZYNSKI

ANTI MONEY LAUNDRY COMPLIANCE AND INVESTMENT STRATEGIES

BANKS & INVESTORS

CHECKING IF THERE ARE ON-GOING OR PAST LEGAL PROCESSES

LAW FIRMS

POLICY MAKERS

NEWS AS VALUABLE INPUT IN THE LAW MAKING PROCESS

BUSINESS CREATING BUSINESS VALUES AND TAKING DECISIONS BY READING NEWS

(Humans)…creating value with News

Page 8: WordLift for Digital Publishers and how to create an Open Database of Knowledge

Meet Your New Colleagues

Page 9: WordLift for Digital Publishers and how to create an Open Database of Knowledge

can interpret your data and turn it into meaningful, personalised content.

Associated Press announced last year that corporate earnings stories and sport stories are written automatically.

Text Generation Algorithms

Logan Ingalls / Flickr

Page 10: WordLift for Digital Publishers and how to create an Open Database of Knowledge

Analysts expect higher profit for Paychex when the company reports its fourth quarter results on Tuesday, July 1, 2014. The consensus estimate is calling for profit of 40 cents a share, reflecting a rise from 38 cents per share a year ago.

Your New Colleague…the Algorithm has just written a new piece.

Page 11: WordLift for Digital Publishers and how to create an Open Database of Knowledge

but remember… you still are

“Uniquely Human”

Pay a visit to http://nextdraft.com/

Page 12: WordLift for Digital Publishers and how to create an Open Database of Knowledge

“If our role as journalists is to help communities better organize their knowledge and themselves, then it is apparent that we are in the service business and that we must draw on many tools, including content, and place value on the relationships we build with members of our communities, which will also take many forms. Thus we are in the relationship business.”

Jeff Jarvis

Human Factor is key!

Page 13: WordLift for Digital Publishers and how to create an Open Database of Knowledge

Introducing

Page 14: WordLift for Digital Publishers and how to create an Open Database of Knowledge

MEANINGFULLY ORGANISE YOUR CONTENT

A Semantic Editor for WordPress for journalists and bloggers to:

ASSIST THE WRITING PROCESS WITH CONTEXTUAL INFORMATION

ADD STRUCTURED METADATA

ENRICH CONTENT SUGGESTING IMAGES, LINKS AND WIDGETS

RECOMMEND RELEVANT CONTENT TO READERS

BUILD AN OPEN DATASET (ENTITIES + ANNOTATIONS + CONTENT)

Page 15: WordLift for Digital Publishers and how to create an Open Database of Knowledge

ASSIST THE WRITING PROCESS WITH CONTEXTUAL INFORMATION

Fact-based information are derived from open datasets and are contextually relevant to the article. Editors can choose what datasets will be used for the enrichment.

Page 16: WordLift for Digital Publishers and how to create an Open Database of Knowledge

ENRICH CONTENT SUGGESTING IMAGES, LINKS AND WIDGETS

Relevant and free to use photos and illustrations from

the Commons community

meaningful navigation systems for internal interlinking

Page 17: WordLift for Digital Publishers and how to create an Open Database of Knowledge

Bringing to the audience an overview of all the content being written around a specific topic using the chord widget.

RECOMMEND RELEVANT CONTENT

content evolution over time

INTRODUCING THE NAVIGATOR WIDGET

WHERE /entity/earthWHO /entity/michael-caineschema:Person

schema:Place

schema:Organisation WHO /entity/nasa

type: /BlogPosting /2015/07/04/coopers-endurance-crew/

Creates links to entity pages and related articles by using the WHO, WHERE, WHAT and WHEN classifications.

Page 18: WordLift for Digital Publishers and how to create an Open Database of Knowledge

ADD STRUCTURED METADATA

The blog post, entities (dct:references), publishing information (schema:datePublished and schema:dateModified), the author (schema:author), and the number of comments (schema:interactionCount) are published as Linked Open Data and printed using schema.org for on-page SEO.

http://data.redlink.io/91/be2/post/Interstellar.html

Page 19: WordLift for Digital Publishers and how to create an Open Database of Knowledge

Editors identify the basic 'WHO, WHAT, WHEN and WHERE'of an article and structure information around it by creating new entities in their custom vocabulary. Content, vocabulary and annotations constitutes the publisher’s knowledge graph and can be queried via SPARQL.

BUILD AN OPEN DATASET (ENTITIES + ANNOTATIONS + CONTENT)

Page 20: WordLift for Digital Publishers and how to create an Open Database of Knowledge

(using and )How does a blog post look in the knowledge graph?

Special thanks to @dvcama :)

owl:sameAs connects entities, detected in the blog post, such as Wormhole (with the same entity on DBpedia and Freebase).

Page 21: WordLift for Digital Publishers and how to create an Open Database of Knowledge

Starting this coming September WordLift and the technologies of MICO (for cross-media analysis) are going to be used and validated by Greenpeace Italy

on their subscribers magazine website (magazine.greenpeace.it).

Let’s move now to a real-world use case where ecologists, journalists and visionaries

stand to defend the natural world and to promote peace.

Page 22: WordLift for Digital Publishers and how to create an Open Database of Knowledge

CONTENT ANALYSIS

LINKED DATA PUBLISHING

1

3

Technology Stack

Text

Legacy Data

Audio/Images

CONTENT DISCOVERY2

MICO is a 3yrs EU-funded research project (grant no. 610480) that brings to the platform

Cross-Media ExtractionCross-Media Metadata Publishing

Cross-Media QueryingCross-Media Recommendation

• Enterprise Linked Data

• Content Analysis • Semantic Search • Semantic Media

Analysis and Search

Media extractors available in MICO today: Animal detection, video quality, temporal segmentation, automatic speech recognition, speech-music discrimination, face detection and audio tampering detection.

Page 23: WordLift for Digital Publishers and how to create an Open Database of Knowledge

Multimedia Retrieval Cross-Media Querying: Introducing the SPARQL extension SPARQL-MM, which adds multimedia specific features to the standard query language for the Semantic Web.

How can we help Greenpeace Italy?

• Connect videos with text using cross-media recommendations

• Provide compact contextual information for media assets

• Create new discovery path for their readers and subscribers

Spation-Temporal Object Model in SPARQL-MM

“Point me to scenes within videos where Barack Obama is standing to left of the MD of Greenpeace while talking about whale hunting”

Find out more on the SPARQL extension SPARQL-MM by reading this presentation by Thomas Kurz

Page 24: WordLift for Digital Publishers and how to create an Open Database of Knowledge

Lessons learned so far…

• The bond between data and journalism is growing stronger and even for independent news organisation like Greenpeace providing context, clarity and building relationships (and knowledge graphs) is vital

• Algorithms are great and AI has entered the newsrooms but journalists shall preserve their authorship and role when crafting content - always leave the control in the hands of humans

• Providing immediate added value in the UX of semantic apps like WordLift is key to engage journalists and not only marketers and management

• Tags don’t help organising contents and named entities are much better• Linked Data is a service NOT a technology: users want to see images,

meaningful links, recommendation and interactive widgets - they don’t care about underlying technologies like RDF and SPARQL

• Creating datasets as a side effect while editing contents helps journalists make an impact and connect with policy makers, business and other communities.

Page 25: WordLift for Digital Publishers and how to create an Open Database of Knowledge

JOIN.WORDLIFT.IT

Grazie! “[SLIDES] Creating an open database of knowledge by tagging the WHO, WHAT,

WHERE, WHEN of your contents #journalism”

Lclick to share it on Twitter!

mico-project.eu wordlift.it insideout.io

Page 26: WordLift for Digital Publishers and how to create an Open Database of Knowledge

CREDITS

Wilfried Runde of Deutsche Welle, “In Praise of Robots and Humans”

Justin Kosslyn from Google Ideas, on thinking about how journalists' work gets used

Luca Rosati from News to Experience

BBC News Labs A manifesto for structured journalism

this presentation is the result of many inspiring ideas and amazing work from media experts, journalists and technologists and here is the list:

any idea, graphics or meme belonging to us is available for sharing, copying and re-mixing under

creative commons license 3.0

This presentation and the work behind it was partially developed within the MICO project (Media in Context - European Commission 7th Framework Programme

grant agreement no: 610480).

FIND OUT MORE ABOUT OUR PRODUCTS

Video Hosting Platform Semantic Editor Semantic Search