building the inform semantic publishing ecosystem: from author to audience

29
1 More Meaning. Better Results. 1 Building the Inform Semantic Publishing Ecosystem: from Author to Audience Marc Hadfield VP, Research & Development [email protected]

Upload: vitalai

Post on 09-May-2015

1.499 views

Category:

Technology


0 download

DESCRIPTION

ISWC Presentation for Inform Technologies

TRANSCRIPT

Page 1: Building the Inform Semantic Publishing Ecosystem: from Author to Audience

1

More Meaning. Better Results.

1

Building the Inform Semantic Publishing Ecosystem:from Author to Audience

Marc HadfieldVP, Research & [email protected]

Page 2: Building the Inform Semantic Publishing Ecosystem: from Author to Audience

2

Marc Hadfield• Semantic Technology, Computer Science

• Inform Technologies (Head of R&D)‣Semantic Technologies applied to Content Analysis & Distribution

• Alitora Systems (Co-Founder / CTO)‣Life Science Semantic Technology, Research, Big Data Analytics, Semantic HPC

‣Life Science Natural Language Processing

• Columbia Genome Center‣NLP applied to Life Science Research Articles

• LCconnect (CTO)‣Letter-of-Credit Exchange

2

Page 3: Building the Inform Semantic Publishing Ecosystem: from Author to Audience

3

Semantics in Publishing…

3

• Ongoing Theme at ISWC 2010…‣NY Times

‣Facebook (OpenGraph)

‣Elsevier

‣BBC

Page 4: Building the Inform Semantic Publishing Ecosystem: from Author to Audience

4

What is Inform?

4

• Inform is a content enrichment solution designed to increase consumer engagement, page views and revenue.

• We provide a hosted Semantic Web Service for content publishers that:1. Reads your article before you publish it

2. Turns main topics and entities (people, places, companies, organizations) into links

3. Provides feeds of related web content when you publish it

• New Direction: Optimizing Content Distribution via Direct Channels• Web users moving away from destination web sites, but still want the destination web

site content.

• Companies utilizing Inform include:

Page 5: Building the Inform Semantic Publishing Ecosystem: from Author to Audience

Connecting your content

55

Audio, Video & Blogs from the Web

Articles from the Web

Content from Inform

Your Affiliates’ Content

Your Content

Affiliated Content

YourContent

LicensedContent

Google Street View Topic 0.90

Google Company 1.00

Ireland Place 0.70

Norway Place 0.70

South Africa Place 0.70

Sweden Place 0.70

Brian McClendon Person 0.80

Mountain View, California Place 0.60

Wi-Fi Topic 0.50

Page 6: Building the Inform Semantic Publishing Ecosystem: from Author to Audience

6

Related Content Widgets

6

Page 7: Building the Inform Semantic Publishing Ecosystem: from Author to Audience

7

Inform Topic Pages, Micro Sites

7

Page 8: Building the Inform Semantic Publishing Ecosystem: from Author to Audience

8

My Job: Building the Semantic Platform…

8

• “Silo”-ed Semantic Technology Semantic Web‣Aligned with Wikipedia, Leverage Linked Data for Mash-Ups

‣RDFa, SKOS, Semantic SEO

• Semantic / NLP Engine‣ Improve Features, Quality

• Semantic Data Infrastructure‣Scalable Infrastructure

• Semantic Data Analysis‣Algorithms (Topology of Graphs), Inference

‣ “PageRank” on semantic data

• Personalization, Usage Analysis

• Micro Sites‣Clusters of Topics, Generating Rich Content Experience

• Distributing to Social Platforms‣ i.e. Facebook

Page 9: Building the Inform Semantic Publishing Ecosystem: from Author to Audience

9

Inform: Author to Audience

9

Page 10: Building the Inform Semantic Publishing Ecosystem: from Author to Audience

10

Leverage Inform Taxonomy

10

Page 11: Building the Inform Semantic Publishing Ecosystem: from Author to Audience

1111

Author ‣ Content Creation Services

‣ Semantic Data Repository

‣ Semantic Data Analysis

‣ Content Selection Algorithms

‣ Webservices

‣ Content Distribution Services

Audience

Inside theSemanticSystem Architecture

Page 12: Building the Inform Semantic Publishing Ecosystem: from Author to Audience

12

Content Creation

12

• Article Creation Tool (ACT)‣Author Tools

‣Embed in CMS, Tumblr / Wordpress Plugin

• Publisher Portal‣Editorial Tool

‣Content Feeds

• Web Crawl

• Summarizer‣Create smart “blurbs” to advertise article

• LinkedData‣Freebase, Wikipedia, DBPedia, et cetera.

Page 13: Building the Inform Semantic Publishing Ecosystem: from Author to Audience

13

ACT Tool

13

Page 14: Building the Inform Semantic Publishing Ecosystem: from Author to Audience

14

ACT Tool

14

Page 15: Building the Inform Semantic Publishing Ecosystem: from Author to Audience

15

ACT Tool, Tumblr, Wordpress

15

Page 16: Building the Inform Semantic Publishing Ecosystem: from Author to Audience

16

Publisher Portal

16

Page 17: Building the Inform Semantic Publishing Ecosystem: from Author to Audience

17

Summarizer

17

Page 18: Building the Inform Semantic Publishing Ecosystem: from Author to Audience

18

Semantic Data Repository

18

• Data Master / Data Node‣Federated Semantic Data Managers

‣SPARQL Triplestore (scalable cluster)

‣Semantic Search

‣Search Indexes (Semi-Structured and Full-Text Search)

‣Lucene/Siren (Sindice)

‣Facets, Frequency Counts

‣Cache (In-Memory)

‣Blob Store (Voldemort)

‣Listener to Activity (Flume)

‣User Activity (clicks)

‣Content Activity (content updates)

‣Near Real-Time Trends, Analysis

‣Compute Algorithms (Stored Procedures in Groovy)

‣Long Term Content Archive (offline)

Page 19: Building the Inform Semantic Publishing Ecosystem: from Author to Audience

19

Semantic Data Analysis

19

• Natural Language Processing‣Rules & Machine Learning, Training

‣500K articles per day, 4,000 unique sites

‣Text Extraction, Section/Sentence Extraction

‣Tokenization, Part-of-Speech, Noun/Verb Phrases

‣Entity Extraction, Entity Normalization

‣Topic Extraction, Summarization, Clustering

• User Activity‣User Model (Personalization)

• Semantic Inference‣F-Logic, Multi-Domain

‣Linked Data Mash-Ups

• Semantic Graph Topology‣Entity / Property Importance Metrics, Ranking, “PageRank”

‣Which triples in LinkedData are interesting?

Page 20: Building the Inform Semantic Publishing Ecosystem: from Author to Audience

20

Content Selection Algorithms

20

• Model of User, Personalization‣Social Networks provide Context

• Semantic Analysis of Content

• Algorithms‣Maximize Relevancy / Relatedness (Meets Editorial Criteria)

‣Maximize Click-Through

‣Cute Kitten vs. Engagement Issue

‣Maximize Monetization

Goal: Content Exchange

Page 21: Building the Inform Semantic Publishing Ecosystem: from Author to Audience

21

Webservices

21

• REST‣Outputs RDF / JSON Data

• Natural Language Processing‣Article to Semantic MetaData

• Related Content‣ Inputs: Content, Personalization, Algorithm

‣Articles

‣Semantic Mash-Ups

‣Topics

‣Entities

• Semantic Query, Site Search

• Storage, Content Repository

Page 22: Building the Inform Semantic Publishing Ecosystem: from Author to Audience

22

Content Distribution Services

22

• Customer Destinations (Traditional Business)‣Deep Integration

• Publisher Widgets‣Levels of Lightweight Integration

‣Example: Related-Content-Widget in JavaScript

• Inform.com‣Topic Pages

• Micro Sites‣Several Thousand Owned-and-Operated Domains/Sites, Topic Driven

• Social Networks‣Facebook

Tools:

• Semantic SEO‣RDFa, SKOS

Page 23: Building the Inform Semantic Publishing Ecosystem: from Author to Audience

23

Semantic MetaData, RDFa

23

http://inspector.sindice.com

Page 24: Building the Inform Semantic Publishing Ecosystem: from Author to Audience

24

Facebook App

24

Page 25: Building the Inform Semantic Publishing Ecosystem: from Author to Audience

25

Using Facebook OpenGraph

25

Relevancy Algorithm:

Combine:•Trending / Popular Topics•Trending / Popular Articles•Personalization “Liked” Topics•Personalization “Liked” Articles•User Profiles (“Users like you…”)

Page 26: Building the Inform Semantic Publishing Ecosystem: from Author to Audience

26

Facebook “Liked” Topics

26

Page 27: Building the Inform Semantic Publishing Ecosystem: from Author to Audience

27

Facebook Article Stream

27

Page 28: Building the Inform Semantic Publishing Ecosystem: from Author to Audience

28

Inform: Author to Audience via Semantics

28

Page 29: Building the Inform Semantic Publishing Ecosystem: from Author to Audience

29

Thanks for your attention!

29

Questions?

Contact Information:

Marc Hadfield

[email protected]