open calais release 4.0

Post on 28-Nov-2014

4.478 Views

Category:

Education

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

A brief, entry-level overview of version 4.0 of the Calais Web service. Calais 4.0 automatically connects publishers to the exploding ecosystem of Linked Data assets on the Web, and helps them syndicate their metadata to reach downstream readers via search engines,news aggregators, 'related stories' recommendation services and more.

TRANSCRIPT

CalaisThomson Reuters Calais Initiative: Calais 4.0 ~ January, 14, 2009

Thomas (“Tom”) Tague and Krista Thomas

Overview

• Going to discuss five basic topics

– What is Calais?

– Why we’re doing it & what our goals are

– How it works / What’s under the hood?

– A few examples

– Where it’s headed

Calais? What’s Calais?

As seen from U.K & the Continent

As seen from North America

As seen by us

Calais? What’s Calais?

• A semantic metadata generation service that extracts entities, facts and events from unstructured text

• Creates linkages from extracted entities to linked data ecosystem

• Provides a transportation layer for rich semantic metadata from producers to consumers

• Details to follow….

Why We’re Doing It

• Two simple answers:

– Hyper-evolution of capabilities – better, faster, stronger

– The walled garden content world

Our Goals / The Capabilities We Want to Deploy

• Let’s state them here and then walk through why we have these goals

1. Derive semantic metadata from textual assets

2. Use that semantic metadata to create entry points into the linked data ecosystem

3. Provide a simple mechanism for the sharing of semantic metadata about textual content assets

1: Semantics from Text: The Text Problem

• People consume text

• Most of it isn’t semantically enabled

• Most of it won’t be semantically enabled

• This isn’t about standards – microfromats vs RDFa vs whatever.

• Why: Latency, cost and short shelf-life

1: Semantics from Text: The Text Problem

• Target areas where:– The economics

don’t support metadata creation

– The value of metadata is potentially high

– The value of aggregated metadata is potentially extremely high

Seco

nds

Year

s

Seconds

Years

Tweets

Blogs

News

Scient. Pubs

Great Novels

Latency

Sh

elf

Lif

e

2: Getting from Text to the Linked Data Ecosystem

The Linked Data Cloud

3: Semantic Metadata Transport Layer

• I’m a content producer. We’ve loaded the car with rich semantic metadata

– I’m sharing it within my four walls

– How do I transport it to my consumers?

– RSS / Atom, XML, Proprietary data feeds, Content API’s

How it Works – Under the Hood of Calais

How it Works – Under the Hood of Calais

Calais Web Service

ClearForest NLP Engine

Rule Base

Lexicons

RDFDisambig.

Engine

Reference Data Assets

Metadata Management

Document Level

Metadata

Entity Level Linked Data

and …

Output Formatting

Stat Tools

How You Can Use It – the SemHead version

• Send unstructured text– Get back document categorization, entities, facts and

events – with document and entity level URI’s

• Syndicate Metadata– Send unstructured text

– Share /syndicate the document GUID

• Access Endpoints– Use entity level URI

– Access entity level Linked Data endpoints & TR Content

Entities, Facts & Events• Anniversary, City, Company, Continent,

Country, Currency, EmailAddress, EntertainmentAwardEvent, Facility, FaxNumber, Holiday, IndustryTerm, MarketIndex, MedicalCondition, MedicalTreatment, Movie, MusicAlbum, MusicGroup, NaturalDisaster, NaturalFeature, OperatingSystem, Organization, Person, PhoneNumber, Product, ProgrammingLanguage, ProvinceOrState, PublishedMedium, RadioProgram, RadioStation, Region, SportsEvent, SportsGame, SportsLeague, Technology, TVShow, TVStation, URL

• Acquisition, Alliance, AnalystEarningsEstimate, AnalystRecommendation, Bankruptcy, BonusShares, BusinessRelation, Buybacks, CompanyAffiliates, CompanyCustomer, CompanyEarningsAnnouncement, CompanyEarningsGuidance, CompanyInvestment, CompanyLegalIssues, CompanyLocation, CompanyMeeting, CompanyReorganization, CompanyTechnology, CompanyTicker, ConferenceCall, CreditRating, EmploymentRelation, FamilyRelation, FDAPhase, IPO, JointVenture, ManagementChange, Merger, MovieRelease, MusicAlbumRelease, PatentFiling, PatentIssuance, PersonAttributes, PersonCommunication, PersonEducation, PersonEmailAddress, PersonPolitical, PersonPoliticalPast, PersonProfessional, PersonProfessionalPast, PersonRelation, PersonTravel, Quotation, SecondaryIssuance, StockSplit

Extending Calais’ Reach

More than just a web service – a growing collection of tools

and applications to make it valuable in the real world

Calais

BrowserExtensions

Gnosis

Content Management Tools

WordPress

Drupal

UIMA

Development Tools & Libraries

PHP

Ruby

JAVA

.NET

Applications

And more…

TopBraid

RSS Tagger

Powerhouse

LinkedFacts

Wirecatch

FeedShaver

Calais progress to date

• Launched in late January, 2008

• 9,000 developers have joined OpenCalais.com

• Approx. 1 million content ‘transactions’ per day

• Delivered four major update releases

• Lots of interesting apps– The Mail & Guardian Online (http://www.mg.co.za/)

– www.powerhousemuseum.com

– Gist.whistlehog.com

– http://www.semanticproxy.com

Example: The Mail & Guardian Online, South African Newspaper

Using Calais to metatag new and historical articles, and:1. Build an index or topics A-Z2. Pull out automatic related articles or pictures3. Create news alerts on companies or people 4. Pull up maps for the countries named in articles5. Predict readers’ interests based on browsing habits 6. Create tag clouds, showing popular subjects, people,

etc.

Using Calais to optimize search and navigation; drive consumer engagement

Example: Gist - today’s news filtered by people, places & events

GIST uses Calais to prioritize stories, rank newsmakers & reveal trends / reader demand. It automatically aggregates multiple news sources and slots them into topic.

Example: The Powerhouse Museum in Sydney

Using Calais to tag historical archives & using tags as search terms

Example: IT Healthcare News

Using Calais to surface ambient “related content”

Examples

• Those are examples of first generation uses. Some of what we’re seeing in the pipeline:

– Social Resume analysis

– Investigative Journalism*

– Museum metadata coalitions

Investigative Journalism

FOIA Contract Documents

Calais Web Service

Company:PersonFamilyRelation

News Calais Web Service

Company:ContractCompany:Affiliation

Big Fuzzy Graph

What’s new in Release 4?

• Release 4 – What’s New?

– Linked data for approximately 25 entities

– A start at Thomson Reuters contributed content

– Metadata hosting and transport

– Basic French

– Published RDFS Ontology

– New entities / relationships• Products

• Competitive intelligence

• Expanded document level categorization

What’s in the Pipeline?

• 2009 (this is a fuzzy list)

– Person disambiguation @ domain level?

– Other disambiguation

– Dramatic expansion of endpoints (entities & events)

– Calais as hub

– Exposure of the IDE?

– User managed lexicons

– Languages

– Opt-in SPARQL Endpoint?

• www.opencalais.com

– Gallery – code and applications examples

– Forums

– Documentation

top related