Download - Open Calais Release 4.0
![Page 1: Open Calais Release 4.0](https://reader034.vdocuments.site/reader034/viewer/2022042606/547babafb4af9fa5158b4ec1/html5/thumbnails/1.jpg)
CalaisThomson Reuters Calais Initiative: Calais 4.0 ~ January, 14, 2009
Thomas (“Tom”) Tague and Krista Thomas
![Page 2: Open Calais Release 4.0](https://reader034.vdocuments.site/reader034/viewer/2022042606/547babafb4af9fa5158b4ec1/html5/thumbnails/2.jpg)
Overview
• Going to discuss five basic topics
– What is Calais?
– Why we’re doing it & what our goals are
– How it works / What’s under the hood?
– A few examples
– Where it’s headed
![Page 3: Open Calais Release 4.0](https://reader034.vdocuments.site/reader034/viewer/2022042606/547babafb4af9fa5158b4ec1/html5/thumbnails/3.jpg)
Calais? What’s Calais?
As seen from U.K & the Continent
As seen from North America
As seen by us
![Page 4: Open Calais Release 4.0](https://reader034.vdocuments.site/reader034/viewer/2022042606/547babafb4af9fa5158b4ec1/html5/thumbnails/4.jpg)
Calais? What’s Calais?
• A semantic metadata generation service that extracts entities, facts and events from unstructured text
• Creates linkages from extracted entities to linked data ecosystem
• Provides a transportation layer for rich semantic metadata from producers to consumers
• Details to follow….
![Page 5: Open Calais Release 4.0](https://reader034.vdocuments.site/reader034/viewer/2022042606/547babafb4af9fa5158b4ec1/html5/thumbnails/5.jpg)
Why We’re Doing It
• Two simple answers:
– Hyper-evolution of capabilities – better, faster, stronger
– The walled garden content world
![Page 6: Open Calais Release 4.0](https://reader034.vdocuments.site/reader034/viewer/2022042606/547babafb4af9fa5158b4ec1/html5/thumbnails/6.jpg)
Our Goals / The Capabilities We Want to Deploy
• Let’s state them here and then walk through why we have these goals
1. Derive semantic metadata from textual assets
2. Use that semantic metadata to create entry points into the linked data ecosystem
3. Provide a simple mechanism for the sharing of semantic metadata about textual content assets
![Page 7: Open Calais Release 4.0](https://reader034.vdocuments.site/reader034/viewer/2022042606/547babafb4af9fa5158b4ec1/html5/thumbnails/7.jpg)
1: Semantics from Text: The Text Problem
• People consume text
• Most of it isn’t semantically enabled
• Most of it won’t be semantically enabled
• This isn’t about standards – microfromats vs RDFa vs whatever.
• Why: Latency, cost and short shelf-life
![Page 8: Open Calais Release 4.0](https://reader034.vdocuments.site/reader034/viewer/2022042606/547babafb4af9fa5158b4ec1/html5/thumbnails/8.jpg)
1: Semantics from Text: The Text Problem
• Target areas where:– The economics
don’t support metadata creation
– The value of metadata is potentially high
– The value of aggregated metadata is potentially extremely high
Seco
nds
Year
s
Seconds
Years
Tweets
Blogs
News
Scient. Pubs
Great Novels
Latency
Sh
elf
Lif
e
![Page 9: Open Calais Release 4.0](https://reader034.vdocuments.site/reader034/viewer/2022042606/547babafb4af9fa5158b4ec1/html5/thumbnails/9.jpg)
2: Getting from Text to the Linked Data Ecosystem
![Page 10: Open Calais Release 4.0](https://reader034.vdocuments.site/reader034/viewer/2022042606/547babafb4af9fa5158b4ec1/html5/thumbnails/10.jpg)
The Linked Data Cloud
![Page 11: Open Calais Release 4.0](https://reader034.vdocuments.site/reader034/viewer/2022042606/547babafb4af9fa5158b4ec1/html5/thumbnails/11.jpg)
3: Semantic Metadata Transport Layer
• I’m a content producer. We’ve loaded the car with rich semantic metadata
– I’m sharing it within my four walls
– How do I transport it to my consumers?
– RSS / Atom, XML, Proprietary data feeds, Content API’s
![Page 12: Open Calais Release 4.0](https://reader034.vdocuments.site/reader034/viewer/2022042606/547babafb4af9fa5158b4ec1/html5/thumbnails/12.jpg)
How it Works – Under the Hood of Calais
![Page 13: Open Calais Release 4.0](https://reader034.vdocuments.site/reader034/viewer/2022042606/547babafb4af9fa5158b4ec1/html5/thumbnails/13.jpg)
How it Works – Under the Hood of Calais
Calais Web Service
ClearForest NLP Engine
Rule Base
Lexicons
RDFDisambig.
Engine
Reference Data Assets
Metadata Management
Document Level
Metadata
Entity Level Linked Data
and …
Output Formatting
Stat Tools
![Page 14: Open Calais Release 4.0](https://reader034.vdocuments.site/reader034/viewer/2022042606/547babafb4af9fa5158b4ec1/html5/thumbnails/14.jpg)
How You Can Use It – the SemHead version
• Send unstructured text– Get back document categorization, entities, facts and
events – with document and entity level URI’s
• Syndicate Metadata– Send unstructured text
– Share /syndicate the document GUID
• Access Endpoints– Use entity level URI
– Access entity level Linked Data endpoints & TR Content
![Page 15: Open Calais Release 4.0](https://reader034.vdocuments.site/reader034/viewer/2022042606/547babafb4af9fa5158b4ec1/html5/thumbnails/15.jpg)
Entities, Facts & Events• Anniversary, City, Company, Continent,
Country, Currency, EmailAddress, EntertainmentAwardEvent, Facility, FaxNumber, Holiday, IndustryTerm, MarketIndex, MedicalCondition, MedicalTreatment, Movie, MusicAlbum, MusicGroup, NaturalDisaster, NaturalFeature, OperatingSystem, Organization, Person, PhoneNumber, Product, ProgrammingLanguage, ProvinceOrState, PublishedMedium, RadioProgram, RadioStation, Region, SportsEvent, SportsGame, SportsLeague, Technology, TVShow, TVStation, URL
• Acquisition, Alliance, AnalystEarningsEstimate, AnalystRecommendation, Bankruptcy, BonusShares, BusinessRelation, Buybacks, CompanyAffiliates, CompanyCustomer, CompanyEarningsAnnouncement, CompanyEarningsGuidance, CompanyInvestment, CompanyLegalIssues, CompanyLocation, CompanyMeeting, CompanyReorganization, CompanyTechnology, CompanyTicker, ConferenceCall, CreditRating, EmploymentRelation, FamilyRelation, FDAPhase, IPO, JointVenture, ManagementChange, Merger, MovieRelease, MusicAlbumRelease, PatentFiling, PatentIssuance, PersonAttributes, PersonCommunication, PersonEducation, PersonEmailAddress, PersonPolitical, PersonPoliticalPast, PersonProfessional, PersonProfessionalPast, PersonRelation, PersonTravel, Quotation, SecondaryIssuance, StockSplit
![Page 16: Open Calais Release 4.0](https://reader034.vdocuments.site/reader034/viewer/2022042606/547babafb4af9fa5158b4ec1/html5/thumbnails/16.jpg)
Extending Calais’ Reach
More than just a web service – a growing collection of tools
and applications to make it valuable in the real world
Calais
BrowserExtensions
Gnosis
Content Management Tools
WordPress
Drupal
UIMA
Development Tools & Libraries
PHP
Ruby
JAVA
.NET
Applications
And more…
TopBraid
RSS Tagger
Powerhouse
LinkedFacts
Wirecatch
FeedShaver
![Page 17: Open Calais Release 4.0](https://reader034.vdocuments.site/reader034/viewer/2022042606/547babafb4af9fa5158b4ec1/html5/thumbnails/17.jpg)
Calais progress to date
• Launched in late January, 2008
• 9,000 developers have joined OpenCalais.com
• Approx. 1 million content ‘transactions’ per day
• Delivered four major update releases
• Lots of interesting apps– The Mail & Guardian Online (http://www.mg.co.za/)
– www.powerhousemuseum.com
– Gist.whistlehog.com
– http://www.semanticproxy.com
![Page 18: Open Calais Release 4.0](https://reader034.vdocuments.site/reader034/viewer/2022042606/547babafb4af9fa5158b4ec1/html5/thumbnails/18.jpg)
Example: The Mail & Guardian Online, South African Newspaper
Using Calais to metatag new and historical articles, and:1. Build an index or topics A-Z2. Pull out automatic related articles or pictures3. Create news alerts on companies or people 4. Pull up maps for the countries named in articles5. Predict readers’ interests based on browsing habits 6. Create tag clouds, showing popular subjects, people,
etc.
Using Calais to optimize search and navigation; drive consumer engagement
![Page 19: Open Calais Release 4.0](https://reader034.vdocuments.site/reader034/viewer/2022042606/547babafb4af9fa5158b4ec1/html5/thumbnails/19.jpg)
Example: Gist - today’s news filtered by people, places & events
GIST uses Calais to prioritize stories, rank newsmakers & reveal trends / reader demand. It automatically aggregates multiple news sources and slots them into topic.
![Page 20: Open Calais Release 4.0](https://reader034.vdocuments.site/reader034/viewer/2022042606/547babafb4af9fa5158b4ec1/html5/thumbnails/20.jpg)
Example: The Powerhouse Museum in Sydney
Using Calais to tag historical archives & using tags as search terms
![Page 21: Open Calais Release 4.0](https://reader034.vdocuments.site/reader034/viewer/2022042606/547babafb4af9fa5158b4ec1/html5/thumbnails/21.jpg)
Example: IT Healthcare News
Using Calais to surface ambient “related content”
![Page 22: Open Calais Release 4.0](https://reader034.vdocuments.site/reader034/viewer/2022042606/547babafb4af9fa5158b4ec1/html5/thumbnails/22.jpg)
Examples
• Those are examples of first generation uses. Some of what we’re seeing in the pipeline:
– Social Resume analysis
– Investigative Journalism*
– Museum metadata coalitions
![Page 23: Open Calais Release 4.0](https://reader034.vdocuments.site/reader034/viewer/2022042606/547babafb4af9fa5158b4ec1/html5/thumbnails/23.jpg)
Investigative Journalism
FOIA Contract Documents
Calais Web Service
Company:PersonFamilyRelation
News Calais Web Service
Company:ContractCompany:Affiliation
Big Fuzzy Graph
![Page 24: Open Calais Release 4.0](https://reader034.vdocuments.site/reader034/viewer/2022042606/547babafb4af9fa5158b4ec1/html5/thumbnails/24.jpg)
What’s new in Release 4?
• Release 4 – What’s New?
– Linked data for approximately 25 entities
– A start at Thomson Reuters contributed content
– Metadata hosting and transport
– Basic French
– Published RDFS Ontology
– New entities / relationships• Products
• Competitive intelligence
• Expanded document level categorization
![Page 25: Open Calais Release 4.0](https://reader034.vdocuments.site/reader034/viewer/2022042606/547babafb4af9fa5158b4ec1/html5/thumbnails/25.jpg)
What’s in the Pipeline?
• 2009 (this is a fuzzy list)
– Person disambiguation @ domain level?
– Other disambiguation
– Dramatic expansion of endpoints (entities & events)
– Calais as hub
– Exposure of the IDE?
– User managed lexicons
– Languages
– Opt-in SPARQL Endpoint?
![Page 26: Open Calais Release 4.0](https://reader034.vdocuments.site/reader034/viewer/2022042606/547babafb4af9fa5158b4ec1/html5/thumbnails/26.jpg)
• www.opencalais.com
– Gallery – code and applications examples
– Forums
– Documentation