ny freebase workshop 10 dec 2009
DESCRIPTION
Intro slides for NY Freebase Workshop on Dec 10, 2009TRANSCRIPT
FreebaseNew York Workshop
10 Dec 2009
Metaweb confidential – do not distribute
Presenters
Robert Cook
Jamie Taylor
Will Moffat
Metaweb confidential – do not distribute
Today’s Workshop
9:30 – Intro
10:30 – Prepackaged Freebase solutions
12:30 – Lunch
1:15 – Connecting your data to Freebase
2:30 – Freebase in the data service ecosystem
3:30 – Wrap up, “office hours”
Metaweb confidential – do not distribute
Agenda
Intro to Freebase
Freebase as an identity directory
The Freebase platform
Metaweb confidential – do not distribute
Metaweb
Technology company based in San Francisco
~60 person team of engineers and business people
Venture funded, with long-term outlook
Focused on Freebase.com platform
Metaweb confidential – do not distribute
Freebase is a database of entities
One entity per thing in the worldStable, long-lived identifiers
Inclusive policy
Practical dataFocus on available data
People, places, products, etc.
/en/sienna_miller
/en/sony_dsc_s750
/en/frost_nixon_2008
Data to build appsNames, images, descriptions
Dates, measurements and relationships
Metaweb confidential – do not distribute
Actresses (37,079)
Metaweb confidential – do not distribute
Football Players (16,568)
Metaweb confidential – do not distribute
Cheeses (488)
Metaweb confidential – do not distribute
Musical Instruments (1,034)
Metaweb confidential – do not distribute
Airports (11,556)
Metaweb confidential – do not distribute
TV Programs (33,630)
arrested_develop
Metaweb confidential – do not distribute
Related entities are connected, forming a graph
• ~10M entities
• ~275M facts
Current stats:
• Continuous data input, cleanup, and syncing
• ~1,800 “types”
—Celebrity
—Movie
—TV show
—Book
—Company
—Location
—Sports team
—Product
—Etc.
Metaweb confidential – do not distribute
Each entity contains rich, structured metadata
Metaweb confidential – do not distribute
Entities are language independent
Metaweb confidential – do not distribute
As a writeable graph, Freebase gets better over time
• Add (or remove) entities
• Add (or remove) metadata (facts, keys, translations, etc.)
• Extend and improve the schemas
Metaweb confidential – do not distribute
Bulk data into Freebase
15 person group dedicated to algorithmic data import, processing, and tools development
Reconciliation, reconciliation, reconciliation Critical part of everything we do
Automate wherever possible
Crowdsource for tasks requiring human judgment (semi-automated)
Pipelined, ongoing syncing with large external sources(Wikipedia, partners, etc.)
Metaweb confidential – do not distribute
Reconciliation
Guaranteeing one entity per thing in the world
Metaweb confidential – do not distribute
Reconciliation
Metaweb confidential – do not distribute
Reconciliation
Metaweb confidential – do not distribute
Reconciliation
Metaweb confidential – do not distribute
“US Politicians who have taken more than $30K from foreign companies”
Metaweb confidential – do not distribute
Freebase is open
Metaweb confidential – do not distribute
Open platform means more data
Creative Commons Attribution(CC-BY) licensing
Apps
Robust set of APIsHTTP/REST
SLAs for higher volume users (typically >100K API calls per day)
Hosted developer platform for building tools and apps on top of the data
Metaweb confidential – do not distribute
External site data and/or keys
TV episode (715,032)
The TVDB, TV Rage, etc.
Beer (3,100) The Oxford Bottled Beers Database
Metaweb confidential – do not distribute
A global community is actively improving it
Curating existing datasprocketonline
Jet Engines
spatialedHummingbirds
tfmorrisMaritime museums
Creating new data sets
Metaweb confidential – do not distribute
The community is defining new schemas
∙American football ∙Internet∙Anime/Manga ∙Language∙Architecture ∙Law∙Astronomy ∙Library∙Automotive ∙Location∙Aviation ∙Martial Arts∙Awards ∙Measurement Unit∙Baseball ∙Media Common∙Basketball ∙Medicine∙Bicycles ∙Metaweb Types∙Biology ∙Meteorology∙Boats ∙Military∙Broadcast ∙Music∙Business ∙Olympics∙Celebrities ∙Opera∙Chemistry ∙Organization∙Comics ∙People∙Common ∙Geography∙Computers ∙Projects∙Conferences ∙Protected Places∙Cricket ∙Publishing∙Data World ∙Radio∙Digicams ∙Rail∙Education ∙Religion∙Engineering ∙Royalty∙Event ∙Soccer∙Clothing and Textiles ∙Spaceflight∙Fictional Universes ∙Sports∙Film ∙Symbols∙Food & Drink ∙Tennis∙Freebase ∙Theater∙Games ∙Time∙Geology ∙Transportation∙Government ∙Travel∙Hobbies and Interests ∙TV∙Ice Hockey ∙Video Games∙Influence ∙Visual Art
Top-level domains
Metaweb confidential – do not distribute
Agenda
Intro to Freebase
Freebase as an identity directory
The Freebase platform
Metaweb confidential – do not distribute
Everybody is creating entities
Topic pages
User profiles
Artist pages
Other fans
Relevant apps
Metaweb confidential – do not distribute
Millions of users are helping them
(Movies, Celebrities, Companies, Products, etc.)
@robcook (Person) #sxsw09 (Event)
Metaweb confidential – do not distribute
Will Smith(Actor)
Freebase is connecting these entities together
/index.html?curid=154698
/name/nm0000226
/BandsAndArtists/S/Smith,_Will
willsmith.com
/artist/Will+Smith
/Will-Smith/e/B000APUOJC
/people/s/will_smith
/RoleDisplay/86971
/artist/Will+Smith
/WillSmith
/music/Will+Smith
Metaweb confidential – do not distribute
An entity directory can power
new applications
Metaweb confidential – do not distribute
1. Each film review is tagged with the corresponding movies in Freebase
TheIncredibles
(film)
Alfie(film) 2. When the pages loads,
it grabs data from Freebase (images, film info and links) to enhance the article
3. Freebase also returns links to related WSJ film reviews the user might enjoy (based on genre, director, actors, release year, etc.)
4. A Freebase search box allows the user to quickly find any film review in the WSJ archives
Example:
Metaweb confidential – do not distribute
Agenda
Intro to Freebase
Freebase as an identity directory
The Freebase platform
Metaweb confidential – do not distribute
Freebase architecture
Metaweb confidential – do not distribute
Query editor
Metaweb confidential – do not distribute
[{
"type": "/spaceflight/astronaut",
"name": null,
"/people/person/nationality": ”russia"
}]
Querying Freebase
“Russian cosmonauts”
Metaweb confidential – do not distribute
{
"type": "/meteorology/tropical_cyclone",
"name": null,
"formed>=": "1990",
"a:formed<": "2000”
}
“Tropical storms in the 90s”
Querying Freebase
Metaweb confidential – do not distribute
{
"type": "/film/actor",
"name": null,
"/people/person/gender": "female",
"/people/person/date_of_birth<=": "1939",
"/people/person/nationality": "France",
"sort": "/people/person/date_of_birth"
}
“French actresses born pre-WWII”
Querying Freebase
Metaweb confidential – do not distribute
ACRE
Server side Javascript + webpage
templating
WSJ (and other) applications developed
Advanced APIs
Code sharing – programmer ecosystem
Metaweb confidential – do not distribute
ACRE IDE
Metaweb confidential – do not distribute
Other platform services
Freebase suggest
Lucene-based topic search interface
Blob store (text, image thumbnailing)
Reconciliation service
Extended MQL
Metaweb confidential – do not distribute
www.freebase.comblog.freebase.comtwitter.com/fbase