querying the web
DESCRIPTION
A discussion of the various ways that data on the web can be published and queried. Why SQL is not the right tool for this.TRANSCRIPT
Querying the Web
SlipstreamUSA :: April 2, 2008
Querying the Web
“Information wants to be free” Stewart Brand, Whole Earth Catalogue May 1985
“Data is the Next Intel Inside” Tim O’Reilly September 2005
“The internet is my hard drive” Bruce Schneier February 2008
Freebase
Freebase
Freebase
Freebase
Freebase
Metaweb Query Language Request:
{ "type" : "/medicine/physician",
"name" : “Michael Maher“ } Response:
{ "code": "/api/status/ok", "result": { "type": "/medicine/physician", "name": “Michael Maher", “gender”: “Male”,
“education”: “Leeds University”}
} JSON
REST
REpresentational State Transfer Less rigourous equivalent of SOAP Data are considered to be resources Every resource has a unique address Layered over http:
Client/Server separation Stateless Cacheable
Request:GET http://rest.georgejames.com/product/Serenji/
Response:Name=Serenji
Price=195.00
OrderCode=H1001
Amazon S3
S3 :: Simple Storage Service Online storage space $0.15 per Gbyte per month for storage ~ $0.20 per Gbyte data transfer
Storage request:PUT http://s3.amazonaws.com/[bucket-name]/[key-name]
Retrieval request:GET http://s3.amazonaws.com/[bucket-name]/[key-name]
Amazon SimpleDB
Storage request:https://sdb.amazonaws.com/?Action=PutAttributes &Attribute.0.Name=Color&Attribute.0.Value=Blue &Attribute.1.Name=Size&Attribute.1.Value=Med &Attribute.2.Name=Price&Attribute.2.Value=14.99 &AWSAccessKeyId=[valid access key id]&DomainName=MyDomain &ItemName=Item123
Retrieval request:https://sdb.amazonaws.com/ ?Action=GetAttributes &AWSAccessKeyId=[valid access key id] &DomainName=MyDomain &ItemName=Item123
Retrieval response:<GetAttributesResult><Attribute><Name>Color</Name><Value>Blue</Value></Attribute> <Attribute><Name>Size</Name><Value>Med</Value></Attribute> <Attribute><Name>Price</Name><Value>14.99</Value></Attribute> </GetAttributesResult>
Astoria
Astoria in action
Request:http://astoria.sandbox.live.com/northwind/northwind.rse/Categories
Response:
Astoria in action
Request:http://astoria.sandbox.live.com/northwind/northwind.rse/Customers
Response:
Astoria in action
Request:/Customers[FRANK]
Response:
Astoria in action
Request:/Customers[FRANK]/Orders
Response:
Astoria in action A variety of response formats:
POX Web3S (Web, Structured, Schema’d and Searchable) ATOM JSON
JSON request:/Customers[FRANK]?$format=json
Response:
Where is all this information going to come from?
Crowdsourcing
Jeff Howe, Wired Magazine, June 2006 Delegating an activity to a large number of
unidentified individuals Small finite tasks Quantity more important than quality The sum is greater than the parts Examples:
Wikipedia
Crowdsourcing
Crowdsourcing
Google Maps
Google Maps
Crowdsourcing
Jeff Howe, June 2006, Wired Magazine Delegating an activity to a large number of unidentified
individuals Small finite tasks Quantity more important than quality The sum is greater than the parts
Examples: Wikipedia Galaxy Zoo Amazon Mechanical Turk Google route planner
Consequences: Drives down the cost of data Ownership may not be the traditional incubents Client / user needs to discriminate
What does this mean for you?
Data Provider Publish data via simple APIs You data may have unexpected value Innovative usage Usage can enhance the quality of your data
Data Consumer Many potential data sources Explosive growth in available data Quality of the data is potentially lower …but is outweighed by quantity and richness
Technical Cache database is an ideal container Dynamic / extensible data structure Weak data typing High performance and scalability
The Internet is the Database
Thank you
Questions?