adaptive blue java nyc meetup
DESCRIPTION
Presentation of Glue, http://getglue.com, a browser addon made by AdaptiveBlue. In depth discussion of how we use Amazon Web Services and Semantic Algorithms.TRANSCRIPT
AdaptiveBlue @Java NYC Meetup
April 20, 2009
Alex Iskold, Founder/CEOhttp://getglue.com
Agenda
About AdaptiveBlue Glue: The Network of People and Things Glue: Building on Amazon Web Services Glue: Semantic Technology Stack
About AdaptiveBlue
Founded in 2006, based in New York
Funded by USV and RRE
Focuses on enhancing browsing experience
Launched BlueOrganizer and Glue add-ons forFirefox and SmartLinks Widgets for blogs
Get Glue. The Network ThatSticks With You.
http://getglue.com
What is Glue?
Glue is a contextual network that usessemantic technology to automaticallyconnect people around everyday things -books, music, movies, stars, artists,stocks, wine, restaurants and more.
1. Contextual: Glue is distributed and appearswhen it makes sense on popular sites.
2. Automatic: Users participate in Glue just bybrowsing their favorite sites.
3. Simple: Glue removes the friction involvedin networking - the network comes to you.
Glue Demo
Glue:Building on Amazon
Web Services
AWS-based Architecture
Host N (EC2)
Amazon SimpleDB
Interactions betweenPeople and Things
Rackspace MySQL
User accountsAnalytics
Glue Web Service
Host 1 (EC2)
Glue Web Service
Batch Services
. . .
Amazon S3
Object Database/People Profiles
Database Layer
Web Service Layer
Load Balancer 1 Load Balancer 2
Round Robin DNSLoad Balancer Layer
Browser Add-Ons Widgets iPhones Facebook Apps API Clients
Client Layer
Batch Services
AdaptiveBlue AWS Stack
Relating People and Things ( SimpleDB )
Records of people’s interactions around things are stored in SimpleDB Domains using duplication for fast access.
Transactional and Batch Support ( EC2 )
Web Service Requests and batches are distributed through EC2 instances.
Storing Object Meta Data ( S3 )
XML representation of millions of books, music, movies, etc. is stored using Amazon S3
Client
Idea:
Create flat database with auto-indexed tables.
Main Features:
Each attribute is indexed. Record structure is flexible. Basic operators in queries Supports sorting.
Simple DB DomainRecord 1
Put recordGet recordQuery records
Key1 Attributes: A1,A2…
Record NKey2 Attributes: A1,A2…
…
Amazon SimpleDB in a Nutshell
Object Domains
Interaction RecordKey1 Attributes: A1,A2…
How Glue uses SimpleDB
Each record is duplicated into Object and Person Domain The Key is a combination of USER_ID and OBJECT_KEY Djb2hash is used to calculate the domain for each record
Records for each USER and each OBJECT inside the same domain.
OD2OD1 ODN…
People Domains
PD2PD1 PDN…
Client
Idea:
Put/Get objects into bucketsbased on unique keys.
Main Features:
Public/Private access. Support for large objects.
Amazon S3
Bucket 1 Bucket N…
Put object Get object
Amazon S3 in a Nutshell
Object Bucket
How Glue Uses S3
XML-files with object information
People Bucket
XML-files with user and friends info
XML is serialized as string and written to S3 Each file has a unique key: OBJECT_ID or USER_ID/profile, etc.
MachineImage
(OS + Apps)
Usage:
Create Machine Image Deploy the image to S3 Start 1 or more instances Use it as regular machine(s)
Main Options:
Dynamic/Static IPS Choose cores Choose locations Persistence via EBS
Amazon EC2 in the Nutshell
How Glue uses EC2
Host N (EC2/Rackspace)
Glue Web Service
Host 1 (EC2/Rackspace)
Glue Web Service
Batch Services
. . .
Load Balancer 1 Load Balancer 2
Round Robin DNS
Batch Services
Web Service processes transactional requests Batch Services are time-based & run on sets of USERS and OBJECTS
The system scales by equally partitioning Data and Requests
Glue:Semantic
Technologies Stack
Semantic Technology Stack
Concept Definition
Server-based XML schemas for things (nouns): books, music, movies, stocks, wines, recipes, etc.
Recognition Algorithms
Recognition of things in Pages, Links and Text
Identity Algorithms
Correlation of the same thing from different pages across the web.
1. XML-based: A schema file resides on theserver for each type.
2. Data Composition: Each type has attributes(i.e. book has author, etc.)
3. Extensible: New types can be plugged intothe engine dynamically.
Semantic Technology Stack:Concept Definitions
1. Key-based: Each object in the system hasunique key, depending on its type:books/kite_runner/khaled_hosseini
2. Attribute-based: Keys are based on thecombination of attributes (i.e. title/author)
3. Normalized: Multiple transformations andvalidations are applied to raw text togenerate the keys.
Semantic Technology Stack:Identity Algorithms
1. Extraction: First phase of the recognition isbased on processing elements of the page:XML-based framework for parsing DOM usedboth by Java backend and JavaScript client.
2. Cleaning: Second phase of the recognition isasynchronous query of multiple web services/API.For books we query Amazon, for movies Netflix,etc. and then normalize and merge results.
3. Caching: Clean objects are cached. Misses/false-positives are patched manually.
Semantic Technology Stack:Recognition Algorithms
http://getglue.com
http://twitter/[email protected]