adaptive blue java nyc meetup

AdaptiveBlue @Java NYC Meetup

April 20, 2009

Alex Iskold, Founder/CEOhttp://getglue.com

Agenda

About AdaptiveBlue Glue: The Network of People and Things Glue: Building on Amazon Web Services Glue: Semantic Technology Stack

About AdaptiveBlue

Founded in 2006, based in New York

Funded by USV and RRE

Focuses on enhancing browsing experience

Launched BlueOrganizer and Glue add-ons forFirefox and SmartLinks Widgets for blogs

Get Glue. The Network ThatSticks With You.

http://getglue.com

What is Glue?

Glue is a contextual network that usessemantic technology to automaticallyconnect people around everyday things -books, music, movies, stars, artists,stocks, wine, restaurants and more.

1. Contextual: Glue is distributed and appearswhen it makes sense on popular sites.

2. Automatic: Users participate in Glue just bybrowsing their favorite sites.

3. Simple: Glue removes the friction involvedin networking - the network comes to you.

Glue Demo

Glue:Building on Amazon

Web Services

AWS-based Architecture

Host N (EC2)

Amazon SimpleDB

Interactions betweenPeople and Things

Rackspace MySQL

User accountsAnalytics

Glue Web Service

Host 1 (EC2)

Glue Web Service

Batch Services

. . .

Amazon S3

Object Database/People Profiles

Database Layer

Web Service Layer

Load Balancer 1 Load Balancer 2

Round Robin DNSLoad Balancer Layer

Browser Add-Ons Widgets iPhones Facebook Apps API Clients

Client Layer

Batch Services

AdaptiveBlue AWS Stack

Relating People and Things ( SimpleDB )

Records of people’s interactions around things are stored in SimpleDB Domains using duplication for fast access.

Transactional and Batch Support ( EC2 )

Web Service Requests and batches are distributed through EC2 instances.

Storing Object Meta Data ( S3 )

XML representation of millions of books, music, movies, etc. is stored using Amazon S3

Client

Idea:

Create flat database with auto-indexed tables.

Main Features:

Each attribute is indexed. Record structure is flexible. Basic operators in queries Supports sorting.

Simple DB DomainRecord 1

Put recordGet recordQuery records

Key1 Attributes: A1,A2…

Record NKey2 Attributes: A1,A2…

…

Amazon SimpleDB in a Nutshell

Object Domains

Interaction RecordKey1 Attributes: A1,A2…

How Glue uses SimpleDB

Each record is duplicated into Object and Person Domain The Key is a combination of USER_ID and OBJECT_KEY Djb2hash is used to calculate the domain for each record

Records for each USER and each OBJECT inside the same domain.

OD2OD1 ODN…

People Domains

PD2PD1 PDN…

Client

Idea:

Put/Get objects into bucketsbased on unique keys.

Main Features:

Public/Private access. Support for large objects.

Amazon S3

Bucket 1 Bucket N…

Put object Get object

Amazon S3 in a Nutshell

Object Bucket

How Glue Uses S3

XML-files with object information

People Bucket

XML-files with user and friends info

XML is serialized as string and written to S3 Each file has a unique key: OBJECT_ID or USER_ID/profile, etc.

MachineImage

(OS + Apps)

Usage:

Create Machine Image Deploy the image to S3 Start 1 or more instances Use it as regular machine(s)

Main Options:

Dynamic/Static IPS Choose cores Choose locations Persistence via EBS

Amazon EC2 in the Nutshell

How Glue uses EC2

Host N (EC2/Rackspace)

Glue Web Service

Host 1 (EC2/Rackspace)

Glue Web Service

Batch Services

. . .

Load Balancer 1 Load Balancer 2

Round Robin DNS

Batch Services

Web Service processes transactional requests Batch Services are time-based & run on sets of USERS and OBJECTS

The system scales by equally partitioning Data and Requests

Glue:Semantic

Technologies Stack

Semantic Technology Stack

Concept Definition

Server-based XML schemas for things (nouns): books, music, movies, stocks, wines, recipes, etc.

Recognition Algorithms

Recognition of things in Pages, Links and Text

Identity Algorithms

Correlation of the same thing from different pages across the web.

1. XML-based: A schema file resides on theserver for each type.

2. Data Composition: Each type has attributes(i.e. book has author, etc.)

3. Extensible: New types can be plugged intothe engine dynamically.

Semantic Technology Stack:Concept Definitions

1. Key-based: Each object in the system hasunique key, depending on its type:books/kite_runner/khaled_hosseini

2. Attribute-based: Keys are based on thecombination of attributes (i.e. title/author)

3. Normalized: Multiple transformations andvalidations are applied to raw text togenerate the keys.

Semantic Technology Stack:Identity Algorithms

1. Extraction: First phase of the recognition isbased on processing elements of the page:XML-based framework for parsing DOM usedboth by Java backend and JavaScript client.

2. Cleaning: Second phase of the recognition isasynchronous query of multiple web services/API.For books we query Amazon, for movies Netflix,etc. and then normalize and merge results.

3. Caching: Clean objects are cached. Misses/false-positives are patched manually.

Semantic Technology Stack:Recognition Algorithms

http://getglue.com

http://twitter/[email protected]

adaptive blue java nyc meetup

Technology

things glue

adaptiveblue glue

amazon web services

glue addons

glue demo

s3 object bucket people

amazon ec2

web service layerhost