active cloud db at cloudcomp '10
DESCRIPTION
TRANSCRIPT
Active Cloud DB: A RESTful Software-as-a-Service for Language
Agnostic Access to Distributed Datastores
Chris Bunch Jonathan Kupferman Chandra KrintzWednesday, October 27, 2010
CloudComp 2010
1
Who’s Using NoSQL?
2
and many others!
Do It Yourself!
• Pick a datastore
• Learn how the interfaces SHOULD work
• Learn how the interfaces REALLY work
• Migrate to a non-relational data model
• each of these are non-trivial!
3
Trouble in Paradise
4
(at least they’re honest about it)
The Problem
• No way to compare databases with real applications
• No standard on what a real test is
• Too many variables in the equation
• Topology, query language, data model, APIs, consistency settings (to name a few)
5
You Need A Better Way
• Need a platform to:
• Easily evaluate datastores
• Quickly evaluate datastores
• Evaluate datastores on similar metrics
6
Our Contribution
• Active Cloud DB: A Google App Engine app that exposes the DB via REST
• Exposes string key/value DB
• Speed up repeated operations via caching
• Works on Google or AppScale
• Free access to BigTable
7
8
Realistically Speaking
• One test takes ~ 2 hours
• In one day at work you could generate a graph comparing:
• HBase
• Cassandra
• Google BigTable
• Amazon SimpleDB
9
RESTful Interface
• GET /resources/key ➜ get
• POST /resources/key (with value) ➜ put
• DELETE /resources/key ➜ delete
• GET /resources ➜ query (get all)
10
Caching Support
• Leverages Memcache API / memcached
• Provides a Least-Recently-Used Cache
• Write-through caching strategy - all puts / deletes are written to the cache
• Generational caching strategy - queries use a generation number
11
Bookstore App
• Four prototypes available that use Active Cloud DB:
• Ruby on Rails
• Ruby (through Sinatra)
• Python (via Django)
• Python (through web.py)
12
13
The Actual Code
• With BigTable:
• val = `curl -X GET http://your-app.appspot.com/resources/#{key}`
• Or in AppScale:
• val = `curl -X GET http://128.111.55.223:8080/resources/#{key}`
14
• Originally presented at CloudComp 2009
• An open-source implementation of the Google App Engine APIs
• Automatically configures and deploys cloud infrastructures to run your application
• includes database deployment
15
• Supported Datastores as of AppScale 1.4:
• HBase, Hypertable
• MySQL
• Cassandra, Voldemort, Scalaris
• MongoDB
• MemcacheDB
• Amazon SimpleDB
16
17
Not Good Enough
• AppScale / GAE solve the problem for Python and Java
• But only with certain APIs
• And with certain restrictions
• Need something general purpose
•All languages, no restrictions
18
But how do we test it?
• Cassandra 0.5.0 / MemcacheDB 1.2.1β
• Place 1000 items in the database and time:
• Get, put, query, delete operations
• Nine accessor threads
• Standard deployment model
19
20
21
22
A different type of test
• Workload model
• 10000 random operations selected
• 50/30/20 get/put/query ratio
• Constrained to 16 nodes
• Performed on initially empty database
23
24
25
26
Future Work
• Performance impact of:
• Cache size
• Millions of items in DB
• Overhead of Active Cloud DB
• Transaction support
27
Related Work
• BigTable as a Web Service
• Not open source, HBase-like API
• Yahoo Cloud Serving Benchmark[SOCC10]
• Doesn’t run applications
• No automation - you set up the DB, you set up the schemas, etc.
28
Active Cloud DB is Open for Business
• Open source - free to use
• Customize your own batch test or workload test
• Access it via any programming language
• Bookstore applications included
29
Thanks!
• Download Active Cloud DB and AppScale:
• http://appscale.cs.ucsb.edu
• To my advisor, Chandra Krintz
• To the AppScale team, especially co-lead Navraj Chohan
30