scalability 09262012
DESCRIPTION
Here's a pdf of the slides from my talk at http://www.meetup.com/Seattle-Hadoop-HBase-NoSQL-Meetup/events/62000472/TRANSCRIPT
Cloudant, 9-26-2012
My Background
Cloudant CoFounder, Chief Scientist
Assistant Professor, Particle Physics (U. Washington, A!liate)
Background: machine learning, analysis, big data, globally distributed systems
2
Cloudant, 9-26-2012
The face of big data
3
http://abstract.cs.washington.edu/~shwetak/
Cloudant, 9-26-2012 4
The face of big data
Cloudant, 9-26-2012 5
“The future is stranger and sooner than you think” Reid Hoffman, LinkedIn/Greylock
The face of big data
Cloudant, 9-26-2012
Perfect Storm
6
Big Data
ParallelProcessing
HTML5/JS
9M TrainedDevelopersMobile
Cloudant, 9-26-20127
Focus on your Applicationnot data operations
Cloudant, 9-26-20128
If your data is stuck in the warehouse...
... you’re losing
Cloudant, 9-26-2012
Founded (2009) by leading MIT data scientists
Funded by Y Combinator & Avalon
Global network of 20+ data centers -- Application Data Network (ADN)
Built on leading NoSQL standard: most durable data store on planet
10,000 users and growing.
Data Layer for the Web
Cloudant: Akamai of dynamic content9
Cloudant, 9-26-2012
Cloudant Product Line• Application State
Hyper-Scalable Document Store (JSON+HTTP)MVCC Secondary indexes for flexible query
• Application Data SecurityAccounts/API keys, data sharing, permission roles
• Application AnalyticsFully Integrated (Incremental) MapReduce engine
• Application SearchFully Integrated (Incremental) Lucene + Geospatial
• Application Object Storageimages, audio, video...
• Application State Distributioncloud <==> tablet <==> PC <==> mobile
10
API Compatible
Cloudant, 9-26-2012 11
You do this:
We give you:
Cloudant Install
That’s It
Cloudant, 9-26-2012
API Examples
12
Write a doc...from the browser
No client install necessary
Cloudant, 9-26-2012
API Examples
13
Create Secondary Indexes
Query Those indexes
Cloudant, 9-26-2012 14
http://examples.cloudant.com/lobby-search/_design/lookup/index.html
Cloudant, 9-26-2012
Global Data Network
Cloudant scales within & between data centersAvailability, low-latency
15
Anatomy of the Data Layer
US-EAST “Node”
Single-tenant cluster
Horizontally Scalable DB• Fault tolerant• Always consistent• Schemaless (NoSQL)• Automatic sharding• Distributed, parallel analytics• Incremental, chainable
MapReduce• Full-text search
Multi-tenant cluster
PUT {document}
16
Edge Database Cluster
Disconnected Devices
AP-JPFiltered Replication &
Sync
Secondary Data Centers(for DR & distributed access)
EU-NL
Single-Tenant or Multi-Tenant
Cloudant, 9-26-2012 17
https://cloudant.com/blog/cloudant-labs-on-google-spanner/
Cloudant, 9-26-2012
Why It Matters
18
Cloudant, 9-26-2012
>1. Visualization Wins
19http://sosolimited.com/blog/2012/07/from-tweets-to-lightshow/
Cloudant, 9-26-2012
>2. Prepare For Success
20Three #1 apps, from 6 to 90 servers in weeks
Cloudant, 9-26-2012
>3. Scale Invariance
21
Cloudant, 9-26-2012
>3. Scale Invariance
22
Cloud
mobile/tablet
desktop
Goal: Megabytes to Petabytes
Cloudant, 9-26-2012
>3. Scale Invariance
23
‘Carry Small, Live Large’single user experience at vastly di!erent scales
Cloudant, 9-26-2012
>4. No Preferred Frame
24
So why do you have a global ‘write master’?
Cloudant, 9-26-2012
>4. No Preferred Frame
25
...establishes Continuous Pipe from Europe to US
This simple document...
Cloudant, 9-26-2012
>4. No Preferred Frame
26
...at the same time
And you can do the reverse...
Cloudant, 9-26-2012
>4. No Preferred Frame
27
Write local, live globalWhat could you do with relaxed constraints?
Cloudant, 9-26-2012
>4. No Preferred Frame
28
One click (continuous) Import
Time [sec]0 2000 4000 6000 8000 10000 12000 14000
Siz
e [G
B]
0
2
4
6
8
10
12
14
16
18
Data Import
Doc
Cou
nt [M
illio
n]
0
2
4
6
8
10
12
14
16
18
Data Size [GB]
Disk Size [GB]
Documents [M]
Data Import
Actual Customer DataFrance to Amsterdam
Cloudant, 9-26-2012
Big and Getting Bigger
29
Cloudant, 9-26-2012
Big and Getting Bigger• And of course, we are hiring
Languageserlang, scala, c, javascript, python, clojure, html5, iOS, Android, ruby/chef
Sample problems in the Seattle o!ce
Create file format optimized for (huge) structured time-series data
Integrate Cubism into two-tier application stack
Profile creation of 100M databases (real customer)
PIG / HIVE integration
Prototype read-in-place Hadoop connector
30