scalability 09262012

30
1 Mike Miller, CoFounder, Chief Scientist @mlmilleratmit [email protected]

Upload: mike-miller

Post on 16-May-2015

246 views

Category:

Technology


1 download

DESCRIPTION

Here's a pdf of the slides from my talk at http://www.meetup.com/Seattle-Hadoop-HBase-NoSQL-Meetup/events/62000472/

TRANSCRIPT

Page 1: Scalability 09262012

1

Mike Miller, CoFounder, Chief Scientist

@[email protected]

Page 2: Scalability 09262012

Cloudant, 9-26-2012

My Background

Cloudant CoFounder, Chief Scientist

Assistant Professor, Particle Physics (U. Washington, A!liate)

Background: machine learning, analysis, big data, globally distributed systems

2

Page 3: Scalability 09262012

Cloudant, 9-26-2012

The face of big data

3

http://abstract.cs.washington.edu/~shwetak/

Page 4: Scalability 09262012

Cloudant, 9-26-2012 4

The face of big data

Page 5: Scalability 09262012

Cloudant, 9-26-2012 5

“The future is stranger and sooner than you think” Reid Hoffman, LinkedIn/Greylock

The face of big data

Page 6: Scalability 09262012

Cloudant, 9-26-2012

Perfect Storm

6

Big Data

ParallelProcessing

HTML5/JS

9M TrainedDevelopersMobile

Page 7: Scalability 09262012

Cloudant, 9-26-20127

Focus on your Applicationnot data operations

Page 8: Scalability 09262012

Cloudant, 9-26-20128

If your data is stuck in the warehouse...

... you’re losing

Page 9: Scalability 09262012

Cloudant, 9-26-2012

Founded (2009) by leading MIT data scientists

Funded by Y Combinator & Avalon

Global network of 20+ data centers -- Application Data Network (ADN)

Built on leading NoSQL standard: most durable data store on planet

10,000 users and growing.

Data Layer for the Web

Cloudant: Akamai of dynamic content9

Page 10: Scalability 09262012

Cloudant, 9-26-2012

Cloudant Product Line• Application State

Hyper-Scalable Document Store (JSON+HTTP)MVCC Secondary indexes for flexible query

• Application Data SecurityAccounts/API keys, data sharing, permission roles

• Application AnalyticsFully Integrated (Incremental) MapReduce engine

• Application SearchFully Integrated (Incremental) Lucene + Geospatial

• Application Object Storageimages, audio, video...

• Application State Distributioncloud <==> tablet <==> PC <==> mobile

10

API Compatible

Page 11: Scalability 09262012

Cloudant, 9-26-2012 11

You do this:

We give you:

Cloudant Install

That’s It

Page 12: Scalability 09262012

Cloudant, 9-26-2012

API Examples

12

Write a doc...from the browser

No client install necessary

Page 13: Scalability 09262012

Cloudant, 9-26-2012

API Examples

13

Create Secondary Indexes

Query Those indexes

Page 14: Scalability 09262012

Cloudant, 9-26-2012 14

http://examples.cloudant.com/lobby-search/_design/lookup/index.html

Page 15: Scalability 09262012

Cloudant, 9-26-2012

Global Data Network

Cloudant scales within & between data centersAvailability, low-latency

15

Page 16: Scalability 09262012

Anatomy of the Data Layer

US-EAST “Node”

Single-tenant cluster

Horizontally Scalable DB• Fault tolerant• Always consistent• Schemaless (NoSQL)• Automatic sharding• Distributed, parallel analytics• Incremental, chainable

MapReduce• Full-text search

Multi-tenant cluster

PUT {document}

16

Edge Database Cluster

Disconnected Devices

AP-JPFiltered Replication &

Sync

Secondary Data Centers(for DR & distributed access)

EU-NL

Single-Tenant or Multi-Tenant

Page 17: Scalability 09262012

Cloudant, 9-26-2012 17

https://cloudant.com/blog/cloudant-labs-on-google-spanner/

Page 18: Scalability 09262012

Cloudant, 9-26-2012

Why It Matters

18

Page 20: Scalability 09262012

Cloudant, 9-26-2012

>2. Prepare For Success

20Three #1 apps, from 6 to 90 servers in weeks

Page 21: Scalability 09262012

Cloudant, 9-26-2012

>3. Scale Invariance

21

Page 22: Scalability 09262012

Cloudant, 9-26-2012

>3. Scale Invariance

22

Cloud

mobile/tablet

desktop

Goal: Megabytes to Petabytes

Page 23: Scalability 09262012

Cloudant, 9-26-2012

>3. Scale Invariance

23

‘Carry Small, Live Large’single user experience at vastly di!erent scales

Page 24: Scalability 09262012

Cloudant, 9-26-2012

>4. No Preferred Frame

24

So why do you have a global ‘write master’?

Page 25: Scalability 09262012

Cloudant, 9-26-2012

>4. No Preferred Frame

25

...establishes Continuous Pipe from Europe to US

This simple document...

Page 26: Scalability 09262012

Cloudant, 9-26-2012

>4. No Preferred Frame

26

...at the same time

And you can do the reverse...

Page 27: Scalability 09262012

Cloudant, 9-26-2012

>4. No Preferred Frame

27

Write local, live globalWhat could you do with relaxed constraints?

Page 28: Scalability 09262012

Cloudant, 9-26-2012

>4. No Preferred Frame

28

One click (continuous) Import

Time [sec]0 2000 4000 6000 8000 10000 12000 14000

Siz

e [G

B]

0

2

4

6

8

10

12

14

16

18

Data Import

Doc

Cou

nt [M

illio

n]

0

2

4

6

8

10

12

14

16

18

Data Size [GB]

Disk Size [GB]

Documents [M]

Data Import

Actual Customer DataFrance to Amsterdam

Page 29: Scalability 09262012

Cloudant, 9-26-2012

Big and Getting Bigger

29

Page 30: Scalability 09262012

Cloudant, 9-26-2012

Big and Getting Bigger• And of course, we are hiring

Languageserlang, scala, c, javascript, python, clojure, html5, iOS, Android, ruby/chef

Sample problems in the Seattle o!ce

Create file format optimized for (huge) structured time-series data

Integrate Cubism into two-tier application stack

Profile creation of 100M databases (real customer)

PIG / HIVE integration

Prototype read-in-place Hadoop connector

30