1 an overview of cloud computing @ yahoo! raghu ramakrishnan chief scientist, audience and cloud...

68
1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many discussions with: Eric Baldeschwieler, Jay Kistler, Chuck Neerdaels, Shelton Shugar, and Raymie Stata and joint work with the Sherpa team, in particular: Brian Cooper, Utkarsh Srivastava, Adam Silberstein, Rodrigo Fonseca and Nick Puz in Y! Chuck Neerdaels, P.P. Suryanarayanan and many others in CCDI

Post on 22-Dec-2015

222 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

1

An Overview of Cloud Computing @ Yahoo!

Raghu RamakrishnanChief Scientist, Audience and Cloud Computing

Research Fellow, Yahoo! Research

Reflects many discussions with: Eric Baldeschwieler, Jay Kistler, Chuck Neerdaels, Shelton Shugar, and Raymie Stata

and joint work with the Sherpa team, in particular:Brian Cooper, Utkarsh Srivastava, Adam Silberstein, Rodrigo Fonseca and Nick Puz in Y! ResearchChuck Neerdaels, P.P. Suryanarayanan and many others in CCDI

Page 2: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

2

Questions

• What is cloud computing?– Horizontal and functional services

• What’s it going to change?– Software business models, science, life

• How many clouds will there be?– 1, 2, 3, infinity

• What’s new in cloud computing?– HPC grids, ASPs, hosted services, Multics (!)– Emerging “cloud stack” to support a broad class of

programs, including data intensive applications

Page 3: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

3

SCENARIOSPie-in-the-sky

Page 4: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

4

Living in the Clouds

• We want to start a new website, FredsList.com• Our site will provide listings of items for sale, jobs,

etc.• As time goes on, we’ll add more features

– And illustrate how more cloud capabilities (and corresponding infrastructure components) are used as needed

• List of capabilities/components is illustrative, not exhaustive

• Our cloud provides a “dataset” abstraction– FredsList doesn’t worry about the underlying components

Page 5: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

5

Step 1: Listings Scenario

Simple Web Service API’s Simple Web Service API’s

Database

PNUTS

FredsList.com application FredsList.com application

1234323, transportation, For sale: one bicycle, barely used

FredsList wants to store listings as (key, category, description)

5523442, childcare, Nanny available in San Jose

215534, wanted, Looking for issue 1 of Superman comic book

DECLARE DATASET Listings AS( ID String PRIMARY KEY,Category String,Description Text )

DECLARE DATASET Listings AS( ID String PRIMARY KEY,Category String,Description Text )

Page 6: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

6

Step 2: System Evolution

Simple Web Service API’s Simple Web Service API’s

Database

PNUTS

FredsList.com application FredsList.com application

1234323, transportation, For sale: one bicycle, barely used

Fred belatedly realizes prices are useful information!

5523442, childcare, Nanny available in San Jose

215534, wanted, Looking for issue 1 of Superman comic book

ALTER DATASET ListingsADD (Price Float)

ALTER DATASET ListingsADD (Price Float)

Schemas are flexible, and evolve

32138, camera, Nikon D40,USD 300

Not every record in adataset has values defined for all fields declared forthe dataset

vs.

Page 7: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

7

Step 3: Search

Simple Web Service API’s Simple Web Service API’s

Database

PNUTS

“bicycle”

FredsList’s customers quickly ask for keyword search

Search

Vespa

“dvd’s” “nanny”

FredsList.com application FredsList.com application

ALTER ListingsSET Description SEARCHABLE

ALTER ListingsSET Description SEARCHABLE

Messaging

Tribble

Federation of systems

offering different

capabilities

Page 8: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

8

Step 4: Photos

Simple Web Service API’s Simple Web Service API’s

Database

PNUTS

FredsList decides to add photos/videos to listings

Search

Vespa

Storage

MObStorForeign key

photo → listing

FredsList.com application FredsList.com application

ALTER ListingsADD Photo BLOB

ALTER ListingsADD Photo BLOB

Messaging

Tribble

Federation of systems

offering different

performance points

Page 9: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

9

Step 5: Data Analysis

Simple Web Service API’s Simple Web Service API’s

Database

PNUTS

FredsList wants to analyze its listings to get statistics about category, do geocoding, etc.

Search

Vespa

Storage

MObStorForeign key

photo → listing

FredsList.com application FredsList.com application

ALTER ListingsMAKE ANALYZABLE

ALTER ListingsMAKE ANALYZABLE

Compute

Grid

Batch export

Pig query to analyze categories

Hadoop program to geocode data

Hadoop program to generate fancy pages for listings

Messaging

Tribble

Page 10: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

10

Step 6: Performance

Simple Web Service API’s Simple Web Service API’s

Database

PNUTS

FredsList wants to reduce its data access latency

Search

Vespa

Messaging

Tribble

Storage

MObStorForeign key

photo → listing

FredsList.com application FredsList.com application

ALTER ListingsMAKE CACHEABLE

ALTER ListingsMAKE CACHEABLE

Compute

Grid

Batch export

Caching

memcached

And by now, Fred is

global, and wants geo-replication!

Page 11: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

11

Data Serving vs. Analysis

• Very different workloads, requirements• Data from serving system is one of many

kinds of data (click streams are another common kind, as are syndicated feeds) to be analyzed and integrated

• The result of analysis often goes right back into serving system

Page 12: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

12

EYES TO THE SKIESMotherhood-and-Apple-Pie

Page 13: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

13

Why Clouds?

• On-demand infrastructure to create a fundamental shift in the OE curve:

– Do things we can’t do

– Build more robustly, more efficiently, more globally, more completely, more quickly, for a given budget

• Cloud services should do heavy lifting of heavy-lifting of scaling & high-availability– Today, this is done at the

app-level, which is not productive

Page 14: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

14

Requirements for Cloud Services

• Multitenant. A cloud service must support multiple, organizationally distant customers.

• Elasticity. Tenants should be able to negotiate and receive resources/QoS on-demand.

• Resource Sharing. Ideally, spare cloud resources should be transparently applied when a tenant’s negotiated QoS is insufficient, e.g., due to spikes.

• Horizontal scaling. It should be possible to add cloud capacity in small increments; this should be transparent to the tenants of the service.

• Metering. A cloud service must support accounting that reasonably ascribes operational and capital expenditures to each of the tenants of the service.

• Security. A cloud service should be secure in that tenants are not made vulnerable because of loopholes in the cloud.

• Availability. A cloud service should be highly available.• Operability. A cloud service should be easy to operate, with few

operators. Operating costs should scale linearly or better with the capacity of the service.

Page 15: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

15

Types of Cloud Services

• Two kinds of cloud services:– Horizontal (“Platform”) Cloud Services

• Functionality enabling tenants to build applications or new services on top of the cloud

– Functional Cloud Services • Functionality that is useful in and of itself to tenants. E.g., various

SaaS instances, such as Saleforce.com; Google Analytics and Yahoo!’s IndexTools; Yahoo! properties aimed at end-users and small businesses, e.g., flickr, Groups, Mail, News, Shopping

• Could be built on top of horizontal cloud services or from scratch• Yahoo! has been offering these for a long while (e.g., Mail for

SMB, Groups, Flickr, BOSS, Ad exchanges)

Page 16: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

16

Opening Up Yahoo! Search

Phase 1 Phase 2

Giving site owners and developers control over the appearance of Yahoo!

Search results.

BOSS takes Yahoo!’s open strategy to the next level by providing Yahoo!

Search infrastructure and technology to developers and companies to help them

build their own search experiences.

Page 17: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

18

BOSS Offerings

API

A self-service, web services model for developers and start-ups to quickly build and deploy new search experiences.

BOSS offers two options for companies and developers and has partnered with top technology universities to drive search experimentation, innovation and research into next generation search.

• University of Illinois Urbana Champaign• Carnegie Mellon University

• Stanford University

• Purdue University

• MIT

• Indian Institute of

Technology Bombay

• University of

Massachusetts

CUSTOM

Working with 3rd parties to build a more relevant, brand/site specific web search experience.

This option is jointly built by Yahoo! and select partners.

ACADEMIC

Working with the following universities to allow for wide-scale research in the search field:

(Slide courtesy Prabhakar Raghavan)

Page 18: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

19

Partner Examples

Page 19: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

20

Horizontal Cloud Services

• Horizontal cloud services are foundations on which tenants build applications or new services. They should be:– Semantics-free. Must be "generic infrastructure,” and not tied to

specific app-logic. • May provide the ability to inject application logic through well-defined

APIs

– Broadly applicable. Must be broadly applicable (i.e., it can't be intended for just one or two properties).

– Fault-tolerant over commodity hardware. Must be built using inexpensive commodity hardware, and should mask component failures.

• While each cloud service provides value, the power of the cloud paradigm will depend on a collection of well-chosen, loosely coupled services that collectively make it easy to quickly develop and operate innovative web applications.

Page 20: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

22

Yahoo! Cloud StackPr

ovis

ioni

ng (

Self-

serv

e)

Horizontal Cloud Services …YCS YCPI Brooklyn

EDGEM

onito

ring/

Met

erin

g/Se

curit

y

Horizontal Cloud Services…Hadoop

BATCH

Horizontal Cloud Services…Sherpa MOBStor

STORAGE

Horizontal Cloud ServicesVM/OS …

APP

Horizontal Cloud ServicesVM/OS yApache

WEB

Dat

a H

ighw

ay

Serving Grid

PHP App Engine

Page 21: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

23

Yahoo! CCDI Thrust Areas

• Fast Provisioning and Machine Virtualization: On demand, deliver a set of hosts imaged with desired software and configured against standard services– Multiple hosts may be multiplexed onto the same physical

machine.

• Batch Storage and Processing: Scalable data storage optimized for batch processing, together with computational capabilities

• Operational Storage: Persistent storage that supports low-latency updates and flexible retrieval

• Edge Content Services: Support for dealing with network topology, communication protocols, caching, and BCP

Rest of today’s talk

Page 22: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

24

Web Data Management

Large data analysis(Hadoop)

Structured record storage

(PNUTS/Sherpa)

Blob storage(SAN/NAS)

• Scan oriented workloads

• Focus on sequential disk I/O

• $ per cpu cycle

• CRUD • Point lookups

and short scans

• Index organized table and random I/Os

• $ per latency

• Object retrieval and streaming

• Scalable file storage

• $ per GB

Page 23: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

25

[Workflow][Workflow]

Hadoop: Batch Storage/Analysis

Why is batch processing important?

• Whether it’s – response-prediction for advertising– machine-learned relevance for Search, or– content optimization for audience, – data-intensive computing is increasingly

central to everything Yahoo! does– Hadoop is central to addressing this need

• Hadoop is a case-study in our cloud vision– Processes enormous amounts of data– Provides horizontal scaling and fault-

tolerance for our users– Allows those users to focus on their app

logic

HDFSHDFS

Map-ReduceMap-Reduce

High-level query layer (Pig)

High-level query layer (Pig)

Page 24: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

26

The World Has Changed

• Web serving applications need:– Scalability!

• Preferably elastic

– Flexible schemas– Geographic distribution– High availability– Reliable storage

• Web serving applications can do without:– Complicated queries– Strong transactions

Page 25: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

2727

MObStor

• Yahoo!’s next-generation globally replicated, virtualized media object storage service

• Better provisioning, easy migration, replication, better BCP, and performance

• New features (Evergreen URLs, CDN integration, REST API, …)

• The object metadata problem addressed using Sherpa, though MObStor is focused on blob storage.

Page 26: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

28

Storage & Delivery Stack

Page 27: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

29

PNUTS /

SHERPA

To Help You Scale Your Mountains of Data

Page 28: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

30

CCDI—Research Collaboration

Yahoo! Research

• Raghu Ramakrishnan • Brian Cooper• Utkarsh Srivastava• Adam Silberstein• Rodrigo Fonseca

CCDI

• Chuck Neerdaels • P.P.S. Narayan • Kevin Athey• Toby Negrin• Plus Dev/QA teams

Page 29: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

31

Yahoo! Serving Storage Problem

– Small records – 100KB or less

– Structured records – lots of fields, evolving

– Extreme data scale - Tens of TB

– Extreme request scale - Tens of thousands of requests/sec

– Low latency globally - 20+ datacenters worldwide

– High Availability - outages cost $millions

– Variable usage patterns - as applications and users change

31

Page 30: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

33

E 75656 C

A 42342 EB 42521 W

C 66354 W

D 12352 E

F 15677 E

What is PNUTS/Sherpa?

E 75656 C

A 42342 EB 42521 W

C 66354 W

D 12352 E

F 15677 E

CREATE TABLE Parts (ID VARCHAR,StockNumber INT,Status VARCHAR…

)

CREATE TABLE Parts (ID VARCHAR,StockNumber INT,Status VARCHAR…

)

Parallel databaseParallel database Geographic replicationGeographic replication

Structured, flexible schemaStructured, flexible schema

Hosted, managed infrastructureHosted, managed infrastructure

A 42342 E

B 42521 W

C 66354 W

D 12352 E

E 75656 C

F 15677 E

33

Page 31: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

35

What Will It Become?

E 75656 C

A 42342 EB 42521 W

C 66354 W

D 12352 E

F 15677 E

E 75656 C

A 42342 EB 42521 W

C 66354 W

D 12352 E

F 15677 E

E 75656 C

A 42342 EB 42521 W

C 66354 W

D 12352 E

F 15677 E

Indexes and viewsIndexes and views

Page 32: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

36

Scalability• Thousands of machines• Easy to add capacity• Restrict query language to avoid costly queries

Geographic replication• Asynchronous replication around the globe• Low-latency local access

High availability and fault tolerance• Automatically recover from failures• Serve reads and writes despite failures

Design Goals

36

Consistency• Per-record guarantees• Timeline model • Option to relax if needed

Multiple access paths• Hash table, ordered table• Primary, secondary access

Hosted service• Applications plug and play• Share operational cost

Page 33: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

37

Technology Elements

PNUTS • Query planning and execution• Index maintenance

Distributed infrastructure for tabular data • Data partitioning • Update consistency• Replication

YDOT FS • Ordered tables

Applications

Tribble• Pub/sub messaging

YDHT FS • Hash tables

Zookeeper• Consistency service

YC

A:

Aut

hori

zati

on

PNUTS API Tabular API

37

Page 34: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

38

Data Manipulation

• Per-record operations– Get– Set– Delete

• Multi-record operations– Multiget– Scan– Getrange

• Web service (RESTful) API

38

Page 35: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

39

Tablets—Hash Table

Apple

Lemon

Grape

Orange

Lime

Strawberry

Kiwi

Avocado

Tomato

Banana

Grapes are good to eat

Limes are green

Apple is wisdom

Strawberry shortcake

Arrgh! Don’t get scurvy!

But at what price?

How much did you pay for this lemon?

Is this a vegetable?

New Zealand

The perfect fruit

Name Description Price

$12

$9

$1

$900

$2

$3

$1

$14

$2

$8

0x0000

0xFFFF

0x911F

0x2AF3

39

Page 36: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

40

Tablets—Ordered Table

40

Apple

Banana

Grape

Orange

Lime

Strawberry

Kiwi

Avocado

Tomato

Lemon

Grapes are good to eat

Limes are green

Apple is wisdom

Strawberry shortcake

Arrgh! Don’t get scurvy!

But at what price?

The perfect fruit

Is this a vegetable?

How much did you pay for this lemon?

New Zealand

$1

$3

$2

$12

$8

$1

$9

$2

$900

$14

Name Description PriceA

Z

Q

H

Page 37: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

41

Flexible Schema

Posted date Listing id Item Price

6/1/07 424252 Couch $570

6/1/07 763245 Bike $86

6/3/07 211242 Car $1123

6/5/07 421133 Lamp $15

Color

Red

Condition

Good

Fair

Page 38: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

42

Storageunits

Routers

Tablet Controller

REST API

Clients

Local region Remote regions

Tribble

Detailed Architecture

42

Page 39: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

43

Tablet Splitting and Balancing

43

Each storage unit has many tablets (horizontal partitions of the table)Each storage unit has many tablets (horizontal partitions of the table)

Tablets may grow over timeTablets may grow over timeOverfull tablets splitOverfull tablets split

Storage unit may become a hotspotStorage unit may become a hotspot

Shed load by moving tablets to other serversShed load by moving tablets to other servers

Storage unitTablet

Page 40: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

44

QUERY PROCESSING

44

Page 41: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

45

Accessing Data

45

SUSU SU

1

Get key k

2Get key k3 Record for key k

4 Record for key k

Page 42: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

46

Bulk Read

46

SUScatter/gather server

SU SU

1

{k1, k2, … kn}

2Get k1

Get k2Get k3

Page 43: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

47

Storage unit 1 Storage unit 2 Storage unit 3

Range Queries in YDOT

• Clustered, ordered retrieval of records

Storage unit 1Canteloupe

Storage unit 3Lime

Storage unit 2Strawberry

Storage unit 1

Router

AppleAvocadoBananaBlueberry

CanteloupeGrapeKiwiLemon

LimeMangoOrange

StrawberryTomatoWatermelon

AppleAvocadoBananaBlueberry

CanteloupeGrapeKiwiLemon

LimeMangoOrange

StrawberryTomatoWatermelon

Grapefruit…Pear?Grapefruit…Lime?

Lime…Pear?

Storage unit 1Canteloupe

Storage unit 3Lime

Storage unit 2Strawberry

Storage unit 1

Page 44: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

48

Updates

1

Write key k

2Write key k7

Sequence # for key k

8

Sequence # for key k

SU SU SU

3Write key k

4

5SUCCESS

6Write key k

RoutersMessage brokers

48

Page 45: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

49

ASYNCHRONOUS REPLICATION AND

CONSISTENCY

49

Page 46: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

50

Asynchronous Replication

50

Page 47: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

51

• Goal: Make it easier for applications to reason about updates and cope with asynchrony

• What happens to a record with primary key “Alice”?

Consistency Model

51

Time

Record inserted

Update Update Update UpdateUpdate Delete

Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7

Generation 1

v. 6 v. 8

Update Update

As the record is updated, copies may get out of sync.

Page 48: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

52

Example: Social Alice

User Status

Alice Busy

West East

User Status

Alice Free

User Status

Alice ???User Status

Alice ???

User Status

Alice Busy

User Status

Alice ______

Busy

Free

Free

Record Timeline

Page 49: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

53

Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7

Generation 1

v. 6 v. 8

Current version

Stale versionStale version

Read

Consistency Model

53

In general, reads are served using a local copy

Page 50: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

54

Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7

Generation 1

v. 6 v. 8

Read up-to-date

Current version

Stale versionStale version

Consistency Model

54

But application can request and get current version

Page 51: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

55

Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7

Generation 1

v. 6 v. 8

Read ≥ v.6

Current version

Stale versionStale version

Consistency Model

55

Or variations such as “read forward”—while copies may lag themaster record, every copy goes through the same sequence of changes

Page 52: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

56

Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7

Generation 1

v. 6 v. 8

Write

Current version

Stale versionStale version

Consistency Model

56

Achieved via per-record primary copy protocol(To maximize availability, record masterships automaticlly transferred if site fails)

Can be selectively weakened to eventual consistency (local writes that are reconciled using version vectors)

Page 53: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

57

Timev. 1 v. 2 v. 3 v. 4 v. 5 v. 7

Generation 1

v. 6 v. 8

Write if = v.7

ERROR

Current version

Stale versionStale version

Consistency Model

57

Test-and-set writes facilitate per-record transactions

Page 54: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

58

Consistency Techniques

• Per-record mastering– Each record is assigned a “master region”

• May differ between records

– Updates to the record forwarded to the master region– Ensures consistent ordering of updates

• Tablet-level mastering– Each tablet is assigned a “master region”– Inserts and deletes of records forwarded to the master region– Master region decides tablet splits

• These details are hidden from the application– Except for the latency impact!

Page 55: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

5959

Mastering

A 42342 EB 42521 W

C 66354 W

D 12352 EE 75656 C

F 15677 E A 42342 EB 42521 W

C 66354 W

D 12352 EE 75656 C

F 15677 EA 42342 EB 42521 W

C 66354 W

D 12352 EE 75656 C

F 15677 E

Tablet master

Page 56: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

60

Bulk Insert/Update/Replace

Client

Source Data

Bulk manager

1. Client feeds records to bulk manager

2. Bulk loader transfers records to SU’s in batches• Bypass routers and

message brokers• Efficient import into

storage unit

Page 57: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

61

Bulk Load in YDOT

• YDOT bulk inserts can cause performance hotspots

• Solution: preallocate tablets

Page 58: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

62

Index Maintenance

• How to have lots of interesting indexes and views, without killing performance?

• Solution: Asynchrony!– Indexes/views updated asynchronously when

base table updated

Page 59: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

63

SHERPAIN CONTEXT

63

Page 60: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

64

Types of Record Stores

• Query expressiveness

Simple Feature rich

Object retrieval

Retrieval from single table of

objects/records

SQL

S3 PNUTS Oracle

Page 61: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

65

Types of Record Stores

• Consistency model

Best effort Strong guaranteesEventual

consistencyTimeline

consistencyACID

S3 PNUTS Oracle

Program centric

consistency

Program centric

consistencyObject-centric consistency

Object-centric consistency

Page 62: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

66

Types of Record Stores

• Data model

Flexibility,Schema evolution

Optimized forFixed schemas

CouchDB

PNUTS

Oracle

Consistency spans objectsConsistency

spans objectsObject-centric consistency

Object-centric consistency

Page 63: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

67

Types of Record Stores

• Elasticity (ability to add resources on demand)

Inelastic Elastic

Limited (via data

distribution)

VLSD(Very Large

Scale Distribution /Replication)

OraclePNUTS

S3

Page 64: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

68

Data Stores Comparison

• User-partitioned SQL stores– Microsoft Azure SDS– Amazon SimpleDB

• Multi-tenant application databases– Salesforce.com– Oracle on Demand

• Mutable object stores– Amazon S3

Versus PNUTS

• More expressive queries• Users must control partitioning• Limited elasticity

• Highly optimized for complex workloads

• Limited flexibility to evolving applications

• Inherit limitations of underlying data management system

• Object storage versus record management

Page 65: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

69

Application Design Space

Records Files

Get a few things

Scan everything

Sherpa MObStor

Everest Hadoop

YMDBMySQL

Filer

Oracle

BigTable

69

Page 66: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

70

Alternatives Matrix

Ela

stic

Ope

rabi

lity

Glo

bal l

ow

late

ncy

Ava

ilab

ilit

y

Stru

ctur

ed

acce

ss

Sherpa

Y! UDB

MySQL

Oracle

HDFS

BigTable

DynamoU

pdat

esCassandra

Con

sist

ency

m

odel

SQL

/AC

ID

70

Page 67: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

71

Further Reading

Efficient Bulk Insertion into a Distributed Ordered Table (SIGMOD 2008)Adam Silberstein, Brian Cooper, Utkarsh Srivastava, Erik Vee, Ramana Yerneni, Raghu Ramakrishnan

PNUTS: Yahoo!'s Hosted Data Serving Platform (VLDB 2008)Brian Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Phil Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, Ramana Yerneni

Asynchronous View Maintenance for VLSD Databases,Parag Agrawal, Adam Silberstein, Brian F. Cooper, Utkarsh Srivastava and Raghu RamakrishnanSIGMOD 2009 (to appear)

Cloud Storage Design in a PNUTShellBrian F. Cooper, Raghu Ramakrishnan, and Utkarsh SrivastavaBeautiful Data, O’Reilly Media, 2009 (to appear)

Page 68: 1 An Overview of Cloud Computing @ Yahoo! Raghu Ramakrishnan Chief Scientist, Audience and Cloud Computing Research Fellow, Yahoo! Research Reflects many

72

QUESTIONS?

72