seeding cloud-based services: distributed rate limiting (drl) kevin webb, barath raghavan, kashi...

21
SEEDING CLOUD-BASED SERVICES: DISTRIBUTED RATE LIMITING (DRL) Kevin Webb, Barath Raghavan, Kashi Vishwanath, Sriram Ramabhadran, Kenneth Yocum, and Alex C. Snoeren

Upload: edward-wiggins

Post on 16-Dec-2015

220 views

Category:

Documents


0 download

TRANSCRIPT

SEEDING CLOUD-BASED SERVICES:DISTRIBUTED RATE LIMITING (DRL)

Kevin Webb, Barath Raghavan, Kashi Vishwanath, Sriram Ramabhadran, Kenneth Yocum, and Alex C. Snoeren

Seeding the Cloud

Technologies to deliver on the promise cloud computing

Previously: Process data in the cloud (Mortar) Produced/stored across providers Find Ken Yocum or Dennis Logothetis for more info

Today: Control resource usage: “cloud control” with DRL Use resources at multiple sites (e.g., CDN) Complicates resource accounting and control Provide cost control

DRL Overview

Example: Cost control in a Content Distribution Network Abstraction: Enforce global rate limit across multiple sites

Simple example: 10 flows, each limited as if there was a single, central limiter

Src DstLimiter

Src DstLimiter

Src DstLimiter

DRL

10 flows

2 flows

8 flows

20 KB/s

100 KB/s

80 KB/s

Goals & Challenges

Up to now Develop architecture and protocols for distributed rate limiting

(SIGCOMM 07) Particular approach (FPS) is practical in the wide area

Current goals: Move DRL out of the lab and impact real services Validate SIGCOMM results in real-world conditions Provide Internet testbed with ability to manage bandwidth in a

distributed fashion Improve usability of PlanetLab

Challenges Run-time overheads: CPU, memory, communication Environment: link/node failures, software quirks

PlanetLab

World-wide test bed Networking and

systems research Resources donated

by Universities, Labs, etc.

Experiments divided into VMs called “slices” (Vservers)

PostgreSQL

PLC APIWeb server

Linux 2.6

Internet

Controller

Vservers

Slice1

Linux 2.6

Slice2

SliceN

Vservers

Slice1

Linux 2.6

Slice2

SliceN

Nodes

PlanetLab Use Cases

PlanetLab needs DRL! Donated bandwidth Ease of administration

Machine room Limit local-area nodes to a single rate

Per slice Limit experiments in the wide area

Per organization Limit all slices belonging to an organization

PlanetLab Use Cases

Machine room Limit local-area nodes with a single rate

1 MBps1 MBps

1 MBps

1 MBps

1 MBps

DRL

DRL

DRL

DRL

DRL

5 MBps

DRL Design

Each limiter - main event loop Estimate: Observe and

record outgoing demand Allocate: Determine rate

share of each node Enforce: Drops packets

Two allocation approaches GRD: Global random drop

(packet granularity) FPS: Flow proportional share

Flow count as proxy for demand

Input Traffic

Output traffic

Estimate

Allocate

EnforceRegularInterval

OtherLimiters

FPS

Implementation Architecture

Abstractions Limiter

Communication Manages identities

Identity Parameters (limit, interval, etc.) Machines and Subsets

Built upon standard Linux tools… Userspace packet logging (Ulogd) Hierarchical Token Bucket Mesh & gossip update protocols

Integrated with PlanetLab software

Input Data

Output Data

Estimate

FPS

EnforceRegularInterval

Ulogd

HTB

Estimation using ulogd

Userspace logging daemon Already used by PlanetLab for efficient abuse

tracking Packets tagged with slice ID by IPTables Receives outgoing packet headers via netlink

socket

DRL implemented as ulogd plug-in Gives us efficient flow accounting for estimation Executes the Estimate, Allocate, Enforce loop Communicates with other limiters

Enforcement with Hierarchical Token Bucket

Linux Advanced Routing & Traffic Control

Hierarchy of rate limits

Enforces DRL’s rate limit

Packets attributed to leaves (slices)

Packets move up, borrowing from parents

B C D Y Z

A X

Root

Packet (1500b)

1000b

100b

600b

Packet (1500)

0b

0b

200b

Enforcement with Hierarchical Token Bucket

Uses same tree structure as PlanetLab

Efficient control ofsub-trees Updated every loop Root limits whole

node

Replenish each level

B C D Y Z

A X

Root

Citadel Site

The Citadel (2 nodes) Wanted 1 Mbps traffic limit Added (horrible) traffic shaper Poor responsiveness (2 – 15 seconds)

Running right now! Cycles on and off every four minutes

Observe DRL’s impact without ground truth

Shaper

DRL

Citadel Results – Outgoing Traffic

Data logged from running nodes

Takeaways: Without DRL, way over limit One node sending more than other

Time

Outg

oin

g T

raffi

c

1Mbit/s

On On On OnOffOffOff Off

Citadel Results – Flow Counts

Time

# o

f Fl

ow

s

FPS uses flow count as proxy for demand

Citadel Results – Limits and Weights

Time

Rate

Lim

itFP

S W

eig

ht

Lessons Learned

Flow counting is not always the best proxy for demand FPS state transitions were irregular Added checks and dampening/hysteresis in problem cases

Can estimate after enforce Ulogd only shows packets after HTB FPS is forgiving to software limitations

HTB is difficult HYSTERESIS variable TCP Segmentation offloading

Ongoing work

Other use cases Larger-scale tests Complete PlanetLab administrative

interface

Standalone version

Continue DRL rollout on PlanetLab UCSD’s PlanetLab nodes soon

Questions?

Code is available from PlanetLab svn http://svn.planet-lab.org/svn/DistributedRateLimiti

ng/

Citadel Results