capriccio: scalable threads for internet services rob von behren, jeremy condit, feng zhou, geroge...

Capriccio: Scalable Threads for Internet Services

Rob von Behren, Jeremy Condit, Feng Zhou, Geroge Necula and Eric Brewer

University of California at Berkeley

Presenter: Olusanya Soyannwo

Outline

Motivation Background Goals Approach Experiments Results Related work Conclusion & Future work

EECS Advanced Operating Systems Northwestern University

Motivation

Increasing scalability demands for Internet services

Hardware improvements are limited by existing software

Current implementations are event based


Background : Event Based Systems - Drawbacks

Events systems hide the control flowDifficult to understand and debug

Programmers need to match related events

Burdens programmers


Goals: Capriccio

Support for existing thread API

Scalability to hundreds of thousands of threads

Automate application-specific customization


Approach: Capriccio

Thread packageCooperative scheduling

Linked stacksAddress the problem of stack

allocation for large numbers of threadsCombination of compile-time and run-

time analysis Resource-aware scheduler


Approach: User Level Thread – The Choice

POSIX API(-)Complex preemption

(-)Bad interaction with Kernel scheduler

Performance Ease thread synchronization overhead No kernel crossing for preemptive threading More efficient memory management at user level

Flexibility Decoupling user and kernel threads allows faster innovation Can use new kernel thread features without changing application

code Scheduler tailored for applications


Approach: User Level Thread – Disadvantages

Additional OverheadReplacing blocking calls with non-

blocking calls

Multiple CPU synchronization


Approach: User Level Thread – Implementation Context Switches

Built on top of Edgar Toernig’s coroutine library Fast context switches when threads voluntarily yield

I/O Capriccio intercepts blocking I/O calls Uses epoll for asynchronous I/O

Scheduling Very much like an event-driven application Events are hidden from programmers

Synchronization Supports cooperative threading on single-CPU machines

• Requires only Boolean checks


Approach: Linked Stack

The problem: fixed stacksOverflow vs. wasted spaceLimits thread numbers

The solution: linked stacksAllocate space as neededCompiler analysis

• Add runtime checkpoints • Guarantee enough space until next check

Fixed Stacks

Linked Stack


Approach: Linked Stack Parameters

MaxPathMinChunk

Steps Break cycles Trace back

Special Cases Function pointers External calls

5

4

2

6

3

3

2

3

MaxPath = 8



MaxPathMinChunk



5

4

2

3

3

2

3

MaxPath = 8

6



MaxPathMinChunk



5

3

MaxPath = 8

6

3

2

2

4

3



MaxPathMinChunk


Special Cases Function pointers External calls MaxPath = 8

6

3

2

2

4

3

3

3


Approach: Scheduling

Advantages of event-based schedulingTailored for applicationsWith event handlers

Events provide two important pieces of information for schedulingWhether a process is close to

completionWhether a system is overloaded


Approach: Scheduling -The Blocking Graph

Close

Write

WriteReadSleep

ThreadcreateMain

Thread-based View applications as

sequence of stages, separated by blocking calls

Analogous to event-based scheduler


Approach: Resource-aware Scheduling Track resources used along BG edges

Memory, file descriptors, CPU Predict future from the past Algorithm

• Increase use when underutilized• Decrease use near saturation

Advantages Operate near the knee w/o thrashing Automatic admission control


Experiment: Threading Microbenchmarks

SMP, two 2.4 GHz Xeon processors 1 GB memory two 10 K RPM SCSI Ultra II hard

drives Linux 2.5.70 Compared Capriccio, LinuxThreads,

and Native POSIX Threads for Linux


Experiment: Thread Scalability

Producer-consumer microbenchmark LinuxThreads begin to degrade after

20 threads NPTL degrades after 100 Capriccio scales to 32K producers

and consumers (64K threads total)


Results: Thread Primitive - Latency

Capriccio LinuxThreads NPTL

Thread creation

21.5 21.5 17.7

Thread context switch

0.24 0.71 0.65

Uncontended mutex lock

0.04 0.14 0.15


Results: Thread Scalability


Results: I/O performance

Network performanceToken passing among pipesSimulates the effect of slow client links10% overhead compared to epollTwice as fast as both LinuxThreads

and NPTL when more than 1000 threads

Disk I/O comparable to kernel threads


Results: Runtime Overhead

Tested Apache 2.0.44 Stack linking

73% slowdown for null call 3-4% overall

Resource statistics 2% (on all the time) 0.1% (with sampling)

Stack traces 8% overhead


Results: Web Server Performance


Related Work

Programming Model of high concurrency Event based models are a result of poor thread implementations

User-Level Threads Capriccio is unique

Kernel Threads NPTL

Application Specific Optimization SPIN & Exokernel Burden on programmers Portability

Asynchronous I/O Stack Management

Using heap requires a garbage collector (ML of NJ)


Related Work (cont’d)

Resource Aware SchedulingSeveral similar to capriccio

Future Work

ThreadingMulti-CPU supportKernel interface

(enabled) Compile-time techniquesVariations on linked stacksStatic blocking graph

SchedulingMore sophisticated prediction


Conclusion

Capriccio simplifies high concurrency Scalable & high performanceControl over concurrency model

• Stack safety• Resource-aware scheduling• Enables compiler support, invariants

Issues Additional burden to programmer Resource controlled sched.? What

hysteresis?


OTHER GRAPHS

capriccio: scalable threads for internet services rob von behren, jeremy condit, feng zhou, geroge...

Documents

olusanya soyannwo slide

kernel threads

new kernel thread

io calls

programmers synchronization

scalable threads

problem of stack allocation

io capriccio intercepts