shimin chen lba reading group

33
R2: An application- level kernel for record and replay Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, Z. Zhang, (MSR Asia, Tsinghua, MIT), OSDI’08 Shimin Chen LBA Reading Group

Upload: nishan

Post on 02-Feb-2016

25 views

Category:

Documents


0 download

DESCRIPTION

R2: An application-level kernel for record and replay Z. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, Z. Zhang, (MSR Asia, Tsinghua, MIT), OSDI’08. Shimin Chen LBA Reading Group. What is R2?. Library-based record & replay Intercept calls, record in log, replay from log - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Shimin Chen LBA Reading Group

R2: An application-level kernel for record and replayZ. Guo, X. Wang, J. Tang, X. Liu, Z. Xu, M. Wu, M. F. Kaashoek, Z. Zhang, (MSR Asia, Tsinghua, MIT), OSDI’08

Shimin Chen

LBA Reading Group

Page 2: Shimin Chen LBA Reading Group

What is R2?

Library-based record & replay Intercept calls, record in log, replay from log

Novel features: Allow users (app developers) to decide which

interface to do the record and replay A set of annotations for the interface calls

Implementation: Windows Supports Win32, MPI, and SQLite API

Page 3: Shimin Chen LBA Reading Group

Outline

Introduction Design overview Execution orders Annotations for optimization Implementation Evaluation Summary

Page 4: Shimin Chen LBA Reading Group

Choosing an Interface for Record & Replay

Must choose a “cut” in the call graph

Above interface: executed during record and during replay

Below interface: executed during record. Replayed from log.

Page 5: Shimin Chen LBA Reading Group

Isolation Rule

RULE 1 (ISOLATION) All instances of unrecorded reads and writes to a variable should be either be below or above the interposed interface.

Isolate variables above the interface and variables below the interface

Can hold for Windows For example, as long as R2 intercepts the complete set of file

functions, file descriptors can be recorded

Page 6: Shimin Chen LBA Reading Group

Non-Determinism Rule Any source of non-determinism should be below the

interposed interface.

Sources of non-determinism:1. Calls that receive external data

2. Shared memory inter-process communications

3. Shared variables by multiple threads

R2 can handle 1 For 2 and 3, must choose higher-level interface for

hiding the effects (e.g., lock and unlock for spinlocks)

Page 7: Shimin Chen LBA Reading Group

Terminology!

interposed interface

R2 records the output of R2 syscalls, the input of R2 upcalls, and their ordering

Page 8: Shimin Chen LBA Reading Group

Execution Control

R2 tracks the state of every thread with a replay/system mode bit Mode bit is updated when crossing the interface Recording is avoided for R2 syscalls made from R2

system space

When a user invokes R2 with an application1. R2’s initial state is in system space

2. The “main” is treated as an upcall (recorded, going into replay space)

Page 9: Shimin Chen LBA Reading Group

Memory Management R2 ensures the following in the replay space

malloc/free return the same address R2 replay space uses a dedicated memory pool

Stack locations are the same R2 replay space uses a separate stack per thread R2 system space uses different stacks

R2 syscalls, e.g., getcwd(NULL,0), return memory buffers at the same locations

Returned buffer is copied to space allocated from the replay pool

Page 10: Shimin Chen LBA Reading Group

Annotation and Code Generation Developers annotate interface calls. Then R2 can

automatically generate stub code for record and replay.

Direction: in/out Buffer: bsize(return)

The buffer will be recorded. This example is simple. If a C++ object is to be recorded, serialization & deserialization

should be provided via operator overloading on streams

Page 11: Shimin Chen LBA Reading Group

Annotation for Asynchronous Operation

Start asynchronous file read

Call back

Key to identity the call

prepare indicates that ReadFileEx issues an asynchronous I/O requestkeyed by lpOverlapped;

commit indicates the request keyed by lpOverlapped is completed and the transferred data size is cbTransferred.

Page 12: Shimin Chen LBA Reading Group

Outline

Introduction Design overview Execution orders Annotations for optimization Implementation Evaluation Summary

Page 13: Shimin Chen LBA Reading Group

How to track execution orders?

Tracking causality R2 syscall – R2 upcall causality: callback

See previous example

R2 syscall – R2 syscall causality: sync(key)

Page 14: Shimin Chen LBA Reading Group

Recording Event Order (Lamport Clock)

Thread t’s clock c(t); event e’s clock c(e)

Page 15: Shimin Chen LBA Reading Group

Replaying Event Order

Total-order recording + total-order replaying Use a token to serialize execution

Causal-order recording + total-order replaying Before replay, generate a total order based on the

causal order recorded

Page 16: Shimin Chen LBA Reading Group

Outline

Introduction Design overview Data transfers Execution orders Annotations for optimization Implementation Evaluation Summary

Page 17: Shimin Chen LBA Reading Group

Reducing Log Size for Frequent Calls

Some calls (e.g., GetLastError on Windows returns 0 in most cases) “cache” annotation R2 will cache the last return value R2 will avoid recording the return value for

subsequent calls until there is a change

Page 18: Shimin Chen LBA Reading Group

Reproduce annotation

Some data can be reproduced at replay time without recording For example, read file data from local disk Can annotate with “reproduce” R2 will execute the call during replay

Page 19: Shimin Chen LBA Reading Group

All the Annotations

Page 20: Shimin Chen LBA Reading Group

Outline Introduction Design overview Data transfers Execution orders Defining your own syscalls Annotations for optimization Implementation Evaluation Summary

Page 21: Shimin Chen LBA Reading Group
Page 22: Shimin Chen LBA Reading Group

Detecting Un-recorded Non-Determinism

R2 records R2 syscall signature (e.g., name) and checks it during replay

Detect mismatch and report

Page 23: Shimin Chen LBA Reading Group

Outline Introduction Design overview Data transfers Execution orders Defining your own syscalls Annotations for optimization Implementation Evaluation Summary

Page 24: Shimin Chen LBA Reading Group
Page 25: Shimin Chen LBA Reading Group

Questions to be Answered: How much effort is required to annotate the syscall/upcall interface? How important are annotations to successful replay of applications? How much does R2 slowdown applications during recording? How effective are custom syscall layers and annotations (cache and

reproduce) in reducing log size and optimizing performance?

Replay is not evaluated: “However, the replayed application without any debugging interaction

runs much faster than when recording (e.g., a replay run of BitTorrent file downloading is 13x faster).”

Page 26: Shimin Chen LBA Reading Group

Experimental Setup All machines:

2.0 GHz Xeon dual-core CPU, 4 GB memory two 250 GB, 7200 rpm disks running Windows Server 2003 Service Pack 2 interconnected via a 1 Gbps switch.

Unless explicitly specified: the application data and R2 log files are kept on the same disk total-order recording & execution all optimizations (i.e., cache and reproduce) are turned off.

Page 27: Shimin Chen LBA Reading Group

Annotation Effort

The paper says: 500+ Win32 syscall interface: one person-week MPI and SQLite: each take two person-days

Page 28: Shimin Chen LBA Reading Group

Performance without optimization

Apache is configured with 250 threads. ApacheBench mimics 50 concurrent client, downloading 64KB sized web pages. Each configuration executes 500,000 requests.

Page 29: Shimin Chen LBA Reading Group

Customized R2 Syscall Layers

Query: compute vertex degrees in a social network:

SELECT COUNT(*) FROM edge GROUP BY src_uid;

The data set is ~3MB large.

FILE / MEM chooses where SQLite stores temporary data

Page 30: Shimin Chen LBA Reading Group

Cache Annotation for Apache

Profiling shows that 5 R2 syscalls contribute > 50% of syscalls

Using cache annotation reduces the log size from 21.99MB to 18.1MB.

Page 31: Shimin Chen LBA Reading Group

Reproduced File I/O (BitTorrent)

1 machine seeds a 4GB file, upload bandwidth is limited to 8MB/s.

10 machines download the file concurrently.

Average log size is reduced from 17.1GB to 5.4GB by reproduce.

Page 32: Shimin Chen LBA Reading Group

Reproduced Network I/O

GE and PU are two MPI benchmarks.

Annotated MPI functions using reproduce annotation so that the messages are not recorded but reproduced during replay.

Page 33: Shimin Chen LBA Reading Group

Summary

Library based record and replay in software Annotation and automatic generation of stub

code for record and replay Impressively support many Win32 applications

But cannot handle un-recorded non-determinism e.g., data races in the replay space