r2:$an$application.level$kernel$ for$record$and$replay · what$&$why? • what –...

49
R2: An Application.Level Kernel for R ecord and R eplay Zhenyu Guo, Xi Wang, Jian Tang, Xuezheng Liu, Zhilei Xu, Ming Wu, M. Frans Kaashoek, and Zheng Zhang Microsoft Research Asia, Tsinghua University, MIT CSAIL 1

Upload: others

Post on 14-Sep-2019

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

R2:$An$Application.Level$Kernel$for$Record$and$Replay

Zhenyu'Guo,$Xi$Wang,$Jian$Tang,$Xuezheng$Liu,$Zhilei$Xu,$Ming$Wu,$M.$Frans$Kaashoek,$and$Zheng$Zhang

Microsoft$Research$Asia,$Tsinghua University,$MIT$CSAIL

1

Page 2: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

What$&$Why?

• What– Record$one$run$and$Replay$it$for$debugging

• Why– Difficult$to$reproduce$some$bugs$with$re.executing

• E.g.$with$specific$network$message$orders

– Hard$to$apply$comprehensive$analysis$with$no$interference$at$runtime• E.g.,$predicate$checking$at$every$write$on$variable$“state”

2

Page 3: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

RecordD:\> set$R2_MODE=RecordD:\> R2.exe signatureUpdate.exe$srg.tango0start$check$@$19:12:47.11

…$downloading$…

4356$entries$downloaded

3

Page 4: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

D:\> set$R2_MODE=ReplayD:\> R2.exe signatureUpdate.exe$srg.tango0start$check$@$19:12:47.11

…$downloading$…

4356$entries$downloaded

4

Faithful Replay

nondeterminism

Page 5: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

State$of$the$art

• Virtual$machine$approaches– George$et$al,$ReVirt:$Enabling$Intrusion$Analysis$through$VirtualAMachine$Logging$and$Replay,$OSDI’02

– Replay$application$and$the$operating$system– Difficult$to$deploy

• Library$approaches– Dennis$et$al,$Replay$debugging$for$distributed$applications,$USENIX’06

– Replay$application$only– Easy$to$deploy$and$lightweight– Cannot$replay$challenging$system$applications$(e.g.,$with$asynchronous$I/O)

5

Page 6: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

Library$approach

App

OS.Kernel

interposition

interface

Replay.Log

Record Replay

6

read

native.read

log.output App

interposition

interfaceread

return.output

native.read

Page 7: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

Problems$for$replaying$system$applications$using$library$approach

• Cannot$record$all$operations$with$non.deterministic$behavior– The$code$is$not$a$function$– spin$lock$assembly$code– Functions$may$have$unclear$semantics$– socketcall,$ioctl

• Can$be$heavyweight$for$some$applications– Log$size$too$large$. read

• Previous$work$does$not$address$these$problems$well

7

Page 8: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

R2’s$approach

8

Problems R2

spin lock$assembly$code

Find$high$level$function$enclosing$it

spin_lock (long$var)${__asm …

}

socketcall Find$functions with$clear$semantics

recv

read Find$function$with$less$I/O

sqlite_exec

Allow$developers$to$select$functions$that$can$be$easily$&$efficiently replayed

Page 9: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

Overview$of$R2

Step$1:$select$a$replayable set$of$functionsspin_lock,$recv,$sqlite_exec,$…

Step$2:$annotate$functions$with$R2$supplied$keywords$so$R2$knows$what$to$recordint recv ([in] SOCKET-socket,-[out,%bsize(return)] void-*buf,----

[in] int nbytes,-[in] int flag-);

Step$3:$R2$generates$function$stubs$for$faithful$replay$automatically

Step$4:$compile$and$run9

Goal:.Capture.all.nondeterminism

Page 10: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

Which$functions$to$select:$f1?

main

f2

socket

f1

0

network

recv

log

INVALID.SOCKET,.CRASH!!!

Recording

Replay

Page 11: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

A$replay$interface$must$be$a$call$graph$cut

main

f2

socket

f1

1

network

recv

2

log

Recording

Replay

Page 12: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

An$incorrect$cut

main

f2

socket

f1

network

recv

2

g_state

Write:'1?>'2Read:'21

Recording

Replay

Incorrect.Value!!!

Page 13: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

Rules$for$a$correct$cut$(or$replay$interface)

main

f2

socket

f1

network

recv

2

g_state

0 RULE.2.(ISOLATION).All'instances'of'unrecorded'readsand'writes'to'a'variable'should'be'either'below'orabove'the'interposed'interface.

RULE.1.(NONDETERMINISM).Any'source'of'nondeterminismshould'be'below'the'interposed'interface.

3

Page 14: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

Find$a$good$cut$in$practice

• Can$be$difficult$to$identify$correct$cut$in$complex$call$graph

• Module$APIs$are$good$candidates– Encapsulate$internal$state$well

14

Page 15: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

Implementation$Challenges

• Deterministic$memory$footprint

• Execution$order$in$multi.threaded$apps

• Reuse$intercepted$functions

• Identify$function$side$effects

• Threads$created$by$implementation

15

Page 16: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

Deterministic$memory$footprint

• What?–Memory$state$and$its$evolution$must$be$the$same$during$recording$and$replay$

• Why?– Different$memory$address$may$lead$to$different$control$flow

16

Page 17: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

0x104CFF00

Memory$problem:$a$typical$network$application

17

struct iocb

{

OVERLAPPED$olp;

char buf[BUFSIZE];…

}

cb =$(struct iocb *)malloc(...);ReadFileEx(hSocket,$cb?>buf,$

BUFSIZE,$$(OVERLAPPED$*)&cb,$0);

Thread'1

GetQueuedCompletionStatus(hPort,$&size,$...,$(OVERLAPPED$*)&cb,$...);

void$*$buf =$cb?>buf;...

Thread'2

log

0x104CFF00

Network'Message

0x104CEF00

INVALID'ADDRESS,CRASH!!!

Page 18: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

Why$different$memory$addresses?

• The$tool$and$the$application$are$in$the$same$address$space

• The$tool$inherently$runs$differently$during$record$and$replay

• Intercepted$functions$are$not$executed$during$replay–Missed$memory$requests$inside$the$functions

–Modules$not$loaded$during$replay

18

Page 19: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

Isolation$using$space$split

libraries

replay'interfaceapp

OS$Kernel

Replay.Space

System.Space

19

Page 20: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

Memory$Isolation

20

libraries

replay'interface

appReplay.Space

System.Space

Deterministic'Memory'Pool

NativeMemory'Pool

malloc

malloc

malloc

malloc

OS$Kernel User'Memory'Address'Space

Page 21: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

Handle$data$transfer$across$interface

21

libraries

replay'interface

appReplay.Space

System.Space

Deterministic'Memory'Pool

NativeMemory'Pool

malloc

OS$Kernel User'Memory'Address'Space

block

[xpointer] char*'getcwd (NULL,'0);

block

block

Page 22: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

0x104CFF00

Memory$footprints$are$deterministic$now

22

struct iocb

{

OVERLAPPED$olp;

char buf[BUFSIZE];…

}

cb =$(struct iocb *)malloc(...);ReadFileEx(hSocket,$cb?>buf,$

BUFSIZE,$$(OVERLAPPED$*)&cb,$0);

Thread'1

GetQueuedCompletionStatus(hPort,$&size,$...,$(OVERLAPPED$*)&cb,$...);

void$*$buf =$cb?>buf;...

Thread'2

log

0x104CFF00

Page 23: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

Deterministic$execution$order$in$multi.threaded$applications

• What?– If$function$A$is$executed$after$function$B$during$recording(happensAbefore$relation)$,$the$same$order$must$be$enforced$during$replay.

• Why?–May$lead$to$replay$failure$if$it$has$a$different$order$during$replay

23

Page 24: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

0x104CFF00

An$Example

24

struct iocb

{

OVERLAPPED$olp;

char buf[BUFSIZE];…

}

cb =$(struct iocb *)malloc(...);ReadFileEx(hSocket,$cb?>buf,$

BUFSIZE,$$(OVERLAPPED$*)&cb,$0);

Thread'1

GetQueuedCompletionStatus(hPort,$&size,$...,$(OVERLAPPED$*)&cb,$...);

void$*$buf =$cb?>buf;...

Thread'2

log

INVALID.ADDRESS,

CRASH!!!

Page 25: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

Happens.before$relations

• Intra.thread– always$the$same$during$recording$and$replay

• Inter.thread$(Challenges)– Callbacks– Thread$synchronization– Resource$manipulation

– Asynchronous$I/O

25

Page 26: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

Capture$and$Maintain$happens.before$relation

T1 T2

QueueUserWorkItem($[callback].Function,$

Function.(…)

T3 T4

ReleaseMutext [sync(hMutex)]

(hMutex,$…$

WaitForSingleObject[sync(hMutex)]

(hMutex,$…)

26

.$With$the$annotations,$R2$can$capture$and$enforce$these$relations

Page 27: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

Summary$of$Annotationsannotation scope description

inoutbsize(val)xpointer(kind)prepare(key,buf)commit(key,$size)

paramparamparamparamfuncfunc

input$(read.only)$parameteroutput$(mutable)$parametermodified$size$of$an$array$buffer$(val can$be$any$expr)address$allocated$internally$(null$,$thread,$or$process)prepare$asynchronous$data$transfercommit$asynchronous$data$transfer

callbacksync(key)

paramfunc

callback$function$pointer$(upcall)causality$among$syscalls/upcalls (key$can$be$any$expr)

cachereproduce

funcfunc

cache$for$reducing$log$sizereproduce$I/O$for$reducing$log$size

! Some$keywords are$standard,$reused$from$standard$annotation$language$(SAL)

27

Data.Transfer

Execution.Order

Optimization

Page 28: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

Annotations$allow$Automated$Code$Generation

C++$Code$Snippet

PHP$Code$Template

Annotated$Prototype

BEGIN_SLOT(record_<?=$fK>name?>,-<?=$fK>name?>)logger-<<-<?=$fK>name?>_signature-<<-current_tid;---<?if(is_syscall($f))-{?> logger-<<-return_value;<?}?><?$direction-=-is_syscall($f)-?-'out'-:-'in';?><?foreach($fK>params as-$p)-{

if-($pK>has($direction))-{if-($pK>has('bsize'))-{?>

logger.write(<?=$fK>name?>,-<?=$pK>val('bsize')?>);----------<?}-else-{?>

logger-<<-<?=$fK>name?>;-<?}}}?>

END_SLOT

int read-(----[in] int fd,----[out,%bsize(return)] void-*buf,----[in] unsigned-int nbytes);

BEGIN_SLOT(record_read,-read)-//-“record_read”-after-native-“read”---logger-<<-read_signature <<-current_tid;----logger-<<-return_value;logger.write(buf,-return_value);

END_SLOT

28

Page 29: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

R2$implementation

• Runs$on$windows– R2$runtime$(~20$kloc)

• Annotated$three$interfaces–Win32

–MPI$

– Sqlite

29

Page 30: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

R2$can$replay$challenging$system$applications

Category Software

Web$server Apache,$lighttpd,$Null$HTTPd

Database SQLite,$Berkeley$DB,$MySQL

Distributed$system libtorrent,$Nyx,$PacificA

Virtual$machine Lua,$Parrot,$Python

Network$client cURL,$PuTTY,$Wget

Misc. zip,$MPICH

30

• Replay$challenging$system$software$(e.g.,$those$with$async IO)• No$modifications$to$applications$but$require$annotations$to$the$interface

Page 31: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

Evaluation

• Questions$to$answer– Annotation$effort– Overall$Performance

– Effectiveness$of$customized$interface

31

Page 32: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

Experiment$Platform• CPU:$$$$$$$$$$2.0$GHz$Xeon$dual.core• Memory:$$4$GB• Disk:$$$$$$$$$$two$250$GB,$7200$/s$• Switch:$$$$$1$Gbps• OS:$$$$$Windows$Server$2003$Service$Pack$2

32

Page 33: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

Annotation$effort

33

Interface #functions/#reuse Annotation.effort Kloc (autogen)

Win32 1301$/$1239$ ~one$person$week$(500+) 110.2

MPI 191$/$. ~2$person days 22.2

SQLite 153$/$. ~2$person$days 15.7

Annotate.once,.used.many.times!!!

Page 34: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

R2$is$lightweight$in$general

• Apache,$filesize =$64$KB,$client$concurrency$=$50,$win32$interface

34

0.94

0.96

0.98

1

1.02

1.04

1.06

1.08

1.1

native stub$(no$log) recording

Norm

alized.Execution.Tim

e

R2.Configurations

1%

• For$some applications,$the$overhead$may$be$larger

9%

Page 35: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

Example:$Sqlite

• SELECT$COUNT(*)$FROM$edge$GROUP$BY$src_uid

• Filesize =$3$MB,$win32$interface

35

1

10

100

1000

interface

log.size.(MB)

win32

0

20

40

60

interface

time.(s)

win32

native

300x2x

Page 36: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

Solution:$choose$an$interface$with$less$I/O

Application

SQLite API

Filesystem API.....................

OS$Kernel

SELECT$COUNT(*)$FROM$edge$GROUP$BY$src_uid

File$I/O$(database$file,$intermediate$swapping$data)

Replay.Space

System.Space

36

Page 37: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

Raising$interface$reduces$overhead

37

1

10

100

1000

interface

log.size.(MB)

win32

sqlite

0

20

40

60

interfacetime.(s)

win32

sqlite

native300x

~1x

~1x

2x

Page 38: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

R2$is$useful$beyond$replay

• Space$split$can$help$in.process$tools– E.g.,$in$model$checker

• Define$what$you$do/don’t$want$to$check

• Space$split$+$annotation$+$code$generation,$a$powerful$combination$that$we$have$applied$to$many$other$projects– Hang$Cure$for$dynamically$curing$hang$problems$(Eurosys’08)– Towards$Automatic$Inference$of$Task$Hierarchies$in$Complex$

Systems$(HotDep’08)– MPIWiz:$Subgroup$reproducible$replay$of$MPI$applications$

(PPoPP’09)– Model$checker$for$distributed$systems$(Submitted)

38

Page 39: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

Related$work

• Library.based$replay:$liblog (USENIX’06),$Jockey$(AADEBUG’05),$RecPlay (TOCS’99)

• Domain.specific$replay:$ML,$MPI,$Java• Whole$system$replay:$hardware$(Strata,$ASPLOS’06),$VM$(Revirt,$OSDI’02;$iDNA,$VEE’06)

• Annotations:$SAL$(ICSE’06),$Deputy$(OSDI’06)• Isolation:$XFI$(OSDI’06),$Nooks$(SOSP’03)

• Main$distinction:$allow$developers$to$select$an$easy$&$efficient$replay$interface

39

Page 40: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

Conclusion

• A$set$of$rules$that$allows$to$select$an$interface$that– can$be$made$replay$faithful$(Correctness)

– cost$less$record$and$replay$overhead$(Performance)

• A$set$of$$keywords$describing$the$side$effects$that$helps$R2$generate$stubs

• A$replay/system$space$split$to$make$the$interface$replay$faithful

• A$win32$implementation$with$low$overhead,$can$replay$challenging$system$applications

40

Page 41: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

Discussion

• How$to$pick$a$good$interface?– API$calls– Log$size– Debug$visibility– Tradeoffs…

42

Page 42: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

Discussion

• Long$running$execution– Checkpointing?–What$states$to$save?

43

Page 43: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

Discussion

• Do$we$need$FAITHFUL$replay?– Probabilistic$replay…

44

Page 44: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

Discussion

• Multithreading– Data$race?–Multi$core$replay

45

Page 45: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

Discussion

• Many$ways$to$implement$R&R– Library– Virtual$machine

–Mozilla’s$rr

46

Page 46: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

Discussion

• Other$ways$to$debug– Deterministic$execution

–Model$checking

– Grep TODO

47

Page 47: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

Thanks!

48

Page 48: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

49

Page 49: R2:$An$Application.Level$Kernel$ for$Record$and$Replay · What$&$Why? • What – Record$one$run$and$Replay$it$for$debugging • Why – Difficult$to$reproduce$some$bugs$with$re.executing

(R2$latest)$How$to$follow$the$two$rules$faithfully:$static$analysis$to$remove$annotation$effort$(and$potential$annotation$error)!

• Bipartite$flow$graph$between$functions$and$variables• Static$analysis$to$get$this$graph• Dynamic$profiling$for$edge$weight• Min.cut$on$the$graph$for$the$near.optimal$replay$interface$

(in$terms$of$log$size) 50

fread

MD5_Digest

Main

1$MBbuffer

fopen fileName

fd

16$bytes$signature

R2.Cut

iRNA Cut