hypervisor-assisted application checkpointing for high availability min lee joint work with a. s....
TRANSCRIPT
Hypervisor-Assisted Application Checkpointing for High Availability
Min Lee
Joint work with A. S. Krishnakumar, P. Krishnan, Navjot Singh, Shalini Yajnik
© 2009 Avaya Inc. All rights reserved. 2
Introduction
Virtualization technology– Gets adopted widely
– Proves its usefulness
– Most applications run well• Natively run
Some important applications don’t run well– Certain operations cannot run natively
– Instead they use hypercalls
– Our target: Application-checkpointing
© 2009 Avaya Inc. All rights reserved. 3
Xen Virtual Machine Monitor
Xen Hypervisor
ModifiedGuest OS
ModifiedGuest OS
ModifiedGuest OS
… …Virtual
machines
Virtual hardware (vCpu, vDisk, vNic, vMemory etc.)
Physical hardware (Cpu, Disk, Nic, Memory etc.)
ApplicationsApplications Applications
(Taken/adapted from ‘Xen and co.’ slides)
© 2009 Avaya Inc. All rights reserved. 4
High Availability Approaches
Categories– Application-transparent
• No changes to application or guest• Xen-specific: Remus, Kemari
– Application-assisted• Application implements the checkpointing logic• Flexible and light-weight
We are targeting– Application-assisted under virtualization
• Xen-specific• Applicable to general hypervisors
© 2009 Avaya Inc. All rights reserved. 5
Hypervisor-Assisted Application Checkpointing
Application checkpointing– Provides transactional properties to the traditional heap
• Make high available heap
– Processes survive failures
– Has performance issues in Xen
Our technique improves application-checkpointing performance in Xen
© 2009 Avaya Inc. All rights reserved. 6
High Availability
List_add()
List_del()
Magical mirror
changes
changes
List_add()Crash
TakeoverList_add()
© 2009 Avaya Inc. All rights reserved. 7
Transaction APIs
List of dirty-pages– Written pages
Mprotect() system call– Write-protect
– SIGSEGV signal
Tstart();List_add();Tend();
int declare(addr, size);void undeclare(Tid);void Tstart(Tid);void Tend(Tid, dirty_pages);
List_add();
Tstart();List_add();List_del();List_add();List_del();Tend();
List_add();List_del();List_add();List_del();
Examples:
APIs:
© 2009 Avaya Inc. All rights reserved. 8
PT – Existing Approach
Get dirty pages
123456789
101112
5 List_add();handler() {
mprotect(unprotect);add_to_dirty_pages();
}
5
List_add();7
7
Tstart();
Tend();…
Declare() {}
Undeclare() {}
123456789
101112
Process’ view(virtual pages)
© 2009 Avaya Inc. All rights reserved. 9
PT Call-Flow
Pure User-level
User
OS
Hypervisor
Mprotect()Mprotect()
Page fault
Signal
For every dirty page
TLB flush TLB flush
© 2009 Avaya Inc. All rights reserved. 10
Approaches
PT-based Emulation-based Scan-based
Pure user spacePT
(Exisiting)
Hypervisor-assisted
© 2009 Avaya Inc. All rights reserved. 11
Approaches
PT-based Emulation-based Scan-based
Pure user spacePT
(Exisiting)Emulation
Hypervisor-assisted
PTxen Emulxen Scanxen
Our approaches
© 2009 Avaya Inc. All rights reserved. 12
Our Approaches
© 2009 Avaya Inc. All rights reserved. 13
Emulation
Under the condition– Most transactions are small
123456789
101112
List_add();handler() {
emulate();log_to_write_buffer();
}
(Addr1,100)
List_add();
(Addr2,200)
Tstart() {}
Tend();…
Declare();
Undeclare();
123456789
101112
Process’ view(virtual pages)
© 2009 Avaya Inc. All rights reserved. 14
Hypervisor-Assisted:User-to-hypervisor call
Overhead through OS unnecessary– Directly talk to Xen
Move checkpointing to Xen level– Add new interrupt vector
• 0x80: system call• 0x82: hypercall from guest OS• 0x84: hypercall from user (Newly added)
Xen-based approaches without any changes to guest OS.
© 2009 Avaya Inc. All rights reserved. 15
Hypervisor-Assisted:User-to-hypervisor call
User-to-Hypervisor Call
© 2009 Avaya Inc. All rights reserved. 16
PTxen
Implement PT in Xen
123456789
101112
5 List_add();
page_fault() {mprotect(unprotect);add_to_dirty_pages();
}
5
List_add();7
7
Tstart() {}
Tend();…
Declare();
Undeclare() {}
123456789
101112
Process1, (1-12)
----- Xen -----
Process’ view(virtual pages)
© 2009 Avaya Inc. All rights reserved. 17
Emulxen
Emulation in Xen
List_add();
List_add();
Tstart() {}
Tend();…
Declare();
Undeclare();
123456789
101112
Process1, (1-12)
page_fault() {emulate();log_to_write_buffer();
}
(Addr1,100)(Addr2,200)
----- Xen -----
123456789
101112
Process’ view(virtual pages)
© 2009 Avaya Inc. All rights reserved. 18
Scanxen Idea
– Scan page table rather than trapping writes
– Hardware marks dirty bit
List_add();
5
List_add();
7
Tstart() {}
Tend();…
Declare();
Undeclare();
123456789
101112
Process1, (1-12)
----- Xen -----
= Dirty-bit in page table
scan_page_table() {collect_dirty_bit(); add_to_dirty_pages();
}
Process’ view(virtual pages)
© 2009 Avaya Inc. All rights reserved. 19
Microbenchmark
10000 transactions10MB heap size
© 2009 Avaya Inc. All rights reserved. 20
Microbenchmark
Transactional heap size– For simplicity, whole heap is protected
Transaction– Write per pages (wpp)
• # of writes per pages
– Page per transaction (ppt)• # of unique pages written
– # of writes = wpp * ppt
Scanxen– Impacted by only heap size
– Not wpp, ppt, or transaction size
© 2009 Avaya Inc. All rights reserved. 21
PT vs PTxen
PTxen shows 10x speedup
PT, PTxen get impacted by ppt
1 2 3 4 5 6 7 8ppt
0
2
4
6
8
10
12
14
Tim
e in
se
c
PT(wpp = 4, 8, 16 overlapped)
PTxen(wpp = 4, 8, 16 overlapped)
© 2009 Avaya Inc. All rights reserved. 22
Emulation vs emulxen
16 32 48 64 80 96 112 128Transaction size
0
5
10
15
20
25
30
35
40
45
emul wpp 4emul wpp 8emul wpp 16emulxen wpp 4emulxen wpp 8emulxen wpp 16
Tim
e in
se
c
Emul-based gets impacted by transaction size
Emulxen shows 4x speedup
emul
emulxen
ppt (wpp=16) : 1 2 3 4 5 6 7 8ppt (wpp=8) : 2 4 6 8 10 12 14 16ppt (wpp=4) : 4 8 12 16 20 24 28 32
© 2009 Avaya Inc. All rights reserved. 23
PT Call-Flow
Pure User-level
User
OS
Hypervisor
Mprotect()Mprotect()
Page fault
Signal
User
OS
Hypervisor Page fault
declare()
For every dirty page
TLB flush TLB flush TLB flush
Xen-assisted
© 2009 Avaya Inc. All rights reserved. 24
Evaluation
Source from the book “Data Structures and Algorithm Analysis in C (Second Edition),” by
Mark Allen Weiss
© 2009 Avaya Inc. All rights reserved. 25
Data StructuresOPS_PER_T=1 writes pages
avg min max avg min max
aa (AA-trees) insert 21.9836 5 63 4.9481 1 7
delete 20.4053 2 63 6.0642 1 9
avl (AVL trees) insert 30.5609 6 39 5.1021 1 9
bin (Binomial queues) insert 27.9985 25 64 2.0735 1 10
dsl (Deterministic skip list) insert 10.4176 7 23 3.1421 1 5
hashquad (Quadratic probing hash) insert 11.3983 2 47023 1.0146 1 68
hashsepchain (Separate chaining hash) insert 4 4 4 1.9696 1 3
leftheap (Leftist heap) insert 23.5673 5 31 3.0665 1 6
delete 34.0132 0 59 9.2518 0 15heap (binary heaps) insert 2.8693 2 14 2.4009 1 5
delete 12.5523 2 15 2.7349 1 5list (Linked list) insert 4 4 4 1.0029 1 2
delete 1 1 1 1 1 1queue (Queues) insert 3 3 3 1.8984 1 2
delete 2 2 2 1 1 1
rb (Red black tree) insert 13.7011 10 28 4.6102 1 9
splay (Splay trees) insert 20.0851 4 5262 4.7745 1 34delete 7.7604 3 15001 3.0258 1 40
tree (Binary search tree) insert 720.7852 4 1436 5.4576 1 10delete 1.7139 0 3 1.7139 0 3
© 2009 Avaya Inc. All rights reserved. 26
Evaluation Results 1
mprotect emul emulxen mprotxen0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4queue insert
queue delete
list insert
list delete
Tim
e in
se
c
mprotect emul emulxen mprotxen0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
hashquad insert
hashsepchain insert
Tim
e in
se
c
mprotect emul emulxen mprotxen0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
dsl insert
Tim
e in
se
c
mprotect emul emulxen mprotxen0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
bin insert
Tim
e in
se
c
PTXenPT
PTXenPT
PTXenPT
PTXenPT
© 2009 Avaya Inc. All rights reserved. 27
Evaluation Results 2
mprotect emul emulxen mprotxen0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
splay insert
splay delete
Tim
e in
se
c
mprotect emul emulxen mprotxen0
0.2
0.4
0.6
0.8
1
1.2
1.4
aa insert
aa delete
Tim
e in
se
c
PTXenPT
mprotect emul emulxen mprotxen0
5
10
15
20
25
tree inserttree delete
Tim
e in
se
c
mprotect emul emulxen mprotxen0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2leftheap insertleftheap delete
Tim
e in
se
c
PTXenPT
PTXenPTPTXenPT
© 2009 Avaya Inc. All rights reserved. 28
Evaluation Results 3
mprotect emul emulxen mprotxen0
0.1
0.2
0.3
0.4
0.5
0.6
heap insert
heap delete
Tim
e in
se
c
mprotect emul emulxen mprotxen0
0.2
0.4
0.6
0.8
1
1.2
rb insertavl insert
Tim
e in
se
c Scanxen shows almost constant 2.5sec across all
PTXenPT PTXenPT
© 2009 Avaya Inc. All rights reserved. 29
Evaluation Summary
Emulxen has up to 4x speedup compared to emulation
PTxen has up to 13x speedup compared to PT
queu
e-in
sert
queu
e-de
lete
list-i
nser
t
list-d
elet
e
hash
quad
-inse
rt
hash
sepc
hain
-del
ete
dsl-i
nser
t
bin-
inse
rt
spla
y-in
sert
spla
y-de
lete
aa-in
sert
aa-d
elet
e
tree-
inse
rt
tree-
dele
te
lefth
eap-
inse
rt
lefth
eap-
dele
te
heap
-inse
rt
heap
-del
ete
rb-in
sert
avl-i
nser
t0
2
4
6
8
10
12
14
16speedup emulxen
speedup mprotxen
Sp
ee
du
p (
1=
10
0%
)
PTXen
© 2009 Avaya Inc. All rights reserved. 30
Transaction Aggregation
OPT=1– A single operation (e.g. an insert or a delete)
OPT=5– Multiple operations merged into one transaction
– # of writes increases linearly
– # of unique pages touched remains same in most cases
It should benefit PT-based approaches– Because of their heavy dependence on PPT
– Details in the paper
© 2009 Avaya Inc. All rights reserved. 31
Conclusion
Family of application checkpointing techniques introduced
Emulation-based techniques– Useful for small transactions [fewer # of writes]
Hypervisor-Assisted Application Checkpointing– 4x~13x than userspace implementation
© 2009 Avaya Inc. All rights reserved. 32
Thank you!
© 2009 Avaya Inc. All rights reserved. 33
Extra Slides
© 2009 Avaya Inc. All rights reserved. 34
Emulation vs PT
1 2 3 4 5 6 7 8 9 10 11 12write per page
0
5
10
15
20
25
30
35
ppt 4 emul
ppt 4 mprotect
ppt 8 emul
ppt 8 mprotect
Tim
e in
se
c
Emul-based is good for small transaction– Roughly wpp=5 and wpp=1.3 is breakeven point
1 2 3 4 5wpp
0
0.5
1
1.5
2
2.5
3
3.5
ppt 4 emulxenppt 4 mprotxenppt 8 emulxenppt 8 mprotxen
Tim
e in
se
c
Note scale difference
© 2009 Avaya Inc. All rights reserved. 35
Scanxen vs PT
1 2 3 4 5 6 7 8Pages per transaction
0
2
4
6
8
10
12
14
Tim
e in
se
c
1 2 3 4 5 6 7 8Pages per transaction
0
0.2
0.4
0.6
0.8
1
1.2
1.4
Tim
e in
se
c For small buffer and large ppt, scanxen might be better
– Not the case in our experiments
Note scale difference
1MB
2MB
3MB
4MB
5MB
PT
PTxen
40KB80KB
120KB
Scanxen heapsizeScanxen heapsize
© 2009 Avaya Inc. All rights reserved. 36
Scanxen vs emulation
Scanxen might be better than emulation– For big transactions
Scanxen
emul
emulxen
16 32 48 64 80 96 112 128Transaction size
0
5
10
15
20
25
30
35
40
45
scanxen wpp 4emul wpp 4emulxen wpp 4
Tim
e in
se
c
© 2009 Avaya Inc. All rights reserved. 37
queu
e-in
sert
list-i
nser
t
hash
quad
-inse
rt
dsl-i
nser
t
spla
y-in
sert
aa-in
sert
tree-
inse
rt
lefth
eap-
inse
rt
heap
-inse
rt
rb-in
sert
0
0.05
0.1
0.15
0.2
0.25
No-HA
mprotxenT
ime
in s
ec
© 2009 Avaya Inc. All rights reserved. 38
inse
rt
dele
te
inse
rt
dele
te
inse
rt
dele
te
inse
rt
dele
te
inse
rt
dele
te
inse
rt
dele
te
queue list queue list queue listNoTLBFlush TLBFlush AreaFlush
0
0.01
0.02
0.03
0.04
0.05
0.06
PTxen
inse
rt
dele
te
inse
rt
dele
te
inse
rt
dele
te
inse
rt
dele
te
inse
rt
dele
te
inse
rt
dele
te
queue list queue list queue listNoTLBFlush TLBFlush AreaFlush
0
0.5
1
1.5
2
2.5
3
scanxen
© 2009 Avaya Inc. All rights reserved. 39
Operations per transaction– OPT=5 , Merging transaction
• No impact to emulation-based ones• Some slowdown for scanxen
– Merging transactions• Total # of pages written goes down effectively• PT and PTxen becomes much better than emul/emulxen• Still 13x improvement between PT and PTxen
© 2009 Avaya Inc. All rights reserved. 40
Evaluation
mprotect emul emulxen mprotxen0
0.2
0.4
0.6
0.8
1
1.2
OPT=5, 2000 Transactions
rb insert
avl insertT
ime
in s
ec
mprotect emul emulxen mprotxen0
0.2
0.4
0.6
0.8
1
1.2
OPT=1 , 10000 Transactionsrb insertavl insert
Tim
e in
se
c
© 2009 Avaya Inc. All rights reserved. 41
Bandwidth : Amount
queu
e-in
sert
list-i
nser
t
hash
quad
-inse
rt
dsl-i
nser
t
spla
y-in
sert
aa-in
sert
tree-
inse
rt
lefth
eap-
inse
rt
heap
-inse
rt
rb-in
sert
050000
100000150000200000250000300000350000400000
mprotect-based
Am
ou
nt
of
se
nt
in K
B
queu
e-in
sert
list-i
nser
t
hash
quad
-inse
rt
dsl-i
nser
t
spla
y-in
sert
aa-in
sert
tree-
inse
rt
lefth
eap-
inse
rt
heap
-inse
rt
rb-in
sert
0
1000
2000
3000
4000
5000
6000
tree-insert; 56311.34375
emul-based
Am
ou
nt
of
se
nt
in K
B
Note that tree-insert is 56311.34375 which is out of scale.
Emul-based mostly less than 2MB– No ‘diff’ process for emul-based
© 2009 Avaya Inc. All rights reserved. 42
Bandwidth : Time
queu
e-in
sert
list-i
nser
t
hash
quad
-inse
rt
dsl-i
nser
t
spla
y-in
sert
aa-in
sert
tree-
inse
rt
lefth
eap-
inse
rt
heap
-inse
rt
rb-in
sert
00.010.020.030.040.050.060.070.080.09
mprotxenmprotectscanxen
Tim
e in
se
c
-0.005
0
0.005
0.01
0.015
0.02
emulxenemul
Tim
e in
se
c
Emul-based mostly less than 5ms
© 2009 Avaya Inc. All rights reserved. 43
Bandwidth : Percentage
queu
e-in
sert
list-i
nser
t
hash
quad
-inse
rt
dsl-i
nser
t
spla
y-in
sert
aa-in
sert
tree-
inse
rt
lefth
eap-
inse
rt
heap
-inse
rt
rb-in
sert
0
10
20
30
40
50
60
70
mprotxen
Pe
rce
nta
ge
-2
0
2
4
6
8
10
12 emulxenemulmprotectscanxen
Pe
rce
nta
ge
Relatively small fraction– Except PTxen --- due to its minimum runtime
© 2009 Avaya Inc. All rights reserved. 44
Microbenchmark
scanxen
PT
PTxen
© 2009 Avaya Inc. All rights reserved. 45
emulxen
emul
scanxen
© 2009 Avaya Inc. All rights reserved. 46
16 32 48 64 80 96 112 128Transaction size (Tsize)
0
10
20
30
40
50
60mprotect wpp 4mprotxen wpp 4scanxen wpp 4emul wpp 4emulxen wpp 4
Tim
e in
se
c
emulxenPTxen
PT
emul
scanxen
© 2009 Avaya Inc. All rights reserved. 47
Microbenchmark
writes
Transactional heap Dirty pages in Transactional heap
Tstart() of PT
Tend() of PT Three separate mprotect() calls
writesTstart() of PTxen
Tend() of PTxen Single PTxen() call
© 2009 Avaya Inc. All rights reserved. 48
Main process Diff process
diffdirty page
Backup process
Network