an efficient process live migration mechanism for load balanced distributed virtual environments...
TRANSCRIPT
An Efficient Process Live Migration Mechanism forLoad Balanced Distributed Virtual Environments
Balazs Gerofi, Hajime Fujita, Yutaka IshikawaYutaka Ishikawa Laboratory
The University Of Tokyo
IEEE Cluster2010
Outline
• Motivation• Cluster Server Architecture • DVE Software Components• Process Live Migration • Multiple Socket Migration Optimizations• Dynamic Load Balancing• Evaluation• Conclusion
IEEE Cluster2010
Motivation• In Distributed Virtual Environments (DVE):
– Massively Multi-player Online Games (MMPOG)– Networked Virtual Environments (NVE) – Distributed Simulations such as the High-Level Architecture (HLA)
• 10,000 ~ 100,000 of clients may be involved• Cluster of servers is used for providing services on large scale
– Zoning (i.e., partitioning the virtual space among servers)• Main limitations of application level load-balancing:
– Client migrations are heavy, server state needs to be transferred, client(s) reconnect, etc..
– Physical machine limited to neighboring zones• Is operating system level load-balancing feasible?
– Server processes are highly interactive– Maintain a massive amount of network connections (clients)– Maintain connections with other in-cluster components– How to migrate such processes?
IEEE Cluster2010
Outline
• Motivation• Cluster Server Architecture • DVE Software Components• Process Live Migration • Multiple Socket Migration Optimizations• Dynamic Load Balancing• Evaluation• Conclusion
IEEE Cluster2010
Cluster Server Architecture
IEEE Cluster2010
• Each DVE server is equipped with a public and a private interfaces, same IP address is assigned to the public interfaces
• Router broadcasts incoming packets to all DVE server nodes– Migrating zone server processes does not require any work on the router!
• Zone server processes are distinguished based on separate port numbers (as opposed to separate IP addresses)
Outline
• Motivation• Cluster Server Architecture • DVE Software Components• Process Live Migration • Multiple Socket Migration Optimizations• Dynamic Load Balancing• Evaluation• Conclusion
IEEE Cluster2010
Server Node Software Components
IEEE Cluster2010
• mig_mod: migration module with live migration and socket support (extension of Berkeley C/R module)• cap_trans_mod: packet capturing and address translation kernel module (detailes in paper)• transd: translation daemon• migd: migration daemon• cond: load monitor and load balancer• zone_serv: zone server processes
Linux kernel
mig_mod
cap_trans_mod
zone_serv1 zone_servn
migd
transd
cond …
Outline
• Motivation• Cluster Server Architecture • DVE Software Components• Process Live Migration • Multiple Socket Migration Optimizations• Dynamic Load Balancing• Evaluation• Conclusion
IEEE Cluster2010
source host destination host
Process Image
Process Live Migration
network
IEEE Cluster2010
network sockets
source host destination host
Process Image Process Image
Process Live Migration
network
IEEE Cluster2010
network sockets
Transfer the whole process image in the background without stopping the execution
source host destination host
Process Image Process Image
Process Live Migration- dirty memory page
network
IEEE Cluster2010
network sockets
Track dirty pages for a certain period, process is still being executed
source host destination host
Process Live Migration- dirty memory page
network
IEEE Cluster2010
network sockets
Stop process (freeze phase), transfer dirty memory, export network connections and transfer data to destination
Process Image
source host destination host
Process Live Migration
network
IEEE Cluster2010
network sockets
Apply changes and resume execution
Process Image
Note: main goal is short process freeze time!
Outline
• Motivation• Cluster Server Architecture • DVE Software Components• Process Live Migration • Multiple Socket Migration Optimizations• Dynamic Load Balancing• Evaluation• Conclusion
IEEE Cluster2010
source host destination host
Process Image Process Image
Iterative socket migration (during process freeze phase)
- dirty memory page
network
IEEE Cluster2010
network sockets
Incoming packet loss prevention!
source host destination host
Process Image Process Image
Iterative socket migration (during process freeze phase)
- dirty memory page
network
IEEE Cluster2010
network sockets
Extract remote IP and port number, set up a filter at the destination node to capture incoming packets and disable socket
source host destination host
Process Image Process Image
Iterative socket migration (during process freeze phase)
- dirty memory page
network
IEEE Cluster2010
network sockets
Migrate socket data to destination node
source host destination host
Process Image Process Image
Iterative socket migration (during process freeze phase)
- dirty memory page
network
IEEE Cluster2010
network sockets
Inject any packets that were captured on the destination node and attach socket to the process
source host destination host
Process Image Process Image
Iterative socket migration (during process freeze phase)
- dirty memory page
network
IEEE Cluster2010
network sockets
source host destination host
Process Image Process Image
Iterative socket migration (during process freeze phase)
- dirty memory page
network
IEEE Cluster2010
network sockets
source host destination host
Process Image Process Image
Iterative socket migration (during process freeze phase)
- dirty memory page
network
IEEE Cluster2010
network sockets
Note: requires several synchronization steps with short writes following each other!
source host destination host
Process Image Process Image
Collective socket migration (during process freeze phase)
- dirty memory page
network
IEEE Cluster2010
network sockets
source host destination host
Process Image Process Image
Collective socket migration (during process freeze phase)
- dirty memory page
network
IEEE Cluster2010
network sockets
Extract remote IP and port number for all sockets, set up filters to capture incoming packets and disable sockets
source host destination host
Process Image Process Image
Collective socket migration (during process freeze phase)
- dirty memory page
network
IEEE Cluster2010
network sockets
Extract socket data into one unified buffer and transfer everything in one go
source host destination host
Process Image Process Image
Collective socket migration (during process freeze phase)
- dirty memory page
network
IEEE Cluster2010
network sockets
Attach sockets, inject packets.
Note: the amount of socket data transferred canbe still large!
source host destination host
Process Image Process Image
Incremental collective socket migration (during dirty-log phase)
- dirty memory page
network
IEEE Cluster2010
network sockets
source host destination host
Process Image Process Image
Incremental collective socket migration (during dirty-log phase)
- dirty memory page
network
IEEE Cluster2010
network sockets
All socket data are transferred asynchronously and tracking structures are allocated for each connection
network sockets
source host destination host
Process Image Process Image
Incremental collective socket migration (during dirty-log phase)
- dirty memory page
network
IEEE Cluster2010
network sockets
Some pages are dirtied and some sockets’ state change are detected
network sockets
source host destination host
Process Image Process Image
Incremental collective socket migration (during dirty-log phase)
- dirty memory page
network
IEEE Cluster2010
network sockets
Dirty pages transferred and modified sockets’ state are updated, tracking loop timeout is decreased
network sockets
source host destination host
Process Image Process Image
Incremental collective socket migration (during dirty-log phase)
- dirty memory page
network
IEEE Cluster2010
network sockets
When number of dirty pages or tracking timeout goes below a pre-defined limit, enter process freeze phase
network sockets
source host destination host
Process Image Process Image
Incremental collective socket migration (during dirty-log phase)
- dirty memory page
network
IEEE Cluster2010
network sockets
Transfer dirty pages and set up packet capture filter
network sockets
source host destination host
Process Image Process Image
Incremental collective socket migration (during dirty-log phase)
- dirty memory page
network
IEEE Cluster2010
network sockets
Update sockets that have changed in the last iteration and disable sockets on the source machine
network sockets
Note: transferred socket data in freeze phase is much less than the overall socket representation!
source host destination host
Process Image Process Image
Incremental collective socket migration (during dirty-log phase)
- dirty memory page
network
IEEE Cluster2010
Inject packets and re-enable sockets on the destination machine
network sockets
Outline
• Motivation• Cluster Server Architecture • DVE Software Components• Process Live Migration • Multiple Socket Migration Optimizations• Dynamic Load Balancing• Evaluation• Conclusion
IEEE Cluster2010
Dynamic Load Balancing• Decentralized middleware• Load balancing is sender initiated performing a hand-shake with
the receiver• Transfer policy:
– Threshold driven (if load exceeds a certain value)• Location policy:
– Based on knowledge of load on the rest of the nodes, preferring a node that is on the opposite side of the cluster load average
• Selection policy:– Prefers a process that consumes as much CPU power as much the
difference between the given node’s load and the cluster load average• Information policy:
– Periodic policy, nodes broadcast their load
IEEE Cluster2010
Outline
• Motivation• Cluster Server Architecture • DVE Software Components• Process Live Migration • Multiple Socket Migration Optimizations• Dynamic Load Balancing• Evaluation• Conclusion
IEEE Cluster2010
Evaluation: experimental framework
• Dedicated single IP address cluster• 5 DVE server nodes + a MySQL server• 2.4GHz Dual-Core AMD Opteron• 2 GB RAM• Gigabit Ethernet for both in-cluster and public
network
IEEE Cluster2010
Evaluation: OpenArena server• OpenArena is an open-source multi-player online game
based on the Quake III engine [1]• Uses UDP for client-server communication• ~20 messages (updates) per second• Live migrated when 24 clients were participating in a
session
IEEE Cluster2010
[1] http://openarena.ws/smfnews.php
• Based on tcpdump’s result on the client machines ~25ms service downtime due to migration
Evaluation: DVE simulation• DVE simulation with communication characteristics resembling
real-world MMOPGs using TCP connections• Client state update: 20 msgs / sec, 256~512 bytes message size [2]• DVE server processes maintain MySQL to local DB server• CPU consumption grows proportionally with number of clients in a
given zone, 10,000 clients involved• Virtual space consists of 10x10 zones, each DVE server node is
assigned to 20 zones initially
IEEE Cluster2010
[2] Traffic characteristics of a massively multi-player online role playing game, NetGames’05
• 15 minutes simulation during which clients are instructed to move to the up-left and bottom-right corner of the virtual space
• Files are assumed to be available on each node
Live migration process downtime
IEEE Cluster2010
16 32 64 128 256 512 10240
20406080
100120140160180200
Iterative Collective Incremental collective
Number of TCP connections
Proc
ess
dow
ntim
e (m
s)
Socket data transferred during process freeze phase
IEEE Cluster2010
16 32 64 128 256 512 10240
500
1000
1500
2000
2500
3000
3500
4000
Iterative / Collective Incremental collective
Number of TCP connections
Sock
et d
ata
tran
sfer
red
(byt
es)
Load distribution during simulation without load balancing
IEEE Cluster2010
• node1, node2 and node5 becomes overloaded when clients move to zones maintained by these nodes
Load distribution during simulation with load balancing
IEEE Cluster2010
• Load stays balanced throughout the simulation
Number of zone server processes on each node during the simulation
IEEE Cluster2010
• Lighter processes are migrated over to node3 and node4 in order to balance the overall load of the system
Outline
• Motivation• Cluster Server Architecture • DVE Software Components• Process Live Migration • Multiple Socket Migration Optimizations• Dynamic Load Balancing• Evaluation• Conclusion
IEEE Cluster2010
Conclusion• Process live migration– Optimizations for migrating a massive amount network
connections– No modifications to the TCP protocol or to the client side
network stack• Dynamic load balancing engine exploiting process live
migration• DVE simulation for demonstrating load balancer and live
migration• Other possible scenarios:– Fault tolerance (IEEE NCA2010)– Power management
IEEE Cluster2010
Thank you for your attention!Questions?
IEEE Cluster2010
Related Work• Connection Migration:
– NEC’s distributed Web Server arch: each session has its own virtual IP address
– SockMi, Tcpcp: TCP migration with IP layer forwarding, don’t decouple the process from the source machine
– TCP Migrate option: extension to the TCP protocol• Process migration and incremental checkpointing:
– V-System, Amoeba, Mach, Sprite, MOSIX – limited connection migration support
– BLCR: no support for connection and incremental checkp.– Zap’s VNAT: support required on client side as well
• Load balancing DVEs:– Several studies addressing application level solutions– MOSIX: home-node approach leaves residual dependencies
IEEE Cluster2010