enhanced live migration for intensive memory loads in the cloud

22
© 2015 SAMSUNG Electronics Co. Mario Smarduch Senior Virtualization Architect Open Source Group Samsung Research America (Silicon Valley) [email protected]Smarduch Senior Virtualization Architect Open Source Group Samsung Research America (Silicon Valley) [email protected] Enhanced Live Migration For Intensive Memory Loads

Upload: nguyendan

Post on 12-Feb-2017

217 views

Category:

Documents


1 download

TRANSCRIPT

© 2015 SAMSUNG Electronics Co.

Mario SmarduchSenior Virtualization ArchitectOpen Source GroupSamsung Research America (Silicon Valley)[email protected]

Senior Virtualization ArchitectOpen Source GroupSamsung Research America (Silicon Valley)[email protected]

Enhanced Live Migration ForIntensive Memory Loads

2 © 2015 SAMSUNG Electronics Co.

Agenda

• Cover Enhanced Live migration on ARMv8,7• General Walkthrough live migration• Enhancement to handle higher dirty page rate• Validating Destination

3 © 2015 SAMSUNG Electronics Co.

Machine Model & Migration

Guest

QEMU/KVM vCPU – ARMv7/v8 vCPUGPRs/SPRs vCPU – ARMv7/v8GPRs/SPRsLocal Interrupt Ctrlr. Local Interrupt Ctrlr.IO-Bridge RAMFlash UART USB Keyboard/Mouse RTC WDT Timer/Clocksource virtio-net LCDControllerExternalInterruptController ……….

Local Interrupt BusBSP/HAL

Drivers Interrupts Timers Clocks

Kernel

TLB Caches Exceptions Scheduler Synchronization Timers Sockets TCP/IP VFS Memory Management

User Space

• Machine Model & State to Migrate� Extremely complex software

4 © 2015 SAMSUNG Electronics Co.

Type of Devices to Migrate• Type of devices to migrate

� MMIO

� Virtio

Guest OSQEMU

Kernel

pcnet

driver

NIC QEMU

Backend

/dev/net/tun

tap0

192.168.2.1

192.168.2.100

ssh

clientssh

server

KVM

MMIO TCP/IP

Guest OS

QEMU

Kernel

Virti-net driver

QEMU

Backend

/dev/net/tun

tap0192.168.2.1

192.168.2.100

ssh

client

ssh

server Tx-VirtQ

Rx-VirtQ

Desc

Avail

Used Desc

Avail

Used

Desc

Avail

Used

Desc

Avail

Used

• NIC – conifuguration• F.E.

� – cmd regs, fifo_used, len� - irq level

• Vring num, last idx available• Promisc, guest offloads• Status, macs, GPA of ring buffers

5 © 2015 SAMSUNG Electronics Co.

• CPU State� GPRs – x0-x30, vfp/simd� SPRs – SCTLR, Memory Attr., CPU Affinity, MMU PGD pointer, …

• Interrupt Controller� Num of IRQs – IO Interrupt contoller� Levels, CPU targets� Local Interrupt Ctrl. Reg, Prio. Mask Reg,

• Timers� Virtual counter offset� Current counter� Pending timer events

Type of Devices to Migrate

6 © 2015 SAMSUNG Electronics Co.

Who writes to memory?

• Guest – via shadow page tables• QEMU

� Virtio devices• Host KVM – some features

� Async PF – injects page faults – writes to guesto Guest mis – overlap

� PV-EOI –o Writes to Guest – don’t do EOI

• Mark all accesses� From all source

7 © 2015 SAMSUNG Electronics Co.

Migration State

QEMU/KVM vCPU vCPUvCPUIO-Bridge RAMFlash ……….Local Interrupt BusBSP/HALKernelUser Space QEMU/KVM vCPU vCPUvCPUIO-Bridge RAMFlash ……….Local Interrupt Bus

BSP/HALKernelUser SpaceSource VM Dest VM

• Dest QEMU � Boots Guest to identical default state – see ‘-incoming’ & QEMU_OPTION_incoming� In Virgin state� Must migrate – source state changes to dest (will see how next)

• All State needs migration – abstracted differently� MMIO device defines VMStateDescrption� Live state GIC, Virtio – stream state to destination � Memory – complex

o Several stages – setup, iteration, pending, complete

8 © 2015 SAMSUNG Electronics Co.

Migration State Declaration• How do we abstract device migration state?

� Simple Timer Example - staticstatic const VMStateDescription vmstate_sp804 {

.name = …; version_id = ..;

.fields = ….

. VMSTATE_INT32_ARRAY(_f=level, _s=SP804State, _n=2) � put_int32()

� Virtio-net example – dynamic tx & rx dev migration statevirtio_net_save(QEMUFile *f, …)qemu_put_8s(f, &vdev->status);qemu_put_8s(f, &vdev->status);….

� RAM Example – special caseo Handlers.save_live_setup = …..save_live_iterate = …..save_live_complete

9 © 2015 SAMSUNG Electronics Co.

RAM Structures• RAMBlock

� All mmap() regions

0 0x200000000x20000000 0x100000x20010000 0x40000000x24010000 0x40000000x28010000 0x20000000x2a010000 0x800000

r/w ramOffset Size

r/w ram

r/w ramr/w ram

• MemoryRegion� All regions – mmio, ram, flahsGPA ram_addr size

0x80000000 0x0 0x200000000x8000000 0x20010000 0x40000000xC000000 0x24010000 0x40000000x14000000 0x28010000 0x20000000x18000000 0x2a010000 0x8000000x2e000000 0x20000000 0x10000

• Create migration_bitmap – using RAMBlock last offset• MemoryRegion – ram_addr to index

• Also used for GVA � GPA and GPA � GVA• For migration – iterate RAMBlock, use MR – ram_addr to index dirty bit map

10 © 2015 SAMSUNG Electronics Co.

Live Migration High Level• Devices to Migrate

� Flow - (1) WP (2) fault (3) mark dirty log (4) scan (5) WP again� ram, cpu, int ctlr., audio, sd-card, timers, …..� Start with memory first – after completed do devices� High Level Migration –

o Setup Stage• Connect to peer, exchange versions• Determine size pages to migrate, dirty bitmap• Enable KVM dirty page logging

o Iteration Stage• Establish bandwidth, downtime – user configurable• Iterate – walk dirty bitmap – copy to peer• Cover RAMBlock –

� if remaining RAM < bandwidth * downtime – move to completeo Completion Stage

• Copy remaining ram• Go through remaining vmstate_handlers – sync state

o Peer ready to run� Migration challenge

o Dirty page rate – primarily Guest memory access

11 © 2015 SAMSUNG Electronics Co.

Live Migration More DetailSetup:- Connect to Peer, Validate compativility- Walk RAMBlock List find last page

- Create ‘migration_bitmap’ – 696,384 bits for 1k page- Assume all memory dirty ‘migration_dirty_pages’ – 696,384 - Enable KVM Dirty Page Logging – ARM 4k/64k

migration_bitmap_sync()- for each writable memory region – get dirty bitmap (kvm_state)- update to ‘migration_bitmap’ – convert page sizes- increment ‘migration_dirty_page’ – but not in init

Iterate:- xfer rate < bandwidth- migrate_bitmap_sync()- if remaining dirty memory < bandwidth * downtime- move to completion

- Iterate over RAMBlock list - migration_bitmap_find_and reset_dirty(block->mr, offset)- migration_dirty_bytes--;

- send page to peer- throttle

Completion:- ram_save_complete() – copy remaing ram- Iterate savevm_handlers – use ‘ops’ – stream, ‘vmsd’ – static - transfer remaining state

12 © 2015 SAMSUNG Electronics Co.

Live Migration Configurables• xfer_rate < bandwidth

� Bandwidth – default 3.2 Mbytes/s can – update QEMU� Governs rate of dirty memory scanning

• Downtime – bandwidth * downtime� Governs – convergence � greater downtime – greater service unavailability� Can update in QEMU (for better or worse)

• Expected Downtime� Based on dirty memory rate / bandwidth� As dirty memory rate increases so does downtime

• To speed up migration• Increase network bandwidth• Increste downtime• Optimize Memory migration - dirty page tracking, compress

13 © 2015 SAMSUNG Electronics Co.

Huge Pages and performance• Huge Pages

� Guest Page faults very expensive – 6.5x penalty

� Limit 2nd stage faults� Shorten page table walks� A must in virtualized env. � Depending on IPTW cache – 7% - 32% degradation

• Also depending on workload

Root Pointer PGD PUD PMD PTE Tbl Offset

45 5 55

Guest Virtual Address

14 © 2015 SAMSUNG Electronics Co.

Relationship Between Page Tables

Host/QEMU

Huge Pages

2nd stage

Page tables

1st stage

Page Tables

• Guest, Nested, Host page tables involved� THP, hugetlbfs, …� KVM maps in huge 2nd stage entries� ARMv8 4kb pg – 2MB; 64kb page – 512MB� One huge page TLB – covers 512 TLB entries

15 © 2015 SAMSUNG Electronics Co.

Migration & Huge Pages

Host/QEMU

Huge Pages

2nd stage

Page tables

1st stage

Page Tables

• Huge page granularity to large

• Write to 4k/64k page – assume huge page dirty� We WP – only know about 1st write� Near idle guest – dirtying pages – not migrateble� But we want the performance of huge pages

16 © 2015 SAMSUNG Electronics Co.

Run-time dissolve huge pages• Dissolve 2nd stage huge pages during migration

• 2nd stage handler – lazy dissolve� For most part ‘arch/arm/kvm/mmu.c – 4.x kernel

o arm64/arm – code shared� Discover fault in huge page backed by Host� Shoot down mapping� Future page faults force small pages� Track 4k page granules

Host/QEMU

Huge Pages

2nd stage

Page tables

1st stage

Page Tables

Huge Page2MB2MB ���� 512 4KB pages

17 © 2015 SAMSUNG Electronics Co.

Run-time dissolve huge pages• Once migrated – go back to huge pages

� Get the performance back

Host/QEMU

Huge Pages

2nd stage

Page tables

1st stage

Page Tables

Huge Page 2MB2MB huge page• Huge Page dissolve – degrades performance …

� But we want to migrate and quick� We don’t dissolve Read-Only pages

18 © 2015 SAMSUNG Electronics Co.

Performance Numbers• 256MB memory mark huge page dirty

256 MB

.

.

.

2MB HugePage• With huge page won’t auto converge – 1GigE

� 128 * 2MB = 2Gbps� xfer_rate < downtime * bandwidth� Can dirty one byte or whole huge page

• With huge page dissolve � ~20K - 4k pages/second� System loaded lightly – dirty memory periodically� See – https://github.com/mjsmar/arm-dirtylog-tests� ARMv8 – QEMU support added, ARMv7 for quite a while

19 © 2015 SAMSUNG Electronics Co.

Accuracy Testing• Should trust image on destination

� Several sources write to guest memory� Validate destination identical to source

.

.

....

Walk ALL RAMBlocks

….• Destination = initial iteration + delats• Delta – Guest, QEMU, Host – several sources• After convergence

� Checksum source & destination – must match� Generic approach – associate checksum with each page

o For repeated pages update checksum� Discover latent memory errors

20 © 2015 SAMSUNG Electronics Co.

Other • Libvirt/Openstack

� Enabled by default – nothing to be done• NFV in mind – lots of memory data bases• … But any memory intensive workload• Configurations supported

� ARMv8 - 4kb, 64kb Guest kernel � 4kb, 64kb hosto Huge page 2MB, 512MB

� ARMv7 – 4kb Guest � 4kb hosto Huge page 2MB

© 2015 SAMSUNG Electronics Co.

Q & A

Enhanced Live Migration ForIntensive Memory Loads

Thank you.Mario Smarduch

Senior Virtualization Architect

Open Source Group

Samsung Research America (Silicon Valley)

[email protected]