kvm 在 openstack 中的应用 - files.meetup.comfiles.meetup.com/10602292/kvm features.pdf ·...
TRANSCRIPT
Architecture Overviewnova-api
nova-scheduler
nova-conductor
nova-compute
libvirt driver
libvirt
qemu driver
Qemu/KVM
DBRPC Call
QMP Monitor
REST API
Storage disk driverdisk driver tapdisk driver
routerswitch
Network
Cinder Neutron
Cgroup
● Weight– quota:cpu_shares
– No hard limit
● Bandwidth Control– quota:cpu_period
– quota:cpu_quota
– Can't exceed 'quota' ms
in a period
Root
Gold Silver
3072 2048
A B C D
1024 1024 10241024
30% 30% 20% 20%
60% 40%
100%
CPU topology
Core 0
L1 I
cpu0
L2
L1 D
cpu1
L3
Core 1
L1 I
cpu2
L2
L1 D
cpu3
Core 2
L1 I
cpu4
L2
L1 D
cpu5
Core 3
L1 I
cpu6
L2
L1 D
cpu7
Socket 0
Memory Memory
Core 0
L1 I
cpu8
L2
L1 D
cpu9
L3
Core 1
L1 I
cpu10
L2
L1 D
cpu11
Core 2
L1 I
cpu12
L2
L1 D
cpu13
Core 3
L1 I
cpu14
L2
L1 D
cpu15
Socket 1
NUMA Node 0 NUMA Node 1
Local LocalRemote Remote
vCPU topology
● Benefit– Remove licensing restrictions
– Improve performance by working with vcpu pinning
● Implemented in Juno* hw:cpu_sockets=NN - preferred number of sockets to expose to the guest* hw:cpu_cores=NN - preferred number of cores to expose to the guest* hw:cpu_threads=NN - preferred number of threads to expose to the guest* hw:cpu_max_sockets=NN - maximum number of sockets to expose to the guest* hw:cpu_max_cores=NN - maximum number of cores to expose to the guest* hw:cpu_max_threads=NN - maximum number of threads to expose to the guest
vNUMA
● Benefit
– increase the effective utilization of compute resources● Implemented in Juno
– virt-driver-numa-placement.rst
* hw:numa_nodes=NN - num of NUMA nodes to expose to the guest.* hw:numa_mempolicy=preferred|strict - memory allocation policy* hw:numa_cpus.0=<cpu-list> - mapping of vCPUS N-M to NUMA node 0* hw:numa_cpus.1=<cpu-list> - mapping of vCPUS N-M to NUMA node 1* hw:numa_mem.0=<ram-size> - mapping N GB of RAM to NUMA node 0* hw:numa_mem.1=<ram-size> - mapping N GB of RAM to NUMA node 1
● Qemu and libvirt dependencies
-object memory-ram,size=1024M,policy=bind,host-nodes=0,id=ram-node0 \ -numa node,nodeid=0,cpus=0,memdev=ram-node0
Other Features
● vCPU Pinning– Approved in Kilo: virt-driver-cpu-pinning.rst
– Dedicated CPU
– Forbid overcommit of CPU
● vCPU hotplug– 'live-resize' proposed, but not approved yet.
– virsh command● setvcpus domain count –live
– Auto oneline new vcpu in guest● udev rule● Guest agent
Physical memory virtualization
● Guest physical memory is mapped into qemu virtual address space
● Mapping is maintained in memory slots
● Qemu use malloc or mmap to allocate memory
● Reuse kernel memory feature
– Overcommit
– Hugepage
– KSM
Memory Hugepage
● Approved in Kilo: virt-driver-large-pages.rst● Benefit
– increase TLB hit ratio
– less page table footprint
● Why not THP?– No hard guarantees
Memory Balloon (1)
● Balloon device is added by default● Missing “Overcommit Manager”
● Memory Overcommit
DeflateQemu
InflateQemuQemu Qemu
Guest Guest Guest
Memory Balloon (2)
● Guest Memory Stats Query– More detailed and accurate
– Re-enabled by polling instead of asynchronous
– Not real time
– Nova support available in Juno● CONF.libvirt.mem_stats_period_seconds
– Ceilometer support available in Kilo
Qemu
Balloon Thread
MemoryStats
Polling
Client
fetch last update synchronously
Guest
Memory Hotplug
● Added in qemu 2.1● Libvirt support is under development● Qemu commands
● Auto online via udev
(qemu) object_add memory-backend-ram,id=ram1,size=1G(qemu) device_add pc-dimm,id=d1,memdev=ram1
SUBSYSTEM=="cpu", ACTION=="add", TEST=="online", ATTR{online}=="0", ATTR{online}="1" SUBSYSTEM=="memory", ACTION=="add", TEST=="state", ATTR{state}=="offline", ATTR{state}="online"
Storage Architecture
● Frontend– IDE, SCSI, Virtio
● Image format– Raw, Qcow2, QED, VMDK
● Backend– File, host, ceph, glusterfs,
sheepdog, iscsi
Cache ModeCache Mode Host Page Cache Guest Disk Cache Semantics
none No Yes direct
directsync No No direct+ flush
writeback Yes Yes
writethrough Yes No writeback + flush
nosafe Yes Yes, but flush is ignored writeback “- flush”
direct = O_DIRECT flush = fdatasync or fsync● Configuration– disk_cachemodes=”file=directsync,block=none”
● Is writeback safe?– data lost on power failure
– data corruption● Guest FS barrier● Live migration
I/O throttling
● Why not cgroup?● Exposed by Cinder qos spec● Currently missing online update support● New version qemu re-implements throttling
based on leaky bucket– Support burst
● Missing cluster-level I/O throttling
Discard
● Return freed blocks to the storage● Two underlying specifications
– ATA TRIM Command
– SCSI UNMAP
● Nova configuration– hw_disk_discard=unmap
– Image metadata hw_scsi_model=virtio-scsi
● Issued from guest– fstrim,
– mount option '-o discard'
● Supported in file,qcow2,rbd,glusterfs,sheepdog,iscsi
Virtio SCSI
● vHBA● Improve scalability● Enable advanced SCSI
features● Recognized as 'sda', not vda● vhost-scsi
– Better performance
– No format driver support
– Disallow live migration
Other features
● Snapshot– quiesced-image-snapshots-with-qemu-guest-
agent.rst
● driver-mirror– storage live migration
● Multi-queue virtio-disk
Network
● Vhost-net– Less context switch
– Zero-copy transmit
● Vhost-net + macvtap +sriov– Live migration
● Multi-queue virtio NIC– Scale performance with vCPU increase
● Vhost-user– Approved in Kilo
– Userspace equivalent of vhost-net
– Used with userspace switch
Reference● http://www.slideshare.net/meituan/kvmopt-osforce-27669119
● http://www.linux-kvm.org/wiki/images/7/7b/Kvm-forum-2013-openstack.pdf
● http://www.linux-kvm.org/wiki/images/f/f6/01x07a-Vhost.pdf
● http://www.virtualizemydc.ca/2014/01/26/understanding-vnuma-virtual-non-uniform-memory-access/
● http://www.searchtb.com/2012/12/%E7%8E%A9%E8%BD%ACcpu-topology.html
● http://www.virtualopensystems.com/en/solutions/guides/snabbswitch-qemu/
● http://log.amitshah.net/wp-content/uploads/2014/11/virt-6-7-centos-dojo.pdf