the new vvorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · the new vvorld robert...

31
The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring Conference 2010 1 OS research and development performed by FreeBSD Project, University of Zagreb, FreeBSD Foundation, NLNet, and other contributors over a decade Still a work-in-progress, but exciting technology coming soon

Upload: others

Post on 12-May-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring

The new VVorldRobert Watson and Bjoern Zeeb

(and thanks to Marko Zec)The FreeBSD Project

UKUUG Spring Conference 2010

1

OS research and development performed by FreeBSD Project, University of Zagreb, FreeBSD Foundation, NLNet, and other contributors over a decade

Still a work-in-progress, but exciting technology coming soon

Page 2: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring

Introduction

• About virtualization

• FreeBSD Jails

• Virtualizing a kernel

• A virtualized network stack

• A few application ideas

2

2

Page 3: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring

What is virtualization?

3

• Illusion of multiple virtual X on one real X

• Virtual memory address spaces

• VLANs, VPNs, and overlay networks

• Storage volume management

• Virtual machines, OS instances

• ... you can solve any problem with another level of indirection ...

3

“... a level of indirection...”

Page 4: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring

Why virtualize?

• Sharing with the illusion of exclusive use

• Consolidation, managed overcommit

• Flexibility in implementation

• Security and robustness

• Administrative delegation

4

4

Page 5: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring

Virtualization spectrum

• Tradeoffs: scheduler integration, efficient sharing, overcommit opportunities, functionality, security/isolation, resource management, administrative delegation, ...

5

OS access control

UNIX usersSELinux

...

OS virtualization

FreeBSD JailSolaris Zones

...

Hypervisors

VMWareXen...

Physical separation

Racks and racksand racks of

machines

5

Example: With OS virtualization you get full scheduler integration, but migration very hard With Hypervisors you get really bad scheduling, but migration is relatively easy

Page 6: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring

OS virtualization

• Single OS kernel instance, many userspaces

• Safe root delegation, various constraints

• Efficient resource sharing with overcommit

• No hypervisor/virtual device overhead

• ISP virtual hosting, server consolidation, ...

6

6

Page 7: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring

History of FreeBSD Jail

7

1999

2002

2006

2007

2008

2009

Jails merged to FreeBSD 4.x

Virtualized network stack prototype

NLNet/FreeBSD Foundation fund multi-year VIMAGE development project

Jail-friendly ZFS merged from Open Solaris

Multi-IPv4/v6/no-IP patches;VNET integration starts

Hierarchical jail support; FreeBSD 8.0 with highly experimental options VIMAGE shipped

7

Earliest open source OS virtualization we’re aware of

Virtualization work has long timeline

Page 8: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring

Why change Jail?

8

• Jail is fast, efficient, secure, useful

• But, Jail “subsets” rather than “virtualizes”

• For example: employs chroot() internally

• Some resources subset poorly

• E.g., System V IPC, loopback interface, ...

• Virtualization is a functional improvement

8

Page 9: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring

Virtualizing OS services• New abstraction: the virtual instance

• Replicate global objects per-instance

• Multiplex or replicate threads, timers

• Tag subjects with virtual instances

• Consider administrative interfaces

• Examine privileges carefully

• Plan inter-instance plumbing

• How to start and stop instances?

9

9

Page 10: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring

Virtual kernel infrastructure

• Sounds complicated, but some new tools

• Virtualized global variables

• Virtualized startup/shutdown

• Virtualized sysctl MIB entries

• Virtualization-enhanced debugging

• Multiplex virtualization onto netisrs

10

10

Page 11: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring

Virtualized heap

11

Stack1

Kernel

Heap

Kernel

Regular kernel Virtualized heap

Heap2

Heap1

Stack1

Stack2 Stack2

Stack3

11

Goal: make it easy for us to take one of something and make many

Notice that same layout is used for virtual instances as original

Page 12: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring

Virtual global variables

12

• Tag selected globals as virtual in source

• Placed in different ELF section when linked

• Each VNET instance gets a copy of section

• Thread context carries VNET reference

• Globals mapped to VNET when accessed

• Can compile to regular globals if desired

12

Can compile out during development but still in tree. In fact, default today.

Also valuable for embedded.

Page 13: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring

13

VNET_DEFINE(struct inpcbhead, ripcb);VNET_DEFINE(struct inpcbinfo, ripcbinfo);

#define V_ripcb VNET(ripcb)#define V_ripcbinfo VNET(ripcbinfo)

...

voidrip_init(void){

INP_INFO_LOCK_INIT(&V_ripcbinfo, "rip"); LIST_INIT(&V_ripcb);

13

Goal: make virtualized programming natural

Page 14: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring

14

Virtualized boot

SYSINIT

VNET 0 SYSINIT

VNET 1 SYSINIT

Kernelboot

Create jailwith VNET

Load kernel module withvirtualized components

Portions of previously serialized kernel and module startup are now per-VNET

14

Tag bits of boot process that now need to be per-VNET.

Module case is interesting, and tricky.

Page 15: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring

Virtual kernel startup

• Kernel, module startup uses SYSINIT()

• Functions tagged with special ELF section

• Sorted and executed “in order”

• Used for 99% of FreeBSD kernel init

• Some events now need to be virtualized

• Add a new event set, run once per VNET

15

15

Page 16: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring

16

static voidvnet_igmp_init(const void *unused __unused){ CTR1(KTR_IGMPV3, "%s: initializing", __func__); LIST_INIT(&V_igi_head);}VNET_SYSINIT(vnet_igmp_init, SI_SUB_PSEUDO, SI_ORDER_ANY, vnet_igmp_init, NULL);

static voidvnet_igmp_uninit(const void *unused __unused){ CTR1(KTR_IGMPV3, "%s: tearing down", __func__); KASSERT(LIST_EMPTY(&V_igi_head), ("igi list not empty; detached?"));}VNET_SYSUNINIT(vnet_igmp_uninit, SI_SUB_PSEUDO, SI_ORDER_ANY, vnet_igmp_uninit, NULL);

16

Again: goal to make it natural.

Five-character change to each to say “do it virtualized”.

Page 17: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring

What to virtualize?

• Start with a virtual network stack

• Immediate demand due to Jail limitations

• Zec 2002 prototype

• Validate performance of approach

• Can parallelize over many net modules

• In the future: VIPC, ...

17

17

Page 18: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring

Virtual network stack

• Jails can have their own network stacks

• TCP/IP socket bindings, routing table, firewall, IPsec, ...

• Real/virtual interfaces belong to one stack,but may be assigned to child stacks

• Packets float between stacks as needed

• Arbitrary virtual network topologies OK

18

18

Avoid constructs that require additional copying, context switching, etc.

Page 19: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring

19

Jail 0

VNET 0

Jail 1

VNET 1

App

epair0bepair0aigb0 bridge0 igb1

App

TCP/IPTCP/IP

App

19

Applications pinned to virtual stacks.Simple case: assign an ifnet to a jail.Complex case: virtual interfaces, bridging, firewalls, ...

Page 20: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring

19

Jail 0

VNET 0

Jail 1

VNET 1

App

epair0bepair0aigb0 bridge0 igb1

App

TCP/IPTCP/IP

App

19

Applications pinned to virtual stacks.Simple case: assign an ifnet to a jail.Complex case: virtual interfaces, bridging, firewalls, ...

Page 21: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring

19

Jail 0

VNET 0

Jail 1

VNET 1

App

epair0bepair0aigb0 bridge0 igb1

App

TCP/IPTCP/IP

App

19

Applications pinned to virtual stacks.Simple case: assign an ifnet to a jail.Complex case: virtual interfaces, bridging, firewalls, ...

Page 22: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring

19

Jail 0

VNET 0

Jail 1

VNET 1

App

epair0bepair0aigb0 bridge0 igb1

App

TCP/IPTCP/IP

App

19

Applications pinned to virtual stacks.Simple case: assign an ifnet to a jail.Complex case: virtual interfaces, bridging, firewalls, ...

Page 23: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring

Really hard problem: shutting down cleanly

• We’ve been doing that for years, right?

• Actually, no -- we’ve been booting for years.

• But we’ve never shut down the network.

• We just power off and it goes away. :-)

• Now we need destructors.

20

20

Page 24: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring

Status

• FreeBSD 8.0 VIMAGE “highly experimental”

• Known memory leaks on stack shutdown

• Several known crash conditions

• Many subsystems not fully virtualized

• Foundation will shortly announce new funding for productionization work

• Goal production-quality VIMAGE in 9.1/9.2

21

21

Page 25: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring

How to give it a spin

• Update to 8-STABLE or 9-CURRENT

• Compile kernel with “options VIMAGE”

• Simple example:

jail -ci vnet path=/jail command=/bin/cshifconfig vlan100 vnet <id>

• http://wiki.freebsd.org/Image

• WARNING: EXPERIMENTAL

22

22

Page 26: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring

A few applications

• Network routing research

• Parallel overlay networks

• Large-scale hosting

23

23

Page 27: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring

Routing simulation

24

Host

Jail

VNETJail

VNET Jail

VNET

Jail

VNET

Jail

VNETJail

VNET

Jail

VNETJail

VNET

Jail

VNET

Jail

VNET

Jail

VNETJail

VNETJail

VNET

Jail

VNET

VNET

Trivially simulate thousands of nodes with arbitrary topologies and fully functional, independent network stacks

24

Page 28: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring

Virtualized overlay infrastructure

25

Host 1

VNET 0

VNET 1

VNET 2

Host 2

VNET 0

VNET 1

VNET 2

Host 3

VNET 0

VNET 1

VNET 2

Host 4

VNET 0

VNET 1

VNET 2

Host 5

VNET 0

VNET 1

VNET 2

Multiple network stacks allow router/bridge/VAP nodes to implement complex policies using minimal hardware

25

Page 29: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring

Large-scale hosting

26

Host

VNET 0

Jail 1

VNET 1

VLAN 101

TCP/IP

FW IPSEC

Jail 2

VNET 2

VLAN 102

TCP/IP

FW IPSEC

Jail 3

VNET 3

VLAN 103

TCP/IP

FW IPSEC

Jail 4

VNET 4

VLAN 104

TCP/IP

FW IPSEC

...

igb0

Jails each have their own fully delegated connection tables, routing tables, firewalls, IPsec, ...

26

Nested jails, with and without VNETs

Page 30: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring

Some other ideas

27

• Efficient server consolidation

• 500K memory overhead vs. 256M+ VM

• Virtualized appliances

• Multi-instance appliances, such as file stores, firewalls, filters, ...

• Neat: Debian/kFreeBSD on VNETs

27

2MB with ZFS or nullfs providing efficient storage, before applications

Page 31: The new VVorld - watson.orgrobert/freebsd/2010ukuug/20100324-newvvorld.pdf · The new VVorld Robert Watson and Bjoern Zeeb (and thanks to Marko Zec) The FreeBSD Project UKUUG Spring

Conclusion

• Virtual kernel features, such as a virtual network stack, finally becoming a reality

• Prototype operates with increasing stability and little performance overhead

• Adds to virtualization menu; can be combined with other techniques like Xen

• Coming soon(ish)...

28

28

Cleverly, we are able to take advantage of many virtualization-centric hardware optimizations.

For example: MAC address filtering with RSS.