toro, a dedicated kernel for microservices · continuous integration: 1 sec to re-deploy a...

49
Toro, a Dedicated Kernel for Microservices Matias E. Vara Larsen Silicon-Gears Cesar Bernardini Barracuda Networks

Upload: others

Post on 12-Jun-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Toro, a Dedicated Kernel for MicroservicesMatias E. Vara Larsen

Silicon-GearsCesar Bernardini

Barracuda Networks

Page 2: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Microservice #0

Logging

Microservice #1

Order

Microservice #2

Catalog

What is a microservice?

Logging

Order

Catalog

1. Monolithic

2. Microservices

Page 3: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Bare-metal host

Hypervisor

VM Context #0 VM Context #1

Microservice #0

Service Instance per Virtual Machine (VM)

Microservice #1

Page 4: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Bare-metal host

Hypervisor

VM Context #0 VM Context #1

General Purpose Guest OS

Microservice #1

Scheduler FileSystem

DriversNetworking

General Purpose Guest OS

Scheduler FileSystem

DriversNetworking

Each VM requiresits own OS

Microservice #0

Page 5: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Bare-metal host

Hypervisor

General Purpose Guest OS

Microservice #1

Scheduler FileSystem

DriversNetworking

General Purpose Guest OS

Scheduler FileSystem

DriversNetworking

VM Context #0 VM Context #1

Microservice #0

The use of VMs to host microservices allows

to isolate different services

Page 6: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Bare-metal host

Hypervisor

VM Context #0 VM Context #1

Microservice #0

General Purpose Guest OS

Microservice #1

Scheduler FileSystem

DriversNetworking

General Purpose Guest OS

Scheduler FileSystem

DriversNetworking

VMs take long timeto get up and run

The creation andstorage of VMsare not simple

Too much complexityfor a single purpose

usage

Limited numberof VM instances

Page 7: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Bare-metal host

Hypervisor

VM Context #0 VM Context #1

Microservice #0

General Purpose Guest OS

Microservice #1

Scheduler FileSystem

DriversNetworking

General Purpose Guest OS

Scheduler FileSystem

DriversNetworking

VMs take long timeto get up and run

The creation andstorage of VMsare not simple

Too much complexityfor a single purpose

usage

Limited numberof VM instances

Toro is simple kernel that allows microservices to run efficiently in VMs thus levering the strong isolation VMs provide.

Page 8: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Ingredients for ToroKernel● User application within the kernel● Cooperative Scheduler● Dedicated Resources● Single Thread Event Loop Networking

Page 9: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

User application within the Kernel● In general purpose OS, the user application executes as a process in the

less privileged mode, e.g., ring3 in x86.

● The communication from the user application to the kernel relies on syscalls

● Context switch needs to switch from kernel mode to user mode, e.g., interruption and scheduling

Page 10: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

User application within the Kernel● In general purpose OS, the user application executes as a process in the

less privileged mode, e.g., ring3 in x86.

● The communication from the user application to the kernel relies on syscalls

● Context switch needs to switch from kernel mode to user mode, e.g., interruption and scheduling

This is too expensive!Can we do it better

for a single purpose kernel?

Page 11: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

User application within the KernelProposal

1. Run kernel and user application in the most privileged level.

2. Rely on the hypervisor to isolate the context of each VM

3. Use a flat memory model that is shared by the kernel and the user application

4. Use only threads

5. Provide a simple kernel API dedicated to microservices

Page 12: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Toro Kernel

Devices

Microservice

Uses

Filesystem Memory

Networking

Toro.elf

Only the neededcomponents are

included

Page 13: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Toro Kernel

Devices

Microservice

Uses

Image

Filesystem Memory

Networking

Builder

The generated image is Immutable, i.e., the generated image can be used across different hypervisors without the need of recompile it.

Toro.elf

Page 14: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Toro Kernel

Devices

Microservice

Uses

Image

Filesystem Memory

Networking

Builder

CloudIt

Uses

Launches

VM

User appKernel Free mem

Single address memory image

Toro.elf

Page 15: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Ingredients for ToroKernel● User application within the kernel● Cooperative Scheduler● Dedicated Resources● Single Thread Event Loop Networking

Page 16: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Cooperative Scheduler● In a General Purpose OS, the scheduler is in charge to distribute the CPU

time for each process

● When a task has consumed its time, the scheduler switches to the next ready task

● The scheduler relies on a timer

Page 17: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Cooperative Scheduler● In a General Purpose OS, the scheduler is in charge to distribute the CPU

time for each process

● When a task has consumed its time, the scheduler switches to the next ready task

● The scheduler relies on a timer

Can we do it betterfor a single purpose kernel?

Page 18: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Cooperative SchedulerProposal

1. Cooperative scheduler, i.e., each thread decides when to yield the CPU

2. Simple scheduler, i.e., the scheduler chooses the first thread in ready state

2. One scheduler per core

3. Remote creation of threads by relying on a lock-free algorithm

Page 19: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Order

DataBase Microservice

Threads

Page 20: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Order

DataBase

Core 1 Core 2

Thread 1 Thread 2

dedicates

Microservice

BeginThread(DataBase, Thread1, Core1)

BeginThread(Microservice, Thread2, Core2)

Threads

Page 21: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Order

DataBase

Core 1 Core 2

Thread 1 Thread 2

Microservice

One coreone task!

Better cache performance!

Page 22: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Ingredients for ToroKernel● User application within the kernel● Cooperative Scheduler● Dedicated Resources● Single Thread Event Loop Networking

Page 23: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Dedicated Resources● In a multicore system, the problematic resource

is the shared memory. The use of shared memory causes:– Overhead in the memory bus– Overhead in the cache to keep it coherent– Overhead to guaranty mutual exclusion, e.g., use of

spin-locks

Page 24: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Dedicated ResourcesProposal

● Toro improves memory access by keeping the resources locals:– The memory is dedicated per core– The kernel data structures are dedicated per core– The access to kernel data structures is lock free

Page 25: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Core 1 Core 2

Memory Region 1 Memory Region 2

TORO Memory allocator

Memory space in Toro

Dedicated Resources

ToroGetMem()ToroGetMem()

Thread 1 Thread 2

Access to this regioncan be improved

by usingHypertransport

Or Intel Quick Path

Page 26: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Core 1 Core 2

Dedicated Resources

Memory 1 Memory 2

Disk1 Network1

DedicateBlockDriver(Disk1, Core1)DedicateNetworkDriver(Network1, Core2)

dedicates

Page 27: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Core 1 Core 2

Dedicated Resources

Memory 1 Memory 2

Disk1 Network1

DedicateBlockDriver(Disk1, Core1)DedicateNetworkDriver(Network1, Core2)

dedicates

e.g.,ATA Disks

e.g,.E1000, VirtIO

Page 28: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Core 1 Core 2

Dedicated Resources

Memory 1 Memory 2

Disk 1 Network Card

Ext3 Network Stack

Page 29: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Dedicated Resources

Disk 1

Memory 1 Memory 2

Network Card

Core 1 Core 2

Ext3 Network Stack

Thread 1 Thread 2

Data Base Microservice

Messages, shared memory or any mechanism tocommunicate between threads

Page 30: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Dedicated Resources

Disk 1

Memory 1 Memory 2

Core 1 Core 2

Ext3

Thread 1

Data Base

Messages, shared memory or any mechanism tocommunicate between threads

Network Card

Network Stack

Thread 2

Microservice

In Toro, microservices are first-class objects

Page 31: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Ingredients for ToroKernel● User application within the kernel● Cooperative Scheduler● Dedicated Resources● Single Thread Event Loop Networking

Page 32: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Single Thread Event Loop● Toro networking is based on the single thread event

loop model [1], i.e., one thread per microservice● The kernel provides a dedicated API to create

microservices● The kernel implements the microservice and

improves the CPU usage[2]

[1] Node JS Architecture – Single Threaded Event Loop[2] Reducing CPU usage of a Toro Appliance, FOSDEM’18

Page 33: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Microservice

Accept() Receive() Close()

Single Thread Event Loop

The user defineshow to react

to events

Page 34: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Kernel

Microservice

Accept() Receive() Close() SysServiceCreate()

Single Thread Event Loop

The userregisters a

set of callbacks

Page 35: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Kernel

Service Thread “A”

Socket #0 Socket #1 Socket #N...Socket Scheduler

Microservice

Accept() Receive() Close() SysServiceCreate()

The thread schedules each socket

Single Thread Event LoopEach microservice is

implemented by using one thread.

Page 36: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Kernel

Service Thread “A”

Socket #0 Socket #1 Socket #N...Socket Scheduler

Microservice

Accept() Receive() Close() SysServiceCreate()

Single Thread Event LoopThe Event Loop haltsthe core if none connection

arrives

Page 37: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Kernel

Service Thread “A”

Socket #0 Socket #1 Socket #N...Socket Scheduler

Microservice

Accept() Receive() Close() SysServiceCreate()

Single Thread Event Loop

“It’s all talk until the code runs.” - Ward Cunningham

Page 38: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

HelloWorld Microservice Example● We implement a simple microservice that responds “Hello World”● We implement it by using three approaches: Docker, Ubuntu guest

(KVM) and Toro guest (KVM)● We compare these approaches in term of:

– Deploying Time – Bootstrap Time – Image Size – CPU usage – Time per Request

Microservice footprint

Page 39: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

HelloWorld Microservice Examplesetup

Docker General OS/KVM Toro/KVM

● Cpu limit● Memory limit

● NGINX● UWSGI (4 processes)● Flask

● Cpu limit● Memory limit

● NGINX● UWSGI (4 processes)● Flask

● CPU limit● Memory limit

● Toro WWW server

Page 40: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Deploying Time● Time required to build an image within the

microservice

Page 41: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Bootstrap Time● Time to boot and to answer the first request

Page 42: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Image Size● Size of the image that contains the

microservice and its dependencies.

Page 43: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

CPU Usage

Page 44: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

End-User Delay● Benchmarking with ab and measuring the Time

per Request (mean) [ms]

Number of Concurrent Request

Approach CPUs 200 500 1000

Docker 4 139.980 ms 333.937 ms 801.422

Ubuntu/KVM 4 94.507 ms 238.149 ms 560.513 ms

TORO/KVM 1 120.065 ms 301.736 ms 596.792 ms

Page 45: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Take away lessons● Minimal image size (< 4MB)

– NGINX docker image is 15-times the size of a Toro image

● Continuous Integration: 1 sec to re-deploy a microservice– Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

● Time per Requests– Comparable level with cutting edge technology (NGINX)

● CPU Usage– Comparable with Docker– Toro is 100% isolated from the host OS, Docker is not.

Page 46: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Summary● Toro is a kernel dedicated to run microservices● Toro provides a dedicated API to specify microservices ● Toro design is improved in four main points:

– Booting time and building time– communication to kernel– memory access – networking

Page 47: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Future Work● Ease tooling to develop, test and debug microservices● Investigate new use-cases● Investigate the porting of applications● Investigate new ideas to improve the network stack for

microservices, e.g., improve socket scheduling for http, resource allocation algorithm

Page 48: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

QA● http://www.torokernel.io● [email protected]● Twitter @torokernel● Torokernel wiki at github

– My first Three examples with Toro

● Test Toro in 5 minutes (or less...)– torokernel-docker-qemu-webservices at

Github

Page 49: Toro, a Dedicated Kernel for Microservices · Continuous Integration: 1 sec to re-deploy a microservice – Deploy an OS w/similar configuration takes 300 sec, with docker ~50 sec

Thanks!