alternatives to layer-based image distribution: using cern filesystem for images

28
Alternatives to layer-based image distribution using a CERN filesystem for distributing container images George Lestaris @glestaris

Upload: george-lestaris

Post on 11-Feb-2017

90 views

Category:

Technology


0 download

TRANSCRIPT

Alternatives to layer-based

image distributionusing a CERN filesystem for distributing container images

George Lestaris@glestaris

• Software engineer at Pivotal

• working for Cloud Foundry GrootFS

• github.com/cloudfoundry/grootfs

• ex-CERNois

About me

dockerrunelasticsearch:2.3.5Container image

Container image format

• Container images are formalized in: Docker, AppC (ACI) and OCI Image spec

• Generally: image is the combination of:

• a set of layers

• metadata

Building an imageFROMpython:3.5

ADD./myapp

RUNpipinstall\-r/myapp/requirements.txtENTRYPOINTpython/myapp/manage.py\runserver0.0.0.0:8000

Container images are

composed of layers

Layer is a set of files and

directories

FROMpython:3.5

Layers help us to inherit images

FROMpython:3.5

ADD./myapp

RUNpipinstall…

• Different image formats - different distributions mechanisms

• Docker: download layers through HTTP connections from a registry

• Helps reusing layers of base images

• Efficient container image fetching by parallelizing the downloads

Container image distribution

Registry ClientClientClientClient

New image

From cached base

Update dependencies

Distributing software in HEP

Data

Data

Data

Data

Data

Data

Data

Data

Data

Frequent releases

Simulation engine

Analysis framework

Experiment geometry

Experiment software

Dependencies

Simulation engine

Analysis framework

Experiment geometry

Experiment software

Dependencies

Simulation engine

Analysis framework

Experiment geometry

Experiment software

Dependencies

Simulation engine

Analysis framework

Experiment geometry

Experiment software

Dependencies

Simulation engine

Analysis framework

Experiment geometry

Experiment software

Dependencies

Simulation engine

Analysis framework

Experiment geometry

Experiment software

Dependencies

Simulation engine

Analysis framework

Experiment geometry

Experiment software

Dependencies

Simulation engine

Analysis framework

Experiment geometry

Experiment software

Dependencies

WLCG

170 computing centres

in 42 countries

CernVM-FS

• Network file system

• no packages and layers —> files and directories

• FUSE

• Lazily downloads the used files

• Deduplication Downloaded files get cached using a content addressable storage

using a network filesystem

User application

VFS FUSE kernel module

CernVM-FS

FUSE

CernVM-FS service

GET catalog

Cachestatsha256:…

GET /blob/sha256:…

open/dir/file catalog

/dir/file—>sha256:…

Similarities between HEP software and container images

• Most images are based on a Linux distribution

• redis 3.2.3

• Image size: 190 MB (Compressed 74 MB)

• Used to boot: 11 MB - 5.7 %

• node 6.5.0 5.4 %

• nginx 1.11 3.1 %

Applications use a small fragment of the image

• nginx 1.10 to 1.11:

• Real changes: 4.02 MB

• Layer changes: 58 MB (two of the three layers)

• 14.4 times the size of the diff

• nginx 1.9 to 1.10: 4.8 times the size of the diff

Small changes between versions

DemoCernVM-FS and runC

• Small tool to create containers

• Low-level interface - not supposed to be a container runtime

• Used by container runtimes (Docker, Garden) internally

runC

Performance comparison

• http://github.com/glestaris/container-camp

• Used iCE - see PyCon UK 2015

• 20 AWS VMs in eu-west (m4.large)

• 1 CernVM-FS server on an AWS VM (m4.large) in eu-central

• Dockerhub

Experiment setup

• All VMs create a redis:3.2.3 container in parallel

• Comparing runC, Docker and Docker with warm cache

• Run the server and ping (wait for the server to came up)

Scenario

redis-server--daemonizeyeswhile!redis-cliping;doecho'retrying'done

• IPFS: InterPlanetary file system

• Deduplication Content addressed storage for object

• History Versioned objects

• Decentralized P2P transfers

• Objects are files, directories or changes (commits)

Other approaches

• CI server

• Large clusters that parallelly fetch images

• Network contention

• Maintaining a private registry

• Serverless (?)

Use cases