toward 10,000 containers on openstack

33
Toward 10,000 Containers on OpenStack Ricardo Rocha Spyros Trigazis (CERN) Ton Ngo Winnie Tsang (IBM)

Upload: ton-ngo

Post on 15-Apr-2017

130 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: Toward 10,000 Containers on OpenStack

Toward  10,000  Containers  on  OpenStack

Ricardo  RochaSpyros  Trigazis(CERN)

Ton  NgoWinnie  Tsang(IBM)

Page 2: Toward 10,000 Containers on OpenStack

Talk  outline1. Introduction2. Benchmarks3. CERN  Cloud  result4. CNCF  Cloud  result5. Conclusion

• Acknowledgement:  • CERN  cloud  team• CNCF  Lab• IBM  team:  Douglas  Davis,  Simeon  Monov• Rackspace team:  Adrian  Otto,  Chris  Hultin,  Drago  Rosson• Many  thanks  to  the  Magnum  team  for  all  the  progress

Page 3: Toward 10,000 Containers on OpenStack

About  OpenStack  Magnum

• Mission:    management  service  for  container  infrastructure• Create  /  configure  nodes  (VM/baremetal),  networking,  storage  • Deep  integration  with  Openstack services• Lifecycle  operation  on  cluster• Native  container  API

• Current  support:  • Kubernetes• Swarm• Mesos

Page 4: Toward 10,000 Containers on OpenStack

Newton  and  Upcoming  Release• Newton  features:• Cluster  and  drivers  refactoring• Documentation:    user  guide,  installation  guide  • Baremetal:  Kubernetes  cluster  • Storage:    cinder  volume,  Docker  storage  • Networking:  decouple  lbaas,  floating  IP,  Flannel  overlay  network• Distro:    OpenSUSE• Internal:  asynchronous  operation,  certificate  DB  storage,  notification,  rollback

• Upcoming  release• Heterogeneous  clusters• Cluster  upgrades• Advanced  container  networking• Additional  drivers:  DC/OS,  further  baremetal support

Page 5: Toward 10,000 Containers on OpenStack

Benchmarks

Page 6: Toward 10,000 Containers on OpenStack

Rally  An  Openstack benchmark  test  tool• Easily  extended  by  plugin• Test  result  in  HTML  reports• Used  by  many  projects• Context:    set  up  environment• Scenario:    run  benchmark• Recommended  for  a  production  serviceto  verify  that  the  service  behaves  asexpected  at  all  time

Kubernetes  Cluster

pods,containers

Rally

report

Page 7: Toward 10,000 Containers on OpenStack
Page 8: Toward 10,000 Containers on OpenStack

Rally  Plugin  for  MagnumScenarios  for  cluster:• Create  and  list  clusters(support  k8s,  swarm  and  mesos)• Create  and  list  cluster  templates

Scenarios  for  container:• Create  and  list  pods(k8s)• Create  and  list  rcs(k8s)• Create  and  list  containers(swarm)• Create  and  list  apps(mesos)

Page 9: Toward 10,000 Containers on OpenStack

Sample  Rally  input  task  files  

• -­-­-­• MagnumClusters.create_and_list_clusters:• -­• args:• node_count:  4• runner:• type:  "constant”• times:  10• concurrency:  2• context:• users:• tenants:  1• users_per_tenant:  1• cluster_templates:• image_id:  "fedora-­atomic-­latest"• external_network_id:  "public"• dns_nameserver:  "8.8.8.8"• flavor_id:  "m1.small"• docker_volume_size:  5• network_driver:  "flannel"• coe:  "kubernetes"

-­-­-­K8sPods.create_and_list_pods:-­args:manifest:  "artifacts/nginx.yaml.k8s"runner:type:  "constant"times:  20concurrency:  2context:users:tenants:  1users_per_tenant:  1cluster_templates:image_id:  "fedora-­atomic-­latest"external_network_id:  "public"dns_nameserver:  "8.8.8.8"flavor_id:  "m1.small"docker_volume_size:  5network_driver:  "flannel"coe:  "kubernetes"clusters:node_count:  2ca_certs:directory:  "/home/stack"

Page 10: Toward 10,000 Containers on OpenStack

loaddriver

Google/Kubernetes  benchmarkSteady  state  performance  in  a  large  Kubernetes  cluster• Create  a  Kubernetes  cluster  with  800  vcpu(e.g.  200  nodes  x  4  cpu)

• Requires  a  DNS  service,  SkyDNS for  k8s<=1.2,  embedded  in  newer  releases

• Launch  nginx pods  serving  millions  of  HTTP  requests  per  second

• It  is  possible  to  scale  the  load  bots  and  the service  pods  as  needed

• Google  has  published  the  configuration  and  result  data,  so  we  can  compare  with  their  results

Kubernetes  Cluster

nginxmillions  request/sec

Page 11: Toward 10,000 Containers on OpenStack

CERN  Cloud  result

Page 12: Toward 10,000 Containers on OpenStack

CERN  OpenStack  InfrastructureProduction  since  2013

~190.000  cores ~4million  VMs  created ~200  VMs  created  /  hour

Page 13: Toward 10,000 Containers on OpenStack

CERN  Container  Use  Cases• Batch  processing• End  user  analysis  /  Jupyter Notebooks• Machine  Learning  /  TensorFlow /  Keras• Infrastructure  Services

• Data  Movement,  Web  Servers,  PaaS,  ...

• Continuous  Integration  /  Deployment• And  many  others...

Page 14: Toward 10,000 Containers on OpenStack

CERN  Magnum  Deployment• Integrate  containers  in  the  CERN  cloud

• Shared  identity,  networking  integration,  storage  access,  …

• Agnostic  to  container  orchestration  engines• Docker  Swarm,  Kubernetes,  Mesos

• Fast,  Easy  to  use

Container  Investigations Magnum  Tests

Pilot  Service  Deployed

11  /  2015 02  /  2016

Production  Service

CERN  /  HEP  Service  Integration,  Networking,  CVMFS,  EOS

10  /  2016Mesos  Support  

Upstream  Development

Page 15: Toward 10,000 Containers on OpenStack

CERN  Magnum  Deployment• Clusters  are  described  by  cluster  templates• Shared/public  templates  for  most  common  setups,  customizable  by  users

$ magnum cluster-template-list+------+---------------------------+| uuid | name |+------+---------------------------+| .... | swarm || .... | swarm-ha || .... | kubernetes || .... | kubernetes-ha || .... | mesos || .... | mesos-ha |+------+---------------------------+

Page 16: Toward 10,000 Containers on OpenStack

CERN  Magnum  Deployment• Clusters  are  described  by  cluster  templates• Shared/public  templates  for  most  common  setups,  customizable  by  users

$ magnum cluster-create --name myswarmcluster --cluster-template swarm --node-count 100

$ magnum cluster-list+------+----------------+------------+--------------+-----------------+| uuid | name | node_count | master_count | status |+------+----------------+------------+--------------+-----------------+| .... | myswarmcluster | 100 | 1 | CREATE_COMPLETE |+------+----------------+------------+--------------+-----------------+

$ $(magnum cluster-config myswarmcluster --dir magnum/myswarmcluster)

$ docker info / ps / ...$ docker run --volume-driver cvmfs -v atlas.cern.ch:/cvmfs/atlas -it centos /bin/bash [root@32f4cf39128d /]#

Page 17: Toward 10,000 Containers on OpenStack

CERN  Benchmark  Setup• Setup  in  one  dedicated  cell• 240  hypervisors

• Each  32  cores,  64  GB  RAM,  10Gb  links

• Container  images  stored  in  Cinder  volumes,  in  our  CEPH  cluster• Default  today  in  Magnum

• Deployed  /  configured  using  puppet  (as  all  our  production  setup)• Magnum  /  Heat  Setup

• Dedicated  controller(s),  in  VMs• Dedicated  rabbitmq,  clustered,  in  VMs

• Dropped  explicit  Neutron  resource  creation• Floating  IPs,  Ports,  Private  Networks,  LBaaS

Page 18: Toward 10,000 Containers on OpenStack

CERN  Results• Several  iterations  before  arriving  at  a  reliable  setup• First  run:  2  million  requests  /  s

• Bay  of  200  nodes  (400  cores,  800  GB  Ram)

First  Tests~100/200  node  bays

Large  TestsUp  to  1000  node  bays

Page 19: Toward 10,000 Containers on OpenStack

CERN  Results• Services  coped  with  request  increase

• x4  in  Nova,  x8  in  Cinder,  ==  in  Keystone

• Almost  business  as  usual…  though• Keystone  stores  a  revocation  tree  (memcache)• Populated  on  every  project/user/trustee  creation• And  is  checked  for  every  token  validation• -­>  Network  traffic  in  one cache  node  (shard)• -­>  >12  seconds  ave request  time  vs  the  average  of  3ms

First  Tests~100/200  node  bays Large  Tests

Up  to  1000  node  bays

Page 20: Toward 10,000 Containers on OpenStack

CERN  Results• Second  run:  rally  and  7  million  requests  /  sec• Lots  of  iterations!   Example

Scale  Magnum  Conductor

Deploy  Barbican

Page 21: Toward 10,000 Containers on OpenStack

CERN  Results● Second  go:  rally  and  7  million  requests  /  sec  

○ Kubernetes  7  million  requests  /  sec○ 1000  node  clusters  (4000  cores,  8000  GB  /  RAM)

Cluster  Size  (Nodes) Concurrency Deployment  Time  (min)

2 50 2.5

16 10 4

32 10 4

128 5 5.5

512 1 14

1000 1 23

Page 22: Toward 10,000 Containers on OpenStack

CERN  Tuning• Heat• Timeouts  when  contacting  rabbitmq• Large  stack  deletion  sometimes  needs  multiple  tries

• Magnum• ‘Too  many  files  opened’• 503s,  scale  the  conductor• RabbitMQ instabilities• Flannel  network  config

• Keystone• Revocation  tree  can  cause  some  scalability  issues

ulimit  -­‐n  4096

max_stacks_per_tenant:  10000  was  100

max_template_size:  5242880  (*10  previous)

max_nested_stack_depth:  10  (was  5)

engine_life_check_timeout:  10  (was  2)

rpc_poll_timeout:  600  (was  1)

rpc_response_timeout:  600  (was  60)

rcp_queue_expiration:  600  (was  60)

disabled  memcache

Deployed  Barbican

Downgrade  to  3.3.5

-­‐-­‐labels  flannel_network_cidr=10.0.0.0/8,\        flannel_network_subnetlen=22,\        flannel_backend=vxlan

Page 23: Toward 10,000 Containers on OpenStack

CERN  Tuning  (continued)

• Cinder• Slow  deletion  triggering  heat  stack  deletion  timeouts• Heat  engine  issues  (too  many  retrials,  timeouts)• Make  Cinder  optional?  Lots  of  traffic  with  high  load  apps!

• Heat  stack  deployment  scaling  linearly• For  large  stacks  >128  nodes• Summary  of  a  1000  node  cluster:  1003  stacks,  22000  resources,  47000  events• That’s  ~70000  records  in  the  heat  db for  one  stack

• Heat:  Performance  Scalability  Improvements  -­ Thu  27th  11:50  am

• Flannel  backend  tests• udp:  ~450Mbit/s,  vxlan:  ~920  Mbit/s,  host-­gw:  ~950Mbit/s• Change  default?  We  set  vxlan at  CERN  right  now

Page 24: Toward 10,000 Containers on OpenStack

CNCF  Cloud  Result

Page 25: Toward 10,000 Containers on OpenStack

90computes

CNCF  Benchmark  Setup• Granted  access  1  month  ago  and  built  with  OpenstackAnsible with  Newton  release• On-­going  scalability  study  for  Magnum,  Heat  and  COEs

• Hardware  configuration• 2x  Intel  E5-­2680v3  12-­core• 128GB  RAM• 2x  Intel  S3610  400GB  SSD• 10x  Intel  2TB  NLSAS  HDD• 1x  QP  Intel  X710"

• Cinder  configured  with  the  lvm-­driver,  disabled  later

• Neutron  configured  with  linux  bridge

ha-­proxy

5controllers

5controllers

3  neutron  controllers

3  neutron  controllers

90computes

90computes

Page 26: Toward 10,000 Containers on OpenStack

CNCF  resultsTwo  rounds  of  tests:• 35  node  cluster  with  one  master,  24  cores  and  120GB  of  ram,  (840  cores)

• 80  node  cluster  with  one  master,  24  cores  and  120GB  of  ram,  (1920  cores)

Flannel  backend  configuration  host-­gw or  udp)  VS  vxlan at  CERN

nodes containers reqs/sec latency flannel

35 1100 1M 83.2  ms udp

80 1100 1M 1.33  ms host-­gw

80 3100 3M 26.1  ms host-­gw

Page 27: Toward 10,000 Containers on OpenStack

Rally  data  at  CNCF

Cluster  creation

Cluster  Size  

(Nodes)Concurrency

Number  of  

ClustersDeployment  Time  (min)

2 10 100 3.02

2 10 1000 Able  to  create  219  clusters

32 5 100 Able  to  create  28  clusters

512 1 1 *

4000 1 1 *

COE Cluster  Size  (Nodes) Concurrency Number  of  

ContainersDeployment  Time  (sec)

K8S 2 4 8 2.3

Swarm 2 4 8 6.2

Mesos 2 4 8 122.0

Container  creation

Page 28: Toward 10,000 Containers on OpenStack

Tuning  at  CNCF• Apply  the  same  improvements  discovered  at  CERN

• Heat  tuning• Cinder  decoupling

• Disabled  Floating  IPs  to  create  many  large  clusters  concurrently• But  we  need  Floating  IPs  for  the  master  node  or  the  load  balancer

• Still  working  on  tuning  rabbit,  adding  separate  clusters  for  each  service  (like  at  CERN)• Consider  this  option  in  OpenStack  Ansible for  large  deployment

• Using  database  for  certificates  didn’t  impact  the  overall  performance:• Reasonable  alternative  to  Barbican

Page 29: Toward 10,000 Containers on OpenStack

Conclusion

Page 30: Toward 10,000 Containers on OpenStack

Conclusions• Scalability:• Deploy  clusters• Deploy  containers• Steady  state:  app

• Good:• Nova  and  neutron  were  solid• Once  the  infrastructure  is  in  place,  we  can  match  the  performance  published  by  Google• Magnum  itself  not  a  bottleneck:    many  tuning  knobs  for  building  complex  cluster

• Need  work:  • Really  an  Openstack scaling  and  stability  problem• Linear  scaling  in  heat  and  keystone  (when  creating  a  large  number  of  cluster  and  using  uuid  tokens,  token  validation  in  keystone  becomes  too  slow)

• Did  we  hit  10,000  containers?  • YES

Page 31: Toward 10,000 Containers on OpenStack

Best  practices  How  to  avoid  the  bottlenecks  for  now• Tune  your  Openstack• Rabbit,  Heat

• Consider  trade-­off  in  deploying  cluster:    • Local  storage  or  cinder  volume  • Fewer  larger  nodes  or  more  smaller  nodes• Floating  IP  per  node  or  not• Load  balancer  • Networking:    udp,  host-­gw

Page 32: Toward 10,000 Containers on OpenStack

Next  steps• Rerun  tests  focusing  on  cluster  lifecycle  operations

• Rolling  upgrades,  node  retirement  /  replacement,  …

• Summarize  best  practices  in  Magnum  documentation• Run  similar  application  scaling  tests  for  other  COEs

• Swarm  3K,  Mesos  50.000  containers  in  real  time

• Decouple  Cinder  for  container  storage• Bugs:    

• Floating  IP  handling,  client,  state  synchronization  with  Heat

• Long  term  issue:• Developers  use  devstack• How  can  we  discover  bottlenecks,  scaling  problems  in  a  systematic  way?

Page 33: Toward 10,000 Containers on OpenStack

Thank  You

Ricardo  [email protected]

Spyros  [email protected]@strigazi

Ton  Ngo  [email protected]@tango245

Winnie  [email protected]