pets, cattle, and herding dogs

32
Copyright 2013 AlcatelLucent. All rights reserved. CONFIDENTIAL SOLELY FOR AUTHORIZED PERSONS HAVING A NEED TO KNOW PROPRIETARY – USE PURSUANT TO COMPANY INSTRUCTION Nuage Networks Dimitri SSliadis @ds$liadis Pets, Ca)le and Herding Dogs

Upload: nuage-networks

Post on 13-Jul-2015

830 views

Category:

Technology


5 download

TRANSCRIPT

Page 1: Pets, Cattle, and Herding Dogs

Copyright  2013  Alcatel-­‐Lucent.  All  rights  reserved.  CONFIDENTIAL  -­‐  SOLELY  FOR  AUTHORIZED  PERSONS  HAVING  A  NEED  TO  KNOW    

PROPRIETARY  –  USE  PURSUANT  TO  COMPANY  INSTRUCTION  Nuage  Networks  

Dimitri  SSliadis  @ds$liadis  

Pets,  Ca)le  and  Herding  Dogs  

Page 2: Pets, Cattle, and Herding Dogs

Pets  &  Ca)le  

Page 3: Pets, Cattle, and Herding Dogs

Don’t  forget  the  herding  dogs  

The  herding  dogs  keep  the  caTle  safe    The  control  plane  maTers  

Page 4: Pets, Cattle, and Herding Dogs

Adventures  with  the  Neutron  Herd  

•  Goal:  Push  Neutron  to  its  limits  •  Maximize  port  acSvaSon  rate  •  Check  stability  under  heavy  load  •  InteracSons  with  other  components  (Nova,  Keystone)  

•  Create  a  new  Neutron  benchmark  

Does  Neutron  scale  and  is  it  producSon  ready  ?  

Page 5: Pets, Cattle, and Herding Dogs

Focus:  Neutron  +  Nuage  VSP  

•  Neutron  consists  of  two  components  •  Core  Neutron  server    •  Plugins    

•  Reference  OVS/ML2  plugin  used  in  most  previous  tests  •  These  tests  only  with  the  Nuage  VSP  plugin  

Page 6: Pets, Cattle, and Herding Dogs

Background  (Canonical  Tests)  At  around  170  instances  per  compute  server,  we  hit  our  next  bo<leneck;  the  Neutron  agent  status  on  compute  nodes  started  to  flap,  with  agents  being  marked  down  as  instances  were  being  created.      

   

we  took  the  decision  to  turn  Neutron  security  groups  off  in  the  deployment    and  run  without  any  VIF  level  iptables  security.  

however  with  Neutron  in  the  design,  we  could  not  realis$cally  get  past  5-­‐6  chassis  of    servers,  so  we  took  the  decision  to  remove  Neutron  from  the  cloud  design  and    run  with  just  Nova  networking.  

with  the  revised  configuraSon,  we  were  able  to  create  instances  in  batches    of  100  at  a  respectable  throughput  of  iniSally  4.5/sec    

hTp://javacruc.wordpress.com/2014/06/18/168k-­‐instances/  

Page 7: Pets, Cattle, and Herding Dogs
Page 8: Pets, Cattle, and Herding Dogs

Cloud  Service    Management  Plane  

Virtualized  Services  Directory  

Datacenter    Control  Plane  

Virtualized  Services  Controller  

Virtualized  Services  Directory  (VSD)  •  Network  Policy  Engine  –  abstracts  complexity  •  Service  templates  and  analyScs  

Nuage  Networks  Virtualized  Services  PlaKorm  (VSP)  

Virtual  RouMng  &  Switching  (VRS)  •  Distributed  switch  /  router  –  L2-­‐4  rules  •  IntegraSon  of  bare  metal  assets  

Virtualized  Services  Controller  (VSC)  •  SDN  Controller,  programs  the  network  •  Rich  rouSng  feature  set    

WAN  Router    

MP-­‐BGP    

MP-­‐BGP    

Datacenter  Data  Plane  

Virtual  RouSng  &  Switching    

HYPERVISOR  

HYPERVISOR  

HYPERVISOR  

HYPERVISOR  

HYPERVISOR  

HYPERVISOR  

Brooklyn  Datacenter  -­‐    Zone  1  

IP  Fabric  

Hardware  GW  for  

Bare  Metal    

Nuage  Networks  Virtualized  Services  PlaKorm  (VSP)  

Page 9: Pets, Cattle, and Herding Dogs

Differences  from  core  implementaMon  

Agent-­‐less  architecture    No  l3agent,  dhcp  agent  

No  network  node  Distributed  L2,  L3,  L4  Single  mulS-­‐tenant  bridge  in  compute  nodes  ConfiguraSon  of  high  level  policies  at  compute  nodes  rather  than  ACLs  Scale-­‐out  architecture  of  controllers    

Page 10: Pets, Cattle, and Herding Dogs

Our  Setup  

•  Control  plane  only  tesSng  in  AWS  •  Compute  nodes  use  libvirt-­‐lxc  to  avoid  VM  boot  performance  boTlenecks    

Nova  Ctrl  (Mysql/Rabbit/MQ)   Neutron  Server  

Nuage  VSD    

Libvirt-­‐LXC  

Libvirt-­‐LXC  

Libvirt-­‐LXC  

41  Compute  Nodes  

Compute  Nodes  

Nuage  VSC    

AMI  –  c3.8xlarge  64  cores/  60G    

AMI  –  c3.8xlarge   AMI  –  c3.8xlarge   AMI  –  c3.2xlarge  8  cores    

AMI  –  c3.xlarge  

Page 11: Pets, Cattle, and Herding Dogs

Test  

Create    5K  networks  AcSvate  instances  randomly  in  the  network  using  batch  instance  creaSon  Start  50  instances  at  a  Sme,  wait  unSl  they  are  done  and  conSnue      Where  does  it  break  ?  

Page 12: Pets, Cattle, and Herding Dogs

First  a)empt  

1  instances/second  Timeouts  all  over  the  place        

Page 13: Pets, Cattle, and Herding Dogs

First  Steps  

Adjust  nova  and  neutron  workers  

Tune  Keystone  (mulSple  workers)  

MySQL  connecSons  

Page 14: Pets, Cattle, and Herding Dogs

Improvement  Ac$vated  4K  instances  in  about  10  minutes  (about  6.8  instances/second)    Can  we  do  be<er  ?    Where  are  the  bo<lenecks        

Page 15: Pets, Cattle, and Herding Dogs

Nova  and  Neutron  Server  UMlizaMon  

0  

10  

20  

30  

40  

50  

60  

70  

80  

3:46:03  

3:46:23  

3:46:43  

3:47:03  

3:47:23  

3:47:43  

3:48:03  

3:48:23  

3:48:43  

3:49:03  

3:49:23  

3:49:43  

3:50:03  

3:50:23  

3:50:43  

3:51:03  

3:51:24  

3:51:44  

3:52:04  

3:52:24  

3:52:44  

3:53:04  

3:53:24  

3:53:44  

3:54:04  

3:54:24  

3:54:44  

3:55:04  

3:55:24  

3:55:44  

3:56:04  

Neutron  Server  

Nova  Server  

Page 16: Pets, Cattle, and Herding Dogs

nova-­‐scheduler  

0  

10  

20  

30  

40  

50  

60  

70  

80  

90  

3:46:03  

3:46:23  

3:46:43  

3:47:03  

3:47:23  

3:47:43  

3:48:03  

3:48:23  

3:48:43  

3:49:03  

3:49:23  

3:49:43  

3:50:03  

3:50:23  

3:50:43  

3:51:03  

3:51:24  

3:51:44  

3:52:04  

3:52:24  

3:52:44  

3:53:04  

3:53:24  

3:53:44  

3:54:04  

3:54:24  

3:54:44  

3:55:04  

3:55:24  

3:55:44  

3:56:04  

Nova  Scheduler  

Nova  Scheduler  

Page 17: Pets, Cattle, and Herding Dogs

mysqld  

0  

100  

200  

300  

400  

500  

600  

700  

800  

3:46:03  

3:46:23  

3:46:43  

3:47:03  

3:47:23  

3:47:43  

3:48:03  

3:48:23  

3:48:43  

3:49:03  

3:49:23  

3:49:43  

3:50:03  

3:50:23  

3:50:43  

3:51:03  

3:51:24  

3:51:44  

3:52:04  

3:52:24  

3:52:44  

3:53:04  

3:53:24  

3:53:44  

3:54:04  

3:54:24  

3:54:44  

3:55:04  

3:55:24  

3:55:44  

3:56:04  

mysql  

mysql  

Page 18: Pets, Cattle, and Herding Dogs

Query  stats  

Outliers  (AWS  EBS)  

Queries  take  longer  

Page 19: Pets, Cattle, and Herding Dogs

First  suspect  for  MySQL  problems    mysqldumpslow  -­‐a  -­‐s  r  -­‐t  5  /var/log/mysql/mysql-­‐slow.log    Count:  20000    Time=0.06s  (1142s)    Lock=0.00s  (2s)    Rows=1.0  (20000),  root[root]@ip-­‐10-­‐0-­‐1-­‐23.us-­‐west-­‐2.compute.internal      SELECT  count(*)  AS  count_1      FROM  (SELECT  ports.tenant_id  AS  ports_tenant_id,  ports.id  AS  ports_id,  ports.name  AS  ports_name,        ports.network_id  AS  ports_network_id,  ports.mac_address  AS  ports_mac_address,  ports.admin_state_up  AS    ports_admin_state_up,  ports.status  AS  ports_status,  ports.device_id  AS  ports_device_id,  ports.device_owner  AS  ports_device_owner      FROM  ports      WHERE  ports.tenant_id  IN  ('S'))  AS  anon_1    

Quota  check  gets  a  count  of  all  ports  for  a  tenant    We  used  just  one  tenant  for  all  our  ports  

Page 20: Pets, Cattle, and Herding Dogs

Corresponding  Code    

def get_ports_count(self, context, filters=None): ! return self._get_ports_query(context, filters).count()  

That’s  the  wrong  way  to  get    a  count  in  SQLAlchemy  

Page 21: Pets, Cattle, and Herding Dogs

Fixing  the  Query  

Page 22: Pets, Cattle, and Herding Dogs

VSD  UMlizaMon    

0  

10  

20  

30  

40  

50  

60  

70  

80  

90  

3:46:03  

3:46:33  

3:47:03  

3:47:33  

3:48:03  

3:48:33  

3:49:03  

3:49:33  

3:50:03  

3:50:33  

3:51:03  

3:51:34  

3:52:04  

3:52:34  

3:53:04  

3:53:34  

3:54:04  

3:54:34  

3:55:04  

3:55:34  

3:56:04  

VSD  

VSD  

Page 23: Pets, Cattle, and Herding Dogs

VSD  MySQL  UMlizaMon  

0  

50  

100  

150  

200  

250  

300  

350  

400  

450  

500  

3:46:03  

3:46:33  

3:47:03  

3:47:33  

3:48:03  

3:48:33  

3:49:03  

3:49:33  

3:50:03  

3:50:33  

3:51:03  

3:51:34  

3:52:04  

3:52:34  

3:53:04  

3:53:34  

3:54:04  

3:54:34  

3:55:04  

3:55:34  

3:56:04  

VSD  mysql  

VSD  mysql  

Page 24: Pets, Cattle, and Herding Dogs

Modified  Test    

Create    5K  networks  Ac$vate  instances  with  5  vPorts  per  instance    Start  50  instances  at  a  Sme,  wait  unSl  they  are  done  and  conSnue  Avoid  nova-­‐scheduler  boTleneck    Push  neutron-­‐server  to  its  limits      

Page 25: Pets, Cattle, and Herding Dogs

Improvement  

Ac$vated  4K    instances  with  20K  vports  in  10  minutes        *  500  vports  on  every  server  fully  configured  with  DHCP  served      *  34  ports/second  (an  order  of  magnitude  be<er  than  Canonical)      *  number  of  instances  per  second  limited  by  Nova      *  Neutron  was  much  faster  than  Nova  in  comple$ng  the  required  work      *  Nuage  VSP  was  by  no  means  the  bo<leneck  in  any  of  the  above  –  Lots  of  free  capacity      

Page 26: Pets, Cattle, and Herding Dogs

Nova  Control  Node  uMlizaMon    

0  

10  

20  

30  

40  

50  

60  

70  

12:26:20  

12:26:40  

12:27:00  

12:27:20  

12:27:40  

12:28:01  

12:28:21  

12:28:41  

12:29:01  

12:29:21  

12:29:41  

12:30:01  

12:30:21  

12:30:41  

12:31:01  

12:31:21  

12:31:41  

12:32:01  

12:32:21  

12:32:41  

12:33:01  

12:33:21  

12:33:41  

12:34:02  

12:34:22  

12:34:42  

12:35:02  

12:35:22  

12:35:42  

12:36:02  

Nova  Control  Node  

Nova  

Page 27: Pets, Cattle, and Herding Dogs

Neutron  Server  UMlizaMon  

0  

10  

20  

30  

40  

50  

60  

12:26:20  

12:26:40  

12:27:00  

12:27:20  

12:27:40  

12:28:01  

12:28:21  

12:28:41  

12:29:01  

12:29:21  

12:29:41  

12:30:01  

12:30:21  

12:30:41  

12:31:01  

12:31:21  

12:31:41  

12:32:01  

12:32:21  

12:32:41  

12:33:01  

12:33:21  

12:33:41  

12:34:02  

12:34:22  

12:34:42  

12:35:02  

12:35:22  

12:35:42  

12:36:02  

neutron  

neutron  

Page 28: Pets, Cattle, and Herding Dogs

Nova  Scheduler  

0  

10  

20  

30  

40  

50  

60  

70  

80  

90  

12:26:20  

12:26:40  

12:27:00  

12:27:20  

12:27:40  

12:28:01  

12:28:21  

12:28:41  

12:29:01  

12:29:21  

12:29:41  

12:30:01  

12:30:21  

12:30:41  

12:31:01  

12:31:21  

12:31:41  

12:32:01  

12:32:21  

12:32:41  

12:33:01  

12:33:21  

12:33:41  

12:34:02  

12:34:22  

12:34:42  

12:35:02  

12:35:22  

12:35:42  

12:36:02  

nova-­‐scheduler  

nova-­‐scheduler  

Page 29: Pets, Cattle, and Herding Dogs

MySQL  

0  

50  

100  

150  

200  

250  

300  

350  

400  

12:26:20  

12:26:50  

12:27:20  

12:27:50  

12:28:21  

12:28:51  

12:29:21  

12:29:51  

12:30:21  

12:30:51  

12:31:21  

12:31:51  

12:32:21  

12:32:51  

12:33:21  

12:33:51  

12:34:22  

12:34:52  

12:35:22  

12:35:52  

mysql  

nova  mysql  

Increased  uSlizaSon  sSll  there,  although  maximum  numbers  are  40%  beTer  

Page 30: Pets, Cattle, and Herding Dogs

New  Log  Analysis  

Count: 320 Time=0.14s (45s) Lock=0.00s (0s) Rows=13749.8 !(4399926), root[root]@ip-10-0-1-23.us-west-2.compute.internal !! SELECT ports.tenant_id AS ports_tenant_id, ports.id AS ports_id, ports.name AS ports_name, ! ports.network_id AS ports_network_id, ports.mac_address AS ports_mac_address, ports.admin_state_up ! AS ports_admin_state_up, ports.status AS ports_status, ports.device_id AS ports_device_id, ! ports.device_owner AS ports_device_owner, ! ipallocations_1.port_id AS ipallocations_1_port_id, ipallocations_1.ip_address AS ! ipallocations_1_ip_address, ipallocations_1.subnet_id ! AS ipallocations_1_subnet_id, ipallocations_1.network_id AS ipallocations_1_network_id, ! securitygroupportbindings_1.port_id AS securitygroupportbindings_1_port_id, ! securitygroupportbindings_1.security_group_id AS securitygroupportbindings_1_security_group_id ! FROM ports LEFT OUTER JOIN ipallocations AS ipallocations_1 ON ports.id = ipallocations_1.port_id !LEFT OUTER JOIN securitygroupportbindings AS ! securitygroupportbindings_1 ON ports.id = securitygroupportbindings_1.port_id ! WHERE ports.tenant_id IN ('10bee9ce4661476993a5c75ff7fcf016') !  

Long  qeury  related  to  IP  address  allocaSon    

Page 31: Pets, Cattle, and Herding Dogs

Time  for  More  IteraMons  

Well  ..  Not  exactly,  since  we  run  out  of  Sme  before  the  Summit        Stay  tuned  ….    

Page 32: Pets, Cattle, and Herding Dogs

Conclusions  Neutron  is  by  far  not  the  boTleneck  in  a  high  performance  Openstack  installaSon  as  long  as  the  right  SDN  system  is  used        Significant  effort  needed  to  opSmize  Openstack  end-­‐to-­‐end    

 Art  vs  science      Pay  aTenSon  to  SQLAlchemy  statements      Call  to  acSon:                End-­‐to-­‐end  profiling  of  the  code  base  across  all  Openstack  projects                OSProfiler  looks  promising