faster, higher, stronger - opnfv
TRANSCRIPT
![Page 1: Faster, Higher, Stronger - OPNFV](https://reader030.vdocuments.site/reader030/viewer/2022012409/616a4c9e11a7b741a350fc6b/html5/thumbnails/1.jpg)
Faster, Higher, StrongerAccelerating Fault Management to Next Level
![Page 2: Faster, Higher, Stronger - OPNFV](https://reader030.vdocuments.site/reader030/viewer/2022012409/616a4c9e11a7b741a350fc6b/html5/thumbnails/2.jpg)
Speakers
Carlos Gonçalves
Software Specialist on the 5G Networks team at NEC Laboratories Europe in Heidelberg, Germany.
He works in the areas of Network Functions Virtualization and Carrier-Cloud Operation & Management
Yujun Zhang
NFV System Engineer from ZTE Corporation
He is current PTL of QTIP in OPNFV, and creator of MitmStack in OpenStack
His main interest focuses on performance testing, analysis and tuning
![Page 3: Faster, Higher, Stronger - OPNFV](https://reader030.vdocuments.site/reader030/viewer/2022012409/616a4c9e11a7b741a350fc6b/html5/thumbnails/3.jpg)
Doctor project introduction
Doctor is fault management and maintenance project to develop and realize the consequent implementation for the OPNFV reference platform.
● Goals○ build fault management and maintenance framework
■ high availability of Network Services■ immediate notification of unavailability
○ requirement survey○ development of missing feature
● Scope: NFVI, VIM
![Page 4: Faster, Higher, Stronger - OPNFV](https://reader030.vdocuments.site/reader030/viewer/2022012409/616a4c9e11a7b741a350fc6b/html5/thumbnails/4.jpg)
Role of QTIP in the collaboration
QTIP is the project for "Platform Performance Benchmarking"
● Reveal details behind a simple indicator● Benchmarking of various testing environment and condition
![Page 5: Faster, Higher, Stronger - OPNFV](https://reader030.vdocuments.site/reader030/viewer/2022012409/616a4c9e11a7b741a350fc6b/html5/thumbnails/5.jpg)
Expected to learn
● How you can enable fast fault mitigation from a rich set of monitoring data sources
● How to fasten NFVI failure event to user● How to leverage performance profiler to find the
bottleneck
![Page 6: Faster, Higher, Stronger - OPNFV](https://reader030.vdocuments.site/reader030/viewer/2022012409/616a4c9e11a7b741a350fc6b/html5/thumbnails/6.jpg)
A “Strong” unbreakable mobile call
![Page 7: Faster, Higher, Stronger - OPNFV](https://reader030.vdocuments.site/reader030/viewer/2022012409/616a4c9e11a7b741a350fc6b/html5/thumbnails/7.jpg)
Faster! How?
![Page 8: Faster, Higher, Stronger - OPNFV](https://reader030.vdocuments.site/reader030/viewer/2022012409/616a4c9e11a7b741a350fc6b/html5/thumbnails/8.jpg)
Notification strategies: conservative
![Page 9: Faster, Higher, Stronger - OPNFV](https://reader030.vdocuments.site/reader030/viewer/2022012409/616a4c9e11a7b741a350fc6b/html5/thumbnails/9.jpg)
Notification strategies: shortcut
![Page 10: Faster, Higher, Stronger - OPNFV](https://reader030.vdocuments.site/reader030/viewer/2022012409/616a4c9e11a7b741a350fc6b/html5/thumbnails/10.jpg)
Notification strategies: pros and cons
Conservative+ Cloud resource states are
always up-to-date- Takes longer to report the
alarm out to consumers
Shortcut+ Faster notification to consumer- Cloud resource states could
still be out-of-sync by the time consumer processes the alarm notification
Consumer: User-side Manager; consumer of the interfaces produced by the VIM; VFNM, NFV-O or Orchestrator in ETSI NFV terminology
![Page 11: Faster, Higher, Stronger - OPNFV](https://reader030.vdocuments.site/reader030/viewer/2022012409/616a4c9e11a7b741a350fc6b/html5/thumbnails/11.jpg)
Notification times comparison (1/3) OpenStack Ocata (DevStack out of the box); 1x Controller, 1x Compute
![Page 12: Faster, Higher, Stronger - OPNFV](https://reader030.vdocuments.site/reader030/viewer/2022012409/616a4c9e11a7b741a350fc6b/html5/thumbnails/12.jpg)
Notification times comparison (2/3)Same deployment + Congress w/ notification capabilities (draft) & parallel execution driver support (cherry-picked from master)
![Page 13: Faster, Higher, Stronger - OPNFV](https://reader030.vdocuments.site/reader030/viewer/2022012409/616a4c9e11a7b741a350fc6b/html5/thumbnails/13.jpg)
Notification times comparison (3/3)- Sample outperforms Congress out of the box- Congress is much feature richer supporting dynamic user-defined policies and execution actions on most OpenStack cloud resources.
![Page 14: Faster, Higher, Stronger - OPNFV](https://reader030.vdocuments.site/reader030/viewer/2022012409/616a4c9e11a7b741a350fc6b/html5/thumbnails/14.jpg)
Issues and challenges
● Passed on Pod-A, but poor result on Pod-B○ Why such difference?
● Performance degradation when scaling up to more servers○ What is the bottleneck?
● Distributed services○ How to collect data from different nodes
![Page 15: Faster, Higher, Stronger - OPNFV](https://reader030.vdocuments.site/reader030/viewer/2022012409/616a4c9e11a7b741a350fc6b/html5/thumbnails/15.jpg)
Performance Profiler
Get more details behind Pass/Failure
![Page 16: Faster, Higher, Stronger - OPNFV](https://reader030.vdocuments.site/reader030/viewer/2022012409/616a4c9e11a7b741a350fc6b/html5/thumbnails/16.jpg)
Simple diagnostic
![Page 17: Faster, Higher, Stronger - OPNFV](https://reader030.vdocuments.site/reader030/viewer/2022012409/616a4c9e11a7b741a350fc6b/html5/thumbnails/17.jpg)
Want more details?
● check log files● check debugging messages ● ...
![Page 18: Faster, Higher, Stronger - OPNFV](https://reader030.vdocuments.site/reader030/viewer/2022012409/616a4c9e11a7b741a350fc6b/html5/thumbnails/18.jpg)
High technology equipments are helping Doctors in real world
![Page 19: Faster, Higher, Stronger - OPNFV](https://reader030.vdocuments.site/reader030/viewer/2022012409/616a4c9e11a7b741a350fc6b/html5/thumbnails/19.jpg)
Example in Chrome DevTool: Measuring Resource Loading Times
What does a profiler do
![Page 20: Faster, Higher, Stronger - OPNFV](https://reader030.vdocuments.site/reader030/viewer/2022012409/616a4c9e11a7b741a350fc6b/html5/thumbnails/20.jpg)
Craft a PoC
![Page 21: Faster, Higher, Stronger - OPNFV](https://reader030.vdocuments.site/reader030/viewer/2022012409/616a4c9e11a7b741a350fc6b/html5/thumbnails/21.jpg)
Profiler Poc
![Page 22: Faster, Higher, Stronger - OPNFV](https://reader030.vdocuments.site/reader030/viewer/2022012409/616a4c9e11a7b741a350fc6b/html5/thumbnails/22.jpg)
Still not enough
Inspiration fromOpenStack Summitpresentation
![Page 23: Faster, Higher, Stronger - OPNFV](https://reader030.vdocuments.site/reader030/viewer/2022012409/616a4c9e11a7b741a350fc6b/html5/thumbnails/23.jpg)
Now we know why
What’s behind `nova reset-state`
![Page 24: Faster, Higher, Stronger - OPNFV](https://reader030.vdocuments.site/reader030/viewer/2022012409/616a4c9e11a7b741a350fc6b/html5/thumbnails/24.jpg)
How osprofiler works
The implementation is quite simple. Profiler has one stack that contains ids of all trace points. E.g.:
profiler.start("parent_point") # trace_stack.push(<new_uuid>)
# send to collector -> trace_stack[-2:]
profiler.start("parent_point") # trace_stack.push(<new_uuid>)
# send to collector -> trace_stack[-2:]
profiler.stop() # send to collector -> trace_stack[-2:]
# trace_stack.pop()
profiler.stop() # send to collector -> trace_stack[-2:]
# trace_stack.pop()
![Page 25: Faster, Higher, Stronger - OPNFV](https://reader030.vdocuments.site/reader030/viewer/2022012409/616a4c9e11a7b741a350fc6b/html5/thumbnails/25.jpg)
Supported vs Needed
osprofiler doctor
CINDER
HEAT
KEYSTONE
NOVA
NEUTRON
GLANCE
TROVE
SENLIN
MAGNUM
CEILOMETER
VITRAGE
CONGRESS
AODH
![Page 26: Faster, Higher, Stronger - OPNFV](https://reader030.vdocuments.site/reader030/viewer/2022012409/616a4c9e11a7b741a350fc6b/html5/thumbnails/26.jpg)
Recommended to track by default
All HTTP calls - helps to get information about: what HTTP requests were done, duration of calls (latency of service), information about projects involved in request.
All RPC calls - helps to understand duration of parts of request related to different services in one project. This information is essential to understand which service produce the bottleneck.
All DB API calls - in some cases slow DB query can produce bottleneck. So it’s quite useful to track how much time request spend in DB layer.
All driver calls - in case of nova, cinder and others we have vendor drivers. Duration
ALL SQL requests (turned off by default, because it produce a lot of traffic)
![Page 27: Faster, Higher, Stronger - OPNFV](https://reader030.vdocuments.site/reader030/viewer/2022012409/616a4c9e11a7b741a350fc6b/html5/thumbnails/27.jpg)
Challenges in doctor use case
Doctor use case
● Composed by several consecutive steps
● Relies on events for fast notification
● Starts on monitor and ends in consumer
● Multi threaded in inspector
ASYNCHRONOUS
OSProfiler limitation
● Designed for profiling ONE request
● Event notification not tracked
● Must start and end in same thread
● Multi thread is not supported
SYNCHRONOUS
![Page 28: Faster, Higher, Stronger - OPNFV](https://reader030.vdocuments.site/reader030/viewer/2022012409/616a4c9e11a7b741a350fc6b/html5/thumbnails/28.jpg)
Gaps identified in upstream
osprofiler feature
[ ] multiple thread supporting
No support for osprofiler in Openstack services
[ ] alarming: aodh[ ] inspector: vitrage[ ] inspector: congress
![Page 29: Faster, Higher, Stronger - OPNFV](https://reader030.vdocuments.site/reader030/viewer/2022012409/616a4c9e11a7b741a350fc6b/html5/thumbnails/29.jpg)
Next in Euphrates
![Page 30: Faster, Higher, Stronger - OPNFV](https://reader030.vdocuments.site/reader030/viewer/2022012409/616a4c9e11a7b741a350fc6b/html5/thumbnails/30.jpg)
Roadmap: Doctor-QTIP collaboration
● [doctor] Integration of osprofiler in CI jobs● [doctor] Propose changes to upstream to fill gaps
○ Osprofiler enhancement○ Aodh supporting○ Congress supporting○ Vitrage supporting
● [qtip] Benchmarking of notification performance○ Collector backend for profiler data○ Dashboard for performance profile of last build
![Page 31: Faster, Higher, Stronger - OPNFV](https://reader030.vdocuments.site/reader030/viewer/2022012409/616a4c9e11a7b741a350fc6b/html5/thumbnails/31.jpg)
Questions?https://wiki.opnfv.org/display/doctor
https://wiki.opnfv.org/display/qtip