entreon engine entreon roker tehni al...
TRANSCRIPT
Centreon Engine & Broker Benchmarks R&D dept., Merethis – May 2012
CENTREON ENGINE & CENTREON BROKER - TECHNICAL BENCHMARK -
2012, May
Centreon Engine & Broker Benchmarks R&D dept., Merethis – May 2012
Why this study?
From Nagios ©/NDO to Centreon Engine/Broker
Since a few years, the core of Nagios© has been maintained by a single developer and as a consequence has been evolving at a slower pace. The Nagios© community had attempted several times to broaden the developer base. Many community members, such as Centreon Development team, tried to propose improvements and patches, but with little success. Effective community commitment gradually deflated, long standing community supporters decided to fork Nagios© (among them are the people around Icinga for example), so did we!
To improve performance and quality, Centreon Engine and Centreon Broker have been created as a Nagios alternative. A study…
Thanks to many strategic partners, Merethis has tested Centreon Engine during every development steps. Centreon Engine is now stable and already run in production from small to medium IT networks. However, even if Centreon Engine and Centreon Broker are stable enough to deploy them in production, what about the performances? This study is based on a simple and common use case of production that compare a Nagios© based monitoring system to a Centreon Engine one. This aim of this study is not to reach the top of the Centreon Engine performances but just to compare it with Nagios© in a standard use case and show you the performances it brings. We have added the explanation of the biggest evolution Centreon developers did to make it easier to understand. We have planned to publish full performance benchmarks in the next months. If you want to give us your feedback on this document, the results, or the explanation, feel free to comment the blog post on http://blog.centreon.com where we published this report and all the data.
Study case Performance comparison between Nagios ©/NDO and Centreon Engine/Broker in production mode Use case 30.000 services, checks during between 1 second and over a minute. Author R&D dept., Merethis Date of benchmarks March, 2012 Published on May 2012 Keywords Centreon Engine, Nagios ©, benchmarks, performance, study Find out more on http://blog.centreon.com
Centreon Engine & Broker Benchmarks R&D dept., Merethis – May 2012
Testing Methodology and Configuration
Monitoring server
All the benchmarks were conducted on a monitoring server with this configuration:
Processor 2xIntel Xeon E5640 (2.66GHz, 4 Core, 12Mo cache) RAM 8Go 1333MHz (4x2Go) Operating System CES (CentOS release 5.7) 2.6.18-274.17.1.el5 x86_64
Services and checks
The services and checks have been customized depending of the test case currently conducted: full active services, full passive, and half passive/half active.
Full active services
Active host 1 check_ping (5min) Active services 30.000 check_dummy (5min) Passive service 0
Mix active and passive services
Active host 500 check_ping (5min) Active services 10.000 check_sleep (5min) Passive service 10.000 2000 checks/min
Full passive services
Active host 1 check_ping (5min) Active services 0 Passive service 30.000 6000 checks/min
Products versions
For these benchmarks, we have installed Nagios © 3.2.3, Centreon Engine 1.1.1 and Centreon Broker 2.1.0.
Centreon Engine & Broker Benchmarks R&D dept., Merethis – May 2012
Increase start/ready time
Problem
When starting, NDO needs to check and update its database. It needs a lot of time to start and to get ready… waste of time for administrators! And while it’s starting the monitoring is down and your monitoring system is blind.
Results
The tests were conducted for monitoring a network with the configuration below:
Active host 500 check_ping (5min) Active services 10.000 check_sleep (5min) Passive service 10.000 2000 checks/min The very first test was conducted with Nagios ©, using NDOutils and Centreon Broker, the second test was conducted with Centreon Engine, using NDOutils and Centreon Broker Nagios 3.2.3
Engine 1.1.1
Centreon Broker starts and gives control back to the user 15 times faster than NDO does.
0 200 400 600 800
NDO
Broker
NDO Broker
Start (s) 334 2
Ready (s) 359 21
0 100 200 300 400
NDO (s)
Broker (s)
NDO (s) Broker (s)
Start 3 3
Ready 359 21
Centreon Engine & Broker Benchmarks R&D dept., Merethis – May 2012
Centreon Engine manage passive checks better Passive checks management by Nagios © seems pretty poor from our point of view, Merethis R&D team tried and managed to the waste of time it generates.
Current issue
Nagios passive checks management When passive command results are sent to Nagios©, they are stored in an external command file (Nagios cmd pipe). To manage these results, Nagios© reads the file, stops the main thread (and the active checks) and fills the database with the results. Once finished, Nagios© manage active checks again. But when Nagios© looks for the results in the external command file, it wastes a lot of time, and does not manage the active checks efficiently. Thus, when Nagios© is experiencing a lot of passive check, it fails at managing all of them, and raises the buffer more and more until it reach the system limits and fall down.
Centreon Engine alternative solution
Nagios© wastes a lot of time looping the checks in a linked list. By introducing a hash table, Centreon Engine improves the performances and allows to deals faster with the passive checks. The buffer rises slower and the monitoring system does not have to slow down or stop.
The results
Even if Centreon Engine does not fix all the passive Nagios© performance issues, it allows managing bigger IT monitored system.
In our study, the more relevant example comes with the full passive checks (30.000 passive services). Nagios fails and stop monitoring because it’s full. Centreon Engine keeps going on and succeeds in managing all the checks.
Nagios
Nagios CMD pipe
Passive check results Active checks
Centreon Engine & Broker Benchmarks R&D dept., Merethis – May 2012
Cenreon Engine manages active checks better
Current issue
A deep work has been done on check forks thanks to profiling. Nagios© forks waste resources, time and so has difficulty to manage big IT network checks.
Nagios© Fork overview Nagios© need 3 forks (or 4 depending on your configuration) to execute one check. Thus it needs lots of CPU and memory. Moreover, by writing the results in a file, it does a lot of IO on the file system.
Centreon Engine alternative solution
Centreon Engine brings a simple solution by limiting the fork(s) needed. Instead of forking 3 or 4 times, it forks only one time to execute the check command. The first fork has been removed by adding a new thread waiting for the fork results.
Nagios
Fork #1
Fork #2 Fork #3
Executes Command
Popen
Results
(file system)
Write results
Forks Forks
Th
read
Read results
Centreon Engine & Broker Benchmarks R&D dept., Merethis – May 2012
To decrease IO and make results analysis faster, the execution command result is stored in memory instead of a file, and the main thread read, and then throws results in database.
The Engine Fork overview
Centreon Engine
Fork
Executes
Command Forks
Results in memory
Th
read
#1
Th
read
#2
Write results
Read
Centreon Engine & Broker Benchmarks R&D dept., Merethis – May 2012
Test results
Better CPU usage
Nagios © Centreon Engine
CPU in full active, mix and full passive tests
Centreon Engine has a much more better CPU balance with this simpler fork system. The child processes run with less CPU and it allows forking faster. You can notice that children process need less CPU. The master needs more because it manages more checks too.
Mix
Full passive
50% 84%
33% 85%
66% 35%
50% 87%
50%
50%
100%
Childrens Main
Full active
Centreon Engine & Broker Benchmarks R&D dept., Merethis – May 2012
Lower service check latency
Using Engine instead of Nagios decreases the latency by 10. The monitoring is more up to date and avoids getting the information 5 minutes after a server crash down for example… quite useful.
0 50 100 150 200 250 300 350
Average av. (s)
Max av. (s)
Time (sec)
Services latency
Centreon Engine Nagios
Centreon Engine & Broker Benchmarks R&D dept., Merethis – May 2012
Less IO
Centreon Engine decreases IO by 20%. It limits system to slow and awaiting process that can raise huge problems on virtual machine.
0 100 200 300 400 500 600 700
Cancelled write bytes (kb)
Write bytes (kb)
Cancelled write bytes (kb) Write bytes (kb)
Nagios 99 617
Centreon Engine 51 491
IO (bytes written)
Centreon Engine & Broker Benchmarks R&D dept., Merethis – May 2012
Checks are processed faster
As a result of the previous improvements, the checks are all done faster. In out mix test, Centreon ends the 10.000 checks in 226 sec – 1,4 faster than Nagios does!
0% 20% 40% 60% 80% 100%
Centreon Engine
Nagios
Checks processed
1000 checks
10.000 checks
time (sec)
Centreon Engine & Broker Benchmarks R&D dept., Merethis – May 2012
Key Comparison Findings
Centreon Engine and Centreon Broker start and are ready really faster than Nagios© working with NDO. It gives the control back to the administrator faster and allows to restart with less loose of monitoring data. Centreon Engine requires less CPU, less memory and decreases the number of IO. Checks are done faster with less latency, and much more checks can be done with the same server. Centreon need less powerful monitoring servers thanks to these improvements. They can be run more easily on virtual machines, making the monitoring cloud simpler.
Next steps
More detailled benchmarks are already planned for full performance of Centreon Engine working with Broker. New improvements are already under development in Centreon Engine and Centreon Broker. Some of them for example in the 1.2 Centreon Engine should decrease the CPU activity more than it already does.
Want to know more?
Stay tuned of the future releases by following us on twitter (@centreon) or by reading our websites on http://centreon.com and http://blog.centreon.com