entreon engine entreon roker tehni al...

12
Centreon Engine & Broker Benchmarks R&D dept., Merethis – May 2012 CENTREON ENGINE & CENTREON BROKER - TECHNICAL BENCHMARK - 2012, May

Upload: dokhue

Post on 13-Sep-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

Centreon Engine & Broker Benchmarks R&D dept., Merethis – May 2012

CENTREON ENGINE & CENTREON BROKER - TECHNICAL BENCHMARK -

2012, May

Centreon Engine & Broker Benchmarks R&D dept., Merethis – May 2012

Why this study?

From Nagios ©/NDO to Centreon Engine/Broker

Since a few years, the core of Nagios© has been maintained by a single developer and as a consequence has been evolving at a slower pace. The Nagios© community had attempted several times to broaden the developer base. Many community members, such as Centreon Development team, tried to propose improvements and patches, but with little success. Effective community commitment gradually deflated, long standing community supporters decided to fork Nagios© (among them are the people around Icinga for example), so did we!

To improve performance and quality, Centreon Engine and Centreon Broker have been created as a Nagios alternative. A study…

Thanks to many strategic partners, Merethis has tested Centreon Engine during every development steps. Centreon Engine is now stable and already run in production from small to medium IT networks. However, even if Centreon Engine and Centreon Broker are stable enough to deploy them in production, what about the performances? This study is based on a simple and common use case of production that compare a Nagios© based monitoring system to a Centreon Engine one. This aim of this study is not to reach the top of the Centreon Engine performances but just to compare it with Nagios© in a standard use case and show you the performances it brings. We have added the explanation of the biggest evolution Centreon developers did to make it easier to understand. We have planned to publish full performance benchmarks in the next months. If you want to give us your feedback on this document, the results, or the explanation, feel free to comment the blog post on http://blog.centreon.com where we published this report and all the data.

Study case Performance comparison between Nagios ©/NDO and Centreon Engine/Broker in production mode Use case 30.000 services, checks during between 1 second and over a minute. Author R&D dept., Merethis Date of benchmarks March, 2012 Published on May 2012 Keywords Centreon Engine, Nagios ©, benchmarks, performance, study Find out more on http://blog.centreon.com

Centreon Engine & Broker Benchmarks R&D dept., Merethis – May 2012

Testing Methodology and Configuration

Monitoring server

All the benchmarks were conducted on a monitoring server with this configuration:

Processor 2xIntel Xeon E5640 (2.66GHz, 4 Core, 12Mo cache) RAM 8Go 1333MHz (4x2Go) Operating System CES (CentOS release 5.7) 2.6.18-274.17.1.el5 x86_64

Services and checks

The services and checks have been customized depending of the test case currently conducted: full active services, full passive, and half passive/half active.

Full active services

Active host 1 check_ping (5min) Active services 30.000 check_dummy (5min) Passive service 0

Mix active and passive services

Active host 500 check_ping (5min) Active services 10.000 check_sleep (5min) Passive service 10.000 2000 checks/min

Full passive services

Active host 1 check_ping (5min) Active services 0 Passive service 30.000 6000 checks/min

Products versions

For these benchmarks, we have installed Nagios © 3.2.3, Centreon Engine 1.1.1 and Centreon Broker 2.1.0.

Centreon Engine & Broker Benchmarks R&D dept., Merethis – May 2012

Increase start/ready time

Problem

When starting, NDO needs to check and update its database. It needs a lot of time to start and to get ready… waste of time for administrators! And while it’s starting the monitoring is down and your monitoring system is blind.

Results

The tests were conducted for monitoring a network with the configuration below:

Active host 500 check_ping (5min) Active services 10.000 check_sleep (5min) Passive service 10.000 2000 checks/min The very first test was conducted with Nagios ©, using NDOutils and Centreon Broker, the second test was conducted with Centreon Engine, using NDOutils and Centreon Broker Nagios 3.2.3

Engine 1.1.1

Centreon Broker starts and gives control back to the user 15 times faster than NDO does.

0 200 400 600 800

NDO

Broker

NDO Broker

Start (s) 334 2

Ready (s) 359 21

0 100 200 300 400

NDO (s)

Broker (s)

NDO (s) Broker (s)

Start 3 3

Ready 359 21

Centreon Engine & Broker Benchmarks R&D dept., Merethis – May 2012

Centreon Engine manage passive checks better Passive checks management by Nagios © seems pretty poor from our point of view, Merethis R&D team tried and managed to the waste of time it generates.

Current issue

Nagios passive checks management When passive command results are sent to Nagios©, they are stored in an external command file (Nagios cmd pipe). To manage these results, Nagios© reads the file, stops the main thread (and the active checks) and fills the database with the results. Once finished, Nagios© manage active checks again. But when Nagios© looks for the results in the external command file, it wastes a lot of time, and does not manage the active checks efficiently. Thus, when Nagios© is experiencing a lot of passive check, it fails at managing all of them, and raises the buffer more and more until it reach the system limits and fall down.

Centreon Engine alternative solution

Nagios© wastes a lot of time looping the checks in a linked list. By introducing a hash table, Centreon Engine improves the performances and allows to deals faster with the passive checks. The buffer rises slower and the monitoring system does not have to slow down or stop.

The results

Even if Centreon Engine does not fix all the passive Nagios© performance issues, it allows managing bigger IT monitored system.

In our study, the more relevant example comes with the full passive checks (30.000 passive services). Nagios fails and stop monitoring because it’s full. Centreon Engine keeps going on and succeeds in managing all the checks.

Nagios

Nagios CMD pipe

Passive check results Active checks

Centreon Engine & Broker Benchmarks R&D dept., Merethis – May 2012

Cenreon Engine manages active checks better

Current issue

A deep work has been done on check forks thanks to profiling. Nagios© forks waste resources, time and so has difficulty to manage big IT network checks.

Nagios© Fork overview Nagios© need 3 forks (or 4 depending on your configuration) to execute one check. Thus it needs lots of CPU and memory. Moreover, by writing the results in a file, it does a lot of IO on the file system.

Centreon Engine alternative solution

Centreon Engine brings a simple solution by limiting the fork(s) needed. Instead of forking 3 or 4 times, it forks only one time to execute the check command. The first fork has been removed by adding a new thread waiting for the fork results.

Nagios

Fork #1

Fork #2 Fork #3

Executes Command

Popen

Results

(file system)

Write results

Forks Forks

Th

read

Read results

Centreon Engine & Broker Benchmarks R&D dept., Merethis – May 2012

To decrease IO and make results analysis faster, the execution command result is stored in memory instead of a file, and the main thread read, and then throws results in database.

The Engine Fork overview

Centreon Engine

Fork

Executes

Command Forks

Results in memory

Th

read

#1

Th

read

#2

Write results

Read

Centreon Engine & Broker Benchmarks R&D dept., Merethis – May 2012

Test results

Better CPU usage

Nagios © Centreon Engine

CPU in full active, mix and full passive tests

Centreon Engine has a much more better CPU balance with this simpler fork system. The child processes run with less CPU and it allows forking faster. You can notice that children process need less CPU. The master needs more because it manages more checks too.

Mix

Full passive

50% 84%

33% 85%

66% 35%

50% 87%

50%

50%

100%

Childrens Main

Full active

Centreon Engine & Broker Benchmarks R&D dept., Merethis – May 2012

Lower service check latency

Using Engine instead of Nagios decreases the latency by 10. The monitoring is more up to date and avoids getting the information 5 minutes after a server crash down for example… quite useful.

0 50 100 150 200 250 300 350

Average av. (s)

Max av. (s)

Time (sec)

Services latency

Centreon Engine Nagios

Centreon Engine & Broker Benchmarks R&D dept., Merethis – May 2012

Less IO

Centreon Engine decreases IO by 20%. It limits system to slow and awaiting process that can raise huge problems on virtual machine.

0 100 200 300 400 500 600 700

Cancelled write bytes (kb)

Write bytes (kb)

Cancelled write bytes (kb) Write bytes (kb)

Nagios 99 617

Centreon Engine 51 491

IO (bytes written)

Centreon Engine & Broker Benchmarks R&D dept., Merethis – May 2012

Checks are processed faster

As a result of the previous improvements, the checks are all done faster. In out mix test, Centreon ends the 10.000 checks in 226 sec – 1,4 faster than Nagios does!

0% 20% 40% 60% 80% 100%

Centreon Engine

Nagios

Checks processed

1000 checks

10.000 checks

time (sec)

Centreon Engine & Broker Benchmarks R&D dept., Merethis – May 2012

Key Comparison Findings

Centreon Engine and Centreon Broker start and are ready really faster than Nagios© working with NDO. It gives the control back to the administrator faster and allows to restart with less loose of monitoring data. Centreon Engine requires less CPU, less memory and decreases the number of IO. Checks are done faster with less latency, and much more checks can be done with the same server. Centreon need less powerful monitoring servers thanks to these improvements. They can be run more easily on virtual machines, making the monitoring cloud simpler.

Next steps

More detailled benchmarks are already planned for full performance of Centreon Engine working with Broker. New improvements are already under development in Centreon Engine and Centreon Broker. Some of them for example in the 1.2 Centreon Engine should decrease the CPU activity more than it already does.

Want to know more?

Stay tuned of the future releases by following us on twitter (@centreon) or by reading our websites on http://centreon.com and http://blog.centreon.com