Patch submission for comments : CGI speed improvement (XNG)

Fran=?utf-8?B?w6c=?=ois Laupretre francois.laupretre-prestataire at calyon.com
Tue Jun 7 16:32:11 CEST 2005


Hi all,

Here is a patch I am submitting to your comments. The purpose of this patch
is to bring a solution to the CGI performance problem met on configuration
with several thousands of hosts/services.

History :

I currently have a configuration with about 2000 hosts, 3400 services, and
their dependencies (more than 6000). And this configuration is supposed to
grow by a factor of 3 in the near future. On the machine I am using now
(quite slow, Sun E250 with only 1 proc), a request for 'status.cgi?host=all'
takes about 5 seconds of CPU time. With the daemon and the checks, it means
that this CGI execution takes about 25 s of elapsed time. Two new machines
have been ordered, with 4 procs each, but, as more and more things are
monitored, more and more people want to access to the web interface, and
whatever processor I have on my server, I will quickly meet the same
performance problem. And I don't consider that adding some CPU and memory is
the solution to any performance problem :-)

This is why I started thinking about a way to improve the response time of
CGIs, and especially, to lower the impact of an increase in the number of
concurrent web accesses.

My first step was to look at nagios-db. I finally decided not to use it,
mostly because :

	- I want to use the original nagios UI (especially with the nuvola
skin ;-) ).
	- Talking about postgres materialized views, I don't want to choose
between data freshness and CGI response time. I need a solution where I see
'realtime' data in an acceptable response time.

The system I am submitting today was designed with these goals in mind :

	- No modification to the CGI code.
	- Less than 1 sec on CPU time on my server for a
'status.cgi?host=all'
	- a minimal number of changes in the nagios code (except xdata).
	- Full compatibility with the current communication system. The
objects.cache and status.dat file remain the same, in order to keep the
compatibility with all the add-ons who read their information from these
files.

Some profiling in some CGIs confirmed that the two main performance problems
were the reading of configuration and status data (88 % for a full
status.cgi, and more than 95% for extinfo.cgi). That's why I designed a new
system to store and retrieve the data. I kept the system of flat files
because I don't see any interesting alternative. We could do it with shared
memory but I don't expect much improvement in terms of performance and it
brings a new problem : you cannot know which size you will need (for status
data).

The new communication system uses two files, like the current one, but the
format of these files is made to be read very fast by the CGIs. It includes
the objects in binary struct form, the hashtables (which don't have to be
recomputed), and everything to restore the object and status environment in
the fastest possible way. I don't give more explanation on the format today
because I am waiting to know if somebody is interested before writing a real
documentation.

Here are some performance facts :

The request I use is 'status.cgi?host=all'.

Original CPU time (ms) / New CPU time (ms) / Factor of improvement :

Reading object configuration : 2450 / 80 / 30 x
Reading status data : 1640 / 50 / 33 x
Rest of code : 530 / 480 / -10 %

Total CPU time : 4620 / 610 / 7.6 x

In this request, the global performance improvement (760 %) is relatively
low because there is much computing to display the page. But, for something
like extinfo.cgi on one host, there is so few computing that the global
improvement is nearly 30x.

Now, the next step is to see if you find it interesting enough to include it
in a future version of nagios. If you test it, please let me know how much
improvement it brings in your case.

Installing :

The reference version for this patch is the 2.0b3 (I will do it for the CVS
version if needed). The file names it uses are not read in configuration
file yet. They must be set manually in xdata/xsdng.c and xdata/xodng.c, as
XSDNG_DUMP_FILE and XODNG_DUMP_FILE. In a future version, the names can be
derived from the 'status_file' and 'object_cache_file' config vars, or there
can be two new config vars for them, TBD.

Regards

François






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20050607/f392cdcf/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nagios_xng_patch_1.0.gz
Type: application/octet-stream
Size: 12129 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20050607/f392cdcf/attachment.obj>
-------------- next part --------------
Ce message et ses pièces jointes (le "message") est destiné à l'usage    
exclusif de son destinataire.                                            
Si vous recevez ce message par erreur, merci d'en aviser immédiatement   
l'expéditeur  et de le détruire ensuite. Le présent message  pouvant  
être altéré à notre insu,  CALYON Corporate and Investment Bank                              
ne peut pas être engagé par son contenu. Tous droits réservés. 
          
This message and/or any  attachments (the "message") is intended for     
the sole use of its addressee.                                            
If you are not the addressee, please immediately notify the sender and    
then destroy the message.  As this message and/or any attachments may 
have been altered without our knowledge,  its content  is not legally 
binding on CALYON Corporate and Investment Bank. All rights reserved.                                                                


More information about the Developers mailing list