[Nagios-users] Re: problems with performance of cgi's

Marcus Hildenbrand Marcus.Hildenbrand at sap.com
Tue Apr 27 15:35:50 CEST 2004


Hi,

making different performance tests with Nagios 2.0 it seems that this 
version performs better on a P4 box than Nagios 1.2. But the 
webinterface is still very slow on our large configuration. So I tried 
to find out where most of the time is spend and find the following two 
sections:

common/objects.c:

line 2803:
/* add a new service to the list in memory */
service *add_service(....
...
...
...
line 3347:
         /* add new service to service list, sorted by host name then 
service description */
         last_service=service_list; 
for(temp_service=service_list;temp_service!=NULL;temp_service=temp_service->next){

 
if(strcmp(new_service->host_name,temp_service->host_name)<0){
                         new_service->next=temp_service;
                         if(temp_service==service_list)
                                 service_list=new_service;
                         else
                                 last_service->next=new_service;
                         break;
                         }

                 else 
if(strcmp(new_service->host_name,temp_service->host_name)==0 && 
strcmp(new_service->description,temp_service->description)<0){
                         new_service->next=temp_service;
                         if(temp_service==service_list)
                                 service_list=new_service;
                         else
                                 last_service->next=new_service;
                         break;
                         }

                 else
                         last_service=temp_service;
                 }




and in common/statusdata.c

line 376
/* adds a service status entry to the list in memory */
int add_service_status(servicestatus *new_svcstatus){
..
..
..
line 430
         /* add new service status to list, sorted by host name then 
description */
         last_svcstatus=servicestatus_list; 
for(temp_svcstatus=servicestatus_list;temp_svcstatus!=NULL;temp_svcstatus=temp_svcstatus->next){

 
if(strcmp(new_svcstatus->host_name,temp_svcstatus->host_name)<0){
                         new_svcstatus->next=temp_svcstatus;
                         if(temp_svcstatus==servicestatus_list)
                                 servicestatus_list=new_svcstatus;
                         else
                                 last_svcstatus->next=new_svcstatus;
                         break;
                         }

                 else 
if(strcmp(new_svcstatus->host_name,temp_svcstatus->host_name)==0 && 
strcmp(new_svcstatus->description,temp_svcstatus->description)<0){
                         new_svcstatus->next=temp_svcstatus;
                         if(temp_svcstatus==servicestatus_list)
                                 servicestatus_list=new_svcstatus;
                         else
                                 last_svcstatus->next=new_svcstatus;
                         break;
                         }

                 else
                         last_svcstatus=temp_svcstatus;
                 }


When I comment out these loops then the execution time of the cgis 
speeds up from 40 seconds to 4 seconds. Ok, the output shows wrong data 
:-). The first loop where the services are read and sorted needs about 
20 seconds. The second loop where the service states are read/sorted 
consumes about 16 seconds.

As I understand the code of these two loops adds a service or service 
state to a list in sorted order by hostname and description. If the 
service or state that should be added will only fit to the end of the 
list the whole list has to be searched before. This seems to be true the 
most of the time and is therefore very time consuming on large 
configurations. One workaround seems to be the order of service entries 
in the config file. In our configuration all service entries are 
normally in sorted order. So every new service will only fit to the end 
of the list. The whole list has to be searched before. After changing to 
reverse sorted order the part where the services are read needs only 2 
instead of 20 seconds. Maybe some similar actions could be done for the 
second part where the service states are read.

Every cgi seems to call these two functions until they have read all the 
services and their states. I think it should speed up the cgi's very 
much if the list is only sorted once after all services and their states 
have been read in.

One thing I don't understand is why the services are read and sorted in 
the cgi's again. One of the new features in Nagios 2.0 is the cached 
object definition file which should hold all the object configuration 
data. If the data inside that file is already sorted than there is no 
need to resort them again. I checked this by putting a #ifdef NSCORE 
around the loop when the services are read and sorted in the function 
add_service. The output of the cgi's seems to be ok and that speeds up 
the cgi's for 20 seconds.

Unfortunately I'm not a C programmer and don't know how to modify the 
code that way. Hopefully this is not a great modification and it speeds 
up Nagios as I expect.

Another useful configuration for large installations is the definition 
of USE_MEMORY_PERFORMANCE_TWEAKS in include/config.h.in before running 
configure. Without that switch the scheduler will get overloaded and the 
check latency will grow dramatically. We are running Nagios with that 
switch enabled for a long time without problems. Maybe this could be 
added to the documentation or added as a configure option.

Thanks and Best Regards
Marcus Hildenbrand


Stanley Hopcroft wrote:
> Dear Sir,
> 
> I am writing to thank you for your letter and say,
> 
> On Fri, Feb 13, 2004 at 10:26:02AM +0100, Marcus Hildenbrand wrote:
>  > Hi,
>  >
>  > we are currently monitoring 2100 Hosts with 9900 active service checks
>  > with Nagios 1.2. The main problem of that large number of monitored
>  > hosts are the cgi's. Most of the cgi's need more than 30 seconds to
>  > load. The current installation is running on a server with 4x700 MHz
>  > Pentium 3 CPU's with 4 GB RAM running SuSE SLES 7. The check latency is
>  > normally under 2 seconds and the cpu idle time is about 33%. So the
>  > scheduling of the active service checks and the overall CPU performance
>  > seems to be no problem.
> 
> One stupid suggestion is that if you have hacking/coding resources you
> might want to have the CGIs deliver gzipped output; this may be doable
> by Apache or an Apache module.
> 
> The ntop project does this (not with Apache) and the performance is very
> crisp, even on very underpowered ntop hosts.
> 
>  > Will the cgi's be faster in Nagios 2.0 for large configurations?
>  >
> 
> This change in 2.0 is aimed (IIRC) at boosting performance
> 
> '3. Daniel Drown's chained hash patch for object search functions'
> 
> - replacing linked list searches for objects with hash lookups.
> 
> If I understand corectly, this is already in the 2.0 alpha so you might
> give it a pop on your P4 box.
> 
>   .. snip ..
>  
>  > Any hints how to solve this problems,
> 
> Apart from the tuning info in the docs
> (http://you/nagios/docs/tuning.html).
> 
> There have been a few letters about the performance of enormous Nag
> installations (mainly about check latency IIRC); you may find that the
> archives have something to offer.
> 
>  >
>  > Many thanks
>  > Marcus
>  >
> 
> -- 
> ------------------------------------------------------------------------
> Stanley Hopcroft
> ------------------------------------------------------------------------
> 
> '...No man is an island, entire of itself; every man is a piece of the
> continent, a part of the main. If a clod be washed away by the sea,
> Europe is the less, as well as if a promontory were, as well as if a
> manor of thy friend's or of thine own were. Any man's death diminishes
> me, because I am involved in mankind; and therefore never send to know
> for whom the bell tolls; it tolls for thee...'
> 
> from Meditation 17, J Donne.
> 
> 
> -------------------------------------------------------
> SF.Net is sponsored by: Speed Start Your Linux Apps Now.
> Build and deploy apps & Web services for Linux with
> a free DVD software kit from IBM. Click Now!
> http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click 
> <http://ads.osdn.com/?ad_id=1356&alloc_id=3438&op=click>
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when 
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
> 


-------------------------------------------------------
This SF.net email is sponsored by: The Robotic Monkeys at ThinkGeek
For a limited time only, get FREE Ground shipping on all orders of $35
or more. Hurry up and shop folks, this offer expires April 30th!
http://www.thinkgeek.com/freeshipping/?cpg=12297




More information about the Developers mailing list