[Nagios-users] Re: problems with performance of cgi's

Marcus Hildenbrand Marcus.Hildenbrand at sap.com
Wed Apr 28 14:31:17 CEST 2004


Hi,

found out another workaround for fixing the performance problems in 
add_service_status when reading the service status data. I wrote a short 
perl script which resorts the services in reverse order in the 
status.dat file. Normally all service status data is in sorted order in 
this status file. When using reverse sorted order for the service 
entries this speeds up the cgi's in our configuration about 15 seconds. 
Maybe this is not supported and will get trouble in some cgi's. The 
cgi's I checked (tac.cgi, extinfo.cgi) seems to work. I've done this in 
a running Nagios environment replacing the normal status.dat with a new 
one created by the perl script mentioned above.

So using now reverse sorted order for the service objects and service 
status the cgi's only needs < 5 seconds from before > 40 seconds.

Maybe the sort order in the status logfile could be changed.

This was all done with the current Nagios 2.0 CVS code.

Thanks and Best Regards
Marcus Hildenbrand


Hildenbrand, Marcus wrote:
> Hi,
> 
> making different performance tests with Nagios 2.0 it seems that this
> version performs better on a P4 box than Nagios 1.2. But the
> webinterface is still very slow on our large configuration. So I tried
> to find out where most of the time is spend and find the following two
> sections:
> 
> common/objects.c:
> 
> line 2803:
> /* add a new service to the list in memory */
> service *add_service(....
> ...
> ...
> ...
> line 3347:
>          /* add new service to service list, sorted by host name then
> service description */
>          last_service=service_list;
> for(temp_service=service_list;temp_service!=NULL;temp_service=temp_service->next){ 
> 
> 
>  
> if(strcmp(new_service->host_name,temp_service->host_name)<0){
>                          new_service->next=temp_service;
>                          if(temp_service==service_list)
>                                  service_list=new_service;
>                          else
>                                  last_service->next=new_service;
>                          break;
>                          }
> 
>                  else
> if(strcmp(new_service->host_name,temp_service->host_name)==0 &&
> strcmp(new_service->description,temp_service->description)<0){
>                          new_service->next=temp_service;
>                          if(temp_service==service_list)
>                                  service_list=new_service;
>                          else
>                                  last_service->next=new_service;
>                          break;
>                          }
> 
>                  else
>                          last_service=temp_service;
>                  }
> 
> 
> 
> 
> and in common/statusdata.c
> 
> line 376
> /* adds a service status entry to the list in memory */
> int add_service_status(servicestatus *new_svcstatus){
> ..
> ..
> ..
> line 430
>          /* add new service status to list, sorted by host name then
> description */
>          last_svcstatus=servicestatus_list;
> for(temp_svcstatus=servicestatus_list;temp_svcstatus!=NULL;temp_svcstatus=temp_svcstatus->next){ 
> 
> 
>  
> if(strcmp(new_svcstatus->host_name,temp_svcstatus->host_name)<0){
>                          new_svcstatus->next=temp_svcstatus;
>                          if(temp_svcstatus==servicestatus_list)
>                                  servicestatus_list=new_svcstatus;
>                          else
>                                  last_svcstatus->next=new_svcstatus;
>                          break;
>                          }
> 
>                  else
> if(strcmp(new_svcstatus->host_name,temp_svcstatus->host_name)==0 &&
> strcmp(new_svcstatus->description,temp_svcstatus->description)<0){
>                          new_svcstatus->next=temp_svcstatus;
>                          if(temp_svcstatus==servicestatus_list)
>                                  servicestatus_list=new_svcstatus;
>                          else
>                                  last_svcstatus->next=new_svcstatus;
>                          break;
>                          }
> 
>                  else
>                          last_svcstatus=temp_svcstatus;
>                  }
> 
> 
> When I comment out these loops then the execution time of the cgis
> speeds up from 40 seconds to 4 seconds. Ok, the output shows wrong data
> :-). The first loop where the services are read and sorted needs about
> 20 seconds. The second loop where the service states are read/sorted
> consumes about 16 seconds.
> 
> As I understand the code of these two loops adds a service or service
> state to a list in sorted order by hostname and description. If the
> service or state that should be added will only fit to the end of the
> list the whole list has to be searched before. This seems to be true the
> most of the time and is therefore very time consuming on large
> configurations. One workaround seems to be the order of service entries
> in the config file. In our configuration all service entries are
> normally in sorted order. So every new service will only fit to the end
> of the list. The whole list has to be searched before. After changing to
> reverse sorted order the part where the services are read needs only 2
> instead of 20 seconds. Maybe some similar actions could be done for the
> second part where the service states are read.
> 
> Every cgi seems to call these two functions until they have read all the
> services and their states. I think it should speed up the cgi's very
> much if the list is only sorted once after all services and their states
> have been read in.
> 
> One thing I don't understand is why the services are read and sorted in
> the cgi's again. One of the new features in Nagios 2.0 is the cached
> object definition file which should hold all the object configuration
> data. If the data inside that file is already sorted than there is no
> need to resort them again. I checked this by putting a #ifdef NSCORE
> around the loop when the services are read and sorted in the function
> add_service. The output of the cgi's seems to be ok and that speeds up
> the cgi's for 20 seconds.
> 
> Unfortunately I'm not a C programmer and don't know how to modify the
> code that way. Hopefully this is not a great modification and it speeds
> up Nagios as I expect.
> 
> Another useful configuration for large installations is the definition
> of USE_MEMORY_PERFORMANCE_TWEAKS in include/config.h.in before running
> configure. Without that switch the scheduler will get overloaded and the
> check latency will grow dramatically. We are running Nagios with that
> switch enabled for a long time without problems. Maybe this could be
> added to the documentation or added as a configure option.
> 
> Thanks and Best Regards
> Marcus Hildenbrand
> 


-------------------------------------------------------
This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. 
Take an Oracle 10g class now, and we'll give you the exam FREE.
http://ads.osdn.com/?ad_id=3149&alloc_id=8166&op=click




More information about the Developers mailing list