Test Please: Buffer Slots Variable CVS Code

Ton Voon ton.voon at altinity.com
Fri Dec 22 13:30:37 CET 2006


On 22 Dec 2006, at 01:50, Ethan Galstad wrote:

> Based on the recent thread about hanging Nagios processes, I have
> removed the COMMAND_BUFFER_SLOTS and SERVICE_BUFFER_SLOTS definitions
> out to config file variables:
>
> 	external_command_buffer_slots=4096
> 	check_result_buffer_slots=4096
>
> I have also updated nagiostats to report the avail/used number of  
> slots
> for graphing in MRTG.  Could folks try out the latest 2.x CVS code and
> give it some testing?

Ethan,

Thanks for applying to CVS. Several comments:

- external_command_buffer_slots and check_result_buffer_slots only  
needs to be an int as the circular_buffer struct only uses an int for  
items

- in xsddefault.c, when you print out external_command_buffer.items,  
I think this is not thread-safe. My thread knowledge is pretty  
limited, so please correct me if I am wrong. The main nagios process  
writes the status data via xsddefault_save_status_data, which needs  
to read the external_command_buffer variable. However, this variable  
is written to by the command_file_worker_thread. So I think the  
xsddefault_save_status_data routine needs a thread lock on  
external_command_buffers before it can read the items data, otherwise  
there is the potential for corrupt data. Note, there is a cost to  
that, especially if the status data is being written with  
aggregate_status_updates = 0.

- your output to status.dat is different from mine. You are  
outputting max_external_command_buffer_slots (the value defined in  
nagios.cfg) and used_external_command_buffer_slots (the current  
number of items in the buffer). In my patch, I had a different  
definition: max_command_buffer_items meant the "maximum number of  
items that has been in the buffer".

(I would prefer used_external_command_buffer_slots be changed to  
current_external_command_buffer_slots because it more accurately  
describes "this is the number I have now".)

 From now on, I'll call it high_external_command_buffer_items, as it  
can also be the "high water mark of the number of items in the  
buffer". This is a useful statistic as it tells you what the  
max_external_command_buffer_slots should be to get no holdups.

Also, it probably makes sense to put the high water mark within the  
circular_buffer struct.

Please find a patch attached with these changes.

On my small test system, the used_check_result_buffer_slots is  
usually 0. When I introduce 1 fake slave (128 results per 10  
seconds), used_check_result_buffer fluctuates from 0 to 20s to 30s.  
Introducing a 2nd fake slave, the high mark moves up to 100s. A 3rd  
slave moves the high mark to 192.

If I introduce NDO into the system, I get a large iowait time (in the  
80%s), presumably database writes. The status file is not updated as  
regularly (one instance of 60 seconds between writes), but when it  
does, then the high_* values jump up to the 200-300s. This is a  
poorly configured database, so I'm guessing that there are delays due  
to the main nagios process passing data to the the broker module.

At the moment with 2 slaves sending 128 packets per 10 seconds, I'm  
getting high values of 983 for external commands and 1405 for check  
results.

I think these recent changes help with seeing if there are  
bottlenecks at the reading of the command pipe, but I think there are  
possibly other slow downs further down the chain (which Nagios 3 may  
aid with).

Ton

http://www.altinity.com
T: +44 (0)870 787 9243
F: +44 (0)845 280 1725
Skype: tonvoon



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20061222/01918425/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: nagios_show_max_threadsafe.patch
Type: application/octet-stream
Size: 11475 bytes
Desc: not available
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20061222/01918425/attachment.obj>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20061222/01918425/attachment-0001.html>
-------------- next part --------------
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
-------------- next part --------------
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


More information about the Developers mailing list