WEB-Interface performance

Marcel Mitsuto Fucatu Sugano msugano at uolinc.com
Thu Oct 6 01:32:37 CEST 2005


Hi nagios-user list,

I don't know how to begin this question, because i can't imagine how
much use of the nagios web-interface is made by the people who read this
list. But here we use nagios to actively check something around 10k
services now, and up to 2300 hosts. Lately we upgrade our monitoring
pool of machines, setting up a distributed framework to agregate all
warnings at one unique webserver. So far, this new framework is doing
its job, but sometimes, we get around 15 people connected to the nagios
web-interface, and the status.cgi is taking too much time to load. So
here is my question: 
"Is there any ./configure options, or any set of CFLAGS to improve
performance of the cgis?" Here's a snipet from top:

Tasks: 135 total,  18 running, 117 sleeping,   0 stopped,   0 zombie
Cpu(s): 86.8% us, 12.7% sy,  0.0% ni,  0.2% id,  0.0% wa,  0.2% hi,
0.2% si
Mem:   2074356k total,  1450956k used,   623400k free,   170980k buffers
Swap:  2104472k total,        0k used,  2104472k free,  1041400k cached

 PID     USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+
COMMAND           
 8509 nagios    19   0  7244 6096  424 R 34.0  0.3   0:01.17
status.cgi         
 8508 nagios    19   0 14868  12m 8508 R 24.5  0.6   0:01.09
status.cgi         
 8687 nagios    18   0 12756 7104 4600 R 17.5  0.3   0:00.53
status.cgi         
 8690 nagios    18   0 12756 7016 4544 R 17.2  0.3   0:00.52
status.cgi         
 8506 nagios    19   0 14472  11m 7772 R 16.2  0.6   0:01.04
status.cgi         
 8027 nagios    24   0 22952  20m  11m R 12.2  1.0   0:02.93
status.cgi         
 8115 nagios    21   0 22956  15m 6816 R 10.6  0.8   0:02.21
status.cgi         
 8078 nagios    22   0 10412 9348  540 R 10.2  0.5   0:03.30
status.cgi         
 8103 nagios    22   0 10412 9336  528 R 10.2  0.5   0:03.27
status.cgi         
 8046 nagios    21   0 10416 9340  524 R  7.6  0.5   0:03.06
status.cgi         
 7995 nagios    22   0 22956  17m 9420 R  1.3  0.9   0:02.52
status.cgi         
15374 nagios    15   0 39780  21m  908 S  1.0  1.0   1:48.06
nagios             
15382 nagios    16   0  1672  648  540 S  1.0  0.0   0:10.55
nsca               
 8072 nagios    20   0 22948  13m 4844 R  1.0  0.7   0:01.91
status.cgi         
23767 nagios    20   0  223m 8516 2172 S  0.7  0.4   0:00.52
httpd              
23769 nagios    20   0  224m 8272 2172 S  0.3  0.4   0:00.52
httpd              
 8151 msugano   16   0  2040 1136  828 R  0.3  0.1   0:00.05
top                

As you can see, lots of instances of the cgis around, consuming about
90% of CPU time. The problem we are experiencing here, it's that we used
to monitoring nagios service, by checking a regexp at the tac.cgi, and
the thresholds are tight, 6 seconds to warning, 8 seconds to critical
and 10seconds to timeout. We've never experienced critical levels of
this check, but after putting this interface to agregate all alarms, and
having 15~20 people hanged onto nagios interface to see whats happening
with the services they operate, we are dealing with high levels of
response time from cgis.

Finally, the machine that's serving the interface is getting passive
messages from the active monitoring agents and is a Pentium4 HT-SMP
processor, with 2GB memory, SATA HDD, running SuSE9.3 with kernel
2.6.11-8-SMP.
-- 
Marcel Mitsuto
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20051005/a195fdf3/attachment.html>


More information about the Users mailing list