nagios latency

Marcello Russo markel at tin.it
Thu Jul 28 09:49:51 CEST 2005


Hi
we work for a service provider in italy, and we use nagios for  
monitoring many platforms in our ceds.
During last week we had a problem, an unexpected blackout with (!)  
the lacked activation of the electricity-generating group: 600 not  
running server!
The Nagios performance, during the black-out, decreased drastically:  
the latency check at the end of the day arrived at 4 hours!
In this week we look in the code, and we've see that the service  
check and the host check, even if they stay in the same lists (low  
and high priority) have a separate method check:
the services can run in parallel, but not the host.
In the file checks.c the function run_service_check permit the  
multiple execution of the scheduled services, which is not  
implemented in the run_host_check.
Why you use a serial method for the host check?
If is possible and if you want, we can work together for a solution.

p.s.
now we use check_fping intstead of check_ping (we even try with this  
settings: warning 0,400 sec. critical 0,800 sec. timeout 1, 2 packets  
with poor performance) for maximize the performance (in case of  
problems...), but this isn't a definitive solution, because when all  
the server are down,  the latency arrives quickly at 40 minutes!

Thanks

Marcello, Andrea, Lorenzo.


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO September
19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf




More information about the Developers mailing list