Nagios-devel digest, Vol 1 #1058 - 2 msgs

Vegard Hanssen Vegard.Hanssen at xait.no
Fri Apr 21 13:03:18 CEST 2006


Andreas Ericsson wrote:

> Please refrain from top-posting. It makes following the discussion a 
> lot harder.
>
> Vegard Hanssen wrote:
>
>> There is a problem with defining check-host-alive to check_nrpe.
>>
>> A normal problem in my setup:
>>
>> One host get a too high load. This will time out the nrpe checks, 
>> giving me 5-10 sms for the timeout, and then (2 minutes later) the 
>> same sms for OK. The host isn't down, it's just in a very busy state, 
>> and I know this since I don't get a host down message. If I change 
>> the check-host-alive to nrpe I will then get a host down message, 
>> which can mean anything from high load, host is unreachable or host 
>> is actually down. Host down = I have to drop everything and get to 
>> work, High Load = let's give it a minute to cool down first.
>>
>> I could do the same as Øysten Bleie suggest, but I'm not sure that's 
>> good either. Actually I'm not sure what's best to do, so I've for the 
>> moment stuck with all the sms.
>>
>
> You could bite the bullet and set up the service dependencies. You 
> want Nagios to automagically understand that there's a relationship 
> between the nrpe-based services and the nrpe-service itself, but since 
> Nagios has no understanding of what checks you're running this desire 
> is quite complex to implement. Nagios lets you do the part requiring 
> accuracy (namely the "what needs what" part) and then handles the rest 
> for you.
>
> With some clever scripting and a generic naming convention I'm sure 
> you'll be able to set everything up in half an hour, saving you quite 
> a few notifications on failures.
>
I know this is complex, that's why I haven't figured out a good way to 
do it yet. Setting up service dependencies aren't that easy either. 
Creating a script which can generate the files is easy, but how do you 
set it up? Which of the services will fail first? You never know. So I 
need to put every service as depence of other services, which isn't very 
smart. Or does nagios always check the master before sending out a 
notification failure on a service dependency?

And, if I put the master on eg. load - when I get a message, will this 
mean the load is so high that it's a problem for the other services or 
just load? Which brings me back to finding the solution. So far I'm 
stuck with all the messages if I want to be able to pinpoint the real 
problem. You can't always put a computer to do a humans work, yet.

-- 
Mvh / Kind regards


Vegard Hanssen
Xait A/S

vegard.hanssen at xait.no

For general enquiries please call Xait support at: +47 51 95 02 13

CONFIDENTIALITY. This e-mail and any attachments are confidential
and may also be privileged. If you are not the named recipient,
please notify the sender immediately and do not disclose the
contents to another person, use it for any purpose, or store
or copy the information in any medium.



-------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid0709&bid&3057&dat1642




More information about the Developers mailing list