(Service Check Timed Out) returns critical

Scott lists.scott at themagicbox.net
Mon Nov 18 14:38:03 CET 2002


On your hosts.. always set a parent.. this way when a host becomes
unreachable it will walk to parent tree and see where the network has
actually failed.. This is basically a dependancy of hosts and makes for a
lot less pages/emails when something closer to nagios fails.

Example:

efine host{
        host_name               some.host
        alias                   some.host.alias
        address                 some.hosts.ip.address
        check_command           check-host-alive
        max_check_attempts      10
        notification_interval   40
        notification_period     24x7
        notification_options    d,r
        parents                 some.switch.on.my.network
        }

This means that on check-host-alive of some.hosts.ip.address failing, it
checks some.switch.on.my.network to ensure it is actually the host that
has failed and in case the switch has failed. Then it only pages for that
and sets a blocking outage on the web page.. pretty nifty :)

Scott


Michael Markstaller said:

> Hi,
>
> I'm using nagios to check approx 100 hosts and 350 services working fine
> so far.
> I'm asking myself if it's possible to tell nagios to report "unknown"
> instead of critical if a service check times out ? I tried to set the
> "service_check_timeout" in nagios.cfg to 30 to have nagios kill
> non-responsive service-checks quicker in case of a high load due to many
> unreachable hosts (see below) but this resulted in getting dozens of
> cirtical-alerts due to (Service Check Timed Out) with check_snmp.
> Because I'd prefer to get "unknown" in case of any plugin-timeout error
> not resulting in a retrieved value. Or maybe this problem is located
> within check_snmp ?
>
> The hosts are mostly routers and quite distributed, so I have made
> dependencies for all hosts to get a notification only on the host
> failing but this doesn't work so well like I think it should. If for
> instance the first router on which all others are depending fails,
> nagios messes quite up with a few hundred processes for pending checks
> and gives me many false alerts instead of the causing the problem.
> Anybody with some general giudeline to help getting useful alerts when
> something "core" fails (like the switch the nagios-server is attached to
> or DNS etc.)
>
> Thanks,
>
> Michael Markstaller
>
> Elaborated Networks GmbH
> www.elabnet.de
> Lise-Meitner-Str. 1, D-85662 Hohenbrunn, Germany
> fon: +49-8102-8951-60, fax: +49-8102-8951-80
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by: To learn the basics of securing
> your web site with SSL, click here to get a FREE TRIAL of a Thawte
> Server Certificate: http://www.gothawte.com/rd524.html
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
>



-------------------------------------------------------
This sf.net email is sponsored by: To learn the basics of securing 
your web site with SSL, click here to get a FREE TRIAL of a Thawte 
Server Certificate: http://www.gothawte.com/rd524.html




More information about the Users mailing list