"expire" a passive check result.

Paul Weaver paul.weaver at bbc.co.uk
Wed Aug 27 15:42:14 CEST 2008


> From: nagios-users-bounces at lists.sourceforge.net 
> [mailto:nagios-users-bounces at lists.sourceforge.net] On Behalf 
> Of Rui Miguel Silva Seabra
> 
> Hello,
> 
> When using passive checks, you *should* do the following: 
> define service {
...
>         check_freshness                 1       
>         freshness_threshold             660
...
> Helpful, though sparse, hints? :D
> 
> Rui

I've being playing with passive checks recently. We have some trap-style
alerts. Most of these traps we can back up with a polling check, -- e.g.


1) disk fails, we get a notification instantly, but we can then snmp
check the machine to confirm the disk has failed, and note when it's
fixed. 

2) A server reboots, we get a trap, but an uptime check would confirm
the situation and allow the service-status page to show an error on the
server until it has being running for 15 minutes (we have servers that
get into reboot loops, it's nice to know how long the server is "out"
for).

In those instances, I believe the best soltuion is to have a service set
up for normal polling every x minutes, but accept external service state
commands to force them to critical/warning (or just force a check)

Some errors though, we want to show on the service problem state page
for x minutes (as they can lead to issues, and it's handy to have a red
blob to point you in a possible direction), but we have no way of
knowing when the fault is fixed. Some of our servers don't accept any
kind of polling for disk/fan/etc states, but do send traps.

I know that the alert history can show these problems, and we can set
the service to be volatile (?) but that's a different page which
involves support people taking time out of their youtubing to look for
errors.

For now, I have these logged to a database at the moment, and have a
service that looks at the results in the table for the last n minutes,
which isn't ideal. What I'd like is a "semi-volatile" option -- a
passive service that when triggered, remains warning/critical for m
minutes before returning to OK. 

I was thinking of firing an event handler to "sleep 600;
set_service_to_ok", with some form of locking, would that be the right
solution?

http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.
					

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list