Simulating downtime in nagios

Andy Shellam andy-lists at networkmail.eu
Mon Oct 6 22:06:28 CEST 2008


Hi Kelly,

When I've done this in the past, for network services (e.g. http/smtp 
checks) I've actually blocked the target port on the Nagios server, 
which gives a better simulation that the service is down (e.g. for HTTP 
checks, block the Nagios server's outbound port 80.) 

This works for us because as well as the router firewalls, each server 
runs a local software firewall, so it's easy to block outbound packets 
to a particular port on the Nagios server without affecting the service 
itself, simulating the effect of a network/service failure.

However when it comes to checks such as disk space, it can be a bit 
trickier!  I've done things like changing the thresholds for a failure 
(e.g. if disk space is currently 15% capacity, I set my warning alert to 
be 20%, restart Nagios and wait for the alerts to come, and the same for 
critical, then reset back to 90% when complete) and I have done before 
as you suggested, change the service's check and retry intervals in 
Nagios to be something lengthy (e.g. an hour) then submit a passive 
'failure' check result and wait until Nagios re-checks the service - 
this method also checks how Nagios alerts you when the service returns 
to OK.

Hope this helps, it'd be interesting to hear how/if others do it!

Andy

Kelly Jones wrote:
> What's the best way to simulate (not schedule) downtime in nagios?
>
> I want to "pretend" a service is down for a certain amount of time to
> see what alerts nagios sends, etc.
>
> I've come up w/ two bad ways to do this:
>
>  % Edit the config file to change the test to "check_dummy". I want to
>  run these "fire drills" via cron, and editing a file and restarting
>  nagios seems a little ugly.
>
>  % Submit a passive check saying the service is down, and reschedule
>  the next check 4 hours later, so the service is 'down' for 4
>  hours. This can be done using the nagios named pipe, so it's easy to
>  cron. Problem: doing things this way suppresses the alerts (when you
>  don't test a service, it doesn't send an alert).
>
> Thoughts?
>
>   

-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list