Simulating downtime in nagios

Tom Throckmorton throck at gmail.com
Wed Oct 8 16:29:09 CEST 2008


On Oct 07 20:30, Kelly Jones wrote:
> On 10/6/08, Tom Throckmorton <throck at gmail.com> wrote:
> > On Oct 06 18:57, Kelly Jones wrote:
> >> Thanks, Tom.
> >>
> >> Yes, I'm trying to simulate a host/service outage, not scheduled downtime.
> >>
> >> The problem w/ submitting a passive check is that the next ACTIVE check
> >> will
> >> invalidate it. Example: you tell nagios that machine foo is down. That's
> >> soft
> >> alert 1, not enough to generate any emails. Nagios then active checks foo
> >> and
> >> sees that it's up. Of course, you can submit another passive check, but
> >> you'll ping-pong (flap) between up and down states.
> >
> > OK, so it sounds like you want to be able to have Nagios temporarily stop
> > managing the service check scheduling for this service, long enough for you
> > to
> > inject some bogus results.  Seems like rescheduling the next active check
> > (SCHEDULE_FORCED_SVC_CHECK) would do the right thing as far as pushing the
> > next
> > scheduled check into the future.  Or maybe you want to disable active checks
> > for the service (DISABLE_SVC_CHECK), run your simulation, and then re-enable
> > them...?
> 
> I may've done it wrong, but SCHEDULE_FORCED_SVC_CHECK means that
> nagios won't send any alerts at all. 

This command manipulates the check scheduling queue for _active_ checks; it has
no direct impact on alerts:

http://www.nagios.org/developerinfo/externalcommands/commandinfo.php?command_id=129

...so forcing a service check to some time in the future will delay the active
checks, but you can still submit passive checks (and generate alerts, assuming
the result you're submitting is different than the current state of the
service)

> Basically, messing with nagios' check schedule also screws up its
> notification schedule.
> 
> And, since I'm testing notifications, that's not useful.

I must be missing something here.  If, for example, I do the following for a
given service which is currently OK, and for which active checks are normally
accepted:

- delay the next check via SCHEDULE_FORCED_SVC_CHECK now + 1 hour
- submit a passive result with a state of CRITICAL
  (PROCESS_SERVICE_CHECK_RESULT) x $max_check_attempts

As expected, I see:

- an alert for each result I've submitted
- the status changes to SOFT/CRITICAL after the first result, and HARD/CRITICAL
  after $max_check_attempts has been reached
- a notification about the problem

The next scheduled check remains at the time + 1 hour.  If I submit an OK
result, the status changes from CRITICAL to OK, and I get a recovery
notification.  And I can repeat this as often as I like within the time before
the next scheduled active check.

How is this different than what you're trying to achieve?


-tt

> I've written several nagios tests myself, and they're all in one Perl
> program (each subroutine = one test). For these, simulating downtime
> is easy. The script reads downtime from a file and automatically
> exits w/ 1 or 2 during downtime instead of running the subroutine.
> 
> I'm tempted to run ALL nagios tests in a wrapper, but that seems so
> ugly for such a simple? problem.
> 
> -- 
> We're just a Bunch Of Regular Guys, a collective group that's trying
> to understand and assimilate technology. We feel that resistance to
> new ideas and technology is unwise and ultimately futile.


-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list