Scheduled vs. Unscheduled downtime in Availability Reports

Andreas Ericsson ae at op5.se
Wed May 21 11:37:24 CEST 2008


Paulus, Jake wrote:
> Good afternoon!
> 
>  
> 
> We are starting to use availability reports as a metric for our IT
> Operations groups' performance and we are running into a minor snag -
> the overall availability report for a hostgroup doesn't distinguish
> between scheduled and unscheduled downtime. I am sure this data is
> available in the nagios logs as this information is presented in the
> availability report for a specific host.

It is, but not in a very good state. In short, there are no entries
stating "This object has entered a state of scheduled downtime, which
will end at exactly <timestamp>". Instead one has to manually work
out which objects are scheduled as down by parsing external commands
and linking them to starting downtime.

> I have found a thread on this
> list talking about just what I am asking about here
> http://thread.gmane.org/gmane.network.nagios.devel/3638/focus=3651 but
> the patches offered in that thread go quite a bit further than I am
> ready to since it looks like they are calculating SLA data for each host
> and also doing special stuff with nagios logs to make the reports work
> better.
> 

I agree. Those patches are way too invasive. Iirc, they were also nearly
impossible to review due to a whitespace/proper-change ratio of 50/1 or
so.

>  
> 
> Has anyone here already encountered this need and produced a patch that
> I haven't found yet?

We had a patch for it, but it turned out to be broken during certain
events (such as overlapping downtime, flexible downtime never triggering,
nagios restarts during object downtimes etc, etc).

To say the least, it's by no means easy to discern where downtime starts
for a particular object just by looking at the logs, and you have to
have a fallback for if the object never leaves downtime.

We ended up writing a NEB-module (to be GPL'd and released next week)
which logs all state-changes to a database and then re-writing the
entire reports-page with our own PHP implementation, which gets it
right 100% of the time (so far, knock on wood).

The database scheme is a lot simpler than that of ndbneb, and the
module only logs exactly that which is needed to get availability
reports, so even if we end up not releasing the GUI parts under a
GPL-like license, I'm sure someone else will start hacking on one
so the community benefits anyways.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft 
Defy all challenges. Microsoft(R) Visual Studio 2008. 
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/




More information about the Developers mailing list