Feature Request

Adam Augustine augustineas at gmail.com
Fri Apr 6 19:53:22 CEST 2007


On 4/4/07, Andreas Ericsson <ae at op5.se> wrote:
> My apologies. I'll rephrase:
> As the scheduled downtime only affects notifications, all state changes get
> logged as usual. Hence, they do not affect availability reports in a negative
> way. However, the start and end of scheduled downtime gets logged as well,
> and those numbers get displayed on the availability reports too. For detailed
> explanations on how to read availability reports, refer to the docs.
>
> > I'll continue to research it.
> >
>
> Do so by reading the docs. You won't find a better explanation else-where.
> www.nagios.org. Click your way from there.
>
> --
> Andreas Ericsson                   andreas.ericsson at op5.se

We use the availability reports extensively, and there are a few bugs
that we have fixed over the last couple of years as well as some
additional functionality we have added. As I recall, the bugs had to
do with improper calculation of the "Time Undetermined" field in
certain circumstances, and the correct assignment of values in the
parenthesis based on what you tell avail.cgi that the undetermined
time should be counted towards.

We had to patch the Nagios core to handle a new variable ("SLA
Target") for our modified version of the Availability report, and we
patched some of the Scheduled Downtime logging to make sure the
downtime states got logged in a way that would properly report the
state across log file rotations and reloads. A side effect of this is
that you no longer require backtracking through all the logs to the
start of scheduled downtime for it to show up properly.

We added a new "SERVICEGROUP SUMMARY" report page, that takes all the
service groups and summarizes their availability numbers. We also sort
based first on whether they are in SLA or out, and then based on
"Actual SLA" which means, the "uptime" they actually achieved. We also
changed the number in parenthesis to represent the "scheduled
downtime" spent in that state.

So time for a particular service gets divided up into 9 "buckets":

1) OK - Not scheduled down (called "Unscheduled" in the availability
detail reports)
2) OK - Scheduled Down
3) WARN - Not scheduled down
4) WARN - Scheduled Down
5) CRIT - Not scheduled down
6) CRIT - Scheduled Down
7) UNKN - Not scheduled down
8) UNKN - Scheduled Down
9) Undetermined

For purposes of our reporting, we decided that only #6 (Unscheduled
CRITICAL time) should count against the SLA, so basically the "Actual
SLA" column is 100%-CRIT.

We also made a change so that you could select different "views" from
a drop down (which basically changed who you were authenticated as,
from avail.cgi's perspective). This probably isn't useful outside our
company.

I had attached some pictures showing how the summary page looks, and
the view of one of the service group links (which looks almost exactly
like the original service group detail availability report), but the
mailing list thought they were too big :-(. So I threw them on flickr
(http://www.flickr.com/photos/7665289@N05/).

There may be other things we changed that I am not recalling at the
moment, but those are the highlights.

We have never posted the patches because (as I understood it) no more
work was going into updating the CGIs because the new interface was
coming out. Since that isn't going to happen until well after 3.0,
maybe there is interest in the patch set.

If there is interest, I think we have them mostly broken out enough
that I could post them somewhere or to the list.

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV




More information about the Developers mailing list