Reports only show data from a specific tim e period?

BOLLENGIER Eric ebollengier at sigma.fr
Fri Feb 6 14:23:19 CET 2004


Hi,

At this time (with nagios 2a), the trend.cgi doesn't work very well... 
Let me explain more precisely what I mean :
my check_period is 8-19_5x7, but the trend report computes statistics on
24-24_7x7. The thing is, how can nagios give statistics on periods it is
not even checking ? The result, for now, is that, if my service goes
CRITICAL right before the end of the check_period, nagios assumes it is
CRITICAL until the next check-period. Which, obviously, is not what I
want : my statistics are wrong. Or am I missing something somewhere ?

Wouldn't it be better to have it set to "UNDETERMINATE" outside of the
check_period ?

To patch this, we can restart nagios process every hour, so undetermiate
state will appear...

Or else, if we could use a "masq" timeperiod when computing the
statistics, it would probably solve all our problems : we could report
only what we want for who we want : 
- working hours for our managers (or clients, for that matter)
- all day for us, as we do need to know if there is a potential problem
on our systems.
- etc.

Thanks in advance,
Regards,

Eric

On Fri, 2004-02-06 at 01:54, Paul L. Allen wrote:

> Hi Andre 
> 
> Andre Bergei writes: 
> 
> > Yes, that's is the idea. The reason the managers want this is to prove
> > to the customer That they had uptime during the service hours. The hole 
> > point is that  They dont care what happen at night, that is our problem,
> > the sys admins. If there is downtime during Service hours, there will be 
> > economic penalties if totalt downtime dont meet The demands of the SLA 
> > agreement.
> 
> I understand the reasoning, I just dispute the logic behind it. 
> 
> I know that ADSL lines in the UK are unreliable - I know it for a fact
> because Nagios proves it to me.  Most of the problems occur out of
> working hours (because we consider 0800-1830 to be working hours).  Our
> clients want to know about them because they can claim compensation from
> their ADSL suppliers even though the outages didn't affect actual
> operation. 
> 
> I know that if I have a host or services which are unreliable out of
> hours, when the workload is minimal, and which are not caused by power
> failures or ADSL outages that there are serious problems. One of our
> clients had a very bad power feed that caused eventual physical disk
> corruption, and without statistics showing that many of the problems
> occurred out of working hours would have had a harder time claiming from
> he power company. 
> 
> But, in the end, our clients want to know that we are being honest with
> them.  That we are not hiding problems that occur out of working hours
> and pretending everything is perfect because they have not *yet* had
> problems during working hours.  They want to know what is happening out
> of working hours to prove that we're not hiding anything from them
> (they would want such proof from whoever provided their infrastructure). 
> 
> It seems to me that what you are arguing for is that the statistics
> CGIs should take into account working hours for the contacts who view
> them.  So that if you view the stats you see the whole picture and if
> some client whose working hours are 7am-7pm views them he or she sees
> only the problems that occurred in that interval.  That way you could
> define user x-working-hours who only saw the limited information and
> x-overall who saw everything (which would allow customer X to see both
> views and know that you met your SLA but that there are problems outside
> of working hours which might eventually impact regular operations). 
> 
> > Why would we want to supress information?
> > This is solved by having techy reports for the techyies, and 
> > Boring availability reports for the managers. The right information to
> > the right people, a good thing!
> 
> Throwing away information is ALWAYS a bad idea.  Providing two different
> views onto the information ("this is the availability when you actually
> needed it" and "this is the overall availability whether you needed it
> or not") is a good thing. 
> 
> >  Like it or not, in the "wonderful" world of out-sourcing, things like
> > service level agreements becomes more and more common, in fact, customers 
> > _demand_ it.
> 
> As I just explained, our customers not only want SLAs during periods when
> the service is critical to them, they also want to know about problems
> outside of those hours for various reasons.  If nothing else, giving
> them the 24x7 info shows them that we're being honest with them.  If
> every host they have goes down within roughly the same time interval
> we can point out that it must have been an ADSL failure or a power
> failure; if one host alone has problems they know it is probably our
> fault.  Whether it happens out of working hours or not, they use that
> information to evaluate our service level against that of their Internet
> feed and power feed.  If I can't be honest with our customers, I don't
> want to work here... 

-- 
Eric BOLLENGIER, Administrateur Système - Poste 1325
SIGMA Informatique http://www.sigma.fr
3 rue Newton, BP 4127, 44241 La Chapelle sur Erdre Cedex
tel : 02.40.37.14.00
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20040206/6debec77/attachment.html>


More information about the Users mailing list