More on notifications and reboot monitoring

Andreas Ericsson ae at op5.se
Mon Jan 10 12:21:28 CET 2005


Carson Gaspar wrote:
> 
> 
> --On Monday, January 10, 2005 12:36 AM -0600 Ethan Galstad 
> <nagios at nagios.org> wrote:
> 
>> On 7 Jan 2005 at 14:37, Andreas Ericsson wrote:
>>
>>
>> This isn't really a bug that you want to fix, as it will cause a lot
>> of not-so-great side effects.  When you schedule downtime for a host,
>> anything that happens during that time is fair game and is ignored
>> for purposes of notification (that's why its in downtime).  When
>> downtime ends, Nagios will not notify about a problem that happened
>> during that downtime - that's what downtime was for.  If the problem
>> continues after downtime (i.e. an active check returns a problem),
>> then a notification can occur.
> 
> 
> But there will _be_ no checks (other than ping) after the downtime if 
> anything went wrong, because the host will still be down.
> 
>> I would use active checks as Andreas suggested for checking host
>> availability.  Passive-only checks might be troublesome to implement
>> reliably.
> 
> 
> I can't. They just don't scale to the number of hosts I need to monitor, 
> in Nagios's current incarnations (including 2.0 beta).
> 
> Ah well, I have a solution that works.

I'd be happy to hear the theory.

> I'm not thrilled with it, but it 
> handles every corner case I can think of. And will scale to a very large 
> number of hosts. Is anyone else here using Nagios to monitor >1000 
> hosts?

Yes.

> My target (right now) is 2k hosts per monitoring server, and a 
> total of about 12k hosts monitored.
> 

Just out of curiousity, where on earth (or off) is there an environment 
with 12 000 servers that need to be monitored?

> Once I've finished rolling this out, and have better performance data, 
> I'll be writing it up and will post a link here. With luck, my corporate 
> lords and masters will allow me to release the source code to my client 
> side monitor agent and my server side queueing server.

They would be stupid not too. The first rush of developers to a new 
project is usually enough to add those features you thought were nifty 
but just didn't care about, and to iron out those bugs that only happens 
when there are non a-z chars in hostnames.

> Our config 
> management code has too many proprietary DB hooks to be useful anywhere 
> else.
> 

If the downtime scheduling thing plugs in to the proprietary DB you 
could simply add a wrapper so that others can hook up a more generalized 
solution later.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Lead Developer


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list