More on notifications and reboot monitoring

Greg Vickers g.vickers at qut.edu.au
Tue Jan 11 00:19:49 CET 2005


Carson Gaspar wrote:
> 
> But there will _be_ no checks (other than ping) after the downtime if 
> anything went wrong, because the host will still be down.

If a host is still down after downtime is over, you should set up a host 
notification (providing you are using active checks) on state 
DOWN/UNREACHABLE. Then when you receive that host down notification you 
will know that none of the services are available on that host.

>> I would use active checks as Andreas suggested for checking host
>> availability.  Passive-only checks might be troublesome to implement
>> reliably.
> 
> I can't. They just don't scale to the number of hosts I need to monitor, 
> in Nagios's current incarnations (including 2.0 beta).

We are monitoring 7k services and 1.5k hosts using Nagios 1.1. Sure it's 
slow as a dog (>60 sec to bring up the cgi web pages, Dual Xeon 2GHz, 
1Gb RAM, SCSI RAID) - we're about to upgrade to 2.0 and I'm expecting to 
see a fairly massive decrease in response time for the web cgi pages. 
Nagios has scaled successfully for us, the monitoring process has low 
latency, most checks get performed within 10 sec of when they were 
scheduled, and notifications go out lickety-split. The only slow part is 
bringing up the web page and who wants to slog through 7k services in a 
web page? (Yes we use active checks on all our hosts.)

> Ah well, I have a solution that works. I'm not thrilled with it, but it 
> handles every corner case I can think of. And will scale to a very large 
> number of hosts. Is anyone else here using Nagios to monitor >1000 
> hosts? My target (right now) is 2k hosts per monitoring server, and a 
> total of about 12k hosts monitored.

I take it you are in a data centre or some such business area. Demarcate 
areas (by client or whatever) and set up distrubuted Nagios boxes to 
monitor sub-areas.

-- 
Greg Vickers
Security Engineer
Network Services
Information Technology Services
Queensland University of Technology

email: g.vickers at qut.edu.au
phone: (07) 3864 9536

CIROS code: 00213J


-------------------------------------------------------
The SF.Net email is sponsored by: Beat the post-holiday blues
Get a FREE limited edition SourceForge.net t-shirt from ThinkGeek.
It's fun and FREE -- well, almost....http://www.thinkgeek.com/sfshirt
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list