Large scale network monitoring limits with nagios

Noah Leaman noah at mac.com
Thu Mar 11 13:52:00 CET 2004
Previous message: volatile state stalking - snmp traps service notifications
Next message: Large scale network monitoring limits with nagios
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hopes it's o.k. cross posting to both groups on this matter...

Using the concept of one service per up/down trap for each network 
interface, I tested a little by creating a very simple set of nagios 
configs, but with about 8000 PASSIVE service checks and no active 
service checks. of course there was no problem in terms of scheduling 
issues, but the CGIs all crawled to a snails pace. In my setup (nagios 
1.2, Dual G4 first-gen xServe) it takes about 30 secs to display the 
Status Summary page.

Of course that config setup isn't the actual production plan...

I enabled the closer to real-world configs:

552 check_traffic (2 snmpgets running every 10 minutes per service 
check storing to an RRD)
295 check_ping (number of locally monitored hosts)
8389 check_dummy (mostly the up/down Trap and about 100 are passive 
services coming from 2 other distributed nagios servers doing pings and 
check_traffics)

... So 9236 services all together but this is really just a small 
subset of what I would like to be able to do. The plan is to through 
hardware at it to spread out the real work being done (i.e. the active 
checks).

But with just this setup, a single CGI take up an entire CPU to run and 
for a few minutes a lot of the time... and the plan was to have a good 
handful of GUI users (5 ish at a time)... it's just about unusable with 
one GUI user.

How to monitor traps for hundreds of network hosts and tens of 
thousands different interfaces each of which could generate up/down 
traps along with other traps. I tried setting up a single "catch-all" 
trap service per host, but notification would need to occur when going 
from and OK to another OK (with a different output). Shouldn't this 
work with is_volatile on and stalking_options set to o,w,u,c (every 
test I've done to get this working from OK to OK doesn't work... but 
maybe I missed something).

So the higher level question here is am I over my head in what or how I 
can do this with nagios? After tackling the network monitoring needs, 
the plan was to then start the server monitoring (around 1000 servers 
of many platforms).

Any helpful guidance?

-- 
Noah


On Wednesday, March 10, 2004, at 06:51  PM, Noah Leaman wrote:

> I have over 70,000 interfaces/ports (just the up/up ones) for which I 
> could receive linkDown and linkUp traps for. And this is just a 
> sampling of hosts on our network to pilot nagios to see if it can do 
> what we want. Doesn't it seem a little crazy to have to deal with that 
> many services even if they are passive? And this is just linkDown and 
> linkUp. What about all other possible traps that could be received?
>
> -- 
> Noah
>
>
> On Friday, March 5, 2004, at 01:15  AM, Jim Mozley wrote:
>
>> Noah Leaman wrote:
>>
>>> How do you all address the issue of trap monitoring when you want 
>>> notifications for them?
>>
>> I have done something similar with interfaces, the only way I know is 
>> to define each interface as a service. I realise this is potentially 
>> a lot of services. We do this on core network device interfaces, but 
>> only define services for interfaces that are in use. This is an 
>> automated process so as interfaces are activated/deactivated they are 
>> added or removed from the Nagios configuration files. As the only 
>> alerts are passive ones for these services, it isn't as though one is 
>> introducing something like a vast increase in active checks.
>>
>> HTH,
>>
>> Jim Mozley
>>
>>
>> -------------------------------------------------------
>> This SF.Net email is sponsored by: IBM Linux Tutorials
>> Free Linux tutorial presented by Daniel Robbins, President and CEO of
>> GenToo technologies. Learn everything from fundamentals to system
>> administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
>> _______________________________________________
>> Nagios-users mailing list
>> Nagios-users at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/nagios-users
>> ::: Please include Nagios version, plugin version (-v) and OS when 
>> reporting any issue. ::: Messages without supporting info will risk 
>> being sent to /dev/null
>>
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: IBM Linux Tutorials
> Free Linux tutorial presented by Daniel Robbins, President and CEO of
> GenToo technologies. Learn everything from fundamentals to system
> administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when 
> reporting any issue. ::: Messages without supporting info will risk 
> being sent to /dev/null
>
  
  



-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
Previous message: volatile state stalking - snmp traps service notifications
Next message: Large scale network monitoring limits with nagios
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Users mailing list