How many hosts and services are you monitoring with Nagios?

RichTea mail at catsnest.co.uk
Fri May 18 11:56:08 CEST 2012


Hi All,

We have 5393 hosts and 69452 services across 14 servers.

The monitoring is spread unevenly across the servers as most of them are in
specific customer environments.
We use Puppet (formally rsync) to distribute a standard set of "global"
config (host / service templates etc) across all servers and each server
has its own local config (hosts / services).

We use SNMPTT passive checks for the networking kit, NRPE for most Nix
hosts (some use SNMP checks) and NSClient++ with Microsoft servers.

All based on standard Nagios core and tied together with an horrid POC
intergeneration software / MQ.

I did do some testing with Merlin but it was far from stable at that point
and we have been waiting for Nagios 3.4.x to be released. So possible some
time soon we will add Merlin.


--
Ritchie
<--Time flies like an arrow; fruit flies like a banana.  -->


On Fri, May 18, 2012 at 8:33 AM, Simone Felici <s.felici at mclink.eu> wrote:

>
> Impressive :)
> We're monitoring ~2000 hosts and ~10000 services, every 5 minutes.
> Architecture used: OPSView Community edition, the last free version before
> it started to make the
> distributed version commercial :/
> Two central servers (active/standby - drbd) as single point for management
> and collecting all
> passive checks executed by the slave servers. Performance data saved into
> rrd files as well on an
> external BIG database server. Configuration resides on a cluster MySQL
> installation (drbd).
> 4 slave "datacenter installations" with 2 servers per "datacenter" in
> active/active load balancing.
> Traps handling supported on all servers with rules logic.
> Pros:
> - Open Source: at least until version 3 - for our setup. Simple single
> instance with fewer functions
> available as well on version 4.
> - Easy to manage: the prupose was to create monitoring system and then let
> the management to other
> people with less technical skills
> - distributed setup
> - RBAC
> Disadvantages:
> - no longer Open Source: see above
> - Central server suffering on cpu by GUI implementation and other bg jobs
> - Not all nagios parameters editable as we like: i.e. cannot customize
> same checks with different
> intervals without having to re-create new ones. Think on HTTP service on
> servers with different
> loads and the need to extend the retries on high load servers. no way
> expect creating "HTTP" and
> "HTTP High Load" services.
> Maybe there are more pros (and disadvantages), but it's not the right
> place.
> BTW I'll look forward to wait for this solution; seems interesting!
>
> Simon
>
> Il 17/05/2012 16:43, Max Schubert ha scritto:
> > Hi,
> >
> > I like it when people periodically post numbers and architecture
> > summaries, I am guessing with the distributed frameworks out now for
> > Nagios this thread might be seeing bigger numbers than past threads
> > have.
> >
> > With our custom-built distributed Nagios-based monitoring system, we
> > are currently monitoring 18000+ hosts every 5 minutes and 100k+ active
> > services (plenty of passive services in addition to the actives) every
> > 5 mins as well.  We collect performance data from every check as well
> > and pass that on to a highly distributed and scalabe time-series data
> > warehouse another team in our organization has built (which is why we
> > have the 5 min interval requirement)
> >
> > We also do trap ingest using SNMPTT with a few custom mods, but not
> > going to include those numbers as they never have required the
> > optimizations the polling has required.
> >
> > This isn't a monolithic instance, we have 6 projects using instances
> > of our distributed Nagios-based software, called Racon (soon my
> > manager will give our team to package it as open source - so I hear at
> > least).  We built it on core Nagios with a custom database layer based
> > on a very very early version of Merlin's database abstraction layer
> > (thank you Andreas!) - we have a custom client/server network-based
> > notification framework in use (we will release that as well) along
> > with a custom NEB/perl based client-server framework (also releasable,
> > just need time scheduled) for sending and processing performance data
> > - the performance and notification framework are both horizontally
> > scalabe and network fault tolerant.
> >
> > What kinds of numbers of hosts and services are you all monitoring?
> > Which add-ons / distributed frameworks are you using?
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20120518/4ad640ac/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list