spending time with nagios to finetune

Marc Powell marc at ena.com
Sat Feb 17 00:04:14 CET 2007



> -----Original Message-----
> From: nagios-users-bounces at lists.sourceforge.net [mailto:nagios-users-
> bounces at lists.sourceforge.net] On Behalf Of Sjaak Nabuurs
> Sent: Thursday, February 15, 2007 5:02 PM
> To: nagios-users at lists.sourceforge.net
> Subject: [Nagios-users] spending time with nagios to finetune
> 
> Hello nagios users.
> 
> We run about 50 hosts with approx 400 services to check (webservice
>
apache/mail/mysql/mysql_replica/ftp/load/procs/disks/mailq/temp/backups)
.
> Spending a lot of time to fine tune nagios, every day nearly 15-60 min
,
> still we have 5 ~ 20 notifications a day.

That really sounds excessive. We monitor hundreds of hosts and thousands
of routers and have only have a handful of alerts a week. Are you taking
the time to correct the error conditions such that they don't happen
again or happen only rarely? That's the biggest step you can take in
reducing the number of alerts you receive. The second is to be sure that
you're monitoring things that really are important.

 
> I have some questions, how you run nagios.
> How much time do you spend a day or week to run nagios ?

Measured in minutes or less. I presume you mean tuning or other. I
rarely have to touch the system at all. 95% of our config generation is
automated out of our customer databases so there's little that I need to
do there. There are a handful of special case host monitoring that we're
doing that I have to modify every once in a while but it's very rare. 

> How much notifications a day do you have ?

A handful a week, mostly due to Telco circuit issues that we have little
control over (we monitor >5000 circuits).

> Must be every notify a real allert or do you also have 80%
notifications
> just for an annoying mobile telephone ring.

Not sure about this but it sounds like it would depend on the
criticality of the service that you're monitoring. For our circuit
notifications, we don't send email alerts if there is at least one
circuit up at a multi-circuit site. Same with our load balanced servers.
If they're all down or there's not redundancy at all, then that's
usually alertable.

> Nagios helped me a lot with uptime on my servers.
> No diskfull errors, no unknown downtime anymore and more happy
customers.
> I'ts Better to spend time on nagios than fixing problems.

Nagios can help a great deal in determining where you aren't being
proactive enough in preventing problems. As you work through those cases
you'll find that your systems are more reliable and you'll have fewer
and fewer notifications.

--
Marc

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list