How to handle variable periods of relevance for passively monitored services?

Kevin Keane subscription at kkeane.com
Sun Sep 6 09:00:44 CEST 2009


I have a similar situation. To address it, I simply turned off freshness 
checking on the checks on the workstation checks. So when no check 
results come in, Nagios simply continues displaying the last result from 
when the monitored workstation was turned on. It's not 100% foolproof 
because sometimes my checks will fail due to the shutdown process. But 
for me, it's good enough.

You can also use a host check to determine if the machine is turned on. 
If the host check comes back critical, Nagios won't even bother with the 
service checks.

Anthony wrote:
> Hi all,
>
> At work we have a Nagios setup with about 40 hosts and 150 services.
>
> Of those 40 hosts, about 15 are workstations which operators use 
> dependent on their shift and whilst an operator is working on a given 
> machine performing particular tasks, specific software needs to be 
> running (Third party client software etc. that feeds data into the 
> systems we use). I monitor things like how many instances are running 
> and if the particular piece of software is generating the expected 
> output, whether expected services are running, if there's enough free 
> disk space and CPU utilisation etc. etc.....
>
> If an operator accidentally starts multiple copies of some of the 
> software, or a phantom copy is running in the background (occasionally 
> GUIs crash leaving background processes running causing all sorts of 
> gremlins), it's handy to know that they're running outside of normal 
> bounds and allows me help diagnose any problems. That or if they're 
> about to run out of disk space due to some rogue logging process.
>
> On the days where a given operator is not working, their particular 
> system may be switched off or if it's on, certain services may not 
> need to be running.
>
> To overcome firewall issues (the systems are spread across several 
> states) they all tend to push passive test results back to the central 
> Nagios server.
>
> This means, on any one day, it's likely that a particular host is 
> either switched off or not running all its services that it would be 
> during an active day, as its operator is not rostered on that day... 
> and I get a sea of red in Nagios which leads to Chernobyl issues (the 
> important alarms not standing out above the ones that are "ok to be 
> critical")..
>
> Now, service check time periods only apply to active service checks, 
> not passive service checks.
>
> How does one get around this situation of variable periods of 
> relevance for passively monitored services?
>
> My thoughts were that perhaps I needed to create an additional web 
> interface for operators to say when they were using a particular 
> machine and what for, and behind the scenes this would send the 
> relevant external commands to Nagios to do things like setting an OK 
> state and disabling further passive checks across the host.. or doing 
> this to individual services... but I wondered if there was a cleaner 
> way to do this?
>
> That or perhaps somehow creating a service controlled by users somehow 
> which indicated whether they were active or not, and then dependent on 
> the state of this service, not caring about the state of "dependent 
> services".
>
> I know generally Nagios is geared towards monitoring the traditional 
> concept of a server and service - always on 24x7 or at otherwise 
> fixed, inflexible intervals.. but unfortunately the environment I work 
> in is presently a lot more dynamic than that.
> ------------------------------------------------------------------------
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
> trial. Simplify your report design, integration and deployment - and focus on 
> what you do best, core application coding. Discover what's new with 
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> ------------------------------------------------------------------------
>
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null


-- 
Kevin Keane
Owner
The NetTech
Find the Uncommon: Expert Solutions for a Network You Never Have to Think About

Office: 866-642-7116
http://www.4nettech.com

This e-mail and attachments, if any, may contain confidential and/or proprietary information. Please be advised that the unauthorized use or disclosure of the information is strictly prohibited. The information herein is intended only for use by the intended recipient(s) named above. If you have received this transmission in error, please notify the sender immediately and permanently delete the e-mail and any copies, printouts or attachments thereof.


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with 
Crystal Reports now.  http://p.sf.net/sfu/bobj-july
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list