Passive service checks via NSCA and requir ed check command defini tion

Ralph.Grothe at itdz-berlin.de Ralph.Grothe at itdz-berlin.de
Tue Aug 9 10:04:15 CEST 2005


Hello Paul (et alii),

> Of course, you can use check_procs to monitor that crond is 
> running.  But
> there are probably even more exotic scenarios where crond is 
> running but
> you're not getting check results from the plugin (such as an
error in
> the plugin which means it always returns OK, to name but one). 


Though it might not be perfect and a catch-all,
I think running check_procs on important daemons is yet better
than nothing.


> 
> > Ah firewalls, another of my monitoring intentions oponents.
> 
> That's the main reason NSCA exists - so you can install nagios
on the
> firewall or on one of the machines behind the firewall instead
of
> trying to persuade the firewall admin to open up various ports.
> Check_by_ssh is an alternative, if you can ssh onto the 
> firewall or one
> of the machines behind the firewall. 


That was what I did before discovering NSCA
because the least we get is an open 22/tcp port.
I have no access to the firewalls, and they don't respond on ICMP
echo requests either.
So there's no way to check the firewalls' state for me.
The same is sadly true for most of our routers,
which is a very bad fact for us as it hinders us from seriously
discovering and identifying
outages of LAN segments.
Traceroutes almost never work because of the ICMP ignorance of
most routers/packet filters,
and the sender doesn't receive a TIME_EXCEEDED or
PORT_UNREACHABLE.
I really would like to set up my Nagios with node dependencies
(sorry, I lack the term the docs used for describing LAN
hirarchies)
but with most routers being black holes I can't figure out how.

> Dropping all ICMP may cause problems with MTU path discovery, 
> but that's
> their problem.

Very true, but unfortunately it's not quite alone their problem
because the users of our servers' services come and complain at
us,
although the services are usually set and running to best
practicies very well.
The same goes for the server settings in their TCP/IP stack.
Whereas the firewall and black hole router folks usually act like
black magicians
concealed in the background, often reluctant to sort out the
cause, or pass sufficient information.
(you see I'm getting a bit carried away because I've had some bad
experience with unsupported PMTU)


> Dropping pings means you can't check the firewall or
> the hosts behind it to see if they're up.  I believe nagios 
> 2.x supports
> passive host checks but 1.x doesn't.  Well, one bit of the 
> docs says it's
> impossible but another bit says it's possible but too
complicated to
> explain. 

Yes, as a user of the recent 2.0b version I also read in the docs
that passive host checks were possible.

> Feel free to write a "HOWTO" on setting up passive monitoring 
> that gathers
> it all in one place.  Even if you can't persuade Ethan to add 
> it to the
> docs I imagine nagiosexchange would accept it. 

As a Nagios newbie I haven't yet thought about writing
supplementary documentation.
But I will be doing so for my colleagues at my working place.
Maybe if it is of any use to them I later can submit it to Nagios
Exchange.


But now back to my (seemingly solved) problem I started this
thread for.

Yesterday evening I had another read of the documentation.

There, very concealed within the chapter on distributed
Distributed Monitoring,
I discovered what you were mentioning last time as staleness
checking,
i.e. freshness checking.
http://nagios.sourceforge.net/docs/2_0/freshness.html

So I reconfigured my service, this time adding the two freshness
attributes.
Since I wrote my check script (in wise foresight?) that it could
act as both
an NRPE as well as an NSCA check I even didn't need to think much
about the check_command.
Thus my service's definition now reads like this

define service {
    use                         generic-service
    service_description         samos-hpva-state
    host_name                   samos
    passive_checks_enabled      1
    active_checks_enabled       0
    check_freshness             1
    freshness_threshold         4000
    check_command               check-nrpe!check_hpva_INWO2
    contact_groups              samosadmin
}


Because my remote host's cronjob runs the passive check hourly
I decided a check result as stale after 4000 secs (i.e.
freshness_threshold).

Now I think I'm set and done with this service.
Or do you see any still controversial settings?


Regards
--
Ralph









> 
> -- 
> Paul Allen
> Softflare Support 
> 
> 
> 
> 
> -------------------------------------------------------
> SF.Net email is Sponsored by the Better Software Conference &
EXPO
> September 19-22, 2005 * San Francisco, CA * Development 
> Lifecycle Practices
> Agile & Plan-Driven Development * Managing Projects & Teams * 
> Testing & QA
> Security * Process Improvement & Measurement * 
> http://www.sqe.com/bsce5sf
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS 
> when reporting any issue. 
> ::: Messages without supporting info will risk being sent to
/dev/null
> 


-------------------------------------------------------
SF.Net email is Sponsored by the Better Software Conference & EXPO
September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices
Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA
Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list