Why separate hosts and services

Chris Wilson chris at netservers.co.uk
Thu Apr 15 16:42:43 CEST 2004


Hi Andreas,

> I can think of at least two good reasons.
>
> 1) Problem localisation. When a service fails, someone has to fix it. If 
> they don't know what machine it's on the purpose of a monitoring system 
> is soundly defeated.
>
> Ofcourse, you could type in the host_address and host_alias in every 
> service-description, but keeping things the way they are really saves a 
> lot of typing compared to that.

OK, that's a good point, but it could also be handled by inheriting 
hostname from service to dependent service, unless overridden by the 
dependent service.

Another way would be to report the "path" through the "service tree" to 
the failed service in the notification message. This might actually help 
fault diagnosis. For example, if you receive separate notifications that 4 
machines behind the same router have gone down at the same time, then you 
might assume that the router might be at fault.

At the moment, with the current notification architecture, I don't think
you can have enough information to do that, without looking at the status
CGIs or knowing from memory that the hosts are all behind the same router
(which doesn't scale well :-)

> 2) Notification suppression. If a service fails, nagios immediately 
> checks if the host is down. If it is, no more service checks will be 
> scheduled until the host pops back up.

But we already do the same thing for dependent services, don't we? I don't
understand why the logic is different, and why they can't be combined into
a single, simple if-down-then-check-parent-service algorithm.

> Check out (host- and service-) dependancies. It's all properly documented.

To my mind, service dependency is not the same as meta-services (which is
what I'm talking about).

For example, let's assume we have three services, A, B and C. A is a 
meta-service, and B and C "depend" on it. A does not have any check of its 
own; its state is entirely determined from the states of its dependent 
services. If B and C both fail, then A is determined to have failed, and 
not otherwise. 

This is not the same as B and C both depending on A, because if B and C
both fail, then how does one make A fail automatically in Nagios? I don't
think it's possible, do you? I guess it might involve writing a plugin to
check the status of all children, and I don't know if Nagios would update
the status.sav quickly enough that we would be able to determine this
reliably in the parent check. Do you know if it does?

Besides which, we would have to parse both the configuration files and
status.sav to determine this, and neither of those is easy to do.

Cheers, Chris.
-- 
_  __ __     _
 / __/ / ,__(_)_  | Chris Wilson -- UNIX Firewall Lead Developer |
/ (_  ,\/ _/ /_ \ | NetServers.co.uk http://www.netservers.co.uk |
\__/_/_/_//_/___/ | 21 Signet Court, Cambridge, UK. 01223 576516 |



-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click




More information about the Developers mailing list