Services on down hosts

Andreas Ericsson ae at op5.se
Thu Jun 12 00:24:39 CEST 2008


Bringing this back on-list. I'd appreciate if you could use
"reply-to-all" instead of just "reply", as some of this discussion
is probably of interest to the rest of the community as well.
Thanks.

Jay R. Ashworth wrote:
> On Wed, Jun 11, 2008 at 05:54:48PM +0200, Andreas Ericsson wrote:
>>> Well, since (to take an example), CRITICAL load means "a loadaverage
>>> over 8" (on my 8-core Opteron), and we don't *know* the load average if
>>> the machine isn't reachable to return a value...  then the nrpe checker
>>> on the console in fact *is* getting an IO error when trying to, ok,
>>> read from a network socket.
>> I was more thinking along the lines of errno being set to EIO when
>> attempting to read(2) from an already connected network socket, although
>> there are two schools about that too (some wants all failures to always
>> alert, while some wants a lot of things to be in UNKNOWN state).
>>
>> Not being able to connect clearly signals there is something wrong
>> with the service though, while an EIO signals that there's something
>> wrong with the Nagios hosts' kernel or hardware.
> 
> My problem with that is that not all of what Nagios monitors is
> "services", in the meaning we usually give to that term.  Much of it is
> "attributes" -- load average and diskspace on a machine being great
> examples.
> 

True that, but the service of storing a file on disk (or, for some
retarded filesystems, reading one from a disk) requires there to be a
minimum of free space available. It's what makes up the platform on
which the *real* services rest. Hence servicegroups (which together
make up what a service-provider would call a service).

> IMHO, anything you're trying to monitor that's actually a "service" --
> IE: a public facing website -- shouldn't be directly attached to a host,
> anyway...
> 
> What if you're Google?  Which host do you attach "http://www.google.com" to?
> 

All the query distributors (google works by having several front-end servers
distributing the incoming queries to quite a large army of query responders,
which have access to the gdfs (google distributed filesystem) for doing the
actual lookups). Since a monitoring tool is only worth something if it tells
you *where* things break rather than only that things are broken, that
makes perfect sense for a monitoring system even if that's not the case for
the service provider or its sales people.

> 
>>> I think if I'm going to invest a lot of work into code, I'll spend it
>>> reskinning the clunky looking cgi's instead.  :-)
>> That could well be a wasted effort. Several UI's already exist, and more
>> are in the brewing. I'd suggest having a look at op5.org within a week or
>> so instead, and check nagios.org and nagios-community.org for news about
>> GUI's (op5.org will only have a reports gui though, while nagios.org
>> will primarily take care of the equivalent of status.cgi et al).
> 
> By UI, I presume you do *not* mean what someone else (IMHO) incorrectly
> used that term to mean earlier today -- a configuration front-end tool.
> 

No, I do not. I mean an interface displaying current and historical
host and service status.

> I see that op5 is "Coming Soon".  
> 

Indeed it is. Content is scheduled to be added this friday, although in
what shape said content will be is anyone's guess (although I've got a
shrewd idea it won't be 100% completed and super-easy to use from day
one, as there's a lot of work to be done).

> Are you suggesting that *Ethan* is reworking the status.cgi?  Cause I
> see no leaders about that on nagios.org.

Yes, Ethan has been working on a new webbased user interface for Nagios
in the past eight or so weeks. According to his speech at the Nordic
Nagios Meet it's possible it will be a commercial venture. That is,
companies capitalizing from Nagios in one way or another may have to
buy it, while non-profit organizations and home-users will probably
get to download it for free. He was a bit hazy on the details and he
refused to give a release date, so "wait and see" is the best I can
say, I'm afraid.


>  And nagios-community.org doesn't seem to exist...
> 

nagios-community.org doesn't exist, but nagioscommunity.org does.
Sorry for the confusion.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list