Services on down hosts

Jay R. Ashworth jra at baylink.com
Thu Jun 12 16:05:22 CEST 2008


On Thu, Jun 12, 2008 at 12:24:39AM +0200, Andreas Ericsson wrote:
> Bringing this back on-list. I'd appreciate if you could use
> "reply-to-all" instead of just "reply", as some of this discussion
> is probably of interest to the rest of the community as well.
> Thanks.

Sorry; my version of mutt doesn't handle large numbers of 'lists'
directives well; this list wasn't being recognized for List reply, and
I hadn't fixed it and was using Group reply instead.  Fixed now.

> Jay R. Ashworth wrote:
> >On Wed, Jun 11, 2008 at 05:54:48PM +0200, Andreas Ericsson wrote:
> >>>Well, since (to take an example), CRITICAL load means "a loadaverage
> >>>over 8" (on my 8-core Opteron), and we don't *know* the load average if
> >>>the machine isn't reachable to return a value...  then the nrpe checker
> >>>on the console in fact *is* getting an IO error when trying to, ok,
> >>>read from a network socket.
> >>I was more thinking along the lines of errno being set to EIO when
> >>attempting to read(2) from an already connected network socket, although
> >>there are two schools about that too (some wants all failures to always
> >>alert, while some wants a lot of things to be in UNKNOWN state).
> >>
> >>Not being able to connect clearly signals there is something wrong
> >>with the service though, while an EIO signals that there's something
> >>wrong with the Nagios hosts' kernel or hardware.
> >
> >My problem with that is that not all of what Nagios monitors is
> >"services", in the meaning we usually give to that term.  Much of it is
> >"attributes" -- load average and diskspace on a machine being great
> >examples.
> 
> True that, but the service of storing a file on disk (or, for some
> retarded filesystems, reading one from a disk) requires there to be a
> minimum of free space available. It's what makes up the platform on
> which the *real* services rest. Hence servicegroups (which together
> make up what a service-provider would call a service).

Could you expand on that?  Do you mean to imply that a good use for a
servicegroup is "all the physical services upon which my public website
rests", as I think I read in your reply there?

> >IMHO, anything you're trying to monitor that's actually a "service" --
> >IE: a public facing website -- shouldn't be directly attached to a host,
> >anyway...
> >
> >What if you're Google?  Which host do you attach "http://www.google.com" 
> >to?
> 
> All the query distributors (google works by having several front-end servers
> distributing the incoming queries to quite a large army of query responders,
> which have access to the gdfs (google distributed filesystem) for doing the
> actual lookups). Since a monitoring tool is only worth something if it tells
> you *where* things break rather than only that things are broken, that
> makes perfect sense for a monitoring system even if that's not the case for
> the service provider or its sales people.

Ok, Google was a poor choice.

Conversely, though, there may be cases where... or maybe there aren't.
let me muse on this some more.

> >>>I think if I'm going to invest a lot of work into code, I'll spend it
> >>>reskinning the clunky looking cgi's instead.  :-)

> >>That could well be a wasted effort. Several UI's already exist, and more
> >>are in the brewing. I'd suggest having a look at op5.org within a week or
> >>so instead, and check nagios.org and nagios-community.org for news about
> >>GUI's (op5.org will only have a reports gui though, while nagios.org
> >>will primarily take care of the equivalent of status.cgi et al).
> >
> >By UI, I presume you do *not* mean what someone else (IMHO) incorrectly
> >used that term to mean earlier today -- a configuration front-end tool.
> 
> No, I do not. I mean an interface displaying current and historical
> host and service status.

Ok.  I'll sit back and wait for a bit.

> >I see that op5 is "Coming Soon".  
> 
> Indeed it is. Content is scheduled to be added this friday, although in
> what shape said content will be is anyone's guess (although I've got a
> shrewd idea it won't be 100% completed and super-easy to use from day
> one, as there's a lot of work to be done).

Understandable.

> >Are you suggesting that *Ethan* is reworking the status.cgi?  Cause I
> >see no leaders about that on nagios.org.
> 
> Yes, Ethan has been working on a new webbased user interface for Nagios
> in the past eight or so weeks. According to his speech at the Nordic
> Nagios Meet it's possible it will be a commercial venture. That is,
> companies capitalizing from Nagios in one way or another may have to
> buy it, while non-profit organizations and home-users will probably
> get to download it for free. He was a bit hazy on the details and he
> refused to give a release date, so "wait and see" is the best I can
> say, I'm afraid.

Got it.

> > And nagios-community.org doesn't seem to exist...
> 
> nagios-community.org doesn't exist, but nagioscommunity.org does.
> Sorry for the confusion.

I should have known better; I've been there, but it was a long day...

Cheers,
-- jra
-- 
Jay R. Ashworth                   Baylink                      jra at baylink.com
Designer                     The Things I Think                       RFC 2100
Ashworth & Associates     http://baylink.pitas.com                     '87 e24
St Petersburg FL USA      http://photo.imageinc.us             +1 727 647 1274

	     Those who cast the vote decide nothing.
	     Those who count the vote decide everything.
	       -- (Joseph Stalin)

-------------------------------------------------------------------------
Check out the new SourceForge.net Marketplace.
It's the best place to buy or sell services for
just about anything Open Source.
http://sourceforge.net/services/buy/index.php
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list