AW: AW: Cascading Services/Service hierarchy

Andreas Ericsson ae at op5.se
Mon Sep 27 19:02:27 CEST 2004


Mohr James wrote:
>>If you stop to think about it, you'll notice that all 
>>services provided 
>>in a network requires code to be run on one or another 
>>device. Think of 
>>a device as a host and you'll be just fine.
> 
> 
> I still have problems with the term "device". Being able to access a web
> shop is neither a "device" or a "host". However, it is a service.

*sigh* From a marketing point of view: yes. From a computer point of 
view, it's a chain of processes co-operating to deliver a certain output 
to a variety of sources using a single source as input. Any idiot can 
tell you that the webshop is down. Nagios lets you know WHY it is down, 
by checking each service (as a computer would see it, meaning 
database-server, web-server, loadbalancer and what-not).

By all means, if all you want is a system that tells you "Hey, there's 
something wrong in your network", then by all means, configure Nagios to 
tell you just that. It's really easy. You'd be using about 5% of Nagios' 
capability, and your netadmins still wouldn't be a bit wiser in their 
trouble-shooting. That's not really my problem though, so just go ahead 
with it if you like.

> In
> most context "device" refers to something **physical**, a web shop is
> not physical, but still a service.

Only in a marketing context. See above for explanation.

> This is the service we need to guarentee for our customers.
> It consists of the web server service and database services,

Then monitor those services, and group them together in a servicegroup 
named "webshop-services". It's really quite simple, you know.

> which then consist of other services and eventually
> *physical* hosts and *physical* devices. 
> 

Your admins will be happier if they know which physical host is causing 
the ruccus, so let them know that by monitoring the details. This will 
also let you keep availability high on the webshop.

> 
>>> If this is simply the way it is, I can deal with it.
>>>I am just curious if that is the way it is.
>>
>>It's the way it works in the physical world. Nagios (2.0) let's you 
>>abstract this by configuring service groups to help you get a quick 
>>overview of all the separate services included in a 
>>service-flow (proxy, 
>>web-server, loadbalancer, database replication servers, 
>>database master 
>>server to mention a very common example).
> 
> 
> As far as I see, 2.0 is only available from the CVS repository and as
> alpha/unstable from a couple of web sites. Is that correct?

It's considered alpha but I'd gladly call it stable so long as you don't 
enable the embedded perl (which has been causing trouble all the time).

> It's nice to
> know what is coming up. Particularly in our case where we are still in
> the testing and planning stages. I just want to know is it something
> that I can implement now. 
> 
> If I undestand you correctly, with 2.0, you will be able to have a
> heirarchy with multiple levels. Will the stati propagate, as well?
> 

Multiple levels of what?
Define 'stati'.

> Although a simple list of all of the services would provide an
> "overview", it is not necessarily the best representation of the
> services.

For an admin it is, since it's vital for finding the cause of the 
problem. Are you an admin yourself, or are you a project-leader 
implementing Nagios (or are you possibly from management??

> Being able to represent/depict the services as a heirarchy is
> more accurate than a simple list of what is "included in a
> service-flow". 
> 

It's a matter of what you're used to, I suppose.

> 
> 
>>For an admin, it's utterly unhelpful to get a notification 
>>saying "The 
>>customer support FAQ doesn't work". This doesn't bring the admin any 
>>closer to a solution and the first customer that calls in 
>>will tell him 
>>the exact same thing. If, on the other hand, he received a 
>>notification 
>>saying "CPU Load on the customer support load balancer is 100%", he'd 
>>immediately know where the problem resides and thus be a good step 
>>closer to fixing it.
> 
> 
> True, but for reporting it *is** important to know when the "customer
> support FAQ" doesn't work.

So get your availability reports of the 'customer-faq' supportgroup and 
just print the totals.

> As you say, as an admin, I am not interested
> in the fact that the web shop is not accessible. Instead, I want to know
> that it is the port on the switch connecting the web server to the DB
> server. However, neither my customer nor my boss care that it was a
> switch port. They want to know whether we have reached our service
> levels or not. 
> 

Naturally. Show them the availability report of the servicegroup and 
explain to them that "this database-server can't really cope with the 
load" (for example) if you don't meet the demands.

> If we have to sit down at the end of each month and manually add the
> outages to a spreadsheet, then TCO goes waaaaaay up for us. So having a
> mechanism that allows service stati to automatically propagate up a
> hierarchy is extremely beneficial to service provides. In VPO, I know
> the top-level is red, thus I know that I am not fulfilling my service
> obligation. I can then quickly drill down and see that which services
> are affected by the outage and what the root cause is. 
> 

I suggest you stick to VPO then. You don't have to use Nagios if you 
don't want to.

> 
>>>Is there any way to create a multi-level heirarchy? Looking at the 
>>>status maps on the Netway site, it does look like there are 
>>
>>multiple 
>>
>>>levels. Is this just the representation of the dependencies 
>>
>>or can you 
>>
>>>actually create multiple levels?
>>>
>>
>>Every network with more than one switch IS a multi-level hierarchy in 
>>the physical world. If you've configured things correctly 
>>(parents etc) 
>>these are the levels you'll see. 
> 
> 
> I am aware of that. However, I talking about representing it in Nagios
> not what the real world is. As far as I see looking at the "Service
> Detail" you only have two levels. The host and then all of the services
> associated with that host. Looking at the configuration files and doing
> a few tests, I do not see a way of creating this hierarchy. A service
> belongs to a host. I do not see a way of saying that service A contains
> service B, which contains Service C:
> 
> Web Shop -> Web Server -> Physical machine
> 

See documentation on dependencies, and write your own GUI-hack to 
present the data you want.

>>From what I see, displaying this kind of hierarchy is not possible, as
> you can only display the two levels: host and service. If this is so, we
> have to do things differently than before. If it is possible, I cannot
> see how.
> 

I can recommend "Programming PHP" from O'Reilly & Associates.

> Please define "correctly" in this context.

man traceroute

> 
> 
>>>By the way, what are people using for the 3d status map? I 
>>
>>downloaded 
>>
>>>the Cortona player listed on the nagios.org site and the 
>>
>>maps doesn't 
>>
>>>look very good. Any recommendations?
>>>
>>
>>Google for it, or fix the statusvrml.cgi program to create 
>>prettier vrml. 
> 
> 
> Google for what???? I tried "nagios vrml" The first entry points back to
> the Nagios site:

What else did you try? Surely you didn't give up after just one search?

> 
> Cortona (Parallel Graphics) - Look terrible
> Cosmo Player (Computer Associates) - bad link, search by CA found
> nothing
> FreeWRL -  no Windows binary
> OpenVRML - last updated 2001, doesn't look hopeful.
> 
> Of the 400+ entries most are not even related to getting the 3D map to
> work, just happen to have vrml and ngaios on the same page.  Which do
> you use?
> 

I don't. We've written our own statusmap which I use instead.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Lead Developer


-------------------------------------------------------
This SF.Net email is sponsored by: YOU BE THE JUDGE. Be one of 170
Project Admins to receive an Apple iPod Mini FREE for your judgement on
who ports your project to Linux PPC the best. Sponsored by IBM.
Deadline: Sept. 24. Go here: http://sf.net/ppc_contest.php
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list