Improving the host <parents> logic

Andreas Ericsson ae at op5.se
Thu Dec 15 13:31:53 CET 2005


Shane Stixrud wrote:
> On Wed, 14 Dec 2005, Andreas Ericsson wrote:
> 
>>
>> Layer 2 devices can be parents just as well as layer 3 devices, with 
>> any level of redundancy anywhere you want.
> 
> 
> Sure you can define layer 2 devices as parents of a child device, but so 
> what??  If you are using the parent definition for layer 2 it cannot 
> also be used for layer 3 parents (for that same device) without 
> unexpected behavior as far as I can tell.
> 

Yes it can, although the layer 3 node becomes the parent of the layer 2 
node which is the Right Thing to do.


>>> The existing parent logic should be able to remain pretty much as is, 
>>> merely renaming the directive to "l3parents" to distinguish this 
>>> should only be used for layer 3 parents.
>>>
>>
>> But it shouldn't. Each electron that makes up a part of a packet has 
>> to traverse a chain of physical nodes to reach its destination. Each 
>> of those nodes is a parent to whatever node they send the electrons on 
>> to next.
> 
> 
> Sure, there is a physical chain, however nagios does not monitor each 
> electron as it transfers over that chain, rather it contacts the IP 
> addresses of each device along that chain to determine which link in 
> that chain may be broken.
> 

Oh, now I understand what you're talking about. You're still wrong though.

The only difference (packet-wise) between a layer 2 device and a layer 3 
   device is that the layer 3 device modifies the TTL of the packets it 
sends on. It has to do this whenever it sends a packet to a new subnet 
(read RFC 791 for more info). Thus, when home users access their 
internal network over their multi-port router the TTL won't be 
decremented. When they access the outside world it will be.

Nagios is supremely indifferent to this distinction. This is good 
because it can support any type of network equipment.


>>
>>
>>> Duplicating the existing parents logic and assigning it a new name 
>>> called l2parents.  Nagios would then need to be modified to first check
>>> l2parents before proceeding to the l3parents when a device goes into 
>>> a NON-OK state.  If all l2 parents or l3 parents are down nagios 
>>> would follow the l2 or l3 inherited parents just as it does today.
>>>
>>
>> Gaining what? Here's how we set up it. Mindscale to any level and 
>> depth you like. I've never seen nagios do anything but The Right Thing 
>> with config like this.
>>
> [snip]
> 
> Sure this works fine when the switch is isolated to one layer 3 network 
> (i.e. no vlans).  Care to share the magic tricks you use to tie vlan 
> assigned switch ports to the correct layer3 devices/interfaces??


Sure. Give the switch an IP in each network and make it switch2-vlan23 
or, if the route goes through the same physical devices no matter the 
VLAN, use whatever IP you already have on it. Obviously, this doesn't 
work if you can't set an IP on the switch and it can't respond to ICMP 
or some such (in which case you almost certainly won't have VLAN's on them).


>  As far 
> as I can tell if my switch hosts 20 vlans I can either set its parent to 
> one of those vlans or all of them.  If I do 1 then I have a 1 in 20 
> chance of suppressing notifications, if I do all fo them then every vlan 
> would have to fail for it to suppress notifications.   I am willing to be
> wrong here, please show me the error in my thinking.
> 


You're thinking "one device, one host" when you should be thinking "one 
IP, one host" (this is btw also what I was referring to in the acidity 
concluding my last email on this topic). You can add the same physical 
device as a separate host a million times if you want to. If you have 
VLAN's this is almost always the right thing to do because you will then 
notice when one VLAN is incorrectly configured.


> 
>>
>>> IMO this change would be the least intrusive, adds layer2 parent 
>>> support and allows for redundancy detection for both layer2 and 
>>> layer3 devices with little added complexity.
>>>
>>
>> If you're going to add all layer 2 devices in their own parenting link 
>> you need to know when layer 3 devices pop in between them
> 
> 
> Each host definition would/could have either or both defined.
> 


So each host would have parents, l2parents and l3parents??


>> , which means either keeping the current scheme alongside it or go 
>> looking for the object that has our current node as an immediate 
>> downstream child. Either way is just plain dumb.
>>
> 
> It is not a replacement it is an addition, right now the current scheme 
> is better suited to layer3 devices, using it for layer2 devices is a 
> hack that works under very restricted conditions as far as I can tell.
> 


It works fairly well for all conditions I've encountered during the 
roughly 120 installations I've seen (ranging from large international 
corporations monitoring +5000 nodes) to ISP's (monitoring very 
non-catenet-ish equipment, some of which doesn't even have an IP), to 
small but mission-critical manufacturing process controller networks.


>>
>>> Side note: The 3d map should show the layer2 parents as being 
>>> directly connected to the child device.  The l3parents should only 
>>> connected to devices where their layer2 and layer3 parents are the 
>>> same NAME/IP.  In this way you would see a server connected to a 
>>> switch that is in turn connected to another switch which then 
>>> connects to the layer3 device, which so happens is how the physical 
>>> connectivity IS setup in reality.
>>>
>>
>> Use my example from above and this is exactly what you get.
> 
> 
> If you use your switches as a smart hub sure, what about multivlan 
> switches?
> 

"One IP, one host", although you can check things that have no IP as well.

>>
>> Usually when you post to a developer forum for the first time it's a 
>> good idea to browse the list archives for both the developer and the 
>> user forum. Had you done so, with just a few well-chosen searches, you 
>> would have spared yourself the indignity of letting this particular 
>> piece of digital pollution hit the internet.
> 
> 
> I have browsed the list and I did note a number of other people comment 
> on this same topic.  If I am missing something obvious then of course 
> you have humble apology for wasting everyones time, but as far as I can 
> tell you are just missing the point.
> 


Perhaps I am. Would you care to, with examples, clarify the benefits a 
bit for me? The layout of your own network would probably be a good 
starting point, seeing as you probably had some real problem that you 
thought this up as a solution to.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click




More information about the Developers mailing list