Improving the host <parents> logic

Eli Stair estair at ilm.com
Wed Dec 14 23:47:49 CET 2005


Sounds great.  Having that level of granularity would directly solve a 
topology problem I need to find a work around for right now.

Cheers,

/eli

Shane Stixrud wrote:
> Nagios's host parent logic is good but it could be a whole lot better 
> for todays switched networks.  There has been a couple of 
> recommendations in the past on how to improve this.
> 
> 1) Allow nagios admins to change parent logic failure detection in cases 
> where one parent is up but others are down.  By default nagios treats 
> multiple parents as redundant paths and thus does not suppress 
> notification in situations where at least one parent is OK.
> 
> The main disadvantage to this proposal is nagios rightly treats parents 
> as directly connected HOPs on the path back to nagios.  This work around 
> would treat switches and routers as peers when they are not, removing 
> the possibility of redundancy detection and easily determining which 
> device is at fault.
> 
> 2) Allow the nagios admins to assign a weighted priority to each host 
> and have a system that allows the admin to tune these values to suppress 
> notification where appropriate.
> 
> This type of solution in IMO is way more complex than is required, the 
> best part of the current solution is its simple to management and 
> obvious to deploy.
> 
> The main problem with the existing solution is modern switched networks 
> often have A LOT of managed nodes connected to one or more layer2 
> switches in the same layer3 network.  Ideally nagios would allow admins 
> to suppress notification for both devices behind both layer2 devices and 
> layer3 interfaces.  With that in mind I believe there is a relatively 
> easy solution that stays true to nagios's current parent model while 
> still meeting this challenge.
> 
> The existing parent logic should be able to remain pretty much as is, 
> merely renaming the directive to "l3parents" to distinguish this should 
> only be used for layer 3 parents.
> 
> Duplicating the existing parents logic and assigning it a new name 
> called l2parents.  Nagios would then need to be modified to first check
> l2parents before proceeding to the l3parents when a device goes into a 
> NON-OK state.  If all l2 parents or l3 parents are down nagios would 
> follow the l2 or l3 inherited parents just as it does today.
> 
> IMO this change would be the least intrusive, adds layer2 parent support 
> and allows for redundancy detection for both layer2 and layer3 devices 
> with little added complexity.
> 
> Side note: The 3d map should show the layer2 parents as being directly 
> connected to the child device.  The l3parents should only connected to 
> devices where their layer2 and layer3 parents are the same NAME/IP.  In 
> this way you would see a server connected to a switch that is in turn 
> connected to another switch which then connects to the layer3 device, 
> which so happens is how the physical connectivity IS setup in reality.
> 
> Cheers,
> Shane
> 
> 
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep through log 
> files
> for problems?  Stop!  Download the new AJAX search engine that makes
> searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
> http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
> _______________________________________________
> Nagios-devel mailing list
> Nagios-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-devel
> 



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click




More information about the Developers mailing list