failover setup help

Ton Voon tonvoon at gmail.com
Fri Jul 16 00:32:49 CEST 2010


On 9 Jul 2010, at 18:16, Shadhin Rahman wrote:

> All,
>    I have a failover setup with nagios.  I also have ndoUtils setup  
> for collecting historical data.   My setup is described below.
>
> master server - running nagios and ndoutils collecting data.  The  
> master is also sending host state and service state  data using nsca  
> to failover server.  master server has active host check, service  
> check and notification enabled.
>
> slave server - running nagios and partial ndoutil.  no active checks  
> are being done in this slave server.  The slave server also has  
> notification disabled.  nsca daemon is listening for service and  
> host status from master server.
>
> The problem:  I am facing the problem where acknowledgements and  
> comments are not up to date in the slave server.  I can possibly   
> transfer the retention file to slave server and fix the comments  
> part of the problem.  However I do not know how to get all the  
> acknowledgements transfer to the slave server.
>
> It would be great if someone can point me to the right direction,  
> how I can solve the acknowledgement problem I describe above.   
> Thanks in advance.

You probably want to analyse what we've done in Opsview (http://opsview.com 
).

There's two parts to this problem:
    1) How to send commands to multiple Nagios instances
    2) How to keep them in sync

In this screencast, http://www.opsview.com/learn/demos-tutorials/how-opsview-uses-nagios 
, (registration required), about 04:30 in, I mention a broker module  
we developed called "altinity_distributed_commands". It hooks into  
Nagios and, for certain commands, writes them out so that a separate  
process can take those and push it out to slave systems. This is how  
we solve (1).

For (2), we added some new code to Nagios (its in our development  
repo: http://github.com/tonvoon/opsview-nagios) there is a synchronise  
ability, at reload time or via an external command called  
"SYNC_STATE_INFORMATION", where it reads a retention.dat-like file and  
changes certain characteristics of a host/service, such as whether it  
is acknowledged or not. This allows the Opsview master to be the  
single source of state information for all its slave systems.

Ton


------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list