Fault tolerant OCSP's

Bruce bruce at webfarm.co.nz
Thu Sep 2 08:48:44 CEST 2004


Hi,

Dont know how well this option will work, but it works on our network.

Basically every service event is logged to a status file on the remote 
server, then once a minute the remote server sends the data inside the 
file back to the main server. If it fails it normally gets an error code 
returned back to it, so it assumes it failed and dosnt delete the status 
file, so next time it trys it includes the previous attempts as well.

The only problem with this situation (and its not designed for volitile 
alerts, so we dont worry that much about it) is if the file grows to 
more than 100 service reports in one batch it always fails, to avoid 
this we just do a tail -n 99 status file and send that result to the 
client. If you were good a the coding side you could use top -n 99 and 
repeat the process until teh status file is empty.

Hope this helps in some way, (The script we use is rather messy and 
proberly wouldnt be much use to anyone but us so I havnt posted it)

-- 
+------------------------------------------+      \|||/
| Bruce at WebFarm.co.nz       +64 06 7572881 |      (o o)
| Systems Technician                       +---ooO-(_)-Ooo---+
|                                                            |
| WebFarm                           http://www.webfarm.co.nz |
| FreeParking                   http://www.freeparking.co.nz |
+------------------------------------------------------------+

... FreeParking - NZ's best value Domain, WebHosting and email accounts - bar none 
... WebFarm - NZ's eCommerce specialists since 1997 




Jason Martin wrote:

>Has anyone implemented a OCSP that is fault tolerant? The
>failure scenario I am envisioning is a two-tier distributed
>model. The distributed server detects a volatile alert, say a
>logfile alert indicating that a disk has failed. It then calls
>the OCSP for that alert to report it to the central server, but
>a transient network failure causes send_nsca to fail.
>
>send_nsca has no way of queueing the alert to be sent at a later
>point up to the central server, and the nature of the alert is
>not one that will necessairily repeat. The alert gets lost, no
>notifications are sent from the central server and the the
>machine eventually fails due to another disk failure since it
>isn't configured to handle a 2-way disk failure.
>
>Is there a simple way to maintainthe distributed Nagios setup
>and also cover volatile alerts reliably?
>
>Thanks,
>-Jason Martin
>  
>



-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_id=5047&alloc_id=10808&op=click
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list