A question...cascading failures and failure to recover

Steven Schwartz sschwartz at gracenote.com
Sat Feb 17 00:06:08 CET 2007


I've noticed an odd circumstance on two of my four nagios servers
lately, and searching has found me no answers. Has anyone experienced
symptoms similar to these:

 

1) On a given server, a plugin produces a "critical failure" on many
(sometimes all) of the systems using that particular plugin.

2) Tests by hand of said plugin produce an "OK" result.

3) The system does not acknowledge the service having recovered until
checks are rescheduled by force, and then execute OK.

 

Does this ring bells with anyone?

 

Thanks,

 

Steven Schwartz

Systems Administrator,

Gracenote, Inc.

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20070216/62a56add/attachment.html>
-------------- next part --------------
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list