Passive checks number of ettempts error

Stephen Gran steve at lobefin.net
Tue May 8 22:01:41 CEST 2012


On Tue, May 08, 2012 at 12:07:05PM -0300, Rodney Ramos said:
> Hi everybody,

Hi,

> I use Nagios, release 3.2.3, in a distributed environment, with a central
> server and several colector servers.
> 
> For a long time I´m seeing errors on the passive check mechanism on the
> central server, as we can see below.
> 
> Sometimes, on the central server, the states and number of attempts don´t
> follow the correct order, going from SOFT2 to HARD4, for example. However,
> on the colector server everything is OK.
> 
> Log from Central Server:
> Host Up[2012-05-05 01:06:48] HOST ALERT: node;UP;HARD;1;TCP OK - 0.005
> second response time on port 135
> Host Down[2012-05-05 01:05:40] HOST ALERT: node;DOWN;HARD;4;CRITICAL -
> Socket timeout after 10 seconds
> Host Down[2012-05-05 01:04:16] HOST ALERT: node;DOWN;SOFT;2;CRITICAL -
> Socket timeout after 10 seconds
> Host Down[2012-05-05 01:02:55] HOST ALERT: node;DOWN;SOFT;1;CRITICAL -
> Socket timeout after 10 seconds
> 
> Log from Colector Server:
> Host Up[05-05-2012 01:06:31] HOST ALERT: node;UP;SOFT;4;TCP OK - 0.005
> second response time on port 135
> Host Down[05-05-2012 01:05:21] HOST ALERT: node;DOWN;SOFT;3;CRITICAL -
> Socket timeout after 10 seconds
> Host Down[05-05-2012 01:04:01] HOST ALERT: node;DOWN;SOFT;2;CRITICAL -
> Socket timeout after 10 seconds
> Host Down[05-05-2012 01:02:41] HOST ALERT: nodeDOWN;SOFT;1;CRITICAL -
> Socket timeout after 10 seconds

You're losing updates.  Given that it seems to be taking 15 or 20
seconds to get the update from your collector to your central server,
that's not hugely surprising.  You don't say what the replication
mechanism is, but it either needs to get better at shovelling updates or
grow a bigger buffer, at a guess.

Cheers,
-- 
 --------------------------------------------------------------------------
|  Stephen Gran                  | Never eat anything bigger than your     |
|  steve at lobefin.net             | head.                                   |
|  http://www.lobefin.net/~steve |                                         |
 --------------------------------------------------------------------------
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20120508/6e5ed0f3/attachment.sig>
-------------- next part --------------
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
-------------- next part --------------
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


More information about the Developers mailing list