Passive checks number of ettempts error

Rodney Ramos rodneyra at gmail.com
Wed May 9 14:50:12 CEST 2012


Hi Stephen,

Thank you for your answer. However I don´t agree when you say that I´m
losing updates. We can see by the alert times that the collector sent a
SOFT3 status at
01:05:21 h and the central server consider it a HARD4 at 01:05:40 h. For me
it is a bug in the passive check process.

I´m using NSCA to replicate the status from the collectors to central
server and as you said the central server takes about 20 seconds to get the
status sent by the collectors, what is a normal behavior, I think.

Thank you very much,
Rodney

On Tue, May 8, 2012 at 5:01 PM, Stephen Gran <steve at lobefin.net> wrote:

> On Tue, May 08, 2012 at 12:07:05PM -0300, Rodney Ramos said:
> > Hi everybody,
>
> Hi,
>
> > I use Nagios, release 3.2.3, in a distributed environment, with a central
> > server and several colector servers.
> >
> > For a long time I´m seeing errors on the passive check mechanism on the
> > central server, as we can see below.
> >
> > Sometimes, on the central server, the states and number of attempts don´t
> > follow the correct order, going from SOFT2 to HARD4, for example.
> However,
> > on the colector server everything is OK.
> >
> > Log from Central Server:
> > Host Up[2012-05-05 01:06:48] HOST ALERT: node;UP;HARD;1;TCP OK - 0.005
> > second response time on port 135
> > Host Down[2012-05-05 01:05:40] HOST ALERT: node;DOWN;HARD;4;CRITICAL -
> > Socket timeout after 10 seconds
> > Host Down[2012-05-05 01:04:16] HOST ALERT: node;DOWN;SOFT;2;CRITICAL -
> > Socket timeout after 10 seconds
> > Host Down[2012-05-05 01:02:55] HOST ALERT: node;DOWN;SOFT;1;CRITICAL -
> > Socket timeout after 10 seconds
> >
> > Log from Colector Server:
> > Host Up[05-05-2012 01:06:31] HOST ALERT: node;UP;SOFT;4;TCP OK - 0.005
> > second response time on port 135
> > Host Down[05-05-2012 01:05:21] HOST ALERT: node;DOWN;SOFT;3;CRITICAL -
> > Socket timeout after 10 seconds
> > Host Down[05-05-2012 01:04:01] HOST ALERT: node;DOWN;SOFT;2;CRITICAL -
> > Socket timeout after 10 seconds
> > Host Down[05-05-2012 01:02:41] HOST ALERT: nodeDOWN;SOFT;1;CRITICAL -
> > Socket timeout after 10 seconds
>
> You're losing updates.  Given that it seems to be taking 15 or 20
> seconds to get the update from your collector to your central server,
> that's not hugely surprising.  You don't say what the replication
> mechanism is, but it either needs to get better at shovelling updates or
> grow a bigger buffer, at a guess.
>
> Cheers,
> --
>  --------------------------------------------------------------------------
> |  Stephen Gran                  | Never eat anything bigger than your
> |
> |  steve at lobefin.net             | head.
>   |
> |  http://www.lobefin.net/~steve |
>   |
>  --------------------------------------------------------------------------
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (GNU/Linux)
>
> iQIcBAEBCAAGBQJPqXuhAAoJELMRjO+K6o/uCT4P/0Py/NWJ4DYGuiNbwMykA0th
> ejoo2nfMS8PaGiZ+KX+UlCINDyaNNTbBnzOCtaQPLLYGLzyCFZju0zMmcTvlCvFI
> 9RUJlb9U8cZU/Ek+AF3VLJ9+UrFN/EW77R+K2dnt709c445DkdOmQIQez63RmIHy
> ibrQ8waNAJpDhbZb7IV1Pq+XpnKH2RPQIrxODtPTizwGnETq9JjG33h0K7KyjRL8
> Vu3lfPo/DGoNva1NiWlCMyDV2t2Sr27QrPvAkpZYzAajZ52WwNScK+7B2y/8/fJ2
> vI4KJlJFbERMrlANXmDzYjRBU4ZXHSn1d729vAhzoeHk1+TPv1t3AmPphNCy61YX
> Z9yLeFxHLcHLyh6hOhvcxDhadx3eeFk3tTqUroJ13JQvw9+zwdI5T0I8IUmgdG0J
> WI2ntpJhvexTYq2hQHuASWSTpjSW3oc/lJ1SHD61kQ6egfr54tsnfYjeLwag35H5
> dHo9Ul+gFnIjqVw4Sp4APMaMuDHe2wAso8LMBquEudHzNevRT7ZJF1l5FQ6tps2L
> OBUk4oCds9EgwRiTUu7eGLy+0Um6fXQKQww8q0n9YbB1zLZYQBU26cgZYV/TuaZ2
> EL7mWXoEhvXSxICjA6xIOYF7HX4jI0kiN4stAMKjnmRP3BQ1G+DNbfCRF8jGn4kJ
> 39g9fqrjmwRYACjEtPQI
> =BY7S
> -----END PGP SIGNATURE-----
>
>
> ------------------------------------------------------------------------------
> Live Security Virtual Conference
> Exclusive live event will cover all the ways today's security and
> threat landscape has changed and how IT managers can respond. Discussions
> will include endpoint security, mobile security and the latest in malware
> threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
> _______________________________________________
> Nagios-devel mailing list
> Nagios-devel at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-devel
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/developers/attachments/20120509/e2f8066b/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
-------------- next part --------------
_______________________________________________
Nagios-devel mailing list
Nagios-devel at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-devel


More information about the Developers mailing list