critical soft state every 3 hours

Marki jm+nagios-users at roth.lu
Fri Apr 20 16:31:47 CEST 2012


Hi,

we have a problem where all the services checked around 00:01, 03:01, 06:01,
..., i.e. every three hours one minute after the hour, return a critical soft
state. Most of the times they go back to normal, however sometimes they also end
up in a hard state. You can imagine the rest...

We are running Nagios in a virtualized environment (vmware), on a SLES10 VM with
3GB of RAM and 4 vCPUs. The average load of the machine is about 5.

We did not succeed in reproducing network trouble when doing basic checks around
those times from and to other hosts. Indeed the VM running nagios experiences
packet loss somehow. Even when run on completely different Vmware hosts:

Tue Apr 17 21:02:01 CEST 2012
5000 packets transmitted, 4990 received, 0% packet loss, time 3840ms
–
5000 packets transmitted, 4998 received, 0% packet loss, time 2979ms
5000 packets transmitted, 4994 received, 0% packet loss, time 6190ms
–
Wed Apr 18 09:02:01 CEST 2012
5000 packets transmitted, 4999 received, 0% packet loss, time 5230ms
–
5000 packets transmitted, 4999 received, 0% packet loss, time 3340ms
–
5000 packets transmitted, 4979 received, 0% packet loss, time 11298ms
–
Wed Apr 18 12:02:01 CEST 2012
5000 packets transmitted, 4978 received, 0% packet loss, time 12764ms
–
Wed Apr 18 15:01:01 CEST 2012
5000 packets transmitted, 4987 received, 0% packet loss, time 4037ms
–
Wed Apr 18 15:02:01 CEST 2012
5000 packets transmitted, 4987 received, 0% packet loss, time 9010ms

Do you think this is related to Nagios? What could that be?

Here are some Nagios metrics:

Services Actively Checked:
<= 1 minute:       0 (0.0%)
<= 5 minutes:    2096 (78.3%)
<= 15 minutes:   2626 (98.1%)
<= 1 hour:    2665 (99.5%)
Since program start:  2666 (99.6%)

Metric                        Min.          Max.      Average
Check Execution Time:       0.00 sec     52.15 sec    1.133 sec
Check Latency:              0.00 sec     3.03 sec     0.183 sec
Percent State Change:       0.00%        64.54%         1.16%

Check Stats:
Type                Last 1 Min      Last 5 Min    Last 15 Min
Active Scheduled Host Checks     54        282     602
Active On-Demand Host Checks     25        123     405
Parallel Host Checks             56        290      614
Serial Host Checks                0        0        0
Cached Host Checks                23      115      387
Passive Host Checks               0        0        0
Active Scheduled Service Checks  987     4203       12647
Active On-Demand Service Checks   0        0        0
Cached Service Checks             0        0        0
Passive Service Checks           0        0        0
External Commands                0        0        0



Thanks

marki


------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list