critical soft state every 3 hours

Eduardo Silvestre eduardosilvestre at me.com
Sat Apr 21 16:28:33 CEST 2012


Did you have any task/cron running every 30 minutes?

What is the io wait of that vm?

On 20/04/2012, at 15:31, Marki <jm+nagios-users at roth.lu> wrote:

> Hi,
> 
> we have a problem where all the services checked around 00:01, 03:01, 06:01,
> ..., i.e. every three hours one minute after the hour, return a critical soft
> state. Most of the times they go back to normal, however sometimes they also end
> up in a hard state. You can imagine the rest...
> 
> We are running Nagios in a virtualized environment (vmware), on a SLES10 VM with
> 3GB of RAM and 4 vCPUs. The average load of the machine is about 5.
> 
> We did not succeed in reproducing network trouble when doing basic checks around
> those times from and to other hosts. Indeed the VM running nagios experiences
> packet loss somehow. Even when run on completely different Vmware hosts:
> 
> Tue Apr 17 21:02:01 CEST 2012
> 5000 packets transmitted, 4990 received, 0% packet loss, time 3840ms
>> 5000 packets transmitted, 4998 received, 0% packet loss, time 2979ms
> 5000 packets transmitted, 4994 received, 0% packet loss, time 6190ms
>> Wed Apr 18 09:02:01 CEST 2012
> 5000 packets transmitted, 4999 received, 0% packet loss, time 5230ms
>> 5000 packets transmitted, 4999 received, 0% packet loss, time 3340ms
>> 5000 packets transmitted, 4979 received, 0% packet loss, time 11298ms
>> Wed Apr 18 12:02:01 CEST 2012
> 5000 packets transmitted, 4978 received, 0% packet loss, time 12764ms
>> Wed Apr 18 15:01:01 CEST 2012
> 5000 packets transmitted, 4987 received, 0% packet loss, time 4037ms
>> Wed Apr 18 15:02:01 CEST 2012
> 5000 packets transmitted, 4987 received, 0% packet loss, time 9010ms
> 
> Do you think this is related to Nagios? What could that be?
> 
> Here are some Nagios metrics:
> 
> Services Actively Checked:
> <= 1 minute:       0 (0.0%)
> <= 5 minutes:    2096 (78.3%)
> <= 15 minutes:   2626 (98.1%)
> <= 1 hour:    2665 (99.5%)
> Since program start:  2666 (99.6%)
> 
> Metric                        Min.          Max.      Average
> Check Execution Time:       0.00 sec     52.15 sec    1.133 sec
> Check Latency:              0.00 sec     3.03 sec     0.183 sec
> Percent State Change:       0.00%        64.54%         1.16%
> 
> Check Stats:
> Type                Last 1 Min      Last 5 Min    Last 15 Min
> Active Scheduled Host Checks     54        282     602
> Active On-Demand Host Checks     25        123     405
> Parallel Host Checks             56        290      614
> Serial Host Checks                0        0        0
> Cached Host Checks                23      115      387
> Passive Host Checks               0        0        0
> Active Scheduled Service Checks  987     4203       12647
> Active On-Demand Service Checks   0        0        0
> Cached Service Checks             0        0        0
> Passive Service Checks           0        0        0
> External Commands                0        0        0
> 
> 
> 
> Thanks
> 
> marki
> 
> 
> ------------------------------------------------------------------------------
> For Developers, A Lot Can Happen In A Second.
> Boundary is the first to Know...and Tell You.
> Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
> http://p.sf.net/sfu/Boundary-d2dvs2
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
> ::: Messages without supporting info will risk being sent to /dev/null

------------------------------------------------------------------------------
For Developers, A Lot Can Happen In A Second.
Boundary is the first to Know...and Tell You.
Monitor Your Applications in Ultra-Fine Resolution. Try it FREE!
http://p.sf.net/sfu/Boundary-d2dvs2
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list