Nagios sometimes shows wrong status

Thomas Guyot-Sionnest dermoth at aei.ca
Wed May 27 13:43:28 CEST 2009


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 27/05/09 04:52 AM, Michael Prochaska wrote:
> Hi!
> 
> I've seen a strange behavior of nagios with a very simple check script.
> 
> the relevant part of the script:
> #########################################################################
> MAINTCNT="`/usr/sbin/metastat |grep -i maint |wc -l`"
> RESYNCNT="`/usr/sbin/metastat |grep -i resync |wc -l`"
> 
> NOTOK=0
> status=$STATE_UNKNOWN
> 
> if [ $RESYNCNT -gt 0 ]; then
>         NOTOK=1
>         TEXT="WARNING - One or more disks are in resync state. "
>         status=$STATE_WARNING
> fi
> 
> if [ $MAINTCNT -gt 0 ]; then
>         NOTOK=1
>         TEXT="CRITICAL - One or more disks are in maintenance state."
> status=$STATE_CRITICAL
> fi
> 
> 
> if [ $NOTOK -eq 1 ]; then
>         echo $TEXT
>         datum=`date`
>         echo $datum $status >> /tmp/svm.debug
>         exit $status
> fi
> 
> echo "OK - There is no maintenance necessary!"
> exit $STATE_OK
> 
> #########################################################################
> 
> when executing the script from command line, the return code always is 2
> and the output always is "CRITICAL - One or more disks are in maintenance
> state." (because there is one dead disk) => thats ok
> 
> when nagios executes the script, the output always is "CRITICAL - One or
> more disks are in maintenance state." but the return code sometimes is 0
> and sometimes is 2 => thats not good
> 
> snippet from nagios.log:
> [1243410051] SERVICE ALERT: acgweb1;BASIC_SVM;CRITICAL;SOFT;1;CRITICAL -
> One or more disks are in maintenance state.
> [1243410063] EXTERNAL COMMAND:
> SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243410061
> [1243410071] SERVICE ALERT: acgweb1;BASIC_SVM;OK;SOFT;2;CRITICAL - One or
> more disks are in maintenance state.
> [1243410083] EXTERNAL COMMAND:
> SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243410081
> [1243410091] SERVICE ALERT: acgweb1;BASIC_SVM;CRITICAL;SOFT;1;CRITICAL -
> One or more disks are in maintenance state.
> [1243410124] EXTERNAL COMMAND:
> SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243410122
> [1243410131] SERVICE ALERT: acgweb1;BASIC_SVM;OK;SOFT;2;CRITICAL - One or
> more disks are in maintenance state.
> [1243411031] SERVICE ALERT: acgweb1;BASIC_SVM;CRITICAL;SOFT;1;CRITICAL -
> One or more disks are in maintenance state.
> [1243411316] SERVICE ALERT: acgweb1;BASIC_SVM;OK;SOFT;2;CRITICAL - One or
> more disks are in maintenance state.
> [1243411323] EXTERNAL COMMAND:
> SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243411320
> [1243411326] SERVICE ALERT: acgweb1;BASIC_SVM;CRITICAL;SOFT;1;CRITICAL -
> One or more disks are in maintenance state.
> [1243411363] EXTERNAL COMMAND:
> SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243411361
> [1243411366] SERVICE ALERT: acgweb1;BASIC_SVM;OK;SOFT;2;CRITICAL - One or
> more disks are in maintenance state.
> [1243411370] EXTERNAL COMMAND:
> SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243411368
> [1243411376] SERVICE ALERT: acgweb1;BASIC_SVM;CRITICAL;SOFT;1;CRITICAL -
> One or more disks are in maintenance state.
> [1243411391] EXTERNAL COMMAND:
> SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243411389
> [1243411396] SERVICE ALERT: acgweb1;BASIC_SVM;CRITICAL;SOFT;2;CRITICAL -
> One or more disks are in maintenance state.
> [1243411398] EXTERNAL COMMAND:
> SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243411396
> [1243411406] SERVICE ALERT: acgweb1;BASIC_SVM;CRITICAL;SOFT;3;CRITICAL -
> One or more disks are in maintenance state.
> [1243411407] EXTERNAL COMMAND:
> SCHEDULE_SVC_CHECK;acgweb1;BASIC_SVM;1243411405
> 
> 
> 
> /tmp/svm.debug confirmes the command line result:
>> cat /tmp/svm.debug
> Wed May 27 08:21:33 GMT 2009 2
> Wed May 27 08:22:28 GMT 2009 2
> Wed May 27 08:22:39 GMT 2009 2
> Wed May 27 08:22:46 GMT 2009 2
> Wed May 27 08:23:00 GMT 2009 2
> Wed May 27 08:23:11 GMT 2009 2
> Wed May 27 08:23:46 GMT 2009 2
> Wed May 27 08:24:01 GMT 2009 2
> Wed May 27 08:27:09 GMT 2009 2
> Wed May 27 08:27:19 GMT 2009 2
> Wed May 27 08:27:35 GMT 2009 2
> Wed May 27 08:27:50 GMT 2009 2
> Wed May 27 08:27:56 GMT 2009 2
> Wed May 27 08:29:01 GMT 2009 2
> Wed May 27 08:32:55 GMT 2009 2
> Wed May 27 08:34:01 GMT 2009 2
> Wed May 27 08:37:55 GMT 2009 2
> Wed May 27 08:39:01 GMT 2009 2
> Wed May 27 08:39:55 GMT 2009 2
> Wed May 27 08:44:01 GMT 2009 2
> Wed May 27 08:44:55 GMT 2009 2

The times in your nagios log are between Wed May 27 07:40:51 2009 and
Wed May 27 08:03:27 2009. Could you send matching logs?

- --
Thomas
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFKHSdg6dZ+Kt5BchYRAsDmAKDhynEcZ5WwKoIU8VIxLbUm1IFaIACgmh9q
NKYXWWjnmdR/wTG77YmD22Y=
=mVtr
-----END PGP SIGNATURE-----

------------------------------------------------------------------------------
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT 
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, & 
iPhoneDevCamp as they present alongside digital heavyweights like Barbarian 
Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com 




More information about the Developers mailing list