Nagios sometimes shows wrong status

Michael Prochaska michael at prochas.net
Thu May 28 07:27:38 CEST 2009


Hi!
> Whilst I'm not sure it has anything to do with your issue, nagios
> executes scripts without an environment defined (usually), which means
> just calling "grep" will not find the path.  You should define full
> path to executables whenever possible.  Of course, you said "relevant
> part of the script" which could imply you've defined the path earlier
> on.
>
...
> Definetly an odd outcome.  What about dumping the content of the
> metastat calls, and the variables you've assigned?  That way you can
> see what nagios is actually seeing. ie:
>
> echo "====" >> /tmp/svm.debug
> datum=`date`
> META=`/usr/sbin/metastat`
> echo ${datum} ${META} >>  /tmp/svm.debug
> MAINTCNT="`/usr/sbin/metastat |grep -i maint |wc -l`"
> RESYNCNT="`/usr/sbin/metastat |grep -i resync |wc -l`"
> echo ${datum} ${MAINTCNT} >> /tmp/svm.debug
> echo ${datum} ${RESYNCNT} >> /tmp/svm.debug
>
> [.. rest of script ..]
>

ok, i've changed the script:

echo "====" >> /tmp/svm.debug
datum=`/usr/bin/date`
#META=`/usr/sbin/metastat`
META=`/usr/bin/cat /export/home/nagios/metastat.out`
echo ${datum} ${META} >>  /tmp/svm.debug
MAINTCNT="`/usr/bin/cat /export/home/nagios/metastat.out | /usr/bin/grep
-i maint | /usr/bin/wc -l`"
RESYNCNT="`/usr/sbin/metastat | /usr/bin/grep -i resync | /usr/bin/wc -l`"
echo ${datum} ${MAINTCNT} >> /tmp/svm.debug
echo ${datum} ${RESYNCNT} >> /tmp/svm.debug


NOTOK=0
status=$STATE_UNKNOWN

if [ $RESYNCNT -gt 0 ]; then
        NOTOK=1
        TEXT="WARNING - One or more disks are in resync state."
        status=$STATE_WARNING
fi

if [ $MAINTCNT -gt 0 ]; then
        NOTOK=1
        TEXT="CRITICAL - One or more disks are in maintenance state."
        status=$STATE_CRITICAL
fi


if [ $NOTOK -eq 1 ]; then
        echo $TEXT
        datum=`date`
        echo $datum $status >> /tmp/svm.debug
        exit $status
fi

echo "OK - There is no maintenance necessary!"
exit $STATE_OK

##############################################################

i've dumped the metastat output to a file before the disk has been
replaced...

/export/home/nagios/metastat.out:
d31: Mirror
    Submirror 0: d41
      State: Okay
    Submirror 1: d51
      State: Okay
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 58542696 blocks (27 GB)

d41: Submirror of d31
    State: Okay
    Size: 58542696 blocks (27 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c1t1d0s1          0     No            Okay   Yes


d51: Submirror of d31
    State: Okay
    Size: 58542696 blocks (27 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c1t3d0s1          0     No            Okay   Yes


d30: Mirror
    Submirror 0: d40
      State: Okay
    Submirror 1: d50
      State: Okay
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 12584484 blocks (6.0 GB)

d40: Submirror of d30
    State: Okay
    Size: 12584484 blocks (6.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c1t1d0s0          0     No            Okay   Yes


d50: Submirror of d30
    State: Okay
    Size: 12584484 blocks (6.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c1t3d0s0          0     No            Okay   Yes


d7: Mirror
    Submirror 0: d17
      State: Needs maintenance
    Submirror 1: d27
      State: Okay
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 41679603 blocks (19 GB)

d17: Submirror of d7
    State: Unavailable
    Size: 41679603 blocks (19 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c1t0d0s7          0     No               -   Yes


d27: Submirror of d7
    State: Okay
    Size: 41679603 blocks (19 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c1t2d0s7          0     No            Okay   Yes


d3: Mirror
    Submirror 0: d13
      State: Needs maintenance
    Submirror 1: d23
      State: Okay
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 12584484 blocks (6.0 GB)

d13: Submirror of d3
    State: Unavailable
    Size: 12584484 blocks (6.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c1t0d0s3          0     No               -   Yes


d23: Submirror of d3
    State: Okay
    Size: 12584484 blocks (6.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c1t2d0s3          0     No            Okay   Yes


d1: Mirror
    Submirror 0: d11
      State: Needs maintenance
    Submirror 1: d21
      State: Okay
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 8389656 blocks (4.0 GB)

d11: Submirror of d1
    State: Unavailable
    Size: 8389656 blocks (4.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c1t0d0s1          0     No               -   Yes


d21: Submirror of d1
    State: Okay
    Size: 8389656 blocks (4.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c1t2d0s1          0     No            Okay   Yes


d0: Mirror
    Submirror 0: d10
      State: Needs maintenance
    Submirror 1: d20
      State: Okay
    Pass: 1
    Read option: roundrobin (default)
    Write option: parallel (default)
    Size: 8389656 blocks (4.0 GB)

d10: Submirror of d0
    State: Unavailable
    Size: 8389656 blocks (4.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c1t0d0s0          0     No               -   Yes


d20: Submirror of d0
    State: Okay
    Size: 8389656 blocks (4.0 GB)
    Stripe 0:
        Device     Start Block  Dbase        State Reloc Hot Spare
        c1t2d0s0          0     No            Okay   Yes


Device Relocation Information:
Device   Reloc  Device ID
c1t3d0   Yes    id1,sd at SSEAGATE_ST336607LSUN36G_3JA6D4V10000741764YM
c1t1d0   Yes    id1,sd at SSEAGATE_ST336607LSUN36G_3JA682YC00007413GYMP
c1t2d0   Yes    id1,sd at SSEAGATE_ST336607LSUN36G_3JA6D2D100007417657P
c1t0d0   Yes    id1,sd at SSEAGATE_ST336607LSUN36G_3JA67ZCV00007413GY6X



svm.debug:
====
Thu May 28 05:22:19 GMT 2009 d31: Mirror Submirror 0: d41 State: Okay
Submirror 1: d51 State: Okay Pass: 1 Read option: roundrobin (default)
Write option: parallel (default) Size: 58542696 blocks (27 GB) d41:
Submirror of d31 State: Okay Size: 58542696 blocks (27 GB) Stripe 0:
Device Start Block Dbase State Reloc Hot Spare c1t1d0s1 0 No Okay Yes d51:
Submirror of d31 State: Okay Size: 58542696 blocks (27 GB) Stripe 0:
Device Start Block Dbase State Reloc Hot Spare c1t3d0s1 0 No Okay Yes d30:
Mirror Submirror 0: d40 State: Okay Submirror 1: d50 State: Okay Pass: 1
Read option: roundrobin (default) Write option: parallel (default) Size:
12584484 blocks (6.0 GB) d40: Submirror of d30 State: Okay Size: 12584484
blocks (6.0 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare
c1t1d0s0 0 No Okay Yes d50: Submirror of d30 State: Okay Size: 12584484
blocks (6.0 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare
c1t3d0s0 0 No Okay Yes d7: Mirror Submirror 0: d17 State: Needs
maintenance Submirror 1: d27 State: Okay Pass: 1 Read option: roundrobin
(default) Write option: parallel (default) Size: 41679603 blocks (19 GB)
d17: Submirror of d7 State: Unavailable Size: 41679603 blocks (19 GB)
Stripe 0: Device Start Block Dbase State Reloc Hot Spare c1t0d0s7 0 No -
Yes d27: Submirror of d7 State: Okay Size: 41679603 blocks (19 GB) Stripe
0: Device Start Block Dbase State Reloc Hot Spare c1t2d0s7 0 No Okay Yes
d3: Mirror Submirror 0: d13 State: Needs maintenance Submirror 1: d23
State: Okay Pass: 1 Read option: roundrobin (default) Write option:
parallel (default) Size: 12584484 blocks (6.0 GB) d13: Submirror of d3
State: Unavailable Size: 12584484 blocks (6.0 GB) Stripe 0: Device Start
Block Dbase State Reloc Hot Spare c1t0d0s3 0 No - Yes d23: Submirror of d3
State: Okay Size: 12584484 blocks (6.0 GB) Stripe 0: Device Start Block
Dbase State Reloc Hot Spare c1t2d0s3 0 No Okay Yes d1: Mirror Submirror 0:
d11 State: Needs maintenance Submirror 1: d21 State: Okay Pass: 1 Read
option: roundrobin (default) Write option: parallel (default) Size:
8389656 blocks (4.0 GB) d11: Submirror of d1 State: Unavailable Size:
8389656 blocks (4.0 GB) Stripe 0: Device Start Block Dbase State Reloc Hot
Spare c1t0d0s1 0 No - Yes d21: Submirror of d1 State: Okay Size: 8389656
blocks (4.0 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare
c1t2d0s1 0 No Okay Yes d0: Mirror Submirror 0: d10 State: Needs
maintenance Submirror 1: d20 State: Okay Pass: 1 Read option: roundrobin
(default) Write option: parallel (default) Size: 8389656 blocks (4.0 GB)
d10: Submirror of d0 State: Unavailable Size: 8389656 blocks (4.0 GB)
Stripe 0: Device Start Block Dbase State Reloc Hot Spare c1t0d0s0 0 No -
Yes d20: Submirror of d0 State: Okay Size: 8389656 blocks (4.0 GB) Stripe
0: Device Start Block Dbase State Reloc Hot Spare c1t2d0s0 0 No Okay Yes
Device Relocation Information: Device Reloc Device ID c1t3d0 Yes
id1,sd at SSEAGATE_ST336607LSUN36G_3JA6D4V10000741764YM c1t1d0 Yes
id1,sd at SSEAGATE_ST336607LSUN36G_3JA682YC00007413GYMP c1t2d0 Yes
id1,sd at SSEAGATE_ST336607LSUN36G_3JA6D2D100007417657P c1t0d0 Yes
id1,sd at SSEAGATE_ST336607LSUN36G_3JA67ZCV00007413GY6X
Thu May 28 05:22:19 GMT 2009 4
Thu May 28 05:22:19 GMT 2009 0
Thu May 28 05:22:19 GMT 2009 2
        ====
Thu May 28 05:22:28 GMT 2009 d31: Mirror Submirror 0: d41 State: Okay
Submirror 1: d51 State: Okay Pass: 1 Read option: roundrobin (default)
Write option: parallel (default) Size: 58542696 blocks (27 GB) d41:
Submirror of d31 State: Okay Size: 58542696 blocks (27 GB) Stripe 0:
Device Start Block Dbase State Reloc Hot Spare c1t1d0s1 0 No Okay Yes d51:
Submirror of d31 State: Okay Size: 58542696 blocks (27 GB) Stripe 0:
Device Start Block Dbase State Reloc Hot Spare c1t3d0s1 0 No Okay Yes d30:
Mirror Submirror 0: d40 State: Okay Submirror 1: d50 State: Okay Pass: 1
Read option: roundrobin (default) Write option: parallel (default) Size:
12584484 blocks (6.0 GB) d40: Submirror of d30 State: Okay Size: 12584484
blocks (6.0 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare
c1t1d0s0 0 No Okay Yes d50: Submirror of d30 State: Okay Size: 12584484
blocks (6.0 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare
c1t3d0s0 0 No Okay Yes d7: Mirror Submirror 0: d17 State: Needs
maintenance Submirror 1: d27 State: Okay Pass: 1 Read option: roundrobin
(default) Write option: parallel (default) Size: 41679603 blocks (19 GB)
d17: Submirror of d7 State: Unavailable Size: 41679603 blocks (19 GB)
Stripe 0: Device Start Block Dbase State Reloc Hot Spare c1t0d0s7 0 No -
Yes d27: Submirror of d7 State: Okay Size: 41679603 blocks (19 GB) Stripe
0: Device Start Block Dbase State Reloc Hot Spare c1t2d0s7 0 No Okay Yes
d3: Mirror Submirror 0: d13 State: Needs maintenance Submirror 1: d23
State: Okay Pass: 1 Read option: roundrobin (default) Write option:
parallel (default) Size: 12584484 blocks (6.0 GB) d13: Submirror of d3
State: Unavailable Size: 12584484 blocks (6.0 GB) Stripe 0: Device Start
Block Dbase State Reloc Hot Spare c1t0d0s3 0 No - Yes d23: Submirror of d3
State: Okay Size: 12584484 blocks (6.0 GB) Stripe 0: Device Start Block
Dbase State Reloc Hot Spare c1t2d0s3 0 No Okay Yes d1: Mirror Submirror 0:
d11 State: Needs maintenance Submirror 1: d21 State: Okay Pass: 1 Read
option: roundrobin (default) Write option: parallel (default) Size:
8389656 blocks (4.0 GB) d11: Submirror of d1 State: Unavailable Size:
8389656 blocks (4.0 GB) Stripe 0: Device Start Block Dbase State Reloc Hot
Spare c1t0d0s1 0 No - Yes d21: Submirror of d1 State: Okay Size: 8389656
blocks (4.0 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare
c1t2d0s1 0 No Okay Yes d0: Mirror Submirror 0: d10 State: Needs
maintenance Submirror 1: d20 State: Okay Pass: 1 Read option: roundrobin
(default) Write option: parallel (default) Size: 8389656 blocks (4.0 GB)
d10: Submirror of d0 State: Unavailable Size: 8389656 blocks (4.0 GB)
Stripe 0: Device Start Block Dbase State Reloc Hot Spare c1t0d0s0 0 No -
Yes d20: Submirror of d0 State: Okay Size: 8389656 blocks (4.0 GB) Stripe
0: Device Start Block Dbase State Reloc Hot Spare c1t2d0s0 0 No Okay Yes
Device Relocation Information: Device Reloc Device ID c1t3d0 Yes
id1,sd at SSEAGATE_ST336607LSUN36G_3JA6D4V10000741764YM c1t1d0 Yes
id1,sd at SSEAGATE_ST336607LSUN36G_3JA682YC00007413GYMP c1t2d0 Yes
id1,sd at SSEAGATE_ST336607LSUN36G_3JA6D2D100007417657P c1t0d0 Yes
id1,sd at SSEAGATE_ST336607LSUN36G_3JA67ZCV00007413GY6X
Thu May 28 05:22:28 GMT 2009 4
Thu May 28 05:22:28 GMT 2009 0
Thu May 28 05:22:28 GMT 2009 2


any ideas?

best regards,
michael


------------------------------------------------------------------------------
Register Now for Creativity and Technology (CaT), June 3rd, NYC. CaT 
is a gathering of tech-side developers & brand creativity professionals. Meet
the minds behind Google Creative Lab, Visual Complexity, Processing, & 
iPhoneDevCamp as they present alongside digital heavyweights like Barbarian 
Group, R/GA, & Big Spaceship. http://p.sf.net/sfu/creativitycat-com 




More information about the Developers mailing list