reload appears to cause force of DOWN; SOFT; x to DOWN; HARD; 1

Sean McKell mckell at us.ibm.com
Wed Jun 19 00:34:15 CEST 2013


> Do you have this in nagios.cfg?
> retain_state_information=1

yes, i do have that set




From:   nagios-users-request at lists.sourceforge.net
To:     nagios-users at lists.sourceforge.net, 
Date:   06/18/2013 01:56 PM
Subject:        Nagios-users Digest, Vol 85, Issue 6



Send Nagios-users mailing list submissions to
                 nagios-users at lists.sourceforge.net

To subscribe or unsubscribe via the World Wide Web, visit
                 https://lists.sourceforge.net/lists/listinfo/nagios-users
or, via email, send a message with subject or body 'help' to
                 nagios-users-request at lists.sourceforge.net

You can reach the person managing the list at
                 nagios-users-owner at lists.sourceforge.net

When replying, please edit your Subject line so it is more specific
than "Re: Contents of Nagios-users digest..."


Today's Topics:

   1. reload appears to cause force of DOWN; SOFT;               x to 
DOWN; HARD;
      1 (Sean McKell)
   2. Re: reload appears to cause force of DOWN; SOFT; x to DOWN;
      HARD; 1 (Travis Runyard)
   3. Re: Issues with NEB modules breaking after restart
      (Andrew Widdersheim)
   4. Functions to do Availibility in reporting (omar saddiki)
   5. Fwd: Functions to do Availibility in reporting (omar saddiki)
   6. Wmi (martin Rodriguez)
   7. Re: Wmi (Sunil Sankar)
   8. check_ntp_time offset unknown (Bennett, Jan)
   9. Re: check_ntp_time offset unknown (Holger Wei?)
  10. Re: check_ntp_time offset unknown (Giles Coochey)
  11. Problem with check_openmanage plugin and storage (Nic Bernstein)


----------------------------------------------------------------------

Message: 1
Date: Thu, 13 Jun 2013 17:31:44 -0600
From: Sean McKell <mckell at us.ibm.com>
Subject: [Nagios-users] reload appears to cause force of DOWN; SOFT;  x
                 to DOWN; HARD; 1
To: nagios-users at lists.sourceforge.net
Message-ID:
 <OF17CEA331.79DB0522-ON87257B89.0080C0E1-87257B89.0081405C at us.ibm.com>
Content-Type: text/plain; charset="us-ascii"

Running 3.4.1:
I see this strange anomaly, where a host check is in the middle of doing 
retries before hitting max_attempts, but after a server reload occurs, the 

next check is automatically forced to DOWN;HARD;1, as seen here:

[2013-06-04 08:40:21] HOST ALERT: 5gt4;DOWN;SOFT;1;CRITICAL: Connection 
timed out to '' after 160 seconds (user 'chk'). Expected prompt not found. 

Last output was ''.
[2013-06-04 08:47:18] HOST ALERT: 5gt4;DOWN;SOFT;2;CRITICAL: Connection 
timed out to '' after 160 seconds (user 'chk'). Expected prompt not found. 

Last output was ''.
[2013-06-04 08:54:03] HOST ALERT: 5gt4;DOWN;SOFT;3;CRITICAL: Connection 
timed out to '' after 160 seconds (user 'chk'). Expected prompt not found. 

Last output was ''.
(reload happens here)
[2013-06-04 09:00:52] HOST ALERT: 5gt4;DOWN;HARD;1;CRITICAL: Connection 
timed out to '' after 160 seconds (user 'chk'). Expected prompt not found. 

Last output was ''.

Why is it skipping the rest of the attempts and going straight to 
DOWN;HARD after the reload ?
Seems like a bug to me.
-------------- next part --------------
An HTML attachment was scrubbed...

------------------------------

Message: 2
Date: Thu, 13 Jun 2013 21:39:48 -0700
From: Travis Runyard <travisrunyard at gmail.com>
Subject: Re: [Nagios-users] reload appears to cause force of DOWN;
                 SOFT; x to DOWN; HARD; 1
To: Nagios Users List <nagios-users at lists.sourceforge.net>
Message-ID:
 <CANCZ1yG6CYiE2GYL3j5W3Gj9WjrTz4SmGONnaZUxbL5piUB=zA at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Do you have this in nagios.cfg?
retain_state_information=1


On Thu, Jun 13, 2013 at 4:31 PM, Sean McKell <mckell at us.ibm.com> wrote:

> Running 3.4.1:
> I see this strange anomaly, where a host check is in the middle of doing
> retries before hitting max_attempts, but after a server reload occurs, 
the
> next check is automatically forced to DOWN;HARD;1, as seen here:
>
> [2013-06-04 08:40:21] HOST ALERT: 5gt4;DOWN;SOFT;1;CRITICAL: Connection
> timed out to '' after 160 seconds (user 'chk'). Expected prompt not 
found.
> Last output was ''.
> [2013-06-04 08:47:18] HOST ALERT: 5gt4;DOWN;SOFT;2;CRITICAL: Connection
> timed out to '' after 160 seconds (user 'chk'). Expected prompt not 
found.
> Last output was ''.
> [2013-06-04 08:54:03] HOST ALERT: 5gt4;DOWN;SOFT;3;CRITICAL: Connection
> timed out to '' after 160 seconds (user 'chk'). Expected prompt not 
found.
> Last output was ''.
> (reload happens here)
> [2013-06-04 09:00:52] HOST ALERT: 5gt4;DOWN;HARD;1;CRITICAL: Connection
> timed out to '' after 160 seconds (user 'chk'). Expected prompt not 
found.
> Last output was ''.
>
> Why is it skipping the rest of the attempts and going straight to
> DOWN;HARD after the reload ?
> Seems like a bug to me.
>
>
> 
------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>
-------------- next part --------------
An HTML attachment was scrubbed...

------------------------------

Message: 3
Date: Fri, 14 Jun 2013 13:03:56 -0400
From: Andrew Widdersheim <awiddersheim at hotmail.com>
Subject: Re: [Nagios-users] Issues with NEB modules breaking after
                 restart
To: "nagios-users at lists.sourceforge.net"
                 <nagios-users at lists.sourceforge.net>
Message-ID: <SNT143-W535DE68DF8CC060F587EF0DD800 at phx.gbl>
Content-Type: text/plain; charset="iso-8859-1"

<div>To answer my own question... I'm pretty sure two nagios instances 
were spawned at once. The nagios init script that comes with nagios-core 
is the best at handling this situation.</div>
  


------------------------------

Message: 4
Date: Mon, 17 Jun 2013 15:21:37 +0000
From: omar saddiki <omar.saddiki at gmail.com>
Subject: [Nagios-users] Functions to do Availibility in reporting
To: Nagios Users List <nagios-users at lists.sourceforge.net>
Message-ID:
 <CAN5T1CHYs_w4t0=muvDosc+KsjsLf5yW305X3-K1ZrkVtPNGgQ at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi,

Please, someone can give me the function used by Nagios in reporting 
onglet
to extract the availibility between two times.

Regards
 SADDIKI
-------------- next part --------------
An HTML attachment was scrubbed...

------------------------------

Message: 5
Date: Mon, 17 Jun 2013 15:42:17 +0000
From: omar saddiki <omar.saddiki at gmail.com>
Subject: [Nagios-users] Fwd: Functions to do Availibility in reporting
To: Nagios Users List <nagios-users at lists.sourceforge.net>
Message-ID:
 <CAN5T1CHOYvGnu8Z8Q_bbrtJe8A7=phdCNErWmN9cAjX59eU8wA at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi,

Please, someone can give me the function used by Nagios in reporting 
onglet
to extract the availibility between two times.

Regards
 SADDIKI
-------------- next part --------------
An HTML attachment was scrubbed...

------------------------------

Message: 6
Date: Mon, 17 Jun 2013 15:14:24 -0300
From: martin Rodriguez <maestin at gmail.com>
Subject: [Nagios-users] Wmi
To: nagios-users at lists.sourceforge.net
Message-ID:
 <CACrJBAsbWM8wVuPasjJQp0VumJZw5aj_qN6DGS+OHeZTMfmEXg at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

Hi I am installing Nagios 3.4.3 on ubuntu and I can not configure the
plugin check_wmi_plus.conf someone had expereince in this topic
-------------- next part --------------
An HTML attachment was scrubbed...

------------------------------

Message: 7
Date: Tue, 18 Jun 2013 00:14:07 +0530
From: Sunil Sankar <sunil at sunil.cc>
Subject: Re: [Nagios-users] Wmi
To: Nagios Users List <nagios-users at lists.sourceforge.net>
Message-ID:
 <CAPqUM3W+mo5bRRoi6dxAwSdLPs87poqqQZHiJdQWVDh-7c5QhA at mail.gmail.com>
Content-Type: text/plain; charset="iso-8859-1"

What is the error you are getting


On Mon, Jun 17, 2013 at 11:44 PM, martin Rodriguez 
<maestin at gmail.com>wrote:

> Hi I am installing Nagios 3.4.3 on ubuntu and I can not configure the
> plugin check_wmi_plus.conf someone had expereince in this topic
>
>
> 
------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when
> reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>



-- 
Regards
Sunil Sankar
-------------- next part --------------
An HTML attachment was scrubbed...

------------------------------

Message: 8
Date: Fri, 14 Jun 2013 14:10:43 +0000
From: "Bennett, Jan" <JBennett at ntta.org>
Subject: [Nagios-users] check_ntp_time offset unknown
To: "'nagios-users at lists.sourceforge.net'"
                 <nagios-users at lists.sourceforge.net>
Message-ID:
 <E11B0F59D3334D469B36FCA07490BA8C186E67EF at NTTAEXMB01.ntta.local>
Content-Type: text/plain; charset="us-ascii"

We have implemented a NTP sync check in all of the NRDS checks that we are 
rolling out right now but I've run into a bit of a snag.

I am getting returns of 'Offset Unknown' on all clients.  It appears to 
only happen for a short period of time (30 min or so) and then it will 
clear its self up for a bit but the issue will always return.

>From the client that is reporting the unknown offset, I can run the 
following:

# ./check_ntp_time -H localhost
NTP CRITICAL: Offset unknown|
# ./check_ntp_time -V
check_ntp_time v1.4.16 (nagios-plugins 1.4.16)
# ntpdc -p
     remote           local      st poll reach  delay   offset    disp
=======================================================================
=LOCAL(0)        127.0.0.1       10   64   17 0.00000  0.000000 0.96858
*timeserver1     xxx.xxx.xxx.xxx    2   64   17 0.00098  4.956048 0.00580
# /usr/local/nagios/libexec/check_ntp_time -v -H localhost
sending request to peer 0
response from peer 0: offset -2.777669579e-07
sending request to peer 0
response from peer 0: offset -2.161832526e-07
sending request to peer 0
response from peer 0: offset -4.009343684e-07
sending request to peer 0
response from peer 0: offset -1.987209544e-07
discarding peer 0: stratum=0
overall average offset: 0
NTP CRITICAL: Offset unknown|

In my searches, I noticed a number of people reporting the same issue with 
the supposed solution being to update your Nagios plugins to 1.4.13.  I 
have done so and am now running 1.4.16 without any change in the service 
check.

Also, I am unable to check a remote NTP server from these clients as they 
do not have access to the outside world.

It has been suggested that the stratum=0 may be the culprit, but I'm not 
sure of my options here.

Any help would be greatly appreciated.

Jan

-------------- next part --------------
An HTML attachment was scrubbed...

------------------------------

Message: 9
Date: Tue, 18 Jun 2013 17:24:50 +0200
From: Holger Wei? <holger at cis.fu-berlin.de>
Subject: Re: [Nagios-users] check_ntp_time offset unknown
To: Nagios Users <nagios-users at lists.sourceforge.net>
Message-ID: <20130618152450.GA678632 at zedat.fu-berlin.de>
Content-Type: text/plain; charset=iso-8859-1

* Bennett, Jan <JBennett at ntta.org> [2013-06-14 14:10]:
> # ./check_ntp_time -H localhost
> NTP CRITICAL: Offset unknown|

Could you please run "ntpq -c rv" when this happens and post the output?

> It has been suggested that the stratum=0 may be the culprit, but I'm not 
sure of my options here.

Yes, stratum=0 is the culprit.  An NTP server wouldn't usually report
such a stratum value.

Holger

-- 
Holger Wei?               | Freie Universit?t Berlin
holger at zedat.fu-berlin.de | Zentraleinrichtung f?r Datenverarbeitung 
(ZEDAT)
Telefon: +49 30 838-55949 | Fabeckstra?e 32, 14195 Berlin (Germany)
Telefax: +49 30 838455949 | https://www.zedat.fu-berlin.de/



------------------------------

Message: 10
Date: Tue, 18 Jun 2013 16:35:03 +0100
From: Giles Coochey <giles at coochey.net>
Subject: Re: [Nagios-users] check_ntp_time offset unknown
To: nagios-users at lists.sourceforge.net
Message-ID: <51C07E27.7000400 at coochey.net>
Content-Type: text/plain; charset="iso-8859-1"

On 14/06/2013 15:10, Bennett, Jan wrote:
>
> We have implemented a NTP sync check in all of the NRDS checks that we 
> are rolling out right now but I've run into a bit of a snag.
>
> I am getting returns of 'Offset Unknown' on all clients.  It appears 
> to only happen for a short period of time (30 min or so) and then it 
> will clear its self up for a bit but the issue will always return.
>
> From the client that is reporting the unknown offset, I can run the 
> following:
>
> # ./check_ntp_time -H localhost
> NTP CRITICAL: Offset unknown|
> # ./check_ntp_time -V
> check_ntp_time v1.4.16 (nagios-plugins 1.4.16)
> # ntpdc -p
>      remote           local     st poll reach  delay   offset    disp
> =======================================================================
> =LOCAL(0)        127.0.0.1    10   64   17 0.00000  0.000000 0.96858
> *timeserver1  xxx.xxx.xxx.xxx    2   64   17 0.00098  4.956048 0.00580
>
> # /usr/local/nagios/libexec/check_ntp_time -v -H localhost
> sending request to peer 0
> response from peer 0: offset -2.777669579e-07
> sending request to peer 0
> response from peer 0: offset -2.161832526e-07
> sending request to peer 0
> response from peer 0: offset -4.009343684e-07
> sending request to peer 0
> response from peer 0: offset -1.987209544e-07
> discarding peer 0: stratum=0
> overall average offset: 0
> NTP CRITICAL: Offset unknown|
>
> In my searches, I noticed a number of people reporting the same issue 
> with the supposed solution being to update your Nagios plugins to 
> 1.4.13.  I have done so and am now running 1.4.16 without any change 
> in the service check.
>
> Also, I am unable to check a remote NTP server from these clients as 
> they do not have access to the outside world.
>
> It has been suggested that the stratum=0 may be the culprit, but I'm 
> not sure of my options here.
>
> Any help would be greatly appreciated.
>
>
I get this shortly after a NTP client has booted up. Once NTP has been 
running for a while it goes away.

-- 
Regards,

Giles Coochey, CCNP, CCNA, CCNAS
NetSecSpec Ltd
+44 (0) 7983 877438
http://www.coochey.net
http://www.netsecspec.co.uk
giles at coochey.net

-------------- next part --------------
An HTML attachment was scrubbed...
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 4968 bytes
Desc: S/MIME Cryptographic Signature

------------------------------

Message: 11
Date: Tue, 18 Jun 2013 11:03:32 -0500
From: Nic Bernstein <nic at onlight.com>
Subject: [Nagios-users] Problem with check_openmanage plugin and
                 storage
To: nagios-users at lists.sourceforge.net
Message-ID: <51C084D4.8020104 at onlight.com>
Content-Type: text/plain; charset="utf-8"

We've recently been experimenting with Trond Hasle Amundsen's
check_openmanage on a large network with about a hundred Dell servers of
various ages, capabilities, etc.  Mostly PE-2950, R210, R410 and R720. 
Much thanks to Trond for all his great work on Nagios plugins and other
projects, by the way.

We've hit a wall, however, with the storage monitoring aspects of this
plugin.

For example, here's a quite specific case.  This is a new PE R720, in 
debug:

    onlight at monitor:~$ check_openmanage -H host -C secret -d
       System:      PowerEdge R720           OMSA version:    7.1.0
       ServiceTag:  #######                  Plugin version:  3.7.9
       BIOS/date:   1.2.6 05/10/2012         Checking mode:   SNMPv2c 
UDP/IPv4
 
-----------------------------------------------------------------------------
       Storage Components  
 
=============================================================================
      STATE  |    ID    |  MESSAGE TEXT  
 
---------+----------+--------------------------------------------------------
          OK |        0 | Controller 0 [PERC H310 Mini] is Ready
     WARNING |  0:0:1:0 | Physical Disk 0:1:0 [Ata ST2000DM001-9YN164, 
2.0TB] on ctrl 0 is Online, Not Certified
     WARNING |  0:0:1:1 | Physical Disk 0:1:1 [Ata ST2000DM001-9YN164, 
2.0TB] on ctrl 0 is Online, Not Certified
          OK |      0:0 | Logical Drive '/dev/sda' [RAID-1, 1862.50 GB] is 
Ready
          OK |      0:0 | Connector 0 [SAS] on controller 0 is Ready
          OK |      0:1 | Connector 1 [SAS] on controller 0 is Ready
          OK |    0:0:1 | Enclosure 0:0:1 [Backplane] on controller 0 is 
Ready
 
-----------------------------------------------------------------------------
       Chassis Components  
 
=============================================================================
      STATE  |  ID  |  MESSAGE TEXT  
 
---------+------+------------------------------------------------------------
          OK |    0 | Memory module 0 [DIMM_A1, 4096 MB] is Ok
          OK |    1 | Memory module 1 [DIMM_A2, 4096 MB] is Ok
          OK |    2 | Memory module 2 [DIMM_A3, 4096 MB] is Ok
          OK |    3 | Memory module 3 [DIMM_A4, 4096 MB] is Ok
          OK |    0 | Chassis fan 0 [System Board Fan1 RPM] reading: 1200 
RPM
          OK |    1 | Chassis fan 1 [System Board Fan2 RPM] reading: 1080 
RPM
          OK |    2 | Chassis fan 2 [System Board Fan3 RPM] reading: 1200 
RPM
          OK |    3 | Chassis fan 3 [System Board Fan4 RPM] reading: 1080 
RPM
          OK |    4 | Chassis fan 4 [System Board Fan5 RPM] reading: 1080 
RPM
          OK |    5 | Chassis fan 5 [System Board Fan6 RPM] reading: 1080 
RPM
          OK |    0 | Power Supply 0 [AC]: Presence detected
          OK |    0 | Temperature Probe 0 [System Board Inlet Temp] reads 
26 C (min=3/-7, max=42/47)
          OK |    1 | Temperature Probe 1 [System Board Exhaust Temp] 
reads 33 C (min=8/3, max=70/75)
          OK |    2 | Temperature Probe 2 [CPU1 Temp] reads 49 C (min=8/3, 
max=83/88)
          OK |    0 | Processor 0 [Intel Xeon E5-2603 0 1.80GHz] is 
Present
          OK |    0 | Voltage sensor 0 [CPU1 VCORE PG] is Good
          OK |    1 | Voltage sensor 1 [System Board 3.3V PG] is Good
          OK |    2 | Voltage sensor 2 [System Board 5V PG] is Good
          OK |    3 | Voltage sensor 3 [CPU1 PLL PG] is Good
          OK |    4 | Voltage sensor 4 [System Board 1.1V PG] is Good
          OK |    5 | Voltage sensor 5 [CPU1 M23 VDDQ PG] is Good
          OK |    6 | Voltage sensor 6 [CPU1 M23 VTT PG] is Good
          OK |    7 | Voltage sensor 7 [System Board FETDRV PG] is Good
          OK |    8 | Voltage sensor 8 [CPU1 VSA PG] is Good
          OK |    9 | Voltage sensor 9 [CPU1 M01 VDDQ PG] is Good
          OK |   10 | Voltage sensor 10 [System Board NDC PG] is Good
          OK |   11 | Voltage sensor 11 [CPU1 VTT PG] is Good
          OK |   12 | Voltage sensor 12 [System Board 1.5V PG] is Good
          OK |   13 | Voltage sensor 13 [PS2 PG Fail] is Good
          OK |   14 | Voltage sensor 14 [System Board PS1 PG Fail] is Good
          OK |   15 | Voltage sensor 15 [System Board BP1 5V PG] is Good
          OK |   16 | Voltage sensor 16 [CPU1 M01 VTT PG] is Good
          OK |   17 | Voltage sensor 17 [PS1 Voltage 1] reads 114 V
          OK |    0 | Battery probe 0 [System Board CMOS Battery] is 
Presence Detected
          OK |    0 | Amperage probe 0 [PS1 Current 1] reads 0.6 A
          OK |    1 | Amperage probe 1 [System Board Pwr Consumption] 
reads 56 W
          OK |    0 | Chassis intrusion 0 detection: Ok (Not Breached)
          OK |    0 | SD Card 0 [vFlash] is Absent
 
-----------------------------------------------------------------------------
       Other messages  
 
=============================================================================
      STATE  |  MESSAGE TEXT  
 
---------+-------------------------------------------------------------------
          OK | ESM log health is Ok (less than 80% full)
          OK | Chassis Service Tag is sane

This run exits with 1 (WARNING).

We're not sure we agree with the decision to make the fact that a disk
is not Dell Certified a Warning, but we can at least understand that. 
So, what if we exclude storage, with --no-storage?

    onlight at monitor:~$ check_openmanage -H host -C secret -d --no-storage
       System:      PowerEdge R720           OMSA version:    7.1.0
       ServiceTag:  #######                  Plugin version:  3.7.9
       BIOS/date:   1.2.6 05/10/2012         Checking mode:   SNMPv2c 
UDP/IPv4
 
-----------------------------------------------------------------------------
       Chassis Components  
 
=============================================================================
      STATE  |  ID  |  MESSAGE TEXT  
 
---------+------+------------------------------------------------------------
          OK |    0 | Memory module 0 [DIMM_A1, 4096 MB] is Ok
          OK |    1 | Memory module 1 [DIMM_A2, 4096 MB] is Ok
          OK |    2 | Memory module 2 [DIMM_A3, 4096 MB] is Ok
          OK |    3 | Memory module 3 [DIMM_A4, 4096 MB] is Ok
          OK |    0 | Chassis fan 0 [System Board Fan1 RPM] reading: 1080 
RPM
          OK |    1 | Chassis fan 1 [System Board Fan2 RPM] reading: 1080 
RPM
          OK |    2 | Chassis fan 2 [System Board Fan3 RPM] reading: 1200 
RPM
          OK |    3 | Chassis fan 3 [System Board Fan4 RPM] reading: 1080 
RPM
          OK |    4 | Chassis fan 4 [System Board Fan5 RPM] reading: 1080 
RPM
          OK |    5 | Chassis fan 5 [System Board Fan6 RPM] reading: 1080 
RPM
          OK |    0 | Power Supply 0 [AC]: Presence detected
          OK |    0 | Temperature Probe 0 [System Board Inlet Temp] reads 
26 C (min=3/-7, max=42/47)
          OK |    1 | Temperature Probe 1 [System Board Exhaust Temp] 
reads 33 C (min=8/3, max=70/75)
          OK |    2 | Temperature Probe 2 [CPU1 Temp] reads 49 C (min=8/3, 
max=83/88)
          OK |    0 | Processor 0 [Intel Xeon E5-2603 0 1.80GHz] is 
Present
          OK |    0 | Voltage sensor 0 [CPU1 VCORE PG] is Good
          OK |    1 | Voltage sensor 1 [System Board 3.3V PG] is Good
          OK |    2 | Voltage sensor 2 [System Board 5V PG] is Good
          OK |    3 | Voltage sensor 3 [CPU1 PLL PG] is Good
          OK |    4 | Voltage sensor 4 [System Board 1.1V PG] is Good
          OK |    5 | Voltage sensor 5 [CPU1 M23 VDDQ PG] is Good
          OK |    6 | Voltage sensor 6 [CPU1 M23 VTT PG] is Good
          OK |    7 | Voltage sensor 7 [System Board FETDRV PG] is Good
          OK |    8 | Voltage sensor 8 [CPU1 VSA PG] is Good
          OK |    9 | Voltage sensor 9 [CPU1 M01 VDDQ PG] is Good
          OK |   10 | Voltage sensor 10 [System Board NDC PG] is Good
          OK |   11 | Voltage sensor 11 [CPU1 VTT PG] is Good
          OK |   12 | Voltage sensor 12 [System Board 1.5V PG] is Good
          OK |   13 | Voltage sensor 13 [PS2 PG Fail] is Good
          OK |   14 | Voltage sensor 14 [System Board PS1 PG Fail] is Good
          OK |   15 | Voltage sensor 15 [System Board BP1 5V PG] is Good
          OK |   16 | Voltage sensor 16 [CPU1 M01 VTT PG] is Good
          OK |   17 | Voltage sensor 17 [PS1 Voltage 1] reads 112 V
          OK |    0 | Battery probe 0 [System Board CMOS Battery] is 
Presence Detected
          OK |    0 | Amperage probe 0 [PS1 Current 1] reads 0.6 A
          OK |    1 | Amperage probe 1 [System Board Pwr Consumption] 
reads 56 W
          OK |    0 | Chassis intrusion 0 detection: Ok (Not Breached)
          OK |    0 | SD Card 0 [vFlash] is Absent
 
-----------------------------------------------------------------------------
       Other messages  
 
=============================================================================
      STATE  |  MESSAGE TEXT  
 
---------+-------------------------------------------------------------------
          OK | ESM log health is Ok (less than 80% full)
          OK | Chassis Service Tag is sane
    OOPS! Something is wrong with this server, but I don't know what. The 
global 
    system health status is WARNING, but every component check is OK. This 
may 
    be a bug in the Nagios plugin, please file a bug report.

This yields exit code 3 (UNKNOWN).

Now, just for argument's sake, let's say we obviate the check for
certified drives, by commenting out the       "workaround for OMSA 7.1.0
bug" code (just a handy little short-cut).  Here's what we get then:

    onlight at monitor:~$ check_openmanage -H host -C secret -d
       System:      PowerEdge R720           OMSA version:    7.1.0
       ServiceTag:  #######                  Plugin version:  3.7.9
       BIOS/date:   1.2.6 05/10/2012         Checking mode:   SNMPv2c 
UDP/IPv4
 
-----------------------------------------------------------------------------
       Storage Components  
 
=============================================================================
      STATE  |    ID    |  MESSAGE TEXT  
 
---------+----------+--------------------------------------------------------
          OK |        0 | Controller 0 [PERC H310 Mini] is Ready
     WARNING |  0:0:1:0 | Physical Disk 0:1:0 [Ata ST2000DM001-9YN164, 
2.0TB] on ctrl 0 is Online
     WARNING |  0:0:1:1 | Physical Disk 0:1:1 [Ata ST2000DM001-9YN164, 
2.0TB] on ctrl 0 is Online
          OK |      0:0 | Logical Drive '/dev/sda' [RAID-1, 1862.50 GB] is 
Ready
          OK |      0:0 | Connector 0 [SAS] on controller 0 is Ready
          OK |      0:1 | Connector 1 [SAS] on controller 0 is Ready
          OK |    0:0:1 | Enclosure 0:0:1 [Backplane] on controller 0 is 
Ready
 
-----------------------------------------------------------------------------
       Chassis Components  
 
=============================================================================
      STATE  |  ID  |  MESSAGE TEXT  
 
---------+------+------------------------------------------------------------
          OK |    0 | Memory module 0 [DIMM_A1, 4096 MB] is Ok
          OK |    1 | Memory module 1 [DIMM_A2, 4096 MB] is Ok
          OK |    2 | Memory module 2 [DIMM_A3, 4096 MB] is Ok
          OK |    3 | Memory module 3 [DIMM_A4, 4096 MB] is Ok
          OK |    0 | Chassis fan 0 [System Board Fan1 RPM] reading: 1080 
RPM
          OK |    1 | Chassis fan 1 [System Board Fan2 RPM] reading: 1200 
RPM
          OK |    2 | Chassis fan 2 [System Board Fan3 RPM] reading: 1200 
RPM
          OK |    3 | Chassis fan 3 [System Board Fan4 RPM] reading: 1080 
RPM
          OK |    4 | Chassis fan 4 [System Board Fan5 RPM] reading: 1080 
RPM
          OK |    5 | Chassis fan 5 [System Board Fan6 RPM] reading: 1200 
RPM
          OK |    0 | Power Supply 0 [AC]: Presence detected
          OK |    0 | Temperature Probe 0 [System Board Inlet Temp] reads 
26 C (min=3/-7, max=42/47)
          OK |    1 | Temperature Probe 1 [System Board Exhaust Temp] 
reads 33 C (min=8/3, max=70/75)
          OK |    2 | Temperature Probe 2 [CPU1 Temp] reads 48 C (min=8/3, 
max=83/88)
          OK |    0 | Processor 0 [Intel Xeon E5-2603 0 1.80GHz] is 
Present
          OK |    0 | Voltage sensor 0 [CPU1 VCORE PG] is Good
          OK |    1 | Voltage sensor 1 [System Board 3.3V PG] is Good
          OK |    2 | Voltage sensor 2 [System Board 5V PG] is Good
          OK |    3 | Voltage sensor 3 [CPU1 PLL PG] is Good
          OK |    4 | Voltage sensor 4 [System Board 1.1V PG] is Good
          OK |    5 | Voltage sensor 5 [CPU1 M23 VDDQ PG] is Good
          OK |    6 | Voltage sensor 6 [CPU1 M23 VTT PG] is Good
          OK |    7 | Voltage sensor 7 [System Board FETDRV PG] is Good
          OK |    8 | Voltage sensor 8 [CPU1 VSA PG] is Good
          OK |    9 | Voltage sensor 9 [CPU1 M01 VDDQ PG] is Good
          OK |   10 | Voltage sensor 10 [System Board NDC PG] is Good
          OK |   11 | Voltage sensor 11 [CPU1 VTT PG] is Good
          OK |   12 | Voltage sensor 12 [System Board 1.5V PG] is Good
          OK |   13 | Voltage sensor 13 [PS2 PG Fail] is Good
          OK |   14 | Voltage sensor 14 [System Board PS1 PG Fail] is Good
          OK |   15 | Voltage sensor 15 [System Board BP1 5V PG] is Good
          OK |   16 | Voltage sensor 16 [CPU1 M01 VTT PG] is Good
          OK |   17 | Voltage sensor 17 [PS1 Voltage 1] reads 114 V
          OK |    0 | Battery probe 0 [System Board CMOS Battery] is 
Presence Detected
          OK |    0 | Amperage probe 0 [PS1 Current 1] reads 0.6 A
          OK |    1 | Amperage probe 1 [System Board Pwr Consumption] 
reads 56 W
          OK |    0 | Chassis intrusion 0 detection: Ok (Not Breached)
          OK |    0 | SD Card 0 [vFlash] is Absent
 
-----------------------------------------------------------------------------
       Other messages  
 
=============================================================================
      STATE  |  MESSAGE TEXT  
 
---------+-------------------------------------------------------------------
          OK | ESM log health is Ok (less than 80% full)
          OK | Chassis Service Tag is sane

Again, as with the original case, exit code is 1 (WARNING).

Is there any way around this?  Should I be disabling global health
checks?  Here's a run to test that, and it works:

    onlight at monitor:~$ check_openmanage -H host -C secret -b pdisk=all
    OK - System: 'PowerEdge R720', SN: '#######', 16 GB ram (4 dimms), 1 
logical drives, 2 physical drives

Interestingly, when combining the blacklist with debug ("-d -b
pdisk=all"), the exit code is 3 (UNKNOWN), but with debug off, it's 0 
(OK).

So, I guess what I'm wondering is why we need to blacklist the physical
disks (pdisk) instead of using --no-storage?  Shouldn't --no-storage
also cause globalstatus to be ignored?

I can furnish SNMP walk output if that's useful.

Cheers,
    -nic

-- 
Nic Bernstein                             nic at onlight.com
Onlight, Inc.                             www.onlight.com
219 N. Milwaukee St., Suite 2a            v. 414.272.4477
Milwaukee, Wisconsin  53202

-------------- next part --------------
An HTML attachment was scrubbed...

------------------------------

------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev

------------------------------

_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users


End of Nagios-users Digest, Vol 85, Issue 6
*******************************************


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20130618/c54a926a/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:

Build for Windows Store.

http://p.sf.net/sfu/windows-dev2dev
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list