<tt>> Do you have this in nagios.cfg? > retain_state_information=1 </tt> <tt>yes, i do have that set</tt> From: nagios-users-request@lists.sourceforge.net To: nagios-users@lists.sourceforge.net, Date: 06/18/2013 01:56 PM Subject: Nagios-users Digest, Vol 85, Issue 6 <hr noshade> <tt>Send Nagios-users mailing list submissions to nagios-users@lists.sourceforge.net To subscribe or unsubscribe via the World Wide Web, visit </tt><a href="https://lists.sourceforge.net/lists/listinfo/nagios-users"><tt>https://lists.sourceforge.net/lists/listinfo/nagios-users</tt></a><tt> or, via email, send a message with subject or body 'help' to nagios-users-request@lists.sourceforge.net You can reach the person managing the list at nagios-users-owner@lists.sourceforge.net When replying, please edit your Subject line so it is more specific than "Re: Contents of Nagios-users digest..." Today's Topics: 1. reload appears to cause force of DOWN; SOFT; x to DOWN; HARD; 1 (Sean McKell) 2. Re: reload appears to cause force of DOWN; SOFT; x to DOWN; HARD; 1 (Travis Runyard) 3. Re: Issues with NEB modules breaking after restart (Andrew Widdersheim) 4. Functions to do Availibility in reporting (omar saddiki) 5. Fwd: Functions to do Availibility in reporting (omar saddiki) 6. Wmi (martin Rodriguez) 7. Re: Wmi (Sunil Sankar) 8. check_ntp_time offset unknown (Bennett, Jan) 9. Re: check_ntp_time offset unknown (Holger Wei?) 10. Re: check_ntp_time offset unknown (Giles Coochey) 11. Problem with check_openmanage plugin and storage (Nic Bernstein) ---------------------------------------------------------------------- Message: 1 Date: Thu, 13 Jun 2013 17:31:44 -0600 From: Sean McKell <mckell@us.ibm.com> Subject: [Nagios-users] reload appears to cause force of DOWN; SOFT; x to DOWN; HARD; 1 To: nagios-users@lists.sourceforge.net Message-ID: <OF17CEA331.79DB0522-ON87257B89.0080C0E1-87257B89.0081405C@us.ibm.com> Content-Type: text/plain; charset="us-ascii" Running 3.4.1: I see this strange anomaly, where a host check is in the middle of doing retries before hitting max_attempts, but after a server reload occurs, the next check is automatically forced to DOWN;HARD;1, as seen here: [2013-06-04 08:40:21] HOST ALERT: 5gt4;DOWN;SOFT;1;CRITICAL: Connection timed out to '' after 160 seconds (user 'chk'). Expected prompt not found. Last output was ''. [2013-06-04 08:47:18] HOST ALERT: 5gt4;DOWN;SOFT;2;CRITICAL: Connection timed out to '' after 160 seconds (user 'chk'). Expected prompt not found. Last output was ''. [2013-06-04 08:54:03] HOST ALERT: 5gt4;DOWN;SOFT;3;CRITICAL: Connection timed out to '' after 160 seconds (user 'chk'). Expected prompt not found. Last output was ''. (reload happens here) [2013-06-04 09:00:52] HOST ALERT: 5gt4;DOWN;HARD;1;CRITICAL: Connection timed out to '' after 160 seconds (user 'chk'). Expected prompt not found. Last output was ''. Why is it skipping the rest of the attempts and going straight to DOWN;HARD after the reload ? Seems like a bug to me. -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 2 Date: Thu, 13 Jun 2013 21:39:48 -0700 From: Travis Runyard <travisrunyard@gmail.com> Subject: Re: [Nagios-users] reload appears to cause force of DOWN; SOFT; x to DOWN; HARD; 1 To: Nagios Users List <nagios-users@lists.sourceforge.net> Message-ID: <CANCZ1yG6CYiE2GYL3j5W3Gj9WjrTz4SmGONnaZUxbL5piUB=zA@mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" Do you have this in nagios.cfg? retain_state_information=1 On Thu, Jun 13, 2013 at 4:31 PM, Sean McKell <mckell@us.ibm.com> wrote: > Running 3.4.1: > I see this strange anomaly, where a host check is in the middle of doing > retries before hitting max_attempts, but after a server reload occurs, the > next check is automatically forced to DOWN;HARD;1, as seen here: > > [2013-06-04 08:40:21] HOST ALERT: 5gt4;DOWN;SOFT;1;CRITICAL: Connection > timed out to '' after 160 seconds (user 'chk'). Expected prompt not found. > Last output was ''. > [2013-06-04 08:47:18] HOST ALERT: 5gt4;DOWN;SOFT;2;CRITICAL: Connection > timed out to '' after 160 seconds (user 'chk'). Expected prompt not found. > Last output was ''. > [2013-06-04 08:54:03] HOST ALERT: 5gt4;DOWN;SOFT;3;CRITICAL: Connection > timed out to '' after 160 seconds (user 'chk'). Expected prompt not found. > Last output was ''. > (reload happens here) > [2013-06-04 09:00:52] HOST ALERT: 5gt4;DOWN;HARD;1;CRITICAL: Connection > timed out to '' after 160 seconds (user 'chk'). Expected prompt not found. > Last output was ''. > > Why is it skipping the rest of the attempts and going straight to > DOWN;HARD after the reload ? > Seems like a bug to me. > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > </tt><a href="http://p.sf.net/sfu/windows-dev2dev"><tt>http://p.sf.net/sfu/windows-dev2dev</tt></a><tt> > _______________________________________________ > Nagios-users mailing list > Nagios-users@lists.sourceforge.net > </tt><a href="https://lists.sourceforge.net/lists/listinfo/nagios-users"><tt>https://lists.sourceforge.net/lists/listinfo/nagios-users</tt></a><tt> > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null > -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 3 Date: Fri, 14 Jun 2013 13:03:56 -0400 From: Andrew Widdersheim <awiddersheim@hotmail.com> Subject: Re: [Nagios-users] Issues with NEB modules breaking after restart To: "nagios-users@lists.sourceforge.net" <nagios-users@lists.sourceforge.net> Message-ID: <SNT143-W535DE68DF8CC060F587EF0DD800@phx.gbl> Content-Type: text/plain; charset="iso-8859-1" <div>To answer my own question... I'm pretty sure two nagios instances were spawned at once. The nagios init script that comes with nagios-core is the best at handling this situation.</div> ------------------------------ Message: 4 Date: Mon, 17 Jun 2013 15:21:37 +0000 From: omar saddiki <omar.saddiki@gmail.com> Subject: [Nagios-users] Functions to do Availibility in reporting To: Nagios Users List <nagios-users@lists.sourceforge.net> Message-ID: <CAN5T1CHYs_w4t0=muvDosc+KsjsLf5yW305X3-K1ZrkVtPNGgQ@mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" Hi, Please, someone can give me the function used by Nagios in reporting onglet to extract the availibility between two times. Regards SADDIKI -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 5 Date: Mon, 17 Jun 2013 15:42:17 +0000 From: omar saddiki <omar.saddiki@gmail.com> Subject: [Nagios-users] Fwd: Functions to do Availibility in reporting To: Nagios Users List <nagios-users@lists.sourceforge.net> Message-ID: <CAN5T1CHOYvGnu8Z8Q_bbrtJe8A7=phdCNErWmN9cAjX59eU8wA@mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" Hi, Please, someone can give me the function used by Nagios in reporting onglet to extract the availibility between two times. Regards SADDIKI -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 6 Date: Mon, 17 Jun 2013 15:14:24 -0300 From: martin Rodriguez <maestin@gmail.com> Subject: [Nagios-users] Wmi To: nagios-users@lists.sourceforge.net Message-ID: <CACrJBAsbWM8wVuPasjJQp0VumJZw5aj_qN6DGS+OHeZTMfmEXg@mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" Hi I am installing Nagios 3.4.3 on ubuntu and I can not configure the plugin check_wmi_plus.conf someone had expereince in this topic -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 7 Date: Tue, 18 Jun 2013 00:14:07 +0530 From: Sunil Sankar <sunil@sunil.cc> Subject: Re: [Nagios-users] Wmi To: Nagios Users List <nagios-users@lists.sourceforge.net> Message-ID: <CAPqUM3W+mo5bRRoi6dxAwSdLPs87poqqQZHiJdQWVDh-7c5QhA@mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1" What is the error you are getting On Mon, Jun 17, 2013 at 11:44 PM, martin Rodriguez <maestin@gmail.com>wrote: > Hi I am installing Nagios 3.4.3 on ubuntu and I can not configure the > plugin check_wmi_plus.conf someone had expereince in this topic > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > </tt><a href="http://p.sf.net/sfu/windows-dev2dev"><tt>http://p.sf.net/sfu/windows-dev2dev</tt></a><tt> > _______________________________________________ > Nagios-users mailing list > Nagios-users@lists.sourceforge.net > </tt><a href="https://lists.sourceforge.net/lists/listinfo/nagios-users"><tt>https://lists.sourceforge.net/lists/listinfo/nagios-users</tt></a><tt> > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null > -- Regards Sunil Sankar -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 8 Date: Fri, 14 Jun 2013 14:10:43 +0000 From: "Bennett, Jan" <JBennett@ntta.org> Subject: [Nagios-users] check_ntp_time offset unknown To: "'nagios-users@lists.sourceforge.net'" <nagios-users@lists.sourceforge.net> Message-ID: <E11B0F59D3334D469B36FCA07490BA8C186E67EF@NTTAEXMB01.ntta.local> Content-Type: text/plain; charset="us-ascii" We have implemented a NTP sync check in all of the NRDS checks that we are rolling out right now but I've run into a bit of a snag. I am getting returns of 'Offset Unknown' on all clients. It appears to only happen for a short period of time (30 min or so) and then it will clear its self up for a bit but the issue will always return. >From the client that is reporting the unknown offset, I can run the following: # ./check_ntp_time -H localhost NTP CRITICAL: Offset unknown| # ./check_ntp_time -V check_ntp_time v1.4.16 (nagios-plugins 1.4.16) # ntpdc -p remote local st poll reach delay offset disp ======================================================================= =LOCAL(0) 127.0.0.1 10 64 17 0.00000 0.000000 0.96858 *timeserver1 xxx.xxx.xxx.xxx 2 64 17 0.00098 4.956048 0.00580 # /usr/local/nagios/libexec/check_ntp_time -v -H localhost sending request to peer 0 response from peer 0: offset -2.777669579e-07 sending request to peer 0 response from peer 0: offset -2.161832526e-07 sending request to peer 0 response from peer 0: offset -4.009343684e-07 sending request to peer 0 response from peer 0: offset -1.987209544e-07 discarding peer 0: stratum=0 overall average offset: 0 NTP CRITICAL: Offset unknown| In my searches, I noticed a number of people reporting the same issue with the supposed solution being to update your Nagios plugins to 1.4.13. I have done so and am now running 1.4.16 without any change in the service check. Also, I am unable to check a remote NTP server from these clients as they do not have access to the outside world. It has been suggested that the stratum=0 may be the culprit, but I'm not sure of my options here. Any help would be greatly appreciated. Jan -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 9 Date: Tue, 18 Jun 2013 17:24:50 +0200 From: Holger Wei? <holger@cis.fu-berlin.de> Subject: Re: [Nagios-users] check_ntp_time offset unknown To: Nagios Users <nagios-users@lists.sourceforge.net> Message-ID: <20130618152450.GA678632@zedat.fu-berlin.de> Content-Type: text/plain; charset=iso-8859-1 * Bennett, Jan <JBennett@ntta.org> [2013-06-14 14:10]: > # ./check_ntp_time -H localhost > NTP CRITICAL: Offset unknown| Could you please run "ntpq -c rv" when this happens and post the output? > It has been suggested that the stratum=0 may be the culprit, but I'm not sure of my options here. Yes, stratum=0 is the culprit. An NTP server wouldn't usually report such a stratum value. Holger -- Holger Wei? | Freie Universit?t Berlin holger@zedat.fu-berlin.de | Zentraleinrichtung f?r Datenverarbeitung (ZEDAT) Telefon: +49 30 838-55949 | Fabeckstra?e 32, 14195 Berlin (Germany) Telefax: +49 30 838455949 | </tt><a href="https://www.zedat.fu-berlin.de/"><tt>https://www.zedat.fu-berlin.de/</tt></a><tt> ------------------------------ Message: 10 Date: Tue, 18 Jun 2013 16:35:03 +0100 From: Giles Coochey <giles@coochey.net> Subject: Re: [Nagios-users] check_ntp_time offset unknown To: nagios-users@lists.sourceforge.net Message-ID: <51C07E27.7000400@coochey.net> Content-Type: text/plain; charset="iso-8859-1" On 14/06/2013 15:10, Bennett, Jan wrote: > > We have implemented a NTP sync check in all of the NRDS checks that we > are rolling out right now but I've run into a bit of a snag. > > I am getting returns of 'Offset Unknown' on all clients. It appears > to only happen for a short period of time (30 min or so) and then it > will clear its self up for a bit but the issue will always return. > > From the client that is reporting the unknown offset, I can run the > following: > > # ./check_ntp_time -H localhost > NTP CRITICAL: Offset unknown| > # ./check_ntp_time -V > check_ntp_time v1.4.16 (nagios-plugins 1.4.16) > # ntpdc -p > remote local st poll reach delay offset disp > ======================================================================= > =LOCAL(0) 127.0.0.1 10 64 17 0.00000 0.000000 0.96858 > *timeserver1 xxx.xxx.xxx.xxx 2 64 17 0.00098 4.956048 0.00580 > > # /usr/local/nagios/libexec/check_ntp_time -v -H localhost > sending request to peer 0 > response from peer 0: offset -2.777669579e-07 > sending request to peer 0 > response from peer 0: offset -2.161832526e-07 > sending request to peer 0 > response from peer 0: offset -4.009343684e-07 > sending request to peer 0 > response from peer 0: offset -1.987209544e-07 > discarding peer 0: stratum=0 > overall average offset: 0 > NTP CRITICAL: Offset unknown| > > In my searches, I noticed a number of people reporting the same issue > with the supposed solution being to update your Nagios plugins to > 1.4.13. I have done so and am now running 1.4.16 without any change > in the service check. > > Also, I am unable to check a remote NTP server from these clients as > they do not have access to the outside world. > > It has been suggested that the stratum=0 may be the culprit, but I'm > not sure of my options here. > > Any help would be greatly appreciated. > > I get this shortly after a NTP client has booted up. Once NTP has been running for a while it goes away. -- Regards, Giles Coochey, CCNP, CCNA, CCNAS NetSecSpec Ltd +44 (0) 7983 877438 </tt><a href=http://www.coochey.net/><tt>http://www.coochey.net</tt></a><tt> </tt><a href=http://www.netsecspec.co.uk/><tt>http://www.netsecspec.co.uk</tt></a><tt> giles@coochey.net -------------- next part -------------- An HTML attachment was scrubbed... -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4968 bytes Desc: S/MIME Cryptographic Signature ------------------------------ Message: 11 Date: Tue, 18 Jun 2013 11:03:32 -0500 From: Nic Bernstein <nic@onlight.com> Subject: [Nagios-users] Problem with check_openmanage plugin and storage To: nagios-users@lists.sourceforge.net Message-ID: <51C084D4.8020104@onlight.com> Content-Type: text/plain; charset="utf-8" We've recently been experimenting with Trond Hasle Amundsen's check_openmanage on a large network with about a hundred Dell servers of various ages, capabilities, etc. Mostly PE-2950, R210, R410 and R720. Much thanks to Trond for all his great work on Nagios plugins and other projects, by the way. We've hit a wall, however, with the storage monitoring aspects of this plugin. For example, here's a quite specific case. This is a new PE R720, in debug: onlight@monitor:~$ check_openmanage -H host -C secret -d System: PowerEdge R720 OMSA version: 7.1.0 ServiceTag: ####### Plugin version: 3.7.9 BIOS/date: 1.2.6 05/10/2012 Checking mode: SNMPv2c UDP/IPv4 ----------------------------------------------------------------------------- Storage Components ============================================================================= STATE | ID | MESSAGE TEXT ---------+----------+-------------------------------------------------------- OK | 0 | Controller 0 [PERC H310 Mini] is Ready WARNING | 0:0:1:0 | Physical Disk 0:1:0 [Ata ST2000DM001-9YN164, 2.0TB] on ctrl 0 is Online, Not Certified WARNING | 0:0:1:1 | Physical Disk 0:1:1 [Ata ST2000DM001-9YN164, 2.0TB] on ctrl 0 is Online, Not Certified OK | 0:0 | Logical Drive '/dev/sda' [RAID-1, 1862.50 GB] is Ready OK | 0:0 | Connector 0 [SAS] on controller 0 is Ready OK | 0:1 | Connector 1 [SAS] on controller 0 is Ready OK | 0:0:1 | Enclosure 0:0:1 [Backplane] on controller 0 is Ready ----------------------------------------------------------------------------- Chassis Components ============================================================================= STATE | ID | MESSAGE TEXT ---------+------+------------------------------------------------------------ OK | 0 | Memory module 0 [DIMM_A1, 4096 MB] is Ok OK | 1 | Memory module 1 [DIMM_A2, 4096 MB] is Ok OK | 2 | Memory module 2 [DIMM_A3, 4096 MB] is Ok OK | 3 | Memory module 3 [DIMM_A4, 4096 MB] is Ok OK | 0 | Chassis fan 0 [System Board Fan1 RPM] reading: 1200 RPM OK | 1 | Chassis fan 1 [System Board Fan2 RPM] reading: 1080 RPM OK | 2 | Chassis fan 2 [System Board Fan3 RPM] reading: 1200 RPM OK | 3 | Chassis fan 3 [System Board Fan4 RPM] reading: 1080 RPM OK | 4 | Chassis fan 4 [System Board Fan5 RPM] reading: 1080 RPM OK | 5 | Chassis fan 5 [System Board Fan6 RPM] reading: 1080 RPM OK | 0 | Power Supply 0 [AC]: Presence detected OK | 0 | Temperature Probe 0 [System Board Inlet Temp] reads 26 C (min=3/-7, max=42/47) OK | 1 | Temperature Probe 1 [System Board Exhaust Temp] reads 33 C (min=8/3, max=70/75) OK | 2 | Temperature Probe 2 [CPU1 Temp] reads 49 C (min=8/3, max=83/88) OK | 0 | Processor 0 [Intel Xeon E5-2603 0 1.80GHz] is Present OK | 0 | Voltage sensor 0 [CPU1 VCORE PG] is Good OK | 1 | Voltage sensor 1 [System Board 3.3V PG] is Good OK | 2 | Voltage sensor 2 [System Board 5V PG] is Good OK | 3 | Voltage sensor 3 [CPU1 PLL PG] is Good OK | 4 | Voltage sensor 4 [System Board 1.1V PG] is Good OK | 5 | Voltage sensor 5 [CPU1 M23 VDDQ PG] is Good OK | 6 | Voltage sensor 6 [CPU1 M23 VTT PG] is Good OK | 7 | Voltage sensor 7 [System Board FETDRV PG] is Good OK | 8 | Voltage sensor 8 [CPU1 VSA PG] is Good OK | 9 | Voltage sensor 9 [CPU1 M01 VDDQ PG] is Good OK | 10 | Voltage sensor 10 [System Board NDC PG] is Good OK | 11 | Voltage sensor 11 [CPU1 VTT PG] is Good OK | 12 | Voltage sensor 12 [System Board 1.5V PG] is Good OK | 13 | Voltage sensor 13 [PS2 PG Fail] is Good OK | 14 | Voltage sensor 14 [System Board PS1 PG Fail] is Good OK | 15 | Voltage sensor 15 [System Board BP1 5V PG] is Good OK | 16 | Voltage sensor 16 [CPU1 M01 VTT PG] is Good OK | 17 | Voltage sensor 17 [PS1 Voltage 1] reads 114 V OK | 0 | Battery probe 0 [System Board CMOS Battery] is Presence Detected OK | 0 | Amperage probe 0 [PS1 Current 1] reads 0.6 A OK | 1 | Amperage probe 1 [System Board Pwr Consumption] reads 56 W OK | 0 | Chassis intrusion 0 detection: Ok (Not Breached) OK | 0 | SD Card 0 [vFlash] is Absent ----------------------------------------------------------------------------- Other messages ============================================================================= STATE | MESSAGE TEXT ---------+------------------------------------------------------------------- OK | ESM log health is Ok (less than 80% full) OK | Chassis Service Tag is sane This run exits with 1 (WARNING). We're not sure we agree with the decision to make the fact that a disk is not Dell Certified a Warning, but we can at least understand that. So, what if we exclude storage, with --no-storage? onlight@monitor:~$ check_openmanage -H host -C secret -d --no-storage System: PowerEdge R720 OMSA version: 7.1.0 ServiceTag: ####### Plugin version: 3.7.9 BIOS/date: 1.2.6 05/10/2012 Checking mode: SNMPv2c UDP/IPv4 ----------------------------------------------------------------------------- Chassis Components ============================================================================= STATE | ID | MESSAGE TEXT ---------+------+------------------------------------------------------------ OK | 0 | Memory module 0 [DIMM_A1, 4096 MB] is Ok OK | 1 | Memory module 1 [DIMM_A2, 4096 MB] is Ok OK | 2 | Memory module 2 [DIMM_A3, 4096 MB] is Ok OK | 3 | Memory module 3 [DIMM_A4, 4096 MB] is Ok OK | 0 | Chassis fan 0 [System Board Fan1 RPM] reading: 1080 RPM OK | 1 | Chassis fan 1 [System Board Fan2 RPM] reading: 1080 RPM OK | 2 | Chassis fan 2 [System Board Fan3 RPM] reading: 1200 RPM OK | 3 | Chassis fan 3 [System Board Fan4 RPM] reading: 1080 RPM OK | 4 | Chassis fan 4 [System Board Fan5 RPM] reading: 1080 RPM OK | 5 | Chassis fan 5 [System Board Fan6 RPM] reading: 1080 RPM OK | 0 | Power Supply 0 [AC]: Presence detected OK | 0 | Temperature Probe 0 [System Board Inlet Temp] reads 26 C (min=3/-7, max=42/47) OK | 1 | Temperature Probe 1 [System Board Exhaust Temp] reads 33 C (min=8/3, max=70/75) OK | 2 | Temperature Probe 2 [CPU1 Temp] reads 49 C (min=8/3, max=83/88) OK | 0 | Processor 0 [Intel Xeon E5-2603 0 1.80GHz] is Present OK | 0 | Voltage sensor 0 [CPU1 VCORE PG] is Good OK | 1 | Voltage sensor 1 [System Board 3.3V PG] is Good OK | 2 | Voltage sensor 2 [System Board 5V PG] is Good OK | 3 | Voltage sensor 3 [CPU1 PLL PG] is Good OK | 4 | Voltage sensor 4 [System Board 1.1V PG] is Good OK | 5 | Voltage sensor 5 [CPU1 M23 VDDQ PG] is Good OK | 6 | Voltage sensor 6 [CPU1 M23 VTT PG] is Good OK | 7 | Voltage sensor 7 [System Board FETDRV PG] is Good OK | 8 | Voltage sensor 8 [CPU1 VSA PG] is Good OK | 9 | Voltage sensor 9 [CPU1 M01 VDDQ PG] is Good OK | 10 | Voltage sensor 10 [System Board NDC PG] is Good OK | 11 | Voltage sensor 11 [CPU1 VTT PG] is Good OK | 12 | Voltage sensor 12 [System Board 1.5V PG] is Good OK | 13 | Voltage sensor 13 [PS2 PG Fail] is Good OK | 14 | Voltage sensor 14 [System Board PS1 PG Fail] is Good OK | 15 | Voltage sensor 15 [System Board BP1 5V PG] is Good OK | 16 | Voltage sensor 16 [CPU1 M01 VTT PG] is Good OK | 17 | Voltage sensor 17 [PS1 Voltage 1] reads 112 V OK | 0 | Battery probe 0 [System Board CMOS Battery] is Presence Detected OK | 0 | Amperage probe 0 [PS1 Current 1] reads 0.6 A OK | 1 | Amperage probe 1 [System Board Pwr Consumption] reads 56 W OK | 0 | Chassis intrusion 0 detection: Ok (Not Breached) OK | 0 | SD Card 0 [vFlash] is Absent ----------------------------------------------------------------------------- Other messages ============================================================================= STATE | MESSAGE TEXT ---------+------------------------------------------------------------------- OK | ESM log health is Ok (less than 80% full) OK | Chassis Service Tag is sane OOPS! Something is wrong with this server, but I don't know what. The global system health status is WARNING, but every component check is OK. This may be a bug in the Nagios plugin, please file a bug report. This yields exit code 3 (UNKNOWN). Now, just for argument's sake, let's say we obviate the check for certified drives, by commenting out the "workaround for OMSA 7.1.0 bug" code (just a handy little short-cut). Here's what we get then: onlight@monitor:~$ check_openmanage -H host -C secret -d System: PowerEdge R720 OMSA version: 7.1.0 ServiceTag: ####### Plugin version: 3.7.9 BIOS/date: 1.2.6 05/10/2012 Checking mode: SNMPv2c UDP/IPv4 ----------------------------------------------------------------------------- Storage Components ============================================================================= STATE | ID | MESSAGE TEXT ---------+----------+-------------------------------------------------------- OK | 0 | Controller 0 [PERC H310 Mini] is Ready WARNING | 0:0:1:0 | Physical Disk 0:1:0 [Ata ST2000DM001-9YN164, 2.0TB] on ctrl 0 is Online WARNING | 0:0:1:1 | Physical Disk 0:1:1 [Ata ST2000DM001-9YN164, 2.0TB] on ctrl 0 is Online OK | 0:0 | Logical Drive '/dev/sda' [RAID-1, 1862.50 GB] is Ready OK | 0:0 | Connector 0 [SAS] on controller 0 is Ready OK | 0:1 | Connector 1 [SAS] on controller 0 is Ready OK | 0:0:1 | Enclosure 0:0:1 [Backplane] on controller 0 is Ready ----------------------------------------------------------------------------- Chassis Components ============================================================================= STATE | ID | MESSAGE TEXT ---------+------+------------------------------------------------------------ OK | 0 | Memory module 0 [DIMM_A1, 4096 MB] is Ok OK | 1 | Memory module 1 [DIMM_A2, 4096 MB] is Ok OK | 2 | Memory module 2 [DIMM_A3, 4096 MB] is Ok OK | 3 | Memory module 3 [DIMM_A4, 4096 MB] is Ok OK | 0 | Chassis fan 0 [System Board Fan1 RPM] reading: 1080 RPM OK | 1 | Chassis fan 1 [System Board Fan2 RPM] reading: 1200 RPM OK | 2 | Chassis fan 2 [System Board Fan3 RPM] reading: 1200 RPM OK | 3 | Chassis fan 3 [System Board Fan4 RPM] reading: 1080 RPM OK | 4 | Chassis fan 4 [System Board Fan5 RPM] reading: 1080 RPM</tt> <tt> OK | 5 | Chassis fan 5 [System Board Fan6 RPM] reading: 1200 RPM OK | 0 | Power Supply 0 [AC]: Presence detected OK | 0 | Temperature Probe 0 [System Board Inlet Temp] reads 26 C (min=3/-7, max=42/47) OK | 1 | Temperature Probe 1 [System Board Exhaust Temp] reads 33 C (min=8/3, max=70/75) OK | 2 | Temperature Probe 2 [CPU1 Temp] reads 48 C (min=8/3, max=83/88) OK | 0 | Processor 0 [Intel Xeon E5-2603 0 1.80GHz] is Present OK | 0 | Voltage sensor 0 [CPU1 VCORE PG] is Good OK | 1 | Voltage sensor 1 [System Board 3.3V PG] is Good OK | 2 | Voltage sensor 2 [System Board 5V PG] is Good OK | 3 | Voltage sensor 3 [CPU1 PLL PG] is Good OK | 4 | Voltage sensor 4 [System Board 1.1V PG] is Good OK | 5 | Voltage sensor 5 [CPU1 M23 VDDQ PG] is Good OK | 6 | Voltage sensor 6 [CPU1 M23 VTT PG] is Good OK | 7 | Voltage sensor 7 [System Board FETDRV PG] is Good OK | 8 | Voltage sensor 8 [CPU1 VSA PG] is Good OK | 9 | Voltage sensor 9 [CPU1 M01 VDDQ PG] is Good OK | 10 | Voltage sensor 10 [System Board NDC PG] is Good OK | 11 | Voltage sensor 11 [CPU1 VTT PG] is Good OK | 12 | Voltage sensor 12 [System Board 1.5V PG] is Good OK | 13 | Voltage sensor 13 [PS2 PG Fail] is Good OK | 14 | Voltage sensor 14 [System Board PS1 PG Fail] is Good OK | 15 | Voltage sensor 15 [System Board BP1 5V PG] is Good OK | 16 | Voltage sensor 16 [CPU1 M01 VTT PG] is Good OK | 17 | Voltage sensor 17 [PS1 Voltage 1] reads 114 V OK | 0 | Battery probe 0 [System Board CMOS Battery] is Presence Detected OK | 0 | Amperage probe 0 [PS1 Current 1] reads 0.6 A OK | 1 | Amperage probe 1 [System Board Pwr Consumption] reads 56 W OK | 0 | Chassis intrusion 0 detection: Ok (Not Breached) OK | 0 | SD Card 0 [vFlash] is Absent ----------------------------------------------------------------------------- Other messages ============================================================================= STATE | MESSAGE TEXT ---------+------------------------------------------------------------------- OK | ESM log health is Ok (less than 80% full) OK | Chassis Service Tag is sane Again, as with the original case, exit code is 1 (WARNING). Is there any way around this? Should I be disabling global health checks? Here's a run to test that, and it works: onlight@monitor:~$ check_openmanage -H host -C secret -b pdisk=all OK - System: 'PowerEdge R720', SN: '#######', 16 GB ram (4 dimms), 1 logical drives, 2 physical drives Interestingly, when combining the blacklist with debug ("-d -b pdisk=all"), the exit code is 3 (UNKNOWN), but with debug off, it's 0 (OK). So, I guess what I'm wondering is why we need to blacklist the physical disks (pdisk) instead of using --no-storage? Shouldn't --no-storage also cause globalstatus to be ignored? I can furnish SNMP walk output if that's useful. Cheers, -nic -- Nic Bernstein nic@onlight.com Onlight, Inc. </tt><a href=www.onlight.com><tt>www.onlight.com</tt></a><tt> 219 N. Milwaukee St., Suite 2a v. 414.272.4477 Milwaukee, Wisconsin 53202 -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. </tt><a href="http://p.sf.net/sfu/windows-dev2dev"><tt>http://p.sf.net/sfu/windows-dev2dev</tt></a><tt> ------------------------------ _______________________________________________ Nagios-users mailing list Nagios-users@lists.sourceforge.net </tt><a href="https://lists.sourceforge.net/lists/listinfo/nagios-users"><tt>https://lists.sourceforge.net/lists/listinfo/nagios-users</tt></a><tt> End of Nagios-users Digest, Vol 85, Issue 6 ******************************************* </tt>