From marcosjr at dee.feis.unesp.br Mon Jun 3 15:28:50 2013 From: marcosjr at dee.feis.unesp.br (Marcos Renato da Silva Junior) Date: Mon, 03 Jun 2013 10:28:50 -0300 Subject: Nagios nobreak SMS Message-ID: <51AC9A12.1010107@dee.feis.unesp.br> Hi, You can monitor a UPS with USB connection? Using the software SMSPower View (./powerview-c) Thanks, Marcos. ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2 _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From marcosjr at dee.feis.unesp.br Mon Jun 3 15:37:16 2013 From: marcosjr at dee.feis.unesp.br (Marcos Renato da Silva Junior) Date: Mon, 03 Jun 2013 10:37:16 -0300 Subject: Nagios nobreak SMS Message-ID: <51AC9C0C.4080802@dee.feis.unesp.br> Hi, You can monitor a UPS with USB connection? Using the software SMSPower View (./powerview-c) Thanks, Marcos. ------------------------------------------------------------------------------ Get 100% visibility into Java/.NET code with AppDynamics Lite It's a free troubleshooting tool designed for production Get down to code-level detail for bottlenecks, with <2% overhead. Download for free and get started troubleshooting in minutes. http://p.sf.net/sfu/appdyn_d2d_ap2 _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From ae at op5.se Mon Jun 3 22:26:32 2013 From: ae at op5.se (Andreas Ericsson) Date: Mon, 03 Jun 2013 22:26:32 +0200 Subject: [Nagios-users] Questions about Nagios quick search In-Reply-To: References: <512FD0E3.8080307@freesources.org> <51306147.1080509@op5.se> Message-ID: <51ACFBF8.8080906@op5.se> Would you care to forward the patch so it applies to Nagios 4 as well? Otherwise it's likely to get dropped on the floor I'm afraid. On 05/29/2013 06:17 PM, Jonas Meurer wrote: > Hello, > > Am 2013-03-01 09:05, schrieb Andreas Ericsson: >> On 02/28/2013 10:49 PM, Jonas Meurer wrote: >>> Am 20.02.2013 16:13, schrieb Jonas Meurer: >>>> Hello, >>> >>> Hey again, >>> >>>> we're using Nagios as monitoring system for several hundred systems. >>>> While navigating through hosts and services, recently two questions >>>> regarding the quick search (in navigation bar) raised: >>>> >>>> 1/ Why doesn't nagios search for host aliases as well? Is it possible >>>> to enable alias searching? We're using rather short values for >>>> host_name, and tend to add information like server position to the >>>> alias. Thus searching for host_name and alias would be awesome for us. >>>> >> >> Today it's not possible to enable alias searching. Patches welcome. >> If you create one, please use some format that makes it possible to add >> searching on other fields as well, such as "alias~" or some such. >> >>>> 2/ When searching for IP addresses, only the first match is returned. >>>> In some cases (e.g. NRPE Port forwarding through firewall), several >>>> hosts have the same IP address. For these cases it's rather irritating, >>>> that only the first matching host is returned. >>>> >> >> Tru dat. Patches welcome. You'll want to find and remove the correct >> "break" statement, I guess. Other than that it shouldn't be much trouble. > > I finally managed to prepare a patch that fixes both shortcomings. It adds > two new configuration options to configure the behavior of the navigation > bar search: search for hostname only, or also for addresses, or also for > aliases. > > I reported the patch as feature request at > http://tracker.nagios.org/view.php?id=459 > > Kind regards, > jonas -- Andreas Ericsson andreas.ericsson at op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. ------------------------------------------------------------------------------ How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j From bruno at bmartins.eu Tue Jun 4 10:59:14 2013 From: bruno at bmartins.eu (Bruno Martins) Date: Tue, 04 Jun 2013 09:59:14 +0100 Subject: Nagios statusmap.cgi question Message-ID: Hi list, I have statusmap.cgi displayed on a TV screen, which works great with a scaling factor of 0.7. However, when statusmap.cgi refreshes itself (after a minute), it will go back to a scaling factor of 0.0. Is there any way to prevent this? Best regards, Bruno Martins ------------------------------------------------------------------------------ How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From jonas at freesources.org Tue Jun 4 15:43:25 2013 From: jonas at freesources.org (Jonas Meurer) Date: Tue, 04 Jun 2013 15:43:25 +0200 Subject: Nagios navbar search enhancement (Was: Questions about Nagios quick search) In-Reply-To: <51ACFBF8.8080906@op5.se> References: <512FD0E3.8080307@freesources.org> <51306147.1080509@op5.se> <51ACFBF8.8080906@op5.se> Message-ID: Hey, I just prepared a patch against git master (commit 758a64). I hope that it helps. Don't hesitate to ask if you've any questions. Also feel free to rename the config options if you don't like the names. The patch is attached. Kind regards, jonas Am 2013-06-03 22:26, schrieb Andreas Ericsson: > Would you care to forward the patch so it applies to Nagios 4 as well? > > Otherwise it's likely to get dropped on the floor I'm afraid. > > On 05/29/2013 06:17 PM, Jonas Meurer wrote: >> Hello, >> >> Am 2013-03-01 09:05, schrieb Andreas Ericsson: >>> On 02/28/2013 10:49 PM, Jonas Meurer wrote: >>>> Am 20.02.2013 16:13, schrieb Jonas Meurer: >>>>> Hello, >>>> >>>> Hey again, >>>> >>>>> we're using Nagios as monitoring system for several hundred >>>>> systems. >>>>> While navigating through hosts and services, recently two questions >>>>> regarding the quick search (in navigation bar) raised: >>>>> >>>>> 1/ Why doesn't nagios search for host aliases as well? Is it >>>>> possible >>>>> to enable alias searching? We're using rather short values for >>>>> host_name, and tend to add information like server position to the >>>>> alias. Thus searching for host_name and alias would be awesome for >>>>> us. >>>>> >>> >>> Today it's not possible to enable alias searching. Patches welcome. >>> If you create one, please use some format that makes it possible to >>> add >>> searching on other fields as well, such as "alias~" or some >>> such. >>> >>>>> 2/ When searching for IP addresses, only the first match is >>>>> returned. >>>>> In some cases (e.g. NRPE Port forwarding through firewall), several >>>>> hosts have the same IP address. For these cases it's rather >>>>> irritating, >>>>> that only the first matching host is returned. >>>>> >>> >>> Tru dat. Patches welcome. You'll want to find and remove the correct >>> "break" statement, I guess. Other than that it shouldn't be much >>> trouble. >> >> I finally managed to prepare a patch that fixes both shortcomings. It >> adds >> two new configuration options to configure the behavior of the >> navigation >> bar search: search for hostname only, or also for addresses, or also >> for >> aliases. >> >> I reported the patch as feature request at >> http://tracker.nagios.org/view.php?id=459 >> >> Kind regards, >> jonas -------------- next part -------------- A non-text attachment was scrubbed... Name: enhance_navbar_search.patch Type: text/x-diff Size: 4341 bytes Desc: not available URL: -------------- next part -------------- ------------------------------------------------------------------------------ How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From eriks at ssimicro.com Tue Jun 4 18:17:05 2013 From: eriks at ssimicro.com (Erik Sejr) Date: Tue, 04 Jun 2013 12:17:05 -0400 Subject: Nagios 3.5 incorrectly shows checks as being stale? In-Reply-To: <51AE117A.3060602@ssimicro.com> References: <51AE117A.3060602@ssimicro.com> Message-ID: <51AE1301.9090605@ssimicro.com> So not minutes after I sent this I found the problem. The clock on the server sending the checks was off. I have corrected it and things appear to be working now. Thanks. On 06/04/2013 12:10 PM, Erik Sejr wrote: > I'm hoping someone can help me out with a problem i've run into since > upgrading from nagios 3.4.1 to nagios 3.5.0. The log pretty much speaks > for its self: > > [1370361509] EXTERNAL COMMAND: > PROCESS_SERVICE_CHECK_RESULT;host-ldap;pLEASES;0;OK: 181 Total 119 > Active 65% Used > [1370361511] PASSIVE SERVICE CHECK: host-ldap;pLEASES;0;OK: 181 Total > 119 Active 65% Used > [1370361511] SERVICE ALERT: host-ldap;pLEASES;OK;SOFT;2;OK: 181 Total > 119 Active 65% Used > [1370361521] Warning: The results of service 'pLEASES' on host > 'host-ldap' are stale by 0d 0h 24m 4s (threshold=0d 0h 16m 0s). I'm > forcing an immediate check of the service. > > At 1370361509 A passive check comes in saying the service for this host > is good and everything is OK. 2 seconds later nagios sends alerts to > that effect. 10 seconds after that nagios says the results of the check > are stale by 24 minutes?!?! and does an active check. > > Something is very wrong here. Is this a bug or did something else change > in 3.5.0 that I have overlooked? > > Thanks, > Erik ------------------------------------------------------------------------------ How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From eriks at ssimicro.com Tue Jun 4 18:10:34 2013 From: eriks at ssimicro.com (Erik Sejr) Date: Tue, 04 Jun 2013 12:10:34 -0400 Subject: Nagios 3.5 incorrectly shows checks as being stale? Message-ID: <51AE117A.3060602@ssimicro.com> I'm hoping someone can help me out with a problem i've run into since upgrading from nagios 3.4.1 to nagios 3.5.0. The log pretty much speaks for its self: [1370361509] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;host-ldap;pLEASES;0;OK: 181 Total 119 Active 65% Used [1370361511] PASSIVE SERVICE CHECK: host-ldap;pLEASES;0;OK: 181 Total 119 Active 65% Used [1370361511] SERVICE ALERT: host-ldap;pLEASES;OK;SOFT;2;OK: 181 Total 119 Active 65% Used [1370361521] Warning: The results of service 'pLEASES' on host 'host-ldap' are stale by 0d 0h 24m 4s (threshold=0d 0h 16m 0s). I'm forcing an immediate check of the service. At 1370361509 A passive check comes in saying the service for this host is good and everything is OK. 2 seconds later nagios sends alerts to that effect. 10 seconds after that nagios says the results of the check are stale by 24 minutes?!?! and does an active check. Something is very wrong here. Is this a bug or did something else change in 3.5.0 that I have overlooked? Thanks, Erik ------------------------------------------------------------------------------ How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From s.shipway at auckland.ac.nz Wed Jun 5 04:04:16 2013 From: s.shipway at auckland.ac.nz (Steve Shipway) Date: Wed, 5 Jun 2013 02:04:16 +0000 Subject: VMWare monitoring agent for Nagios and MRTG Message-ID: <7294716191A1E142B80615ED2C633BCA6833411E@uxcn10-tdc02.UoA.auckland.ac.nz> So, I??m working on an agent to link VMware to both Nagios and MRTG. This is intended to allow us to get alarms from VMware into Nagios, including alerts on things like disk latency and CPU ready time; also to get graphs in MRTG for the main VC metrics without having to load up the whole VC client. The graphs also show percentages of configured maxima and split, which is easier to understand than the VC Client graphs. It is currently working in beta. The features are ?C ?? Runs as a daemon, polling every 5min (configurable) ?? Talks to the VirtualCentre to collect data on the entire datacentre ?? Outputs to MRTG (via rrdcached) and Nagios (via livestatus) (can disable either). Services in Nagios are configured as passive, with a freshness check to alert if the agent dies; in MRTG, data is pushed directly into the RRD, so MRTG never runs, only the RRD frontend (eg, Routers2). ?? Support for duplicate (DR) livestatus server to receive copies of status updates ?? Configurable thresholds for all metrics ?? Can create config files for both MRTG and Nagios dynamically, so as your farm structure changes, so does the monitoring. Config files are optimised for Routers2/RRD/MRTG setup. ?? Can validate guest identities via hostname, DNS, livestatus, etc ?? Logging to file, syslog, Nagios ?? Collects stats on disk use (traffic and latency per vdisk), network use (traffic per interface), CPU (including split) and memory (including split) ?? Collects info on Datastore usage, cluster balance, Virtualcentre alarms, guest up/down ?? Config files generated by TT2 templates, so you can modify them as required ?? Tested with VC holding 4 datacentres, 17 clusters, 51 hosts, 1300 guests with MRTG and Nagios, and it takes 5min to complete poll Is anyone interested in giving it a try, given that there is very sparse documentation for it as yet? If so, please contact me direct. This is the fully-rewritten child of the old check_vmware.pl plugin for Nagios/MRTG that I wrote, which was itself the child of check_esx3. Steve Routers2/MRTG/RRD graph for CPU hosts-maillstappprd01.its.cfg-_lstappprd01.its.auckland.ac.nz-cpu-combined-d -x3.png _____ Steve Shipway ITS Unix Services Design Lead University of Auckland, New Zealand Floor 1, 58 Symonds Street, Auckland Phone: +64 (0)9 3737599 ext 86487 DDI: +64 (0)9 923 6487 Mobile: +64 (0)21 753 189 Email: s.shipway at auckland.ac.nz P Please consider the environment before printing this e-mail : ???????????? ?????????????????????? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.png Type: image/png Size: 69886 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/x-pkcs7-signature Size: 5900 bytes Desc: not available URL: -------------- next part -------------- ------------------------------------------------------------------------------ How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From stevejenkins at gmail.com Wed Jun 5 05:53:18 2013 From: stevejenkins at gmail.com (Steve Jenkins) Date: Tue, 4 Jun 2013 20:53:18 -0700 Subject: Flapping SMTP timeout, started after reboot Message-ID: I did a kernel update on my Nagios box, and after the reboot it keeps reporting an SMTP timeout on one of our mail servers, then saying it's fine, then reporting a problem again. The nagios log shows: SERVICE ALERT: Titanium;SMTP;CRITICAL;SOFT;1;CRITICAL - Socket timeout after 10 seconds SERVICE ALERT: Titanium;SMTP;CRITICAL;SOFT;2;CRITICAL - Socket timeout after 10 seconds SERVICE ALERT: Titanium;SMTP;CRITICAL;HARD;3;CRITICAL - Socket timeout after 10 seconds and then I get an alert. If I go into the web interface and force a check, it will eventually come back OK. Also, any time I run "check_smtp -H hostname" from the command line, it works fine: # /nagios/libexec/check_smtp -H titanium SMTP OK - 0.060 sec. response time|time=0.060454s;;;0.000000 The machine it's checking isn't busy, the response time is always fast like above, and I'm not sure where to start looking for the issue. Anyone got any ideas? Thanks, SteveJ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From khofmann1403 at gmail.com Wed Jun 5 10:28:17 2013 From: khofmann1403 at gmail.com (Karl Hofmann) Date: Wed, 5 Jun 2013 10:28:17 +0200 Subject: how to avoid service dependency on host status Message-ID: Hello experts, We want to receive notifications from a passive service, even if the host is down. The passive check is performed when the system is booting up. And it sends a message which we'd like to receive as notification. This notification is not sent to us because it seems that there is a new feature in Nagios 3.0 that avoids sending notifications when the host is in status DOWN. This feature makes sense to me, but in this case, I prefer not to avoid it. Do you know any way of avoiding it with parametrization as an exception? Thanks in advance for your help. Regards, Felipe -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From mcarey at ucar.edu Wed Jun 5 18:01:58 2013 From: mcarey at ucar.edu (Maxwell Carey) Date: Wed, 05 Jun 2013 10:01:58 -0600 Subject: check_dhcp permissions issue even when run as root Message-ID: <51AF60F6.4000407@ucar.edu> I'm having a strange issue with the check_dhcp plugin v1.4.16 on SL6. Permissions on the file are 4750, owner is root, and group is nagios; the nagios user is a member of the nagios group. The partition was mounted with the suid option enabled. SELinux is disabled. $ ls -l /usr/local/nagios/libexec/check_dhcp -rwsr-x--- 1 root nagios 123851 Jun 3 12:08 /usr/local/nagios/libexec/check_dhcp When I run check_dhcp as the nagios user, I get the following error: $ /usr/local/nagios/libexec/check_dhcp Error: Could not bind socket to interface eth0. Check your privileges... Even when I run the plugin as the root user, I get the same error, so I suspect this is not really a file permissions issue. I found a similar post on this list from 2008 (http://sourceforge.net/mailarchive/message.php?msg_id=20407079), but when that poster changed the permissions to 0775 he was able to run the plugin successfully as root. But when I chmod 0775 and run as root I still get the same error. The last post in the thread simply says "was a permission issue on the file" with no additional explanation, which is oh so helpful. Does anyone have any idea what could cause this? --Max Carey ------------------------------------------------------------------------------ How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From valiyev at unicc.org Thu Jun 6 09:15:40 2013 From: valiyev at unicc.org (VALIYEV Ruslan) Date: Thu, 6 Jun 2013 07:15:40 +0000 Subject: rpmbuild of latest nagios-plugins fails Message-ID: <49EB0E0B-C548-4E39-9144-8962F9F1CEEB@unicc.org> Hi all, I'm getting an error during rpmbuild when trying to build rpm from latest nagios-plugins-HEAD.tar.gz Looks like this library is shipped with the plugins package(?) Any hints on how to get rid of the error? + /usr/lib/rpm/check-buildroot + /usr/lib/rpm/redhat/brp-compress + /usr/lib/rpm/redhat/brp-strip-static-archive /usr/bin/strip + /usr/lib/rpm/redhat/brp-strip-comment-note /usr/bin/strip /usr/bin/objdump + /usr/lib/rpm/brp-python-bytecompile + /usr/lib/rpm/redhat/brp-python-hardlink + /usr/lib/rpm/redhat/brp-java-repack-jars Processing files: nagios-plugins-1.4.16-1.x86_64 error: File not found: /root/rpmbuild/BUILDROOT/nagios-plugins-1.4.16-1.x86_64/usr/lib/nagios/plugins/libnpcommon.a Executing(%doc): /bin/sh -e /var/tmp/rpm-tmp.hZvPb5 + umask 022 + cd /root/rpmbuild/BUILD + cd nagios-plugins-1.4.16 + DOCDIR=/root/rpmbuild/BUILDROOT/nagios-plugins-1.4.16-1.x86_64/usr/share/doc/nagios-plugins-1.4.16 + export DOCDIR + rm -rf /root/rpmbuild/BUILDROOT/nagios-plugins-1.4.16-1.x86_64/usr/share/doc/nagios-plugins-1.4.16 + /bin/mkdir -p /root/rpmbuild/BUILDROOT/nagios-plugins-1.4.16-1.x86_64/usr/share/doc/nagios-plugins-1.4.16 + cp -pr CODING COPYING FAQ INSTALL LEGAL README REQUIREMENTS SUPPORT THANKS /root/rpmbuild/BUILDROOT/nagios-plugins-1.4.16-1.x86_64/usr/share/doc/nagios-plugins-1.4.16 + cp -pr ChangeLog command.cfg /root/rpmbuild/BUILDROOT/nagios-plugins-1.4.16-1.x86_64/usr/share/doc/nagios-plugins-1.4.16 + exit 0 RPM build errors: File not found: /root/rpmbuild/BUILDROOT/nagios-plugins-1.4.16-1.x86_64/usr/lib/nagios/plugins/libnpcommon.a Thanks. Ruslan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From kirill.bychkov at gmail.com Thu Jun 6 18:31:13 2013 From: kirill.bychkov at gmail.com (Kirill Bychkov) Date: Thu, 6 Jun 2013 20:31:13 +0400 Subject: nagios backdoor Message-ID: Hello list, I am client of Hetzner Online (http://hetzner.de) They are sent me email this following text (part): = At the end of last week, Hetzner technicians discovered a "backdoor" in one of our internal monitoring systems (Nagios). The malicious code used in the "backdoor" exclusively infects the RAM. First analysis suggests that the malicious code directly infiltrates running Apache and sshd processes. Here, the infection neither modifies the binaries of the service which has been compromised, nor does it restart the service which has been affected. = I wrote it just for information. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From mcarey at ucar.edu Thu Jun 6 19:03:52 2013 From: mcarey at ucar.edu (Maxwell Carey) Date: Thu, 06 Jun 2013 11:03:52 -0600 Subject: check_dhcp permissions issue even when run as root In-Reply-To: <51AF60F6.4000407@ucar.edu> References: <51AF60F6.4000407@ucar.edu> Message-ID: <51B0C0F8.3060907@ucar.edu> On 06/05/2013 10:01 AM, Maxwell Carey wrote: > When I run check_dhcp as the nagios user, I get the following error: > > $ /usr/local/nagios/libexec/check_dhcp > Error: Could not bind socket to interface eth0. Check your privileges... > > Even when I run the plugin as the root user, I get the same error, so I > suspect this is not really a file permissions issue. As I suspected, this had nothing to do with file permissions. I was misled by the second part of the error message, when I should have paid closer attention to the "Could not bind socket to interface eth0" part. Our Nagios host doesn't have an interface named eth0. Doh! When I add the -i flag, I can run the plugin successfully from the command line as root and as nagios. But as it turns out, I diagnosed the initial issue incorrectly. Nagios was actually running check_dhcp with the correct interface name, but still spitting out an error about permissions. The cause? I recently upgraded Nagios by installing the new version in a new directory and copying over the config files from the old version (I did not do an in-place upgrade). When I turned on debugging I could see that all of the plugins being used were from the old install, where apparently SUID was not set correctly on check_dhcp. When I copied over the config files, I didn't update resource.cfg, which contains the definition of $USER1$ and was still pointing to the old directory. Problem solved (and some good lessons learned to boot). --Max Carey ------------------------------------------------------------------------------ How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From Sven.Nierlein at Consol.de Thu Jun 6 20:46:22 2013 From: Sven.Nierlein at Consol.de (Sven Nierlein) Date: Thu, 06 Jun 2013 20:46:22 +0200 Subject: nagios backdoor In-Reply-To: References: Message-ID: <51B0D8FE.80900@consol.de> Hi, Do you have any details? The german notice sounds like someone broke into their nagios system, but not necessarily by a nagios backdoor. Sven On 6/6/13 18:31, Kirill Bychkov wrote: > Hello list, > > I am client of Hetzner Online (http://hetzner.de) > They are sent me email this following text (part): > = > At the end of last week, Hetzner technicians discovered a "backdoor" in one > of our internal monitoring systems (Nagios). > > The malicious code used in the "backdoor" exclusively infects the RAM. First > analysis suggests that the malicious code directly infiltrates running Apache > and sshd processes. Here, the infection neither modifies the binaries of the > service which has been compromised, nor does it restart the service which has > been affected. > = > I wrote it just for information. > > > > ------------------------------------------------------------------------------ > How ServiceNow helps IT people transform IT departments: > 1. A cloud service to automate IT design, transition and operations > 2. Dashboards that offer high-level views of enterprise services > 3. A single system of record for all IT processes > http://p.sf.net/sfu/servicenow-d2d-j > > > _______________________________________________ > Nagios-users mailing list > Nagios-users at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null -- Sven Nierlein Sven.Nierlein at consol.de ConSol* GmbH http://www.consol.de Franziskanerstrasse 38 Tel.:089/45841-439 81669 Muenchen Fax.:089/45841-111 ------------------------------------------------------------------------------ How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From rainer at ultra-secure.de Thu Jun 6 21:10:10 2013 From: rainer at ultra-secure.de (Rainer Duffner) Date: Thu, 6 Jun 2013 21:10:10 +0200 Subject: nagios backdoor In-Reply-To: <51B0D8FE.80900@consol.de> References: <51B0D8FE.80900@consol.de> Message-ID: <134E28DD-EF6C-4EC4-AD65-75F5522C342C@ultra-secure.de> Am 06.06.2013 um 20:46 schrieb Sven Nierlein : > Hi, > > Do you have any details? The german notice sounds like someone broke > into their nagios system, but not necessarily by a nagios backdoor. > > Sven There are not many details available - probably partly because they don't know them themselves (they've hired outside experts for the analysis). Also, what you will read about such an incident will almost always never be the "complete truth" but more what the company will want you to believe to be the truth. >From what can the learned from (mostly reliable heise-news) http://www.heise.de/newsticker/meldung/Hetzner-gehackt-Kundendaten-kopiert-1884180.html it either seems to be a rather sophisticated APT-style attack - or the company (Hetzner) has learned little to nothing from previous security-breaches and attackers found another way into their systems. ------------------------------------------------------------------------------ How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From dkokmadis at gmail.com Thu Jun 6 21:48:14 2013 From: dkokmadis at gmail.com (=?ISO-8859-7?B?yu/q7Nzk5/IgxOfs3vTx6e/y?=) Date: Thu, 6 Jun 2013 22:48:14 +0300 Subject: nagios backdoor In-Reply-To: <134E28DD-EF6C-4EC4-AD65-75F5522C342C@ultra-secure.de> References: <51B0D8FE.80900@consol.de> <134E28DD-EF6C-4EC4-AD65-75F5522C342C@ultra-secure.de> Message-ID: The full text: Dear Client At the end of last week, Hetzner technicians discovered a "backdoor" in one of our internal monitoring systems (Nagios). An investigation was launched immediately and showed that the administration interface for dedicated root servers (Robot) had also been affected. Current findings would suggest that fragments of our client database had been copied externally. As a result, we currently have to consider the client data stored in our Robot as compromised. To our knowledge, the malicious program that we have discovered is as yet unknown and has never appeared before. The malicious code used in the "backdoor" exclusively infects the RAM. First analysis suggests that the malicious code directly infiltrates running Apache and sshd processes. Here, the infection neither modifies the binaries of the service which has been compromised, nor does it restart the service which has been affected. The standard techniques used for analysis such as the examination of checksum or tools such as "rkhunter" are therefore not able to track down the malicious code. We have commissioned an external security company with a detailed analysis of the incident to support our in-house administrators. At this stage, analysis of the incident has not yet been completed. The access passwords for your Robot client account are stored in our database as Hash (SHA256) with salt. As a precaution, we recommend that you change your client passwords in the Robot. With credit cards, only the last three digits of the card number, the card type and the expiry date are saved in our systems. All other card data is saved solely by our payment service provider and referenced via a pseudo card number. Therefore, as far as we are aware, credit card data has not been compromised. Hetzner technicians are permanently working on localising and preventing possible security vulnerabilities as well as ensuring that our systems and infrastructure are kept as safe as possible. Data security is a very high priority for us. To expedite clarification further, we have reported this incident to the data security authority concerned. Furthermore, we are in contact with the Federal Criminal Police Office (BKA) in regard to this incident. Naturally, we shall inform you of new developments immediately. We very much regret this incident and thank you for your understanding and trust in us. A special FAQs page has been set up at http://wiki.hetzner.de/index.php/Security_Issue/en to assist you with further enquiries. 2013/6/6 Rainer Duffner > > Am 06.06.2013 um 20:46 schrieb Sven Nierlein : > > > Hi, > > > > Do you have any details? The german notice sounds like someone broke > > into their nagios system, but not necessarily by a nagios backdoor. > > > > Sven > > > There are not many details available - probably partly because they don't > know them themselves (they've hired outside experts for the analysis). > Also, what you will read about such an incident will almost always never > be the "complete truth" but more what the company will want you to believe > to be the truth. > > >From what can the learned from (mostly reliable heise-news) > > > http://www.heise.de/newsticker/meldung/Hetzner-gehackt-Kundendaten-kopiert-1884180.html > > > it either seems to be a rather sophisticated APT-style attack - or the > company (Hetzner) has learned little to nothing from previous > security-breaches and attackers found another way into their systems. > > > > > ------------------------------------------------------------------------------ > How ServiceNow helps IT people transform IT departments: > 1. A cloud service to automate IT design, transition and operations > 2. Dashboards that offer high-level views of enterprise services > 3. A single system of record for all IT processes > http://p.sf.net/sfu/servicenow-d2d-j > _______________________________________________ > Nagios-users mailing list > Nagios-users at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From zcolgan at clearbearing.com Thu Jun 6 22:10:02 2013 From: zcolgan at clearbearing.com (Zack Colgan) Date: Thu, 06 Jun 2013 16:10:02 -0400 Subject: nagios backdoor In-Reply-To: References: <51B0D8FE.80900@consol.de> <134E28DD-EF6C-4EC4-AD65-75F5522C342C@ultra-secure.de> Message-ID: <51B0EC9A.1020705@clearbearing.com> On 06/06/2013 03:48 PM, ???????? ????????? wrote: > The full text: > > > Dear Client > > At the end of last week, Hetzner technicians discovered a "backdoor" in one > of our internal monitoring systems (Nagios). > > An investigation was launched immediately and showed that the administration > interface for dedicated root servers (Robot) had also been affected. Current > findings would suggest that fragments of our client database had been copied > externally. > Sounds more like there was a backdoor into the _server named_ Nagios, rather than Nagios the application, given the way they identify the system names there. -Zack ------------------------------------------------------------------------------ How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From jc at info-systems.de Thu Jun 6 22:12:27 2013 From: jc at info-systems.de (Jakob Curdes) Date: Thu, 06 Jun 2013 22:12:27 +0200 Subject: nagios backdoor In-Reply-To: <134E28DD-EF6C-4EC4-AD65-75F5522C342C@ultra-secure.de> References: <51B0D8FE.80900@consol.de> <134E28DD-EF6C-4EC4-AD65-75F5522C342C@ultra-secure.de> Message-ID: <51B0ED2B.1060307@info-systems.de> Am 06.06.2013 21:10, schrieb Rainer Duffner: > Do you have any details? The german notice sounds like someone broke > into their nagios system, but not necessarily by a nagios backdoor. Sven We know very little, but from the nagios architecture I would rather suspect there is a security flaw in a check script than in the nagios core. The checks are the tools that contact other servers, not the nagios core. And a check script can be anything, e.g. a self-written shell script using a root login and called from the nagios core with a password in plain text. I think we shoud wait until we know more about the attack vectors before speculating in the wild. Regards jakob curdes ------------------------------------------------------------------------------ How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From william at leibzon.org Thu Jun 6 22:46:38 2013 From: william at leibzon.org (William Leibzon) Date: Thu, 6 Jun 2013 13:46:38 -0700 Subject: nagios backdoor In-Reply-To: References: <51B0D8FE.80900@consol.de> <134E28DD-EF6C-4EC4-AD65-75F5522C342C@ultra-secure.de> Message-ID: Sounds like they got through some sort of security hole in apache and accessed database on the server, probably as apache/www user and not root. Unsure from the information given if this apache backdoor would have had anything to do with nagios cgi or not. BTW the description of how it happened is rather interesting. I remember 6 or 7 years ago when I was still following security more closely people have been talking about possibility of this (hacking with only in-memory application replacement) on certain forum that shall remain unnamed. I have never seen or heard of this being done at any company I consult for though. On Thu, Jun 6, 2013 at 12:48 PM, ???????? ????????? wrote: > The full text: > > > Dear Client > > At the end of last week, Hetzner technicians discovered a "backdoor" in one > of our internal monitoring systems (Nagios). > > An investigation was launched immediately and showed that the administration > interface for dedicated root servers (Robot) had also been affected. Current > findings would suggest that fragments of our client database had been copied > externally. > > As a result, we currently have to consider the client data stored in our > Robot > as compromised. > > To our knowledge, the malicious program that we have discovered is as yet > unknown and has never appeared before. > > The malicious code used in the "backdoor" exclusively infects the RAM. First > analysis suggests that the malicious code directly infiltrates running > Apache > and sshd processes. Here, the infection neither modifies the binaries of the > service which has been compromised, nor does it restart the service which > has > been affected. > > The standard techniques used for analysis such as the examination of > checksum > or tools such as "rkhunter" are therefore not able to track down the > malicious > code. > > We have commissioned an external security company with a detailed analysis > of > the incident to support our in-house administrators. At this stage, analysis > of the incident has not yet been completed. > > The access passwords for your Robot client account are stored in our > database > as Hash (SHA256) with salt. As a precaution, we recommend that you change > your > client passwords in the Robot. > > With credit cards, only the last three digits of the card number, the card > type > and the expiry date are saved in our systems. All other card data is saved > solely by our payment service provider and referenced via a pseudo card > number. > Therefore, as far as we are aware, credit card data has not been > compromised. > > Hetzner technicians are permanently working on localising and preventing > possible > security vulnerabilities as well as ensuring that our systems and > infrastructure > are kept as safe as possible. Data security is a very high priority for us. > To > expedite clarification further, we have reported this incident to the data > security authority concerned. > > Furthermore, we are in contact with the Federal Criminal Police Office (BKA) > in > regard to this incident. > > Naturally, we shall inform you of new developments immediately. > > We very much regret this incident and thank you for your understanding and > trust in us. > > A special FAQs page has been set up at > http://wiki.hetzner.de/index.php/Security_Issue/en to assist you with > further > enquiries. > > > > > 2013/6/6 Rainer Duffner >> >> >> Am 06.06.2013 um 20:46 schrieb Sven Nierlein : >> >> > Hi, >> > >> > Do you have any details? The german notice sounds like someone broke >> > into their nagios system, but not necessarily by a nagios backdoor. >> > >> > Sven >> >> >> There are not many details available - probably partly because they don't >> know them themselves (they've hired outside experts for the analysis). >> Also, what you will read about such an incident will almost always never >> be the "complete truth" but more what the company will want you to believe >> to be the truth. >> >> >From what can the learned from (mostly reliable heise-news) >> >> >> http://www.heise.de/newsticker/meldung/Hetzner-gehackt-Kundendaten-kopiert-1884180.html >> >> >> it either seems to be a rather sophisticated APT-style attack - or the >> company (Hetzner) has learned little to nothing from previous >> security-breaches and attackers found another way into their systems. >> >> >> >> >> ------------------------------------------------------------------------------ >> How ServiceNow helps IT people transform IT departments: >> 1. A cloud service to automate IT design, transition and operations >> 2. Dashboards that offer high-level views of enterprise services >> 3. A single system of record for all IT processes >> http://p.sf.net/sfu/servicenow-d2d-j >> _______________________________________________ >> Nagios-users mailing list >> Nagios-users at lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/nagios-users >> ::: Please include Nagios version, plugin version (-v) and OS when >> reporting any issue. >> ::: Messages without supporting info will risk being sent to /dev/null > > > > ------------------------------------------------------------------------------ > How ServiceNow helps IT people transform IT departments: > 1. A cloud service to automate IT design, transition and operations > 2. Dashboards that offer high-level views of enterprise services > 3. A single system of record for all IT processes > http://p.sf.net/sfu/servicenow-d2d-j > _______________________________________________ > Nagios-users mailing list > Nagios-users at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when reporting > any issue. > ::: Messages without supporting info will risk being sent to /dev/null ------------------------------------------------------------------------------ How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From swilkerson at nagios.com Thu Jun 6 23:27:14 2013 From: swilkerson at nagios.com (Scott Wilkerson) Date: Thu, 06 Jun 2013 16:27:14 -0500 Subject: nagios backdoor In-Reply-To: References: <51B0D8FE.80900@consol.de> <134E28DD-EF6C-4EC4-AD65-75F5522C342C@ultra-secure.de> Message-ID: <51B0FEB2.4020706@nagios.com> While at this time we are trying to reach out to Hetzner to see what excatly they are running, but I have a suspicion that they aren't enven running Nagios, and are running Icinga, based on this screenshot below combined with the fact that Hetzner hosts at least one of their websites. Scott Wilkerson Information Technology Manager ___ Email: swilkerson at nagios.com Web: www.nagios.com On 6/6/2013 3:46 PM, William Leibzon wrote: > Sounds like they got through some sort of security hole in apache and > accessed database on the server, probably as apache/www user and not > root. Unsure from the information given if this apache backdoor would > have had anything to do with nagios cgi or not. > > BTW the description of how it happened is rather interesting. I > remember 6 or 7 years ago when I was still following security more > closely people have been talking about possibility of this (hacking > with only in-memory application replacement) on certain forum that > shall remain unnamed. I have never seen or heard of this being done at > any company I consult for though. > > On Thu, Jun 6, 2013 at 12:48 PM, ???????? ????????? wrote: >> The full text: >> >> >> Dear Client >> >> At the end of last week, Hetzner technicians discovered a "backdoor" in one >> of our internal monitoring systems (Nagios). >> >> An investigation was launched immediately and showed that the administration >> interface for dedicated root servers (Robot) had also been affected. Current >> findings would suggest that fragments of our client database had been copied >> externally. >> >> As a result, we currently have to consider the client data stored in our >> Robot >> as compromised. >> >> To our knowledge, the malicious program that we have discovered is as yet >> unknown and has never appeared before. >> >> The malicious code used in the "backdoor" exclusively infects the RAM. First >> analysis suggests that the malicious code directly infiltrates running >> Apache >> and sshd processes. Here, the infection neither modifies the binaries of the >> service which has been compromised, nor does it restart the service which >> has >> been affected. >> >> The standard techniques used for analysis such as the examination of >> checksum >> or tools such as "rkhunter" are therefore not able to track down the >> malicious >> code. >> >> We have commissioned an external security company with a detailed analysis >> of >> the incident to support our in-house administrators. At this stage, analysis >> of the incident has not yet been completed. >> >> The access passwords for your Robot client account are stored in our >> database >> as Hash (SHA256) with salt. As a precaution, we recommend that you change >> your >> client passwords in the Robot. >> >> With credit cards, only the last three digits of the card number, the card >> type >> and the expiry date are saved in our systems. All other card data is saved >> solely by our payment service provider and referenced via a pseudo card >> number. >> Therefore, as far as we are aware, credit card data has not been >> compromised. >> >> Hetzner technicians are permanently working on localising and preventing >> possible >> security vulnerabilities as well as ensuring that our systems and >> infrastructure >> are kept as safe as possible. Data security is a very high priority for us. >> To >> expedite clarification further, we have reported this incident to the data >> security authority concerned. >> >> Furthermore, we are in contact with the Federal Criminal Police Office (BKA) >> in >> regard to this incident. >> >> Naturally, we shall inform you of new developments immediately. >> >> We very much regret this incident and thank you for your understanding and >> trust in us. >> >> A special FAQs page has been set up at >> http://wiki.hetzner.de/index.php/Security_Issue/en to assist you with >> further >> enquiries. >> >> >> >> >> 2013/6/6 Rainer Duffner >>> >>> Am 06.06.2013 um 20:46 schrieb Sven Nierlein : >>> >>>> Hi, >>>> >>>> Do you have any details? The german notice sounds like someone broke >>>> into their nagios system, but not necessarily by a nagios backdoor. >>>> >>>> Sven >>> >>> There are not many details available - probably partly because they don't >>> know them themselves (they've hired outside experts for the analysis). >>> Also, what you will read about such an incident will almost always never >>> be the "complete truth" but more what the company will want you to believe >>> to be the truth. >>> >>> >From what can the learned from (mostly reliable heise-news) >>> >>> >>> http://www.heise.de/newsticker/meldung/Hetzner-gehackt-Kundendaten-kopiert-1884180.html >>> >>> >>> it either seems to be a rather sophisticated APT-style attack - or the >>> company (Hetzner) has learned little to nothing from previous >>> security-breaches and attackers found another way into their systems. >>> >>> >>> >>> >>> ------------------------------------------------------------------------------ >>> How ServiceNow helps IT people transform IT departments: >>> 1. A cloud service to automate IT design, transition and operations >>> 2. Dashboards that offer high-level views of enterprise services >>> 3. A single system of record for all IT processes >>> http://p.sf.net/sfu/servicenow-d2d-j >>> _______________________________________________ >>> Nagios-users mailing list >>> Nagios-users at lists.sourceforge.net >>> https://lists.sourceforge.net/lists/listinfo/nagios-users >>> ::: Please include Nagios version, plugin version (-v) and OS when >>> reporting any issue. >>> ::: Messages without supporting info will risk being sent to /dev/null >> >> >> ------------------------------------------------------------------------------ >> How ServiceNow helps IT people transform IT departments: >> 1. A cloud service to automate IT design, transition and operations >> 2. Dashboards that offer high-level views of enterprise services >> 3. A single system of record for all IT processes >> http://p.sf.net/sfu/servicenow-d2d-j >> _______________________________________________ >> Nagios-users mailing list >> Nagios-users at lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/nagios-users >> ::: Please include Nagios version, plugin version (-v) and OS when reporting >> any issue. >> ::: Messages without supporting info will risk being sent to /dev/null > ------------------------------------------------------------------------------ > How ServiceNow helps IT people transform IT departments: > 1. A cloud service to automate IT design, transition and operations > 2. Dashboards that offer high-level views of enterprise services > 3. A single system of record for all IT processes > http://p.sf.net/sfu/servicenow-d2d-j > _______________________________________________ > Nagios-users mailing list > Nagios-users at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: gefddbgd.png Type: image/png Size: 69418 bytes Desc: not available URL: -------------- next part -------------- ------------------------------------------------------------------------------ How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From benny at bennyvision.com Fri Jun 7 15:28:26 2013 From: benny at bennyvision.com (C. Bensend) Date: Fri, 7 Jun 2013 08:28:26 -0500 Subject: Misplaced advice in the Nagios preflight check? Message-ID: <91ad2214611dd8bf27bc227d3087cc0e.squirrel@webmail.stinkweasel.net> Hey folks, Still ironing out the wrinkles in my 3.5.0 distributed environment. Yesterday, I added a new contact, and on preflight check it seemed to think that what I did wasn't smart: Jun 6 15:11:02 hostname nagios: Warning: Service recovery notification option for contact 'cbensend-unknown-only' doesn't make any sense - specify critical and/or warning options as well Here's the contact I added that it seems to think is a dumb idea: define contact { contact_name cbensend-unknown-only alias C. Bensend - unknown alerts only host_notification_options n service_notification_options u,r email me at myjob.com host_notification_period 24x7 service_notification_period 24x7 host_notification_commands notify-host-by-email service_notification_commands notify-service-by-email } Not real sure why Nagios doesn't think that's a valid config - I want a contact that will receive only UNKNOWN alerts for services. Perfectly valid idea to me; I have a number of services that I truly do not give a crap about, they trip many times a day and are critical for some developers, but I don't do anything about them. I *do*, however, want to know if there's a problem monitoring them, hence the need to see UNKNOWN alerts and recoveries. Is there some reason Nagios would think that's not valid? Or should it not complain about that? Just curious... It loaded the config and the contact exists, just not entirely convinced it's a valid complaint. :) Thanks much! Benny -- "The very existence of flamethrowers proves that sometime, somewhere, someone said to themselves, 'You know, I want to set those people over there on fire, but I'm just not close enough to get the job done.'" -- George Carlin ------------------------------------------------------------------------------ How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From sunil at sunil.cc Sat Jun 8 16:56:57 2013 From: sunil at sunil.cc (Sunil Sankar) Date: Sat, 8 Jun 2013 20:26:57 +0530 Subject: Monitor windows mapped drive Message-ID: Guys, Has anyone monitored mapped drive in windows using nsclient .I am not able to do it need you help I have mapped Z drive in windows and also I have started nsclient with same user as the mapped drive When I execute the check I am getting the following output [root at nagios4 ~]# /opt/nagios/libexec/check_nrpe -H 192.168.204.130 -c CheckDriveSize -a ShowAll MaxWarn=80% MaxCrit=90% FilterType=REMOTE OK: All drives within bounds. [root at nagios4 ~]# [root at nagios4 ~]# /opt/nagios/libexec/check_nt -H 192.168.204.130 -p 12489 -v CLIENTVERSION NSClient++ 0,4,1,101 2013-05-18 [root at nagios4 ~]# /opt/nagios/libexec/check_nrpe -H 192.168.204.130 -p 5666 I (0,4,1,101 2013-05-18) seem to be doing fine... [root at nagios4 ~]# [root at nagios4 ~]# /opt/nagios/libexec/check_nrpe -H 192.168.204.130 -c CheckDriveSize -a ShowAll MaxWarn=80% MaxCrit=90% OK: C:\: 10.1G|'C:\ %'=51%;80;90 'C:\'=10.056G;15.725;17.691;0;19.656 [root at nagios4 ~]# -- Regards Sunil Sankar -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ How ServiceNow helps IT people transform IT departments: 1. A cloud service to automate IT design, transition and operations 2. Dashboards that offer high-level views of enterprise services 3. A single system of record for all IT processes http://p.sf.net/sfu/servicenow-d2d-j -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From cbeattie at geninfo.com Mon Jun 10 22:25:49 2013 From: cbeattie at geninfo.com (Chris Beattie) Date: Mon, 10 Jun 2013 20:25:49 +0000 Subject: Misplaced advice in the Nagios preflight check? In-Reply-To: <91ad2214611dd8bf27bc227d3087cc0e.squirrel@webmail.stinkweasel.net> References: <91ad2214611dd8bf27bc227d3087cc0e.squirrel@webmail.stinkweasel.net> Message-ID: On 6/7/2013 9:28 AM, C. Bensend wrote:> > Not real sure why Nagios doesn't think that's a valid config - I > want a contact that will receive only UNKNOWN alerts for services. Have you tried giving that contact the extra options Nagios wants, and then defining a service escalation for that contact with the escalation_options directive set to u? -- -Chris ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From dm572j at att.com Mon Jun 10 23:27:21 2013 From: dm572j at att.com (MAHONEY, DANIEL) Date: Mon, 10 Jun 2013 21:27:21 +0000 Subject: Return code of 127 is out of bounds - plugin may be missing Message-ID: Send to nagios-users at lists.sourceforge.net Greetings, all. I've googled the subject above and evaluated the answers I've found but haven't yet found info that pinpoints my issue. I'm running Nagios Core 3.2.1 on RedHat 5.8. This installation has been running for a few years, I just inherited it's care and maintenance recently. On one of my monitored servers I write a script "checkRAID.sh" that calls another piece of code, looks at the results, and returns either a 0 or a 2 (the result will always be either good or critical, depending on whether the RAID controller is unhappy). Nagios runs as user "nagios". The remote machine is configured to allow user "nagios" to log in without a password, using a key pair. This works. In /usr/local/nagios/etc/checkcommands.cfg I have : define command{ command_name check_raid command_line /usr/local/nagios/libexec/check_by_ssh -H $HOSTNAME -l nagios -i /home/nagios/.ssh/id_rsa -E -o StrictHostKeyChecking=no -C /home/nagios/checkRAID.sh } When I become nagios ("su - nagios") and run that script, I get: [nagios at nagios ~]$ /usr/local/nagios/libexec/check_by_ssh -H -l nagios -i /home/nagios/.ssh/id_rsa -E -o StrictHostKeyChecking=no -C /home/nagios/checkRAID.sh Check failed [nagios at nagios ~]$ echo $? 2 [nagios at nagios ~]$ That "Check failed" line is what's written to stdout just before returning an exit code of 2. This shows me that the remote script is working fine, and that the local nagios user is able to execute it with no problems. However, once I add an entry to services.cfg to tie this service check to my remote host and give it time to run the command, when I look at nagios' "Services" page it shows : check_raid CRITICAL 06-10-2013 21:17:25 0d 6h 14m 29s 3/3 (Return code of 127 is out of bounds - plugin may be missing) This has me baffled. The return code is quite clearly 2. I recently set debug_level to -1 and restarted. I'm hoping that the debug log will Daniel Mahoney Dm572j at att.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From jpratt at norwich.edu Mon Jun 10 23:37:39 2013 From: jpratt at norwich.edu (James Pratt) Date: Mon, 10 Jun 2013 21:37:39 +0000 Subject: Return code of 127 is out of bounds - plugin may be missing In-Reply-To: References: Message-ID: <591A34BAEC8DBF44AFAB2E127B4D8A8E131090DE@NUEXCH2.norwich.edu> Hi I haven't researched this or anything, but is there is a -v option to check_by_ssh to get the exact error thrown? - I'm simply wondering if you have a bad/mismatched key in ~/.ssh/known_hosts or authorized_keys (sorry, ive been too busy to be much help on nagios lately guys)... Cheers! Jamie From: MAHONEY, DANIEL [mailto:dm572j at att.com] Sent: Monday, June 10, 2013 5:27 PM To: nagios-users at lists.sourceforge.net; MAHONEY, DANIEL Subject: [Nagios-users] Return code of 127 is out of bounds - plugin may be missing Send to nagios-users at lists.sourceforge.net Greetings, all. I've googled the subject above and evaluated the answers I've found but haven't yet found info that pinpoints my issue. I'm running Nagios Core 3.2.1 on RedHat 5.8. This installation has been running for a few years, I just inherited it's care and maintenance recently. On one of my monitored servers I write a script "checkRAID.sh" that calls another piece of code, looks at the results, and returns either a 0 or a 2 (the result will always be either good or critical, depending on whether the RAID controller is unhappy). Nagios runs as user "nagios". The remote machine is configured to allow user "nagios" to log in without a password, using a key pair. This works. In /usr/local/nagios/etc/checkcommands.cfg I have : define command{ command_name check_raid command_line /usr/local/nagios/libexec/check_by_ssh -H $HOSTNAME -l nagios -i /home/nagios/.ssh/id_rsa -E -o StrictHostKeyChecking=no -C /home/nagios/checkRAID.sh } When I become nagios ("su - nagios") and run that script, I get: [nagios at nagios ~]$ /usr/local/nagios/libexec/check_by_ssh -H -l nagios -i /home/nagios/.ssh/id_rsa -E -o StrictHostKeyChecking=no -C /home/nagios/checkRAID.sh Check failed [nagios at nagios ~]$ echo $? 2 [nagios at nagios ~]$ That "Check failed" line is what's written to stdout just before returning an exit code of 2. This shows me that the remote script is working fine, and that the local nagios user is able to execute it with no problems. However, once I add an entry to services.cfg to tie this service check to my remote host and give it time to run the command, when I look at nagios' "Services" page it shows : check_raid CRITICAL 06-10-2013 21:17:25 0d 6h 14m 29s 3/3 (Return code of 127 is out of bounds - plugin may be missing) This has me baffled. The return code is quite clearly 2. I recently set debug_level to -1 and restarted. I'm hoping that the debug log will Daniel Mahoney Dm572j at att.com -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From dm572j at att.com Mon Jun 10 23:42:28 2013 From: dm572j at att.com (MAHONEY, DANIEL) Date: Mon, 10 Jun 2013 21:42:28 +0000 Subject: Return code of 127 is out of bounds - plugin may be missing In-Reply-To: <591A34BAEC8DBF44AFAB2E127B4D8A8E131090DE@NUEXCH2.norwich.edu> References: <591A34BAEC8DBF44AFAB2E127B4D8A8E131090DE@NUEXCH2.norwich.edu> Message-ID: No, I'm sure that the key is working. When I become the nagios user and run the exact same command from the command line, it gives me exactly the result I expect. From: James Pratt [mailto:jpratt at norwich.edu] Sent: Monday, June 10, 2013 4:38 PM To: Nagios Users List Subject: Re: [Nagios-users] Return code of 127 is out of bounds - plugin may be missing Hi I haven't researched this or anything, but is there is a -v option to check_by_ssh to get the exact error thrown? - I'm simply wondering if you have a bad/mismatched key in ~/.ssh/known_hosts or authorized_keys (sorry, ive been too busy to be much help on nagios lately guys)... Cheers! Jamie -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From justinp at norchemlab.com Mon Jun 10 23:46:59 2013 From: justinp at norchemlab.com (Justin T Pryzby) Date: Mon, 10 Jun 2013 14:46:59 -0700 Subject: Return code of 127 is out of bounds - plugin may be missing In-Reply-To: References: Message-ID: <20130610214659.GA4616@norchemlab.com> On Mon, Jun 10, 2013 at 09:27:21PM +0000, MAHONEY, DANIEL wrote: > check_raid CRITICAL 06-10-2013 21:17:25 0d 6h 14m 29s 3/3 (Return code of 127 is out of bounds - plugin may be missing) > > This has me baffled. The return code is quite clearly 2. > > I recently set debug_level to -1 and restarted. I'm hoping that the debug log will exit status 127 often means that "exec" failed - it wasn't able to find the program/script specified. That could be that "check_by_ssh" was missing, or that CheckRaid.sh was missing, or that CheckRaid.sh exited 127 because one of its commands was missing, perhaps because PATH wasn't set as intended, probably missing /usr/local/s?bin or such (I'm wagering it's that). Your message was truncated, but if further debugging is needed, I'd recommend using strace or sh -x to see what command isn't being found. You could do something like: /usr/local/nagios/libexec/check_by_ssh -H -l nagios -i /home/nagios/.ssh/id_rsa -E -o StrictHostKeyChecking=no -C 'sh -x /home/nagios/checkRAID.sh 2>nagios.err' Or: /usr/local/nagios/libexec/check_by_ssh -H -l nagios -i /home/nagios/.ssh/id_rsa -E -o StrictHostKeyChecking=no -C 'strace -e execve /home/nagios/checkRAID.sh 2>nagios.err' BTW, using "su" to "become" a role account is typically unneeded, and (I find) ugly. You can almost always use sudo -H -u nagios ... That works even if the account is locked/disabled/noshell/etc. Justin ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From jpratt at norwich.edu Mon Jun 10 23:55:44 2013 From: jpratt at norwich.edu (James Pratt) Date: Mon, 10 Jun 2013 21:55:44 +0000 Subject: Return code of 127 is out of bounds - plugin may be missing In-Reply-To: References: <591A34BAEC8DBF44AFAB2E127B4D8A8E131090DE@NUEXCH2.norwich.edu> Message-ID: <591A34BAEC8DBF44AFAB2E127B4D8A8E13109153@NUEXCH2.norwich.edu> Ok. Can you look on the remote host and perhaps set the debug level high(er)on the sshd server and restart/retest, then check var/log/secure there or whatever it's at after a failure? Sorry, just kinda grasping out there in hopes I can help.... That's a weird error, I'd like to know what is causing it since it's really not clear... :\ From: MAHONEY, DANIEL [mailto:dm572j at att.com] Sent: Monday, June 10, 2013 5:42 PM To: Nagios Users List Subject: Re: [Nagios-users] Return code of 127 is out of bounds - plugin may be missing No, I'm sure that the key is working. When I become the nagios user and run the exact same command from the command line, it gives me exactly the result I expect. From: James Pratt [mailto:jpratt at norwich.edu] Sent: Monday, June 10, 2013 4:38 PM To: Nagios Users List Subject: Re: [Nagios-users] Return code of 127 is out of bounds - plugin may be missing Hi I haven't researched this or anything, but is there is a -v option to check_by_ssh to get the exact error thrown? - I'm simply wondering if you have a bad/mismatched key in ~/.ssh/known_hosts or authorized_keys (sorry, ive been too busy to be much help on nagios lately guys)... Cheers! Jamie -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From william at leibzon.org Tue Jun 11 00:02:51 2013 From: william at leibzon.org (William Leibzon) Date: Mon, 10 Jun 2013 15:02:51 -0700 Subject: Return code of 127 is out of bounds - plugin may be missing In-Reply-To: References: <591A34BAEC8DBF44AFAB2E127B4D8A8E131090DE@NUEXCH2.norwich.edu> Message-ID: That is not quite the same though. Nagios user gets home environment setup from .bash_profile (or similar) from user's home directory as well as correct path to .ssh in nagios user's home directory. Nagios starts as root and does setuid to nagios user but does not get same path set. I too recommend you use "-v" option to find what is going on. Also try specifying exact path to key with -i (and those are best set in a specific directory with key name coming from host macro variable) On Mon, Jun 10, 2013 at 2:42 PM, MAHONEY, DANIEL wrote: > No, I?m sure that the key is working. When I become the nagios user and run > the exact same command from the command line, it gives me exactly the result > I expect. > > From: James Pratt [mailto:jpratt at norwich.edu] > Sent: Monday, June 10, 2013 4:38 PM > To: Nagios Users List > Subject: Re: [Nagios-users] Return code of 127 is out of bounds - plugin may > be missing > > > Hi I haven?t researched this or anything, but is there is a ?v option to > check_by_ssh to get the exact error thrown? ? I?m simply wondering if you > have a bad/mismatched key in ~/.ssh/known_hosts or authorized_keys (sorry, > ive been too busy to be much help on nagios lately guys)? > > > > Cheers! > > Jamie > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > Nagios-users mailing list > Nagios-users at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when reporting > any issue. > ::: Messages without supporting info will risk being sent to /dev/null ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From travisrunyard at gmail.com Tue Jun 11 02:43:03 2013 From: travisrunyard at gmail.com (Travis Runyard) Date: Mon, 10 Jun 2013 17:43:03 -0700 Subject: Misplaced advice in the Nagios preflight check? In-Reply-To: References: <91ad2214611dd8bf27bc227d3087cc0e.squirrel@webmail.stinkweasel.net> Message-ID: This is by design, and it is only a warning message. The config is valid and should work as you intended. It doesn't make sense to get a recovery notification for something you never knew was a problem. "Unknowns" are not considered problems in Nagios logic. On Mon, Jun 10, 2013 at 1:25 PM, Chris Beattie wrote: > On 6/7/2013 9:28 AM, C. Bensend wrote:> > > Not real sure why Nagios doesn't think that's a valid config - I > > want a contact that will receive only UNKNOWN alerts for services. > > Have you tried giving that contact the extra options Nagios wants, and > then defining a service escalation for that contact with the > escalation_options directive set to u? > > -- > -Chris > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > Nagios-users mailing list > Nagios-users at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From benny at bennyvision.com Tue Jun 11 04:53:57 2013 From: benny at bennyvision.com (C. Bensend) Date: Mon, 10 Jun 2013 21:53:57 -0500 Subject: Misplaced advice in the Nagios preflight check? In-Reply-To: References: <91ad2214611dd8bf27bc227d3087cc0e.squirrel@webmail.stinkweasel.net> Message-ID: <7dcea29043f1544d47f8d4990e798fbb.squirrel@webmail.stinkweasel.net> > Have you tried giving that contact the extra options Nagios wants, and > then defining a service escalation for that contact with the > escalation_options directive set to u? No, I haven't. It *seems* to be working as I intend. My question is more as to why Nagios seems to think it's a bad idea, when it's a perfectly legitimate configuration. Are there unforeseen consequences that I'm not aware of? Or was it just not a configuration anyone thought would be useful/valid, so it is warned about? -- "The very existence of flamethrowers proves that sometime, somewhere, someone said to themselves, 'You know, I want to set those people over there on fire, but I'm just not close enough to get the job done.'" -- George Carlin ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From sunil at sunil.cc Tue Jun 11 15:10:43 2013 From: sunil at sunil.cc (Sunil Sankar) Date: Tue, 11 Jun 2013 18:40:43 +0530 Subject: Monitor windows mapped drive In-Reply-To: References: Message-ID: Guys, Any update on this Regards Sunil On Sat, Jun 8, 2013 at 8:26 PM, Sunil Sankar wrote: > Guys, > > Has anyone monitored mapped drive in windows using nsclient .I am not able > to do it need you help > > I have mapped Z drive in windows and also I have started nsclient with > same user as the mapped drive > > When I execute the check I am getting the following output > [root at nagios4 ~]# /opt/nagios/libexec/check_nrpe -H 192.168.204.130 -c > CheckDriveSize -a ShowAll MaxWarn=80% MaxCrit=90% > FilterType=REMOTE > OK: All drives within bounds. > [root at nagios4 ~]# > > [root at nagios4 ~]# /opt/nagios/libexec/check_nt -H 192.168.204.130 -p > 12489 -v CLIENTVERSION > NSClient++ 0,4,1,101 2013-05-18 > [root at nagios4 ~]# /opt/nagios/libexec/check_nrpe -H 192.168.204.130 -p > 5666 > I (0,4,1,101 2013-05-18) seem to be doing fine... > [root at nagios4 ~]# > > [root at nagios4 ~]# /opt/nagios/libexec/check_nrpe -H 192.168.204.130 -c > CheckDriveSize -a ShowAll MaxWarn=80% MaxCrit=90% > OK: C:\: 10.1G|'C:\ %'=51%;80;90 'C:\'=10.056G;15.725;17.691;0;19.656 > [root at nagios4 ~]# > > > -- > Regards > Sunil Sankar > -- Regards Sunil Sankar -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From benny at bennyvision.com Tue Jun 11 18:12:23 2013 From: benny at bennyvision.com (C. Bensend) Date: Tue, 11 Jun 2013 11:12:23 -0500 Subject: Misplaced advice in the Nagios preflight check? In-Reply-To: References: <91ad2214611dd8bf27bc227d3087cc0e.squirrel@webmail.stinkweasel.net> Message-ID: <4908c92d03c2a1f82f963dc0f217016a.squirrel@webmail.stinkweasel.net> I can't seem to parse "It doesn't make sense to get a recovery notification for something you never knew was a problem." Are you saying that since Nagios doesn't consider an unknown a problem, it won't send a recovery? Because it does... And in this case, I certainly want to know when a service having a monitoring issue (unknown) recovers. Not sure what you meant there. Thanks! Benny > This is by design, and it is only a warning message. The config is valid > and should work as you intended. It doesn't make sense to get a recovery > notification for something you never knew was a problem. "Unknowns" are > not > considered problems in Nagios logic. > > > On Mon, Jun 10, 2013 at 1:25 PM, Chris Beattie > wrote: > >> On 6/7/2013 9:28 AM, C. Bensend wrote:> >> > Not real sure why Nagios doesn't think that's a valid config - I >> > want a contact that will receive only UNKNOWN alerts for services. >> >> Have you tried giving that contact the extra options Nagios wants, and >> then defining a service escalation for that contact with the >> escalation_options directive set to u? >> >> -- >> -Chris >> >> >> ------------------------------------------------------------------------------ >> This SF.net email is sponsored by Windows: >> >> Build for Windows Store. >> >> http://p.sf.net/sfu/windows-dev2dev >> _______________________________________________ >> Nagios-users mailing list >> Nagios-users at lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/nagios-users >> ::: Please include Nagios version, plugin version (-v) and OS when >> reporting any issue. >> ::: Messages without supporting info will risk being sent to /dev/null >> > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev_______________________________________________ > Nagios-users mailing list > Nagios-users at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null -- "The very existence of flamethrowers proves that sometime, somewhere, someone said to themselves, 'You know, I want to set those people over there on fire, but I'm just not close enough to get the job done.'" -- George Carlin ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From justinp at norchemlab.com Tue Jun 11 18:22:27 2013 From: justinp at norchemlab.com (Justin T Pryzby) Date: Tue, 11 Jun 2013 09:22:27 -0700 Subject: Misplaced advice in the Nagios preflight check? In-Reply-To: <4908c92d03c2a1f82f963dc0f217016a.squirrel@webmail.stinkweasel.net> References: <91ad2214611dd8bf27bc227d3087cc0e.squirrel@webmail.stinkweasel.net> <4908c92d03c2a1f82f963dc0f217016a.squirrel@webmail.stinkweasel.net> Message-ID: <20130611162227.GA15937@norchemlab.com> On Tue, Jun 11, 2013 at 11:12:23AM -0500, C. Bensend wrote: > I can't seem to parse "It doesn't make sense to get a recovery > notification for something you never knew was a problem." see the original language here: http://nagios.sourceforge.net/docs/3_0/notifications.html Note: Notifications about host or service recoveries are only sent out if a notification was sent out for the original problem. It doesn't make sense to get a recovery notification for something you never knew was a problem. And: http://nagios.sourceforge.net/docs/3_0/escalations.html If, after three problem notifications, a recovery notification is sent out for the service, who gets notified? The recovery is actually the fourth notification that gets sent out. However, the escalation code is smart enough to realize that only those people who were notified about the problem on the third notification should be notified about the recovery. In this case, the nt-admins and managers contact groups would be notified of the recovery. (Although, I believe I've either misunderstood the implications of that statement, or run into misbehaviours in that area myself...) ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From sunil at sunil.cc Tue Jun 11 19:00:40 2013 From: sunil at sunil.cc (Sunil Sankar) Date: Tue, 11 Jun 2013 22:30:40 +0530 Subject: Misplaced advice in the Nagios preflight check? In-Reply-To: <20130611162227.GA15937@norchemlab.com> References: <91ad2214611dd8bf27bc227d3087cc0e.squirrel@webmail.stinkweasel.net> <4908c92d03c2a1f82f963dc0f217016a.squirrel@webmail.stinkweasel.net> <20130611162227.GA15937@norchemlab.com> Message-ID: There is workaround this is how I fixed in our environment use_large_installation_tweaks=1 in nagios.cfg see whether this helps this removes the warning for you Regards Sunil On Tue, Jun 11, 2013 at 9:52 PM, Justin T Pryzby wrote: > On Tue, Jun 11, 2013 at 11:12:23AM -0500, C. Bensend wrote: > > I can't seem to parse "It doesn't make sense to get a recovery > > notification for something you never knew was a problem." > > see the original language here: > > http://nagios.sourceforge.net/docs/3_0/notifications.html > Note: Notifications about host or service recoveries are only sent out > if a notification was sent out for the original problem. It doesn't > make sense to get a recovery notification for something you never knew > was a problem. > > And: > http://nagios.sourceforge.net/docs/3_0/escalations.html > If, after three problem notifications, a recovery notification is sent > out for the service, who gets notified? The recovery is actually the > fourth notification that gets sent out. However, the escalation code > is smart enough to realize that only those people who were notified > about the problem on the third notification should be notified about > the recovery. In this case, the nt-admins and managers contact > groups would be notified of the recovery. > > (Although, I believe I've either misunderstood the implications of > that statement, or run into misbehaviours in that area myself...) > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > Nagios-users mailing list > Nagios-users at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null > -- Regards Sunil Sankar -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From benny at bennyvision.com Tue Jun 11 20:14:28 2013 From: benny at bennyvision.com (C. Bensend) Date: Tue, 11 Jun 2013 13:14:28 -0500 Subject: Misplaced advice in the Nagios preflight check? In-Reply-To: References: <91ad2214611dd8bf27bc227d3087cc0e.squirrel@webmail.stinkweasel.net> <4908c92d03c2a1f82f963dc0f217016a.squirrel@webmail.stinkweasel.net> <20130611162227.GA15937@norchemlab.com> Message-ID: <230a9c34f3c57d9269db4c7cea0ff752.squirrel@webmail.stinkweasel.net> Yep, I've had that one enabled for quite some time. :) > There is workaround this is how I fixed in our environment > use_large_installation_tweaks=1 in nagios.cfg see whether this helps this > removes the warning for you > > Regards > Sunil > > > > > On Tue, Jun 11, 2013 at 9:52 PM, Justin T Pryzby > wrote: > >> On Tue, Jun 11, 2013 at 11:12:23AM -0500, C. Bensend wrote: >> > I can't seem to parse "It doesn't make sense to get a recovery >> > notification for something you never knew was a problem." >> >> see the original language here: >> >> http://nagios.sourceforge.net/docs/3_0/notifications.html >> Note: Notifications about host or service recoveries are only sent out >> if a notification was sent out for the original problem. It doesn't >> make sense to get a recovery notification for something you never knew >> was a problem. >> >> And: >> http://nagios.sourceforge.net/docs/3_0/escalations.html >> If, after three problem notifications, a recovery notification is sent >> out for the service, who gets notified? The recovery is actually the >> fourth notification that gets sent out. However, the escalation code >> is smart enough to realize that only those people who were notified >> about the problem on the third notification should be notified about >> the recovery. In this case, the nt-admins and managers contact >> groups would be notified of the recovery. >> >> (Although, I believe I've either misunderstood the implications of >> that statement, or run into misbehaviours in that area myself...) >> >> >> ------------------------------------------------------------------------------ >> This SF.net email is sponsored by Windows: >> >> Build for Windows Store. >> >> http://p.sf.net/sfu/windows-dev2dev >> _______________________________________________ >> Nagios-users mailing list >> Nagios-users at lists.sourceforge.net >> https://lists.sourceforge.net/lists/listinfo/nagios-users >> ::: Please include Nagios version, plugin version (-v) and OS when >> reporting any issue. >> ::: Messages without supporting info will risk being sent to /dev/null >> > > > > -- > Regards > Sunil Sankar > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev_______________________________________________ > Nagios-users mailing list > Nagios-users at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null -- "The very existence of flamethrowers proves that sometime, somewhere, someone said to themselves, 'You know, I want to set those people over there on fire, but I'm just not close enough to get the job done.'" -- George Carlin ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From benny at bennyvision.com Tue Jun 11 20:24:23 2013 From: benny at bennyvision.com (C. Bensend) Date: Tue, 11 Jun 2013 13:24:23 -0500 Subject: Misplaced advice in the Nagios preflight check? In-Reply-To: <20130611162227.GA15937@norchemlab.com> References: <91ad2214611dd8bf27bc227d3087cc0e.squirrel@webmail.stinkweasel.net> <4908c92d03c2a1f82f963dc0f217016a.squirrel@webmail.stinkweasel.net> <20130611162227.GA15937@norchemlab.com> Message-ID: > see the original language here: > > http://nagios.sourceforge.net/docs/3_0/notifications.html > Note: Notifications about host or service recoveries are only sent out > if a notification was sent out for the original problem. It doesn't > make sense to get a recovery notification for something you never knew > was a problem. > > And: > http://nagios.sourceforge.net/docs/3_0/escalations.html > If, after three problem notifications, a recovery notification is sent > out for the service, who gets notified? The recovery is actually the > fourth notification that gets sent out. However, the escalation code > is smart enough to realize that only those people who were notified > about the problem on the third notification should be notified about > the recovery. In this case, the nt-admins and managers contact > groups would be notified of the recovery. > > (Although, I believe I've either misunderstood the implications of > that statement, or run into misbehaviours in that area myself...) Ah. Well, yes. :) I believe those statements are referring to the filters that Nagios uses to determine whether or not to send a notification at all. *That's* not an issue here, the notification goes out, just like it should. *My* question is why the sanity check thinks that configuration doesn't make sense. I think the answer is probably something to the effect of: "I don't know why anyone would want that, so warn about it." I don't want to put words in the mouth of any of the developers that may have touched it, though, so I'm just guessing. I just want to make sure this is a case of Nagios maybe not giving the right advice in its sanity check, and *not* that there's something behind the scenes that I'm not aware of that might actually cause a problem. If it's the former, maybe we can get it adjusted for the next release. If it's the latter, I hope someone will step forth with the ClueBat 5000(tm) and give me a good thump. :) Thanks, everyone! Benny -- "The very existence of flamethrowers proves that sometime, somewhere, someone said to themselves, 'You know, I want to set those people over there on fire, but I'm just not close enough to get the job done.'" -- George Carlin ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From mark.frost1 at pepsico.com Tue Jun 11 21:08:07 2013 From: mark.frost1 at pepsico.com (Frost, Mark {BIS}) Date: Tue, 11 Jun 2013 19:08:07 +0000 Subject: Monitor windows mapped drive In-Reply-To: References: Message-ID: <3FC36F2C7B96D444A1138066E952D2B604B46E@PEPWMC00171.corp.pep.pvt> Sunil, I've only ever found two ways to do this. 1) We were using NSClient++ in the same manner you were (running it with a domain user id that had permissions to access that network drive). I think we still though did not attempt to access it as the drive letter. I'm pretty sure we used the UNC path to the disk. Unless I'm mistaken, the driver letter is only mapped after a user logs in and the service that NSClient++ does not replicate a login shell to trigger the drive to map to a drive letter. 2) We are using check_disk_smb from the Nagios plugins package to mount the UNC path to the disk locally on our Linux machine and check available space. Mark From: Sunil Sankar [mailto:sunil at sunil.cc] Sent: Tuesday, June 11, 2013 9:11 AM To: Nagios Users List Subject: Re: [Nagios-users] Monitor windows mapped drive Guys, Any update on this Regards Sunil On Sat, Jun 8, 2013 at 8:26 PM, Sunil Sankar > wrote: Guys, Has anyone monitored mapped drive in windows using nsclient .I am not able to do it need you help I have mapped Z drive in windows and also I have started nsclient with same user as the mapped drive When I execute the check I am getting the following output [root at nagios4 ~]# /opt/nagios/libexec/check_nrpe -H 192.168.204.130 -c CheckDriveSize -a ShowAll MaxWarn=80% MaxCrit=90% FilterType=REMOTE OK: All drives within bounds. [root at nagios4 ~]# [root at nagios4 ~]# /opt/nagios/libexec/check_nt -H 192.168.204.130 -p 12489 -v CLIENTVERSION NSClient++ 0,4,1,101 2013-05-18 [root at nagios4 ~]# /opt/nagios/libexec/check_nrpe -H 192.168.204.130 -p 5666 I (0,4,1,101 2013-05-18) seem to be doing fine... [root at nagios4 ~]# [root at nagios4 ~]# /opt/nagios/libexec/check_nrpe -H 192.168.204.130 -c CheckDriveSize -a ShowAll MaxWarn=80% MaxCrit=90% OK: C:\: 10.1G|'C:\ %'=51%;80;90 'C:\'=10.056G;15.725;17.691;0;19.656 [root at nagios4 ~]# -- Regards Sunil Sankar -- Regards Sunil Sankar -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From gantz.alex at gmail.com Tue Jun 11 23:27:27 2013 From: gantz.alex at gmail.com (Gantz Alex) Date: Wed, 12 Jun 2013 01:27:27 +0400 Subject: Return code of 127 is out of bounds - plugin may be missing In-Reply-To: References: Message-ID: Hello! Try to use this command_line: command_line $USER1$/check_by_ssh -H $HOSTNAME -l nagios -i /home/nagios/.ssh/id_rsa -E -o StrictHostKeyChecking=no -C /home/nagios/checkRAID.sh**** where $USER1$ variable is set in resource.cfg file 2013/6/11 MAHONEY, DANIEL > Send to nagios-users at lists.sourceforge.net**** > > ** ** > > Greetings, all. I?ve googled the subject above and evaluated the answers > I?ve found but haven?t yet found info that pinpoints my issue.**** > > ** ** > > I?m running Nagios Core 3.2.1 on RedHat 5.8. This installation has been > running for a few years, I just inherited it?s care and maintenance > recently. On one of my monitored servers I write a script ?checkRAID.sh? > that calls another piece of code, looks at the results, and returns either > a 0 or a 2 (the result will always be either good or critical, depending on > whether the RAID controller is unhappy).**** > > ** ** > > Nagios runs as user ?nagios?. The remote machine is configured to allow > user ?nagios? to log in without a password, using a key pair. This works.* > *** > > ** ** > > In /usr/local/nagios/etc/checkcommands.cfg I have :**** > > define command{**** > > command_name check_raid**** > > command_line /usr/local/nagios/libexec/check_by_ssh -H $HOSTNAME -l > nagios -i /home/nagios/.ssh/id_rsa -E -o StrictHostKeyChecking=no -C > /home/nagios/checkRAID.sh**** > > }**** > > ** ** > > When I become nagios (?su ? nagios?) and run that script, I get:**** > > [nagios at nagios ~]$ /usr/local/nagios/libexec/check_by_ssh -H server IP> -l nagios -i /home/nagios/.ssh/id_rsa -E -o > StrictHostKeyChecking=no -C /home/nagios/checkRAID.sh**** > > Check failed**** > > [nagios at nagios ~]$ echo $?**** > > 2**** > > [nagios at nagios ~]$**** > > ** ** > > That ?Check failed? line is what?s written to stdout just before returning > an exit code of 2. This shows me that the remote script is working fine, > and that the local nagios user is able to execute it with no problems. > However, once I add an entry to services.cfg to tie this service check to > my remote host and give it time to run the command, when I look at nagios? > ?Services? page it shows :**** > > **** > > check_raid CRITICAL 06-10-2013 21:17:25 0d 6h > 14m 29s 3/3 (Return code of 127 is out of bounds - plugin may be > missing)**** > > ** ** > > This has me baffled. The return code is quite clearly 2. ** ** > > ** ** > > I recently set debug_level to -1 and restarted. I?m hoping that the debug > log will **** > > ** ** > > Daniel Mahoney**** > > Dm572j at att.com**** > > ** ** > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > Nagios-users mailing list > Nagios-users at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From ae at op5.se Wed Jun 12 16:43:31 2013 From: ae at op5.se (Andreas Ericsson) Date: Wed, 12 Jun 2013 16:43:31 +0200 Subject: Nagios navbar search enhancement (Was: Questions about Nagios quick search) In-Reply-To: References: <512FD0E3.8080307@freesources.org> <51306147.1080509@op5.se> <51ACFBF8.8080906@op5.se> Message-ID: <51B88913.2040403@op5.se> On 2013-06-04 15:43, Jonas Meurer wrote: > Hey, > > I just prepared a patch against git master (commit 758a64). I hope that > it helps. Don't hesitate to ask if you've any questions. > > Also feel free to rename the config options if you don't like the names. > > The patch is attached. > Applied. Thanks. I made the new options default to ON in the code and renamed them "navbar_search_addresses" and "navbar_search_aliases" instead od the "navbar_search_for..." since we're actually searching for hosts by searching the hosts' configured properties. -- Andreas Ericsson andreas.ericsson at op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev From ae at op5.se Wed Jun 12 18:50:55 2013 From: ae at op5.se (Andreas Ericsson) Date: Wed, 12 Jun 2013 18:50:55 +0200 Subject: nagios backdoor In-Reply-To: References: <51B0D8FE.80900@consol.de> <134E28DD-EF6C-4EC4-AD65-75F5522C342C@ultra-secure.de> Message-ID: <51B8A6EF.6020905@op5.se> On 06/06/2013 10:46 PM, William Leibzon wrote: > Sounds like they got through some sort of security hole in apache and > accessed database on the server, probably as apache/www user and not > root. Unsure from the information given if this apache backdoor would > have had anything to do with nagios cgi or not. > > BTW the description of how it happened is rather interesting. I > remember 6 or 7 years ago when I was still following security more > closely people have been talking about possibility of this (hacking > with only in-memory application replacement) on certain forum that > shall remain unnamed. I have never seen or heard of this being done at > any company I consult for though. > It's not particularly difficult. All exploits work by modifying executable code in memory to make a program do what they want. If one can get root access that way, it's possible to freeze a process and replace it entirely. -- Andreas Ericsson andreas.ericsson at op5.se OP5 AB www.op5.se Tel: +46 8-230225 Fax: +46 8-230231 Considering the successes of the wars on alcohol, poverty, drugs and terror, I think we should give some serious thought to declaring war on peace. ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From er.abhinav.upadhyay at gmail.com Mon Jun 10 11:02:40 2013 From: er.abhinav.upadhyay at gmail.com (Abhinav Upadhyay) Date: Mon, 10 Jun 2013 14:32:40 +0530 Subject: Nagios init script not working on Ubuntu 12.04 Message-ID: Hi, I just followed the instructions on http://nagios.sourceforge.net/docs/3_0/quickstart-ubuntu.html to install the latest stable release of Nagios (3.5) on a fresh Ubuntu 12.04 machine. Everything went fine, but when I try to start nagios using /etc/init.d/nagios start, I get following error: /etc/init.d/nagios: 20: .: Can't open /etc/rc.d/init.d/functions There is no file at /etc/rc.d/init.d/functions. It seems like the Makefile could not put the functions file at /etc/rc.d/init.d ? Is it a bug or I missed something? Regards Abhinav ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From allan.thorne at monash.edu Wed Jun 12 06:08:12 2013 From: allan.thorne at monash.edu (Mr Allan Thorne) Date: Wed, 12 Jun 2013 14:08:12 +1000 Subject: check_by_ssh to Openvms system Message-ID: <51B7F42C.8050500@its.monash.edu.au> I am trying to use either command files or perl or c to return a status and value to check_by_ssh. When I test the command interactively using: /usr/lib64/nagios/plugins/check_by_ssh -H -s c1 -C "" -i -l -S 1 -E I get WARNING - check_by_ssh: Remote command returned status 1 Any help would be appreciated. Allan Thorne eSolutions Monash University ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From vilas at altechtechnologies.com Tue Jun 11 14:01:57 2013 From: vilas at altechtechnologies.com (Vilas Deshmukh) Date: Tue, 11 Jun 2013 17:31:57 +0530 Subject: How to use Check_cluster Message-ID: Dear Team, I have searching about check_cluster command from last couple of days. I founds lots of sites and information. However, I could not understand how to use it? because all time its output's depend on my input. If I gave input "OK" output is "OK", if I gave input "Critical" output also "Critical". My "clustat" command output as follows: [root at centos libexec]# clustat Cluster Status for hacluster @ Tue Jun 11 17:20:13 2013 Member Status: Quorate Member Name ID Status ------ ---- ---- ------ node1.altech.local 1 Online, Local, rgmanager node2.altech.local 2 Online, rgmanager Service Name Owner (Last) State ------- ---- ----- ------ ----- service:Apache1 node2.altech.local started service:IP node1.altech.local started [root at centos libexec]# Now my check_cluster output as follows: [root at centos libexec]# ./check_cluster -s -d 0,0 -w @1: -c @2: CLUSTER OK: Service cluster: 2 ok, 0 warning, 0 unknown, 0 critical [root at centos libexec]# ./check_cluster -s -d 1,1 -w @1: -c @2: CLUSTER CRITICAL: Service cluster: 0 ok, 2 warning, 0 unknown, 0 critical [root at centos libexec]# ./check_cluster -s -d 0,1 -w @1: -c @2: CLUSTER WARNING: Service cluster: 1 ok, 1 warning, 0 unknown, 0 critical [root at centos libexec]# ./check_cluster -s -w @1: -c @2: -d $SERVICESTATEDID:node1.altech.local:IP,$SERVICESTATEDID:node2.altech.local:IP CLUSTER OK: Service cluster: 2 ok, 0 warning, 0 unknown, 0 critical How can I find error in cluster? Please help me. I have confuse in this command. Thanks, Best Regards, Vilas Deshmukh -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From temail8 at tg.com Mon Jun 10 09:56:09 2013 From: temail8 at tg.com (temail8 at tg.com) Date: Mon, 10 Jun 2013 15:56:09 +0800 Subject: Dynamic Balancing Machine Message-ID: An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From Nikolas at Shlomo.co.il Tue Jun 11 14:49:33 2013 From: Nikolas at Shlomo.co.il (Nikolas Twito) Date: Tue, 11 Jun 2013 12:49:33 +0000 Subject: check_nt Message-ID: Hello there I use software for evaluation XI Ngios I'm trying for several days to solve a problem - and I'd be happy if you have the solution. I'm trying to get information about the state of reading and writing disks windows server 2008 This is one of the commands I run: check_xi_service_nsclient!Mypassword!COUNTER!-l "\\PhysicalDisk(_Total)\\Avg.Disk sec/Write","Avg.Disk sec/Write is %f" -w 1 -c 2!!!!! But keep getting all the time the value: Avg.Disk sec/Write is 0.000000 Greetings Nikolas ************************************************************************************ This footnote confirms that this email message has been scanned by PineApp Mail-SeCure for the presence of malicious code, vandals & computer viruses. ************************************************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Image_01.gif Type: image/gif Size: 66007 bytes Desc: not available URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From dwittenberg2008 at gmail.com Wed Jun 12 21:29:56 2013 From: dwittenberg2008 at gmail.com (Daniel Wittenberg) Date: Wed, 12 Jun 2013 14:29:56 -0500 Subject: Nagios init script not working on Ubuntu 12.04 In-Reply-To: References: Message-ID: Functions is a file on rhel-based systems and provides common start/stop routines. I dont use Ubuntu myself but if you figure out how to make it work let me know. If I get some time I'll see if I still have an ubuntu test VM to look at. Dan On Jun 12, 2013 2:09 PM, "Abhinav Upadhyay" wrote: > Hi, > > I just followed the instructions on > http://nagios.sourceforge.net/docs/3_0/quickstart-ubuntu.html to > install the latest stable release of Nagios (3.5) on a fresh Ubuntu > 12.04 machine. Everything went fine, but when I try to start nagios > using /etc/init.d/nagios start, I get following error: > > /etc/init.d/nagios: 20: .: Can't open /etc/rc.d/init.d/functions > > There is no file at /etc/rc.d/init.d/functions. It seems like the > Makefile could not put the functions file at /etc/rc.d/init.d ? Is it > a bug or I missed something? > > Regards > Abhinav > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > Nagios-users mailing list > Nagios-users at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From stanislas.leveau at ac-caen.fr Wed Jun 12 21:27:52 2013 From: stanislas.leveau at ac-caen.fr (Leveau Stanislas) Date: Wed, 12 Jun 2013 21:27:52 +0200 Subject: check_nt In-Reply-To: References: Message-ID: Hi Nikolas what is your command in commands.cfg When you execute this command in command line, the result is the same? regards Stan Le 12/06/13, Nikolas Twito a ?crit : > > > > > > > > > > > > > > > > > Hello there > > I use software for evaluation XI Ngios > > I'm trying for several days to solve a problem - and I'd be happy if you have the solution. > > I'm trying to get information about the state of reading and writing disks windows server 2008 > > This is one of the commands I run: check_xi_service_nsclient!Mypassword!COUNTER!-l "\\PhysicalDisk(_Total)\\Avg.Disk sec/Write","Avg.Disk sec/Write is %f" -w 1 -c 2!!!!! > > > > But keep getting all the time the value: Avg.Disk sec/Write is 0.000000 > > > > > > Greetings > > Nikolas > > > > > > > > > > > ************************************************************************************ > This footnote confirms that this email message has been scanned by > PineApp Mail-SeCure for the presence of malicious code, vandals & computer viruses. > ************************************************************************************ > > > > > > > (http://robustus.impactia.com/AnalyticServer/redirect?cid=908b79e7232e4a08&mid=26487253&eurl=aHR0cDovL3d3dy5zaGxvbW8uY28uaWwvbWluaXNpdGUvY2FycmVudGFsL3RyYWNrL21pbmlfc3VubnkuYXNwP3JlZj0xODU=&istmp=enrich at Shlomo.co.il" target="1"> > > > > > > Intelligent Mail ? Powered by?do not enrich(http://robustus.impactia.com/AnalyticServer/redirect?cid=908b79e7232e4a08&mid=26487253&eurl=aHR0cDovL3d3dy5pbXBhY3RpYS5jb20/Y2xrU291cmNlPXBvd2Vy&istmp=enrich at Shlomo.co.il" id="footer_url1" target="1">IMP at CTIA. > Please > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From Gavin.Grieve at datacom.co.nz Wed Jun 12 22:58:06 2013 From: Gavin.Grieve at datacom.co.nz (Gavin Grieve [DATACOM]) Date: Thu, 13 Jun 2013 08:58:06 +1200 Subject: Nagios init script not working on Ubuntu 12.04 In-Reply-To: References: Message-ID: <9BAC443567932C429837071BCD1C709C07CC2D002B@dnzwgex2.datacom.co.nz> You could try replacing the line that mentions functions with: . /lib/lsb/init-functions Note the full stop at the start is required. I believe this is the Ubuntu equivalent. -- Gavin Grieve Systems Management Specialist | Datacom | Datacom House, 68 Jervois Quay, Wellington, 6011, New Zealand www.datacom.co.nz | PO Box 6376, Marion Square, Wellington, New Zealand 6141 From: Daniel Wittenberg [mailto:dwittenberg2008 at gmail.com] Sent: Thursday, 13 June 2013 7:30 a.m. To: Nagios Users List Subject: Re: [Nagios-users] Nagios init script not working on Ubuntu 12.04 Functions is a file on rhel-based systems and provides common start/stop routines. I dont use Ubuntu myself but if you figure out how to make it work let me know. If I get some time I'll see if I still have an ubuntu test VM to look at. Dan On Jun 12, 2013 2:09 PM, "Abhinav Upadhyay" > wrote: Hi, I just followed the instructions on http://nagios.sourceforge.net/docs/3_0/quickstart-ubuntu.html to install the latest stable release of Nagios (3.5) on a fresh Ubuntu 12.04 machine. Everything went fine, but when I try to start nagios using /etc/init.d/nagios start, I get following error: /etc/init.d/nagios: 20: .: Can't open /etc/rc.d/init.d/functions There is no file at /etc/rc.d/init.d/functions. It seems like the Makefile could not put the functions file at /etc/rc.d/init.d ? Is it a bug or I missed something? Regards Abhinav ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From divisback at gmail.com Thu Jun 13 08:31:06 2013 From: divisback at gmail.com (Divya Raj) Date: Thu, 13 Jun 2013 14:31:06 +0800 Subject: Discussion: Nagios Message-ID: *Hi Nagios-Users,* * * *Currently, I am working with Nagios where I have integrated it with a database platform (remote machines) to listen to the alerts and display them in the Nagios Web Interface. * *Nagios here runs on RHEL. The remote mahine sends SNMP trap messages (its an external device and not a box so no NRPE/SSH). I've setup SNMPTRAPD in the machine which captures the snmp messages from the box and calls Nagios command to route them to Nagios. **For this also, I have defined a trap service to manage the incoming traps from the remote machine. * * * *But, the problem is that only the topmost alert is displayed in the Nagios (in the log as well as in the Nagios Web UI). Is that like till the first one gets cleared the other alerts for the same service don't show up? The thing is that I need all the alerts sent from the remote machine to be sent under one service/host to Nagios.* * * *Any pointers regarding this will be much appreciated.* *Thank You.* * * *Regards,* *Divya.* -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From manikumar85 at gmail.com Thu Jun 13 11:09:12 2013 From: manikumar85 at gmail.com (Manish Kumar) Date: Thu, 13 Jun 2013 09:09:12 +0000 Subject: Discussion: Nagios In-Reply-To: References: Message-ID: Hi, In past I have configured snmp traps from network devices to display in the nagios UI. I have defined only one trap service under a network device which captures all the traps sent for this device in the service, so it will always show you the latest submitted trap/message and send out an alert based on if it's a Warning/Critical trap as may be defined by you in the snmptt config file or the integration script you used. Since any critical/warning alert logs a ticket on a ticketing system integrated with nagios, we are not so concerned to see all the alerts displayed in one service. In your case if you always want to display all the incoming traps to be displayed permanently you may need to define multiple trap service under that host and in the integration script you have to map different traps to the different services which you defined. But even in that case you might have defined and mapped a cpu trap service and a fan problem trap service under a host, so the cpu and fan trap will not display in the same service but guess if the new fan trap comes it will again override the old trap and show you in the nagios UI. But in this situation there also a chance to miss a trap or unknow trap which you may not have mapped. The other way which we are using is I defined a single trap service under a host and I used to reset it to OK after few seconds or minutes of the trap submission so by default it's always OK and once a trap comes it will display it, fire an alert and again rest to OK after few seconds. Hope it helps you some what :) On Thu, Jun 13, 2013 at 6:31 AM, Divya Raj wrote: > *Hi Nagios-Users,* > * > * > *Currently, I am working with Nagios where I have integrated it with a > database platform (remote machines) to listen to the alerts and display > them in the Nagios Web Interface. * > > *Nagios here runs on RHEL. The remote mahine sends SNMP trap messages > (its an external device and not a box so no NRPE/SSH). I've setup SNMPTRAPD > in the machine which captures the snmp messages from the box and calls > Nagios command to route them to Nagios. **For this also, I have defined a > trap service to manage the incoming traps from the remote machine. * > * > * > *But, the problem is that only the topmost alert is displayed in the > Nagios (in the log as well as in the Nagios Web UI). Is that like till the > first one gets cleared the other alerts for the same service don't show up? > The thing is that I need all the alerts sent from the remote machine to be > sent under one service/host to Nagios.* > * > * > *Any pointers regarding this will be much appreciated.* > *Thank You.* > * > * > *Regards,* > *Divya.* > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > Nagios-users mailing list > Nagios-users at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null > -- Thanks Manish Kumar www.manishkr.com http://in.linkedin.com/in/manishkumar85 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From palli at ok.is Thu Jun 13 12:26:56 2013 From: palli at ok.is (=?utf-8?Q?P=C3=A1ll_Gu=C3=B0j=C3=B3n_Sigur=C3=B0sson?=) Date: Thu, 13 Jun 2013 10:26:56 -0000 (GMT) Subject: Nagios init script not working on Ubuntu 12.04 In-Reply-To: <9BAC443567932C429837071BCD1C709C07CC2D002B@dnzwgex2.datacom.co.nz> References: <9BAC443567932C429837071BCD1C709C07CC2D002B@dnzwgex2.datacom.co.nz> Message-ID: Alternatively, do this to install nagios on ubuntu 12.04: # apt-get install nagios3 nagios-plugins The packages are quite good, and imho there is nothing quick about that quickstart guide. Kind regards, Pall Sigurdsson ----- Original Message ----- From: "Gavin Grieve [DATACOM]" To: "Nagios Users List" Sent: Wednesday, June 12, 2013 8:58:06 PM Subject: Re: [Nagios-users] Nagios init script not working on Ubuntu 12.04 You could try replacing the line that mentions functions with: . /lib/lsb/init-functions Note the full stop at the start is required. I believe this is the Ubuntu equivalent. -- Gavin Grieve Systems Management Specialist | Datacom | Datacom House, 68 Jervois Quay, Wellington, 6011, New Zealand www.datacom.co.nz | PO Box 6376, Marion Square, Wellington, New Zealand 6141 From: Daniel Wittenberg [mailto:dwittenberg2008 at gmail.com] Sent: Thursday, 13 June 2013 7:30 a.m. To: Nagios Users List Subject: Re: [Nagios-users] Nagios init script not working on Ubuntu 12.04 Functions is a file on rhel-based systems and provides common start/stop routines. I dont use Ubuntu myself but if you figure out how to make it work let me know. If I get some time I'll see if I still have an ubuntu test VM to look at. Dan On Jun 12, 2013 2:09 PM, "Abhinav Upadhyay" < er.abhinav.upadhyay at gmail.com > wrote: Hi, I just followed the instructions on http://nagios.sourceforge.net/docs/3_0/quickstart-ubuntu.html to install the latest stable release of Nagios (3.5) on a fresh Ubuntu 12.04 machine. Everything went fine, but when I try to start nagios using /etc/init.d/nagios start, I get following error: /etc/init.d/nagios: 20: .: Can't open /etc/rc.d/init.d/functions There is no file at /etc/rc.d/init.d/functions. It seems like the Makefile could not put the functions file at /etc/rc.d/init.d ? Is it a bug or I missed something? Regards Abhinav ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From bonnierush6 at gmail.com Thu Jun 13 15:08:14 2013 From: bonnierush6 at gmail.com (Bonnie Rush) Date: Thu, 13 Jun 2013 08:08:14 -0500 Subject: How do I check ALL mount points using check_disk? Message-ID: With check_disk normally you can specify something like "-p /tmp -p /var" which will check /tmp and /var. But if I want to check all of the mount points, should I not specify a partition (-p) variable at all? -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From work at paul.dubuc.org Thu Jun 13 16:24:23 2013 From: work at paul.dubuc.org (Paul Dubuc) Date: Thu, 13 Jun 2013 10:24:23 -0400 Subject: Nagios 3.5.0 segfaulting at midnight In-Reply-To: <51A31E67.3020708@consol.de> References: <51A31E67.3020708@consol.de> Message-ID: <51B9D617.4050405@paul.dubuc.org> Sven Nierlein wrote: > On 27.05.2013 09:50, Fournier, Wim wrote: >> Hi List, >> >> I've got 5 nagios installs, all on 3.5.0 and 3 they seem to segfault exactly >> at midnight. It's not all of them, but the busiest ones and not always. >> Has anyone else seen this? >> >> @ DEV what info would like if I file this as a bug? > > Hi Wim, > > Afaik there is a bug already. This is a known issue in combination with the > livestatus neb module. > You could wait for the next release or use the attached patch. > > Sven Anyone know if/when there will be a 3.5.1 release? Seems like there have been several good fixes made since 3.5.0 (including this one). ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From sunil at sunil.cc Thu Jun 13 16:28:20 2013 From: sunil at sunil.cc (Sunil Sankar) Date: Thu, 13 Jun 2013 19:58:20 +0530 Subject: How do I check ALL mount points using check_disk? In-Reply-To: References: Message-ID: You can configure it nrpe.conf command[check_disk_all]=/usr/libexec/nagios/plugins/check_disk -X nfs -X nfs3 -X nfs4 -X cifs -X none -X tmpfs -w $ARG1$ -c $ARG2$ Here we are not monitoring the nfs or cifs or tmpfs other than that we are will get results for everything On Thu, Jun 13, 2013 at 6:38 PM, Bonnie Rush wrote: > With check_disk normally you can specify something like "-p /tmp -p /var" > which will check /tmp and /var. But if I want to check all of the mount > points, should I not specify a partition (-p) variable at all? > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > Nagios-users mailing list > Nagios-users at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null > -- Regards Sunil Sankar -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From awiddersheim at hotmail.com Thu Jun 13 16:55:53 2013 From: awiddersheim at hotmail.com (Andrew Widdersheim) Date: Thu, 13 Jun 2013 10:55:53 -0400 Subject: Issues with NEB modules breaking after restart Message-ID: I recently just upgraded to the latest 3.5.0 release of nagios-core and just added livestatus into my environment. We are trying to replace NDO but currently have the two running at the same time along with NCPD for perfdata which as far as I know there shouldn't be an issues . The first issue I had was where Nagios would segfault every night during it's routine log rotation so I applied the?0007-fix_downtime_struct.dif patch which seems to have fixed that issue. I experienced a new issue this morning where when restarting Nagios none of the NEB modules uninitialized properly. Nagios was able to start and initialized all of the NEB modules but a few seconds later Nagios uninitialized them again. This isn't like anything I've seen before and none of the NEB modules worked after this occurred. Here is what the logs looked like. [Thu Jun 13 09:30:29 2013] Caught SIGTERM, shutting down... [Thu Jun 13 09:30:30 2013] Successfully shutdown... (PID=14098) [Thu Jun 13 09:30:31 2013] livestatus: Socket thread has terminated [Thu Jun 13 09:30:41 2013] Nagios 3.5.0 starting... (PID=481) [Thu Jun 13 09:30:41 2013] Local time is Thu Jun 13 09:30:41 EDT 2013 [Thu Jun 13 09:30:41 2013] LOG VERSION: 2.0 [Thu Jun 13 09:30:41 2013] livestatus: Livestatus 1.2.2p2 by Mathias Kettner. Socket: '/usr/local/nagios/var/rw/livestatus.sock' [Thu Jun 13 09:30:41 2013] livestatus: Please visit us at http://mathias-kettner.de/ [Thu Jun 13 09:30:41 2013] livestatus: Hint: please try out OMD - the Open Monitoring Distribution [Thu Jun 13 09:30:41 2013] livestatus: Please visit OMD at http://omdistro.org [Thu Jun 13 09:30:41 2013] livestatus: Removed old left over socket file /usr/local/nagios/var/rw/livestatus.sock [Thu Jun 13 09:30:41 2013] livestatus: archive path /drbd/r1/nagios/archives [Thu Jun 13 09:30:41 2013] livestatus: Finished initialization. Further log messages go to /drbd/r1/nagios/livestatus.log [Thu Jun 13 09:30:41 2013] Event broker module '/usr/local/mk-livestatus/livestatus.o' initialized successfully. [Thu Jun 13 09:30:41 2013] npcdmod: Copyright (c) 2008-2009 Hendrik Baecker (andurin at process-zero.de) - http://www.pnp4nagios.org [Thu Jun 13 09:30:41 2013] npcdmod: /usr/local/pnp4nagios/etc/npcd.cfg initialized [Thu Jun 13 09:30:41 2013] npcdmod: spool_dir = '/dev/shm/pnp4nagios/var/spool/'. [Thu Jun 13 09:30:41 2013] npcdmod: perfdata file '/dev/shm/pnp4nagios/var/perfdata.dump'. [Thu Jun 13 09:30:41 2013] npcdmod: Ready to run to have some fun! [Thu Jun 13 09:30:41 2013] livestatus: Timeperiod cache not updated, there are no timeperiods (yet) [Thu Jun 13 09:30:41 2013] Event broker module '/usr/local/pnp4nagios/lib64/npcdmod.o' initialized successfully. [Thu Jun 13 09:30:41 2013] ndomod: NDOMOD 1.5.2 (06-08-2012) Copyright (c) 2009 Nagios Core Development Team and Community Contributors [Thu Jun 13 09:30:41 2013] ndomod: Successfully connected to data sink. ?0 queued items to flush. [Thu Jun 13 09:30:41 2013] Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully. [Thu Jun 13 09:30:43 2013] Finished daemonizing... (New PID=482) [Thu Jun 13 09:30:44 2013] TIMEPERIOD TRANSITION: 24x7;-1;1 [Thu Jun 13 09:30:47 2013] Event broker module '/usr/local/mk-livestatus/livestatus.o' deinitialized successfully. [Thu Jun 13 09:30:47 2013] npcdmod: If you don't like me, I will go out! Bye. [Thu Jun 13 09:30:47 2013] Event broker module '/usr/local/pnp4nagios/lib64/npcdmod.o' deinitialized successfully. [Thu Jun 13 09:30:47 2013] ndomod: Shutdown complete. [Thu Jun 13 09:30:47 2013] Event broker module '/usr/local/nagios/bin/ndomod.o' deinitialized successfully. Here is the next restart after this where things happened as I would expect: [Thu Jun 13 09:52:25 2013] Successfully shutdown... (PID=482) [Thu Jun 13 09:52:26 2013] livestatus: Socket thread has terminated [Thu Jun 13 09:52:26 2013] Event broker module '/usr/local/mk-livestatus/livestatus.o' deinitialized successfully. [Thu Jun 13 09:52:26 2013] npcdmod: If you don't like me, I will go out! Bye. [Thu Jun 13 09:52:26 2013] Event broker module '/usr/local/pnp4nagios/lib64/npcdmod.o' deinitialized successfully. [Thu Jun 13 09:52:26 2013] ndomod: Shutdown complete. [Thu Jun 13 09:52:26 2013] Event broker module '/usr/local/nagios/bin/ndomod.o' deinitialized successfully. [Thu Jun 13 09:52:29 2013] Nagios 3.5.0 starting... (PID=20081) [Thu Jun 13 09:52:29 2013] Local time is Thu Jun 13 09:52:29 EDT 2013 [Thu Jun 13 09:52:29 2013] LOG VERSION: 2.0 [Thu Jun 13 09:52:29 2013] livestatus: Livestatus 1.2.2p2 by Mathias Kettner. Socket: '/usr/local/nagios/var/rw/livestatus.sock' [Thu Jun 13 09:52:29 2013] livestatus: Please visit us at http://mathias-kettner.de/ [Thu Jun 13 09:52:29 2013] livestatus: Hint: please try out OMD - the Open Monitoring Distribution [Thu Jun 13 09:52:29 2013] livestatus: Please visit OMD at http://omdistro.org [Thu Jun 13 09:52:29 2013] livestatus: archive path /drbd/r1/nagios/archives [Thu Jun 13 09:52:29 2013] livestatus: Finished initialization. Further log messages go to /drbd/r1/nagios/livestatus.log [Thu Jun 13 09:52:29 2013] Event broker module '/usr/local/mk-livestatus/livestatus.o' initialized successfully. [Thu Jun 13 09:52:29 2013] npcdmod: Copyright (c) 2008-2009 Hendrik Baecker (andurin at process-zero.de) - http://www.pnp4nagios.org [Thu Jun 13 09:52:29 2013] npcdmod: /usr/local/pnp4nagios/etc/npcd.cfg initialized [Thu Jun 13 09:52:29 2013] npcdmod: spool_dir = '/dev/shm/pnp4nagios/var/spool/'. [Thu Jun 13 09:52:29 2013] npcdmod: perfdata file '/dev/shm/pnp4nagios/var/perfdata.dump'. [Thu Jun 13 09:52:29 2013] npcdmod: Ready to run to have some fun! [Thu Jun 13 09:52:29 2013] livestatus: Timeperiod cache not updated, there are no timeperiods (yet) [Thu Jun 13 09:52:29 2013] Event broker module '/usr/local/pnp4nagios/lib64/npcdmod.o' initialized successfully. [Thu Jun 13 09:52:29 2013] ndomod: NDOMOD 1.5.2 (06-08-2012) Copyright (c) 2009 Nagios Core Development Team and Community Contributors [Thu Jun 13 09:52:29 2013] ndomod: Successfully connected to data sink. ?0 queued items to flush. [Thu Jun 13 09:52:29 2013] Event broker module '/usr/local/nagios/bin/ndomod.o' initialized successfully. [Thu Jun 13 09:52:30 2013] Finished daemonizing... (New PID=20136) You'll notice in the first snippet of the logs things clearly did not clean up because when livestatus started it had: Removed old left over socket file /usr/local/nagios/var/rw/livestatus.sock Anyone experience the same thing? Any idea what I can do to fix? ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From Gavin.Grieve at datacom.co.nz Thu Jun 13 23:19:57 2013 From: Gavin.Grieve at datacom.co.nz (Gavin Grieve [DATACOM]) Date: Fri, 14 Jun 2013 09:19:57 +1200 Subject: Discussion: Nagios In-Reply-To: References: Message-ID: <9BAC443567932C429837071BCD1C709C07CC3C8E58@dnzwgex2.datacom.co.nz> We've found that for SNMP trap services, we set the following: is_volatile 1 # notify on every failure message, not just when going from OK to failure state max_check_attempts 1 # notify on the first failure every time stalking_options o,w,c # log *all* OK, Warning and Critical messages to the Nagios log even if the state hasn't changed None of these will show all the traps in the host/service display however it will log all the states you have defined to the log file every time the host/service check output is received. Just a warning though - too many hosts/services being stalked can make the log file very big. -- Gavin Grieve Systems Management Specialist | Datacom | Datacom House, 68 Jervois Quay, Wellington, 6011, New Zealand www.datacom.co.nz | PO Box 6376, Marion Square, Wellington, New Zealand 6141 From: Manish Kumar [mailto:manikumar85 at gmail.com] Sent: Thursday, 13 June 2013 9:09 p.m. To: Nagios Users List Subject: Re: [Nagios-users] Discussion: Nagios Hi, In past I have configured snmp traps from network devices to display in the nagios UI. I have defined only one trap service under a network device which captures all the traps sent for this device in the service, so it will always show you the latest submitted trap/message and send out an alert based on if it's a Warning/Critical trap as may be defined by you in the snmptt config file or the integration script you used. Since any critical/warning alert logs a ticket on a ticketing system integrated with nagios, we are not so concerned to see all the alerts displayed in one service. In your case if you always want to display all the incoming traps to be displayed permanently you may need to define multiple trap service under that host and in the integration script you have to map different traps to the different services which you defined. But even in that case you might have defined and mapped a cpu trap service and a fan problem trap service under a host, so the cpu and fan trap will not display in the same service but guess if the new fan trap comes it will again override the old trap and show you in the nagios UI. But in this situation there also a chance to miss a trap or unknow trap which you may not have mapped. The other way which we are using is I defined a single trap service under a host and I used to reset it to OK after few seconds or minutes of the trap submission so by default it's always OK and once a trap comes it will display it, fire an alert and again rest to OK after few seconds. Hope it helps you some what :) On Thu, Jun 13, 2013 at 6:31 AM, Divya Raj > wrote: Hi Nagios-Users, Currently, I am working with Nagios where I have integrated it with a database platform (remote machines) to listen to the alerts and display them in the Nagios Web Interface. Nagios here runs on RHEL. The remote mahine sends SNMP trap messages (its an external device and not a box so no NRPE/SSH). I've setup SNMPTRAPD in the machine which captures the snmp messages from the box and calls Nagios command to route them to Nagios. For this also, I have defined a trap service to manage the incoming traps from the remote machine. But, the problem is that only the topmost alert is displayed in the Nagios (in the log as well as in the Nagios Web UI). Is that like till the first one gets cleared the other alerts for the same service don't show up? The thing is that I need all the alerts sent from the remote machine to be sent under one service/host to Nagios. Any pointers regarding this will be much appreciated. Thank You. Regards, Divya. ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null -- Thanks Manish Kumar www.manishkr.com http://in.linkedin.com/in/manishkumar85 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From mckell at us.ibm.com Fri Jun 14 01:31:44 2013 From: mckell at us.ibm.com (Sean McKell) Date: Thu, 13 Jun 2013 17:31:44 -0600 Subject: reload appears to cause force of DOWN; SOFT; x to DOWN; HARD; 1 Message-ID: Running 3.4.1: I see this strange anomaly, where a host check is in the middle of doing retries before hitting max_attempts, but after a server reload occurs, the next check is automatically forced to DOWN;HARD;1, as seen here: [2013-06-04 08:40:21] HOST ALERT: 5gt4;DOWN;SOFT;1;CRITICAL: Connection timed out to '' after 160 seconds (user 'chk'). Expected prompt not found. Last output was ''. [2013-06-04 08:47:18] HOST ALERT: 5gt4;DOWN;SOFT;2;CRITICAL: Connection timed out to '' after 160 seconds (user 'chk'). Expected prompt not found. Last output was ''. [2013-06-04 08:54:03] HOST ALERT: 5gt4;DOWN;SOFT;3;CRITICAL: Connection timed out to '' after 160 seconds (user 'chk'). Expected prompt not found. Last output was ''. (reload happens here) [2013-06-04 09:00:52] HOST ALERT: 5gt4;DOWN;HARD;1;CRITICAL: Connection timed out to '' after 160 seconds (user 'chk'). Expected prompt not found. Last output was ''. Why is it skipping the rest of the attempts and going straight to DOWN;HARD after the reload ? Seems like a bug to me. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From travisrunyard at gmail.com Fri Jun 14 06:39:48 2013 From: travisrunyard at gmail.com (Travis Runyard) Date: Thu, 13 Jun 2013 21:39:48 -0700 Subject: reload appears to cause force of DOWN; SOFT; x to DOWN; HARD; 1 In-Reply-To: References: Message-ID: Do you have this in nagios.cfg? retain_state_information=1 On Thu, Jun 13, 2013 at 4:31 PM, Sean McKell wrote: > Running 3.4.1: > I see this strange anomaly, where a host check is in the middle of doing > retries before hitting max_attempts, but after a server reload occurs, the > next check is automatically forced to DOWN;HARD;1, as seen here: > > [2013-06-04 08:40:21] HOST ALERT: 5gt4;DOWN;SOFT;1;CRITICAL: Connection > timed out to '' after 160 seconds (user 'chk'). Expected prompt not found. > Last output was ''. > [2013-06-04 08:47:18] HOST ALERT: 5gt4;DOWN;SOFT;2;CRITICAL: Connection > timed out to '' after 160 seconds (user 'chk'). Expected prompt not found. > Last output was ''. > [2013-06-04 08:54:03] HOST ALERT: 5gt4;DOWN;SOFT;3;CRITICAL: Connection > timed out to '' after 160 seconds (user 'chk'). Expected prompt not found. > Last output was ''. > (reload happens here) > [2013-06-04 09:00:52] HOST ALERT: 5gt4;DOWN;HARD;1;CRITICAL: Connection > timed out to '' after 160 seconds (user 'chk'). Expected prompt not found. > Last output was ''. > > Why is it skipping the rest of the attempts and going straight to > DOWN;HARD after the reload ? > Seems like a bug to me. > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > Nagios-users mailing list > Nagios-users at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From awiddersheim at hotmail.com Fri Jun 14 19:03:56 2013 From: awiddersheim at hotmail.com (Andrew Widdersheim) Date: Fri, 14 Jun 2013 13:03:56 -0400 Subject: Issues with NEB modules breaking after restart In-Reply-To: References: Message-ID:
To answer my own question... I'm pretty sure two nagios instances were spawned at once. The nagios init script that comes with nagios-core is the best at handling this situation.
------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From omar.saddiki at gmail.com Mon Jun 17 17:21:37 2013 From: omar.saddiki at gmail.com (omar saddiki) Date: Mon, 17 Jun 2013 15:21:37 +0000 Subject: Functions to do Availibility in reporting Message-ID: Hi, Please, someone can give me the function used by Nagios in reporting onglet to extract the availibility between two times. Regards SADDIKI -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From omar.saddiki at gmail.com Mon Jun 17 17:42:17 2013 From: omar.saddiki at gmail.com (omar saddiki) Date: Mon, 17 Jun 2013 15:42:17 +0000 Subject: Fwd: Functions to do Availibility in reporting In-Reply-To: References: Message-ID: Hi, Please, someone can give me the function used by Nagios in reporting onglet to extract the availibility between two times. Regards SADDIKI -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From maestin at gmail.com Mon Jun 17 20:14:24 2013 From: maestin at gmail.com (martin Rodriguez) Date: Mon, 17 Jun 2013 15:14:24 -0300 Subject: Wmi Message-ID: Hi I am installing Nagios 3.4.3 on ubuntu and I can not configure the plugin check_wmi_plus.conf someone had expereince in this topic -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From sunil at sunil.cc Mon Jun 17 20:44:07 2013 From: sunil at sunil.cc (Sunil Sankar) Date: Tue, 18 Jun 2013 00:14:07 +0530 Subject: Wmi In-Reply-To: References: Message-ID: What is the error you are getting On Mon, Jun 17, 2013 at 11:44 PM, martin Rodriguez wrote: > Hi I am installing Nagios 3.4.3 on ubuntu and I can not configure the > plugin check_wmi_plus.conf someone had expereince in this topic > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > Nagios-users mailing list > Nagios-users at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null > -- Regards Sunil Sankar -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From JBennett at ntta.org Fri Jun 14 16:10:43 2013 From: JBennett at ntta.org (Bennett, Jan) Date: Fri, 14 Jun 2013 14:10:43 +0000 Subject: check_ntp_time offset unknown Message-ID: We have implemented a NTP sync check in all of the NRDS checks that we are rolling out right now but I've run into a bit of a snag. I am getting returns of 'Offset Unknown' on all clients. It appears to only happen for a short period of time (30 min or so) and then it will clear its self up for a bit but the issue will always return. >From the client that is reporting the unknown offset, I can run the following: # ./check_ntp_time -H localhost NTP CRITICAL: Offset unknown| # ./check_ntp_time -V check_ntp_time v1.4.16 (nagios-plugins 1.4.16) # ntpdc -p remote local st poll reach delay offset disp ======================================================================= =LOCAL(0) 127.0.0.1 10 64 17 0.00000 0.000000 0.96858 *timeserver1 xxx.xxx.xxx.xxx 2 64 17 0.00098 4.956048 0.00580 # /usr/local/nagios/libexec/check_ntp_time -v -H localhost sending request to peer 0 response from peer 0: offset -2.777669579e-07 sending request to peer 0 response from peer 0: offset -2.161832526e-07 sending request to peer 0 response from peer 0: offset -4.009343684e-07 sending request to peer 0 response from peer 0: offset -1.987209544e-07 discarding peer 0: stratum=0 overall average offset: 0 NTP CRITICAL: Offset unknown| In my searches, I noticed a number of people reporting the same issue with the supposed solution being to update your Nagios plugins to 1.4.13. I have done so and am now running 1.4.16 without any change in the service check. Also, I am unable to check a remote NTP server from these clients as they do not have access to the outside world. It has been suggested that the stratum=0 may be the culprit, but I'm not sure of my options here. Any help would be greatly appreciated. Jan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From holger at cis.fu-berlin.de Tue Jun 18 17:24:50 2013 From: holger at cis.fu-berlin.de (Holger =?iso-8859-1?Q?Wei=DF?=) Date: Tue, 18 Jun 2013 17:24:50 +0200 Subject: check_ntp_time offset unknown In-Reply-To: References: Message-ID: <20130618152450.GA678632@zedat.fu-berlin.de> * Bennett, Jan [2013-06-14 14:10]: > # ./check_ntp_time -H localhost > NTP CRITICAL: Offset unknown| Could you please run "ntpq -c rv" when this happens and post the output? > It has been suggested that the stratum=0 may be the culprit, but I'm not sure of my options here. Yes, stratum=0 is the culprit. An NTP server wouldn't usually report such a stratum value. Holger -- Holger Wei? | Freie Universit?t Berlin holger at zedat.fu-berlin.de | Zentraleinrichtung f?r Datenverarbeitung (ZEDAT) Telefon: +49 30 838-55949 | Fabeckstra?e 32, 14195 Berlin (Germany) Telefax: +49 30 838455949 | https://www.zedat.fu-berlin.de/ ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From giles at coochey.net Tue Jun 18 17:35:03 2013 From: giles at coochey.net (Giles Coochey) Date: Tue, 18 Jun 2013 16:35:03 +0100 Subject: check_ntp_time offset unknown In-Reply-To: References: Message-ID: <51C07E27.7000400@coochey.net> On 14/06/2013 15:10, Bennett, Jan wrote: > > We have implemented a NTP sync check in all of the NRDS checks that we > are rolling out right now but I've run into a bit of a snag. > > I am getting returns of 'Offset Unknown' on all clients. It appears > to only happen for a short period of time (30 min or so) and then it > will clear its self up for a bit but the issue will always return. > > From the client that is reporting the unknown offset, I can run the > following: > > # ./check_ntp_time -H localhost > NTP CRITICAL: Offset unknown| > # ./check_ntp_time -V > check_ntp_time v1.4.16 (nagios-plugins 1.4.16) > # ntpdc -p > remote local st poll reach delay offset disp > ======================================================================= > =LOCAL(0) 127.0.0.1 10 64 17 0.00000 0.000000 0.96858 > *timeserver1 xxx.xxx.xxx.xxx 2 64 17 0.00098 4.956048 0.00580 > > # /usr/local/nagios/libexec/check_ntp_time -v -H localhost > sending request to peer 0 > response from peer 0: offset -2.777669579e-07 > sending request to peer 0 > response from peer 0: offset -2.161832526e-07 > sending request to peer 0 > response from peer 0: offset -4.009343684e-07 > sending request to peer 0 > response from peer 0: offset -1.987209544e-07 > discarding peer 0: stratum=0 > overall average offset: 0 > NTP CRITICAL: Offset unknown| > > In my searches, I noticed a number of people reporting the same issue > with the supposed solution being to update your Nagios plugins to > 1.4.13. I have done so and am now running 1.4.16 without any change > in the service check. > > Also, I am unable to check a remote NTP server from these clients as > they do not have access to the outside world. > > It has been suggested that the stratum=0 may be the culprit, but I'm > not sure of my options here. > > Any help would be greatly appreciated. > > I get this shortly after a NTP client has booted up. Once NTP has been running for a while it goes away. -- Regards, Giles Coochey, CCNP, CCNA, CCNAS NetSecSpec Ltd +44 (0) 7983 877438 http://www.coochey.net http://www.netsecspec.co.uk giles at coochey.net -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4968 bytes Desc: S/MIME Cryptographic Signature URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From nic at onlight.com Tue Jun 18 18:03:32 2013 From: nic at onlight.com (Nic Bernstein) Date: Tue, 18 Jun 2013 11:03:32 -0500 Subject: Problem with check_openmanage plugin and storage Message-ID: <51C084D4.8020104@onlight.com> We've recently been experimenting with Trond Hasle Amundsen's check_openmanage on a large network with about a hundred Dell servers of various ages, capabilities, etc. Mostly PE-2950, R210, R410 and R720. Much thanks to Trond for all his great work on Nagios plugins and other projects, by the way. We've hit a wall, however, with the storage monitoring aspects of this plugin. For example, here's a quite specific case. This is a new PE R720, in debug: onlight at monitor:~$ check_openmanage -H host -C secret -d System: PowerEdge R720 OMSA version: 7.1.0 ServiceTag: ####### Plugin version: 3.7.9 BIOS/date: 1.2.6 05/10/2012 Checking mode: SNMPv2c UDP/IPv4 ----------------------------------------------------------------------------- Storage Components ============================================================================= STATE | ID | MESSAGE TEXT ---------+----------+-------------------------------------------------------- OK | 0 | Controller 0 [PERC H310 Mini] is Ready WARNING | 0:0:1:0 | Physical Disk 0:1:0 [Ata ST2000DM001-9YN164, 2.0TB] on ctrl 0 is Online, Not Certified WARNING | 0:0:1:1 | Physical Disk 0:1:1 [Ata ST2000DM001-9YN164, 2.0TB] on ctrl 0 is Online, Not Certified OK | 0:0 | Logical Drive '/dev/sda' [RAID-1, 1862.50 GB] is Ready OK | 0:0 | Connector 0 [SAS] on controller 0 is Ready OK | 0:1 | Connector 1 [SAS] on controller 0 is Ready OK | 0:0:1 | Enclosure 0:0:1 [Backplane] on controller 0 is Ready ----------------------------------------------------------------------------- Chassis Components ============================================================================= STATE | ID | MESSAGE TEXT ---------+------+------------------------------------------------------------ OK | 0 | Memory module 0 [DIMM_A1, 4096 MB] is Ok OK | 1 | Memory module 1 [DIMM_A2, 4096 MB] is Ok OK | 2 | Memory module 2 [DIMM_A3, 4096 MB] is Ok OK | 3 | Memory module 3 [DIMM_A4, 4096 MB] is Ok OK | 0 | Chassis fan 0 [System Board Fan1 RPM] reading: 1200 RPM OK | 1 | Chassis fan 1 [System Board Fan2 RPM] reading: 1080 RPM OK | 2 | Chassis fan 2 [System Board Fan3 RPM] reading: 1200 RPM OK | 3 | Chassis fan 3 [System Board Fan4 RPM] reading: 1080 RPM OK | 4 | Chassis fan 4 [System Board Fan5 RPM] reading: 1080 RPM OK | 5 | Chassis fan 5 [System Board Fan6 RPM] reading: 1080 RPM OK | 0 | Power Supply 0 [AC]: Presence detected OK | 0 | Temperature Probe 0 [System Board Inlet Temp] reads 26 C (min=3/-7, max=42/47) OK | 1 | Temperature Probe 1 [System Board Exhaust Temp] reads 33 C (min=8/3, max=70/75) OK | 2 | Temperature Probe 2 [CPU1 Temp] reads 49 C (min=8/3, max=83/88) OK | 0 | Processor 0 [Intel Xeon E5-2603 0 1.80GHz] is Present OK | 0 | Voltage sensor 0 [CPU1 VCORE PG] is Good OK | 1 | Voltage sensor 1 [System Board 3.3V PG] is Good OK | 2 | Voltage sensor 2 [System Board 5V PG] is Good OK | 3 | Voltage sensor 3 [CPU1 PLL PG] is Good OK | 4 | Voltage sensor 4 [System Board 1.1V PG] is Good OK | 5 | Voltage sensor 5 [CPU1 M23 VDDQ PG] is Good OK | 6 | Voltage sensor 6 [CPU1 M23 VTT PG] is Good OK | 7 | Voltage sensor 7 [System Board FETDRV PG] is Good OK | 8 | Voltage sensor 8 [CPU1 VSA PG] is Good OK | 9 | Voltage sensor 9 [CPU1 M01 VDDQ PG] is Good OK | 10 | Voltage sensor 10 [System Board NDC PG] is Good OK | 11 | Voltage sensor 11 [CPU1 VTT PG] is Good OK | 12 | Voltage sensor 12 [System Board 1.5V PG] is Good OK | 13 | Voltage sensor 13 [PS2 PG Fail] is Good OK | 14 | Voltage sensor 14 [System Board PS1 PG Fail] is Good OK | 15 | Voltage sensor 15 [System Board BP1 5V PG] is Good OK | 16 | Voltage sensor 16 [CPU1 M01 VTT PG] is Good OK | 17 | Voltage sensor 17 [PS1 Voltage 1] reads 114 V OK | 0 | Battery probe 0 [System Board CMOS Battery] is Presence Detected OK | 0 | Amperage probe 0 [PS1 Current 1] reads 0.6 A OK | 1 | Amperage probe 1 [System Board Pwr Consumption] reads 56 W OK | 0 | Chassis intrusion 0 detection: Ok (Not Breached) OK | 0 | SD Card 0 [vFlash] is Absent ----------------------------------------------------------------------------- Other messages ============================================================================= STATE | MESSAGE TEXT ---------+------------------------------------------------------------------- OK | ESM log health is Ok (less than 80% full) OK | Chassis Service Tag is sane This run exits with 1 (WARNING). We're not sure we agree with the decision to make the fact that a disk is not Dell Certified a Warning, but we can at least understand that. So, what if we exclude storage, with --no-storage? onlight at monitor:~$ check_openmanage -H host -C secret -d --no-storage System: PowerEdge R720 OMSA version: 7.1.0 ServiceTag: ####### Plugin version: 3.7.9 BIOS/date: 1.2.6 05/10/2012 Checking mode: SNMPv2c UDP/IPv4 ----------------------------------------------------------------------------- Chassis Components ============================================================================= STATE | ID | MESSAGE TEXT ---------+------+------------------------------------------------------------ OK | 0 | Memory module 0 [DIMM_A1, 4096 MB] is Ok OK | 1 | Memory module 1 [DIMM_A2, 4096 MB] is Ok OK | 2 | Memory module 2 [DIMM_A3, 4096 MB] is Ok OK | 3 | Memory module 3 [DIMM_A4, 4096 MB] is Ok OK | 0 | Chassis fan 0 [System Board Fan1 RPM] reading: 1080 RPM OK | 1 | Chassis fan 1 [System Board Fan2 RPM] reading: 1080 RPM OK | 2 | Chassis fan 2 [System Board Fan3 RPM] reading: 1200 RPM OK | 3 | Chassis fan 3 [System Board Fan4 RPM] reading: 1080 RPM OK | 4 | Chassis fan 4 [System Board Fan5 RPM] reading: 1080 RPM OK | 5 | Chassis fan 5 [System Board Fan6 RPM] reading: 1080 RPM OK | 0 | Power Supply 0 [AC]: Presence detected OK | 0 | Temperature Probe 0 [System Board Inlet Temp] reads 26 C (min=3/-7, max=42/47) OK | 1 | Temperature Probe 1 [System Board Exhaust Temp] reads 33 C (min=8/3, max=70/75) OK | 2 | Temperature Probe 2 [CPU1 Temp] reads 49 C (min=8/3, max=83/88) OK | 0 | Processor 0 [Intel Xeon E5-2603 0 1.80GHz] is Present OK | 0 | Voltage sensor 0 [CPU1 VCORE PG] is Good OK | 1 | Voltage sensor 1 [System Board 3.3V PG] is Good OK | 2 | Voltage sensor 2 [System Board 5V PG] is Good OK | 3 | Voltage sensor 3 [CPU1 PLL PG] is Good OK | 4 | Voltage sensor 4 [System Board 1.1V PG] is Good OK | 5 | Voltage sensor 5 [CPU1 M23 VDDQ PG] is Good OK | 6 | Voltage sensor 6 [CPU1 M23 VTT PG] is Good OK | 7 | Voltage sensor 7 [System Board FETDRV PG] is Good OK | 8 | Voltage sensor 8 [CPU1 VSA PG] is Good OK | 9 | Voltage sensor 9 [CPU1 M01 VDDQ PG] is Good OK | 10 | Voltage sensor 10 [System Board NDC PG] is Good OK | 11 | Voltage sensor 11 [CPU1 VTT PG] is Good OK | 12 | Voltage sensor 12 [System Board 1.5V PG] is Good OK | 13 | Voltage sensor 13 [PS2 PG Fail] is Good OK | 14 | Voltage sensor 14 [System Board PS1 PG Fail] is Good OK | 15 | Voltage sensor 15 [System Board BP1 5V PG] is Good OK | 16 | Voltage sensor 16 [CPU1 M01 VTT PG] is Good OK | 17 | Voltage sensor 17 [PS1 Voltage 1] reads 112 V OK | 0 | Battery probe 0 [System Board CMOS Battery] is Presence Detected OK | 0 | Amperage probe 0 [PS1 Current 1] reads 0.6 A OK | 1 | Amperage probe 1 [System Board Pwr Consumption] reads 56 W OK | 0 | Chassis intrusion 0 detection: Ok (Not Breached) OK | 0 | SD Card 0 [vFlash] is Absent ----------------------------------------------------------------------------- Other messages ============================================================================= STATE | MESSAGE TEXT ---------+------------------------------------------------------------------- OK | ESM log health is Ok (less than 80% full) OK | Chassis Service Tag is sane OOPS! Something is wrong with this server, but I don't know what. The global system health status is WARNING, but every component check is OK. This may be a bug in the Nagios plugin, please file a bug report. This yields exit code 3 (UNKNOWN). Now, just for argument's sake, let's say we obviate the check for certified drives, by commenting out the "workaround for OMSA 7.1.0 bug" code (just a handy little short-cut). Here's what we get then: onlight at monitor:~$ check_openmanage -H host -C secret -d System: PowerEdge R720 OMSA version: 7.1.0 ServiceTag: ####### Plugin version: 3.7.9 BIOS/date: 1.2.6 05/10/2012 Checking mode: SNMPv2c UDP/IPv4 ----------------------------------------------------------------------------- Storage Components ============================================================================= STATE | ID | MESSAGE TEXT ---------+----------+-------------------------------------------------------- OK | 0 | Controller 0 [PERC H310 Mini] is Ready WARNING | 0:0:1:0 | Physical Disk 0:1:0 [Ata ST2000DM001-9YN164, 2.0TB] on ctrl 0 is Online WARNING | 0:0:1:1 | Physical Disk 0:1:1 [Ata ST2000DM001-9YN164, 2.0TB] on ctrl 0 is Online OK | 0:0 | Logical Drive '/dev/sda' [RAID-1, 1862.50 GB] is Ready OK | 0:0 | Connector 0 [SAS] on controller 0 is Ready OK | 0:1 | Connector 1 [SAS] on controller 0 is Ready OK | 0:0:1 | Enclosure 0:0:1 [Backplane] on controller 0 is Ready ----------------------------------------------------------------------------- Chassis Components ============================================================================= STATE | ID | MESSAGE TEXT ---------+------+------------------------------------------------------------ OK | 0 | Memory module 0 [DIMM_A1, 4096 MB] is Ok OK | 1 | Memory module 1 [DIMM_A2, 4096 MB] is Ok OK | 2 | Memory module 2 [DIMM_A3, 4096 MB] is Ok OK | 3 | Memory module 3 [DIMM_A4, 4096 MB] is Ok OK | 0 | Chassis fan 0 [System Board Fan1 RPM] reading: 1080 RPM OK | 1 | Chassis fan 1 [System Board Fan2 RPM] reading: 1200 RPM OK | 2 | Chassis fan 2 [System Board Fan3 RPM] reading: 1200 RPM OK | 3 | Chassis fan 3 [System Board Fan4 RPM] reading: 1080 RPM OK | 4 | Chassis fan 4 [System Board Fan5 RPM] reading: 1080 RPM OK | 5 | Chassis fan 5 [System Board Fan6 RPM] reading: 1200 RPM OK | 0 | Power Supply 0 [AC]: Presence detected OK | 0 | Temperature Probe 0 [System Board Inlet Temp] reads 26 C (min=3/-7, max=42/47) OK | 1 | Temperature Probe 1 [System Board Exhaust Temp] reads 33 C (min=8/3, max=70/75) OK | 2 | Temperature Probe 2 [CPU1 Temp] reads 48 C (min=8/3, max=83/88) OK | 0 | Processor 0 [Intel Xeon E5-2603 0 1.80GHz] is Present OK | 0 | Voltage sensor 0 [CPU1 VCORE PG] is Good OK | 1 | Voltage sensor 1 [System Board 3.3V PG] is Good OK | 2 | Voltage sensor 2 [System Board 5V PG] is Good OK | 3 | Voltage sensor 3 [CPU1 PLL PG] is Good OK | 4 | Voltage sensor 4 [System Board 1.1V PG] is Good OK | 5 | Voltage sensor 5 [CPU1 M23 VDDQ PG] is Good OK | 6 | Voltage sensor 6 [CPU1 M23 VTT PG] is Good OK | 7 | Voltage sensor 7 [System Board FETDRV PG] is Good OK | 8 | Voltage sensor 8 [CPU1 VSA PG] is Good OK | 9 | Voltage sensor 9 [CPU1 M01 VDDQ PG] is Good OK | 10 | Voltage sensor 10 [System Board NDC PG] is Good OK | 11 | Voltage sensor 11 [CPU1 VTT PG] is Good OK | 12 | Voltage sensor 12 [System Board 1.5V PG] is Good OK | 13 | Voltage sensor 13 [PS2 PG Fail] is Good OK | 14 | Voltage sensor 14 [System Board PS1 PG Fail] is Good OK | 15 | Voltage sensor 15 [System Board BP1 5V PG] is Good OK | 16 | Voltage sensor 16 [CPU1 M01 VTT PG] is Good OK | 17 | Voltage sensor 17 [PS1 Voltage 1] reads 114 V OK | 0 | Battery probe 0 [System Board CMOS Battery] is Presence Detected OK | 0 | Amperage probe 0 [PS1 Current 1] reads 0.6 A OK | 1 | Amperage probe 1 [System Board Pwr Consumption] reads 56 W OK | 0 | Chassis intrusion 0 detection: Ok (Not Breached) OK | 0 | SD Card 0 [vFlash] is Absent ----------------------------------------------------------------------------- Other messages ============================================================================= STATE | MESSAGE TEXT ---------+------------------------------------------------------------------- OK | ESM log health is Ok (less than 80% full) OK | Chassis Service Tag is sane Again, as with the original case, exit code is 1 (WARNING). Is there any way around this? Should I be disabling global health checks? Here's a run to test that, and it works: onlight at monitor:~$ check_openmanage -H host -C secret -b pdisk=all OK - System: 'PowerEdge R720', SN: '#######', 16 GB ram (4 dimms), 1 logical drives, 2 physical drives Interestingly, when combining the blacklist with debug ("-d -b pdisk=all"), the exit code is 3 (UNKNOWN), but with debug off, it's 0 (OK). So, I guess what I'm wondering is why we need to blacklist the physical disks (pdisk) instead of using --no-storage? Shouldn't --no-storage also cause globalstatus to be ignored? I can furnish SNMP walk output if that's useful. Cheers, -nic -- Nic Bernstein nic at onlight.com Onlight, Inc. www.onlight.com 219 N. Milwaukee St., Suite 2a v. 414.272.4477 Milwaukee, Wisconsin 53202 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From mckell at us.ibm.com Wed Jun 19 00:34:15 2013 From: mckell at us.ibm.com (Sean McKell) Date: Tue, 18 Jun 2013 16:34:15 -0600 Subject: reload appears to cause force of DOWN; SOFT; x to DOWN; HARD; 1 In-Reply-To: References: Message-ID: > Do you have this in nagios.cfg? > retain_state_information=1 yes, i do have that set From: nagios-users-request at lists.sourceforge.net To: nagios-users at lists.sourceforge.net, Date: 06/18/2013 01:56 PM Subject: Nagios-users Digest, Vol 85, Issue 6 Send Nagios-users mailing list submissions to nagios-users at lists.sourceforge.net To subscribe or unsubscribe via the World Wide Web, visit https://lists.sourceforge.net/lists/listinfo/nagios-users or, via email, send a message with subject or body 'help' to nagios-users-request at lists.sourceforge.net You can reach the person managing the list at nagios-users-owner at lists.sourceforge.net When replying, please edit your Subject line so it is more specific than "Re: Contents of Nagios-users digest..." Today's Topics: 1. reload appears to cause force of DOWN; SOFT; x to DOWN; HARD; 1 (Sean McKell) 2. Re: reload appears to cause force of DOWN; SOFT; x to DOWN; HARD; 1 (Travis Runyard) 3. Re: Issues with NEB modules breaking after restart (Andrew Widdersheim) 4. Functions to do Availibility in reporting (omar saddiki) 5. Fwd: Functions to do Availibility in reporting (omar saddiki) 6. Wmi (martin Rodriguez) 7. Re: Wmi (Sunil Sankar) 8. check_ntp_time offset unknown (Bennett, Jan) 9. Re: check_ntp_time offset unknown (Holger Wei?) 10. Re: check_ntp_time offset unknown (Giles Coochey) 11. Problem with check_openmanage plugin and storage (Nic Bernstein) ---------------------------------------------------------------------- Message: 1 Date: Thu, 13 Jun 2013 17:31:44 -0600 From: Sean McKell Subject: [Nagios-users] reload appears to cause force of DOWN; SOFT; x to DOWN; HARD; 1 To: nagios-users at lists.sourceforge.net Message-ID: Content-Type: text/plain; charset="us-ascii" Running 3.4.1: I see this strange anomaly, where a host check is in the middle of doing retries before hitting max_attempts, but after a server reload occurs, the next check is automatically forced to DOWN;HARD;1, as seen here: [2013-06-04 08:40:21] HOST ALERT: 5gt4;DOWN;SOFT;1;CRITICAL: Connection timed out to '' after 160 seconds (user 'chk'). Expected prompt not found. Last output was ''. [2013-06-04 08:47:18] HOST ALERT: 5gt4;DOWN;SOFT;2;CRITICAL: Connection timed out to '' after 160 seconds (user 'chk'). Expected prompt not found. Last output was ''. [2013-06-04 08:54:03] HOST ALERT: 5gt4;DOWN;SOFT;3;CRITICAL: Connection timed out to '' after 160 seconds (user 'chk'). Expected prompt not found. Last output was ''. (reload happens here) [2013-06-04 09:00:52] HOST ALERT: 5gt4;DOWN;HARD;1;CRITICAL: Connection timed out to '' after 160 seconds (user 'chk'). Expected prompt not found. Last output was ''. Why is it skipping the rest of the attempts and going straight to DOWN;HARD after the reload ? Seems like a bug to me. -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 2 Date: Thu, 13 Jun 2013 21:39:48 -0700 From: Travis Runyard Subject: Re: [Nagios-users] reload appears to cause force of DOWN; SOFT; x to DOWN; HARD; 1 To: Nagios Users List Message-ID: Content-Type: text/plain; charset="iso-8859-1" Do you have this in nagios.cfg? retain_state_information=1 On Thu, Jun 13, 2013 at 4:31 PM, Sean McKell wrote: > Running 3.4.1: > I see this strange anomaly, where a host check is in the middle of doing > retries before hitting max_attempts, but after a server reload occurs, the > next check is automatically forced to DOWN;HARD;1, as seen here: > > [2013-06-04 08:40:21] HOST ALERT: 5gt4;DOWN;SOFT;1;CRITICAL: Connection > timed out to '' after 160 seconds (user 'chk'). Expected prompt not found. > Last output was ''. > [2013-06-04 08:47:18] HOST ALERT: 5gt4;DOWN;SOFT;2;CRITICAL: Connection > timed out to '' after 160 seconds (user 'chk'). Expected prompt not found. > Last output was ''. > [2013-06-04 08:54:03] HOST ALERT: 5gt4;DOWN;SOFT;3;CRITICAL: Connection > timed out to '' after 160 seconds (user 'chk'). Expected prompt not found. > Last output was ''. > (reload happens here) > [2013-06-04 09:00:52] HOST ALERT: 5gt4;DOWN;HARD;1;CRITICAL: Connection > timed out to '' after 160 seconds (user 'chk'). Expected prompt not found. > Last output was ''. > > Why is it skipping the rest of the attempts and going straight to > DOWN;HARD after the reload ? > Seems like a bug to me. > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > Nagios-users mailing list > Nagios-users at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null > -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 3 Date: Fri, 14 Jun 2013 13:03:56 -0400 From: Andrew Widdersheim Subject: Re: [Nagios-users] Issues with NEB modules breaking after restart To: "nagios-users at lists.sourceforge.net" Message-ID: Content-Type: text/plain; charset="iso-8859-1"
To answer my own question... I'm pretty sure two nagios instances were spawned at once. The nagios init script that comes with nagios-core is the best at handling this situation.
------------------------------ Message: 4 Date: Mon, 17 Jun 2013 15:21:37 +0000 From: omar saddiki Subject: [Nagios-users] Functions to do Availibility in reporting To: Nagios Users List Message-ID: Content-Type: text/plain; charset="iso-8859-1" Hi, Please, someone can give me the function used by Nagios in reporting onglet to extract the availibility between two times. Regards SADDIKI -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 5 Date: Mon, 17 Jun 2013 15:42:17 +0000 From: omar saddiki Subject: [Nagios-users] Fwd: Functions to do Availibility in reporting To: Nagios Users List Message-ID: Content-Type: text/plain; charset="iso-8859-1" Hi, Please, someone can give me the function used by Nagios in reporting onglet to extract the availibility between two times. Regards SADDIKI -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 6 Date: Mon, 17 Jun 2013 15:14:24 -0300 From: martin Rodriguez Subject: [Nagios-users] Wmi To: nagios-users at lists.sourceforge.net Message-ID: Content-Type: text/plain; charset="iso-8859-1" Hi I am installing Nagios 3.4.3 on ubuntu and I can not configure the plugin check_wmi_plus.conf someone had expereince in this topic -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 7 Date: Tue, 18 Jun 2013 00:14:07 +0530 From: Sunil Sankar Subject: Re: [Nagios-users] Wmi To: Nagios Users List Message-ID: Content-Type: text/plain; charset="iso-8859-1" What is the error you are getting On Mon, Jun 17, 2013 at 11:44 PM, martin Rodriguez wrote: > Hi I am installing Nagios 3.4.3 on ubuntu and I can not configure the > plugin check_wmi_plus.conf someone had expereince in this topic > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > Nagios-users mailing list > Nagios-users at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null > -- Regards Sunil Sankar -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 8 Date: Fri, 14 Jun 2013 14:10:43 +0000 From: "Bennett, Jan" Subject: [Nagios-users] check_ntp_time offset unknown To: "'nagios-users at lists.sourceforge.net'" Message-ID: Content-Type: text/plain; charset="us-ascii" We have implemented a NTP sync check in all of the NRDS checks that we are rolling out right now but I've run into a bit of a snag. I am getting returns of 'Offset Unknown' on all clients. It appears to only happen for a short period of time (30 min or so) and then it will clear its self up for a bit but the issue will always return. >From the client that is reporting the unknown offset, I can run the following: # ./check_ntp_time -H localhost NTP CRITICAL: Offset unknown| # ./check_ntp_time -V check_ntp_time v1.4.16 (nagios-plugins 1.4.16) # ntpdc -p remote local st poll reach delay offset disp ======================================================================= =LOCAL(0) 127.0.0.1 10 64 17 0.00000 0.000000 0.96858 *timeserver1 xxx.xxx.xxx.xxx 2 64 17 0.00098 4.956048 0.00580 # /usr/local/nagios/libexec/check_ntp_time -v -H localhost sending request to peer 0 response from peer 0: offset -2.777669579e-07 sending request to peer 0 response from peer 0: offset -2.161832526e-07 sending request to peer 0 response from peer 0: offset -4.009343684e-07 sending request to peer 0 response from peer 0: offset -1.987209544e-07 discarding peer 0: stratum=0 overall average offset: 0 NTP CRITICAL: Offset unknown| In my searches, I noticed a number of people reporting the same issue with the supposed solution being to update your Nagios plugins to 1.4.13. I have done so and am now running 1.4.16 without any change in the service check. Also, I am unable to check a remote NTP server from these clients as they do not have access to the outside world. It has been suggested that the stratum=0 may be the culprit, but I'm not sure of my options here. Any help would be greatly appreciated. Jan -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ Message: 9 Date: Tue, 18 Jun 2013 17:24:50 +0200 From: Holger Wei? Subject: Re: [Nagios-users] check_ntp_time offset unknown To: Nagios Users Message-ID: <20130618152450.GA678632 at zedat.fu-berlin.de> Content-Type: text/plain; charset=iso-8859-1 * Bennett, Jan [2013-06-14 14:10]: > # ./check_ntp_time -H localhost > NTP CRITICAL: Offset unknown| Could you please run "ntpq -c rv" when this happens and post the output? > It has been suggested that the stratum=0 may be the culprit, but I'm not sure of my options here. Yes, stratum=0 is the culprit. An NTP server wouldn't usually report such a stratum value. Holger -- Holger Wei? | Freie Universit?t Berlin holger at zedat.fu-berlin.de | Zentraleinrichtung f?r Datenverarbeitung (ZEDAT) Telefon: +49 30 838-55949 | Fabeckstra?e 32, 14195 Berlin (Germany) Telefax: +49 30 838455949 | https://www.zedat.fu-berlin.de/ ------------------------------ Message: 10 Date: Tue, 18 Jun 2013 16:35:03 +0100 From: Giles Coochey Subject: Re: [Nagios-users] check_ntp_time offset unknown To: nagios-users at lists.sourceforge.net Message-ID: <51C07E27.7000400 at coochey.net> Content-Type: text/plain; charset="iso-8859-1" On 14/06/2013 15:10, Bennett, Jan wrote: > > We have implemented a NTP sync check in all of the NRDS checks that we > are rolling out right now but I've run into a bit of a snag. > > I am getting returns of 'Offset Unknown' on all clients. It appears > to only happen for a short period of time (30 min or so) and then it > will clear its self up for a bit but the issue will always return. > > From the client that is reporting the unknown offset, I can run the > following: > > # ./check_ntp_time -H localhost > NTP CRITICAL: Offset unknown| > # ./check_ntp_time -V > check_ntp_time v1.4.16 (nagios-plugins 1.4.16) > # ntpdc -p > remote local st poll reach delay offset disp > ======================================================================= > =LOCAL(0) 127.0.0.1 10 64 17 0.00000 0.000000 0.96858 > *timeserver1 xxx.xxx.xxx.xxx 2 64 17 0.00098 4.956048 0.00580 > > # /usr/local/nagios/libexec/check_ntp_time -v -H localhost > sending request to peer 0 > response from peer 0: offset -2.777669579e-07 > sending request to peer 0 > response from peer 0: offset -2.161832526e-07 > sending request to peer 0 > response from peer 0: offset -4.009343684e-07 > sending request to peer 0 > response from peer 0: offset -1.987209544e-07 > discarding peer 0: stratum=0 > overall average offset: 0 > NTP CRITICAL: Offset unknown| > > In my searches, I noticed a number of people reporting the same issue > with the supposed solution being to update your Nagios plugins to > 1.4.13. I have done so and am now running 1.4.16 without any change > in the service check. > > Also, I am unable to check a remote NTP server from these clients as > they do not have access to the outside world. > > It has been suggested that the stratum=0 may be the culprit, but I'm > not sure of my options here. > > Any help would be greatly appreciated. > > I get this shortly after a NTP client has booted up. Once NTP has been running for a while it goes away. -- Regards, Giles Coochey, CCNP, CCNA, CCNAS NetSecSpec Ltd +44 (0) 7983 877438 http://www.coochey.net http://www.netsecspec.co.uk giles at coochey.net -------------- next part -------------- An HTML attachment was scrubbed... -------------- next part -------------- A non-text attachment was scrubbed... Name: smime.p7s Type: application/pkcs7-signature Size: 4968 bytes Desc: S/MIME Cryptographic Signature ------------------------------ Message: 11 Date: Tue, 18 Jun 2013 11:03:32 -0500 From: Nic Bernstein Subject: [Nagios-users] Problem with check_openmanage plugin and storage To: nagios-users at lists.sourceforge.net Message-ID: <51C084D4.8020104 at onlight.com> Content-Type: text/plain; charset="utf-8" We've recently been experimenting with Trond Hasle Amundsen's check_openmanage on a large network with about a hundred Dell servers of various ages, capabilities, etc. Mostly PE-2950, R210, R410 and R720. Much thanks to Trond for all his great work on Nagios plugins and other projects, by the way. We've hit a wall, however, with the storage monitoring aspects of this plugin. For example, here's a quite specific case. This is a new PE R720, in debug: onlight at monitor:~$ check_openmanage -H host -C secret -d System: PowerEdge R720 OMSA version: 7.1.0 ServiceTag: ####### Plugin version: 3.7.9 BIOS/date: 1.2.6 05/10/2012 Checking mode: SNMPv2c UDP/IPv4 ----------------------------------------------------------------------------- Storage Components ============================================================================= STATE | ID | MESSAGE TEXT ---------+----------+-------------------------------------------------------- OK | 0 | Controller 0 [PERC H310 Mini] is Ready WARNING | 0:0:1:0 | Physical Disk 0:1:0 [Ata ST2000DM001-9YN164, 2.0TB] on ctrl 0 is Online, Not Certified WARNING | 0:0:1:1 | Physical Disk 0:1:1 [Ata ST2000DM001-9YN164, 2.0TB] on ctrl 0 is Online, Not Certified OK | 0:0 | Logical Drive '/dev/sda' [RAID-1, 1862.50 GB] is Ready OK | 0:0 | Connector 0 [SAS] on controller 0 is Ready OK | 0:1 | Connector 1 [SAS] on controller 0 is Ready OK | 0:0:1 | Enclosure 0:0:1 [Backplane] on controller 0 is Ready ----------------------------------------------------------------------------- Chassis Components ============================================================================= STATE | ID | MESSAGE TEXT ---------+------+------------------------------------------------------------ OK | 0 | Memory module 0 [DIMM_A1, 4096 MB] is Ok OK | 1 | Memory module 1 [DIMM_A2, 4096 MB] is Ok OK | 2 | Memory module 2 [DIMM_A3, 4096 MB] is Ok OK | 3 | Memory module 3 [DIMM_A4, 4096 MB] is Ok OK | 0 | Chassis fan 0 [System Board Fan1 RPM] reading: 1200 RPM OK | 1 | Chassis fan 1 [System Board Fan2 RPM] reading: 1080 RPM OK | 2 | Chassis fan 2 [System Board Fan3 RPM] reading: 1200 RPM OK | 3 | Chassis fan 3 [System Board Fan4 RPM] reading: 1080 RPM OK | 4 | Chassis fan 4 [System Board Fan5 RPM] reading: 1080 RPM OK | 5 | Chassis fan 5 [System Board Fan6 RPM] reading: 1080 RPM OK | 0 | Power Supply 0 [AC]: Presence detected OK | 0 | Temperature Probe 0 [System Board Inlet Temp] reads 26 C (min=3/-7, max=42/47) OK | 1 | Temperature Probe 1 [System Board Exhaust Temp] reads 33 C (min=8/3, max=70/75) OK | 2 | Temperature Probe 2 [CPU1 Temp] reads 49 C (min=8/3, max=83/88) OK | 0 | Processor 0 [Intel Xeon E5-2603 0 1.80GHz] is Present OK | 0 | Voltage sensor 0 [CPU1 VCORE PG] is Good OK | 1 | Voltage sensor 1 [System Board 3.3V PG] is Good OK | 2 | Voltage sensor 2 [System Board 5V PG] is Good OK | 3 | Voltage sensor 3 [CPU1 PLL PG] is Good OK | 4 | Voltage sensor 4 [System Board 1.1V PG] is Good OK | 5 | Voltage sensor 5 [CPU1 M23 VDDQ PG] is Good OK | 6 | Voltage sensor 6 [CPU1 M23 VTT PG] is Good OK | 7 | Voltage sensor 7 [System Board FETDRV PG] is Good OK | 8 | Voltage sensor 8 [CPU1 VSA PG] is Good OK | 9 | Voltage sensor 9 [CPU1 M01 VDDQ PG] is Good OK | 10 | Voltage sensor 10 [System Board NDC PG] is Good OK | 11 | Voltage sensor 11 [CPU1 VTT PG] is Good OK | 12 | Voltage sensor 12 [System Board 1.5V PG] is Good OK | 13 | Voltage sensor 13 [PS2 PG Fail] is Good OK | 14 | Voltage sensor 14 [System Board PS1 PG Fail] is Good OK | 15 | Voltage sensor 15 [System Board BP1 5V PG] is Good OK | 16 | Voltage sensor 16 [CPU1 M01 VTT PG] is Good OK | 17 | Voltage sensor 17 [PS1 Voltage 1] reads 114 V OK | 0 | Battery probe 0 [System Board CMOS Battery] is Presence Detected OK | 0 | Amperage probe 0 [PS1 Current 1] reads 0.6 A OK | 1 | Amperage probe 1 [System Board Pwr Consumption] reads 56 W OK | 0 | Chassis intrusion 0 detection: Ok (Not Breached) OK | 0 | SD Card 0 [vFlash] is Absent ----------------------------------------------------------------------------- Other messages ============================================================================= STATE | MESSAGE TEXT ---------+------------------------------------------------------------------- OK | ESM log health is Ok (less than 80% full) OK | Chassis Service Tag is sane This run exits with 1 (WARNING). We're not sure we agree with the decision to make the fact that a disk is not Dell Certified a Warning, but we can at least understand that. So, what if we exclude storage, with --no-storage? onlight at monitor:~$ check_openmanage -H host -C secret -d --no-storage System: PowerEdge R720 OMSA version: 7.1.0 ServiceTag: ####### Plugin version: 3.7.9 BIOS/date: 1.2.6 05/10/2012 Checking mode: SNMPv2c UDP/IPv4 ----------------------------------------------------------------------------- Chassis Components ============================================================================= STATE | ID | MESSAGE TEXT ---------+------+------------------------------------------------------------ OK | 0 | Memory module 0 [DIMM_A1, 4096 MB] is Ok OK | 1 | Memory module 1 [DIMM_A2, 4096 MB] is Ok OK | 2 | Memory module 2 [DIMM_A3, 4096 MB] is Ok OK | 3 | Memory module 3 [DIMM_A4, 4096 MB] is Ok OK | 0 | Chassis fan 0 [System Board Fan1 RPM] reading: 1080 RPM OK | 1 | Chassis fan 1 [System Board Fan2 RPM] reading: 1080 RPM OK | 2 | Chassis fan 2 [System Board Fan3 RPM] reading: 1200 RPM OK | 3 | Chassis fan 3 [System Board Fan4 RPM] reading: 1080 RPM OK | 4 | Chassis fan 4 [System Board Fan5 RPM] reading: 1080 RPM OK | 5 | Chassis fan 5 [System Board Fan6 RPM] reading: 1080 RPM OK | 0 | Power Supply 0 [AC]: Presence detected OK | 0 | Temperature Probe 0 [System Board Inlet Temp] reads 26 C (min=3/-7, max=42/47) OK | 1 | Temperature Probe 1 [System Board Exhaust Temp] reads 33 C (min=8/3, max=70/75) OK | 2 | Temperature Probe 2 [CPU1 Temp] reads 49 C (min=8/3, max=83/88) OK | 0 | Processor 0 [Intel Xeon E5-2603 0 1.80GHz] is Present OK | 0 | Voltage sensor 0 [CPU1 VCORE PG] is Good OK | 1 | Voltage sensor 1 [System Board 3.3V PG] is Good OK | 2 | Voltage sensor 2 [System Board 5V PG] is Good OK | 3 | Voltage sensor 3 [CPU1 PLL PG] is Good OK | 4 | Voltage sensor 4 [System Board 1.1V PG] is Good OK | 5 | Voltage sensor 5 [CPU1 M23 VDDQ PG] is Good OK | 6 | Voltage sensor 6 [CPU1 M23 VTT PG] is Good OK | 7 | Voltage sensor 7 [System Board FETDRV PG] is Good OK | 8 | Voltage sensor 8 [CPU1 VSA PG] is Good OK | 9 | Voltage sensor 9 [CPU1 M01 VDDQ PG] is Good OK | 10 | Voltage sensor 10 [System Board NDC PG] is Good OK | 11 | Voltage sensor 11 [CPU1 VTT PG] is Good OK | 12 | Voltage sensor 12 [System Board 1.5V PG] is Good OK | 13 | Voltage sensor 13 [PS2 PG Fail] is Good OK | 14 | Voltage sensor 14 [System Board PS1 PG Fail] is Good OK | 15 | Voltage sensor 15 [System Board BP1 5V PG] is Good OK | 16 | Voltage sensor 16 [CPU1 M01 VTT PG] is Good OK | 17 | Voltage sensor 17 [PS1 Voltage 1] reads 112 V OK | 0 | Battery probe 0 [System Board CMOS Battery] is Presence Detected OK | 0 | Amperage probe 0 [PS1 Current 1] reads 0.6 A OK | 1 | Amperage probe 1 [System Board Pwr Consumption] reads 56 W OK | 0 | Chassis intrusion 0 detection: Ok (Not Breached) OK | 0 | SD Card 0 [vFlash] is Absent ----------------------------------------------------------------------------- Other messages ============================================================================= STATE | MESSAGE TEXT ---------+------------------------------------------------------------------- OK | ESM log health is Ok (less than 80% full) OK | Chassis Service Tag is sane OOPS! Something is wrong with this server, but I don't know what. The global system health status is WARNING, but every component check is OK. This may be a bug in the Nagios plugin, please file a bug report. This yields exit code 3 (UNKNOWN). Now, just for argument's sake, let's say we obviate the check for certified drives, by commenting out the "workaround for OMSA 7.1.0 bug" code (just a handy little short-cut). Here's what we get then: onlight at monitor:~$ check_openmanage -H host -C secret -d System: PowerEdge R720 OMSA version: 7.1.0 ServiceTag: ####### Plugin version: 3.7.9 BIOS/date: 1.2.6 05/10/2012 Checking mode: SNMPv2c UDP/IPv4 ----------------------------------------------------------------------------- Storage Components ============================================================================= STATE | ID | MESSAGE TEXT ---------+----------+-------------------------------------------------------- OK | 0 | Controller 0 [PERC H310 Mini] is Ready WARNING | 0:0:1:0 | Physical Disk 0:1:0 [Ata ST2000DM001-9YN164, 2.0TB] on ctrl 0 is Online WARNING | 0:0:1:1 | Physical Disk 0:1:1 [Ata ST2000DM001-9YN164, 2.0TB] on ctrl 0 is Online OK | 0:0 | Logical Drive '/dev/sda' [RAID-1, 1862.50 GB] is Ready OK | 0:0 | Connector 0 [SAS] on controller 0 is Ready OK | 0:1 | Connector 1 [SAS] on controller 0 is Ready OK | 0:0:1 | Enclosure 0:0:1 [Backplane] on controller 0 is Ready ----------------------------------------------------------------------------- Chassis Components ============================================================================= STATE | ID | MESSAGE TEXT ---------+------+------------------------------------------------------------ OK | 0 | Memory module 0 [DIMM_A1, 4096 MB] is Ok OK | 1 | Memory module 1 [DIMM_A2, 4096 MB] is Ok OK | 2 | Memory module 2 [DIMM_A3, 4096 MB] is Ok OK | 3 | Memory module 3 [DIMM_A4, 4096 MB] is Ok OK | 0 | Chassis fan 0 [System Board Fan1 RPM] reading: 1080 RPM OK | 1 | Chassis fan 1 [System Board Fan2 RPM] reading: 1200 RPM OK | 2 | Chassis fan 2 [System Board Fan3 RPM] reading: 1200 RPM OK | 3 | Chassis fan 3 [System Board Fan4 RPM] reading: 1080 RPM OK | 4 | Chassis fan 4 [System Board Fan5 RPM] reading: 1080 RPM OK | 5 | Chassis fan 5 [System Board Fan6 RPM] reading: 1200 RPM OK | 0 | Power Supply 0 [AC]: Presence detected OK | 0 | Temperature Probe 0 [System Board Inlet Temp] reads 26 C (min=3/-7, max=42/47) OK | 1 | Temperature Probe 1 [System Board Exhaust Temp] reads 33 C (min=8/3, max=70/75) OK | 2 | Temperature Probe 2 [CPU1 Temp] reads 48 C (min=8/3, max=83/88) OK | 0 | Processor 0 [Intel Xeon E5-2603 0 1.80GHz] is Present OK | 0 | Voltage sensor 0 [CPU1 VCORE PG] is Good OK | 1 | Voltage sensor 1 [System Board 3.3V PG] is Good OK | 2 | Voltage sensor 2 [System Board 5V PG] is Good OK | 3 | Voltage sensor 3 [CPU1 PLL PG] is Good OK | 4 | Voltage sensor 4 [System Board 1.1V PG] is Good OK | 5 | Voltage sensor 5 [CPU1 M23 VDDQ PG] is Good OK | 6 | Voltage sensor 6 [CPU1 M23 VTT PG] is Good OK | 7 | Voltage sensor 7 [System Board FETDRV PG] is Good OK | 8 | Voltage sensor 8 [CPU1 VSA PG] is Good OK | 9 | Voltage sensor 9 [CPU1 M01 VDDQ PG] is Good OK | 10 | Voltage sensor 10 [System Board NDC PG] is Good OK | 11 | Voltage sensor 11 [CPU1 VTT PG] is Good OK | 12 | Voltage sensor 12 [System Board 1.5V PG] is Good OK | 13 | Voltage sensor 13 [PS2 PG Fail] is Good OK | 14 | Voltage sensor 14 [System Board PS1 PG Fail] is Good OK | 15 | Voltage sensor 15 [System Board BP1 5V PG] is Good OK | 16 | Voltage sensor 16 [CPU1 M01 VTT PG] is Good OK | 17 | Voltage sensor 17 [PS1 Voltage 1] reads 114 V OK | 0 | Battery probe 0 [System Board CMOS Battery] is Presence Detected OK | 0 | Amperage probe 0 [PS1 Current 1] reads 0.6 A OK | 1 | Amperage probe 1 [System Board Pwr Consumption] reads 56 W OK | 0 | Chassis intrusion 0 detection: Ok (Not Breached) OK | 0 | SD Card 0 [vFlash] is Absent ----------------------------------------------------------------------------- Other messages ============================================================================= STATE | MESSAGE TEXT ---------+------------------------------------------------------------------- OK | ESM log health is Ok (less than 80% full) OK | Chassis Service Tag is sane Again, as with the original case, exit code is 1 (WARNING). Is there any way around this? Should I be disabling global health checks? Here's a run to test that, and it works: onlight at monitor:~$ check_openmanage -H host -C secret -b pdisk=all OK - System: 'PowerEdge R720', SN: '#######', 16 GB ram (4 dimms), 1 logical drives, 2 physical drives Interestingly, when combining the blacklist with debug ("-d -b pdisk=all"), the exit code is 3 (UNKNOWN), but with debug off, it's 0 (OK). So, I guess what I'm wondering is why we need to blacklist the physical disks (pdisk) instead of using --no-storage? Shouldn't --no-storage also cause globalstatus to be ignored? I can furnish SNMP walk output if that's useful. Cheers, -nic -- Nic Bernstein nic at onlight.com Onlight, Inc. www.onlight.com 219 N. Milwaukee St., Suite 2a v. 414.272.4477 Milwaukee, Wisconsin 53202 -------------- next part -------------- An HTML attachment was scrubbed... ------------------------------ ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev ------------------------------ _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users End of Nagios-users Digest, Vol 85, Issue 6 ******************************************* -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From t.h.amundsen at usit.uio.no Wed Jun 19 02:41:02 2013 From: t.h.amundsen at usit.uio.no (Trond Hasle Amundsen) Date: Wed, 19 Jun 2013 02:41:02 +0200 Subject: Problem with check_openmanage plugin and storage In-Reply-To: <51C084D4.8020104@onlight.com> (Nic Bernstein's message of "Tue, 18 Jun 2013 11:03:32 -0500") References: <51C084D4.8020104@onlight.com> Message-ID: <15tk3lrrkyp.fsf@tux.uio.no> Nic Bernstein writes: > We've recently been experimenting with Trond Hasle Amundsen's check_openmanage > on a large network with about a hundred Dell servers of various ages, > capabilities, etc.? Mostly PE-2950, R210, R410 and R720.? Much thanks to Trond > for all his great work on Nagios plugins and other projects, by the way. > > We've hit a wall, however, with the storage monitoring aspects of this plugin. > > For example, here's a quite specific case.? This is a new PE R720, in debug: > > onlight at monitor:~$ check_openmanage -H host -C secret -d > System: PowerEdge R720 OMSA version: 7.1.0 > ServiceTag: ####### Plugin version: 3.7.9 > BIOS/date: 1.2.6 05/10/2012 Checking mode: SNMPv2c UDP/IPv4 > ----------------------------------------------------------------------------- > Storage Components > ============================================================================= > STATE | ID | MESSAGE TEXT > ---------+----------+-------------------------------------------------------- > OK | 0 | Controller 0 [PERC H310 Mini] is Ready > WARNING | 0:0:1:0 | Physical Disk 0:1:0 [Ata ST2000DM001-9YN164, 2.0TB] on ctrl 0 is Online, Not Certified > WARNING | 0:0:1:1 | Physical Disk 0:1:1 [Ata ST2000DM001-9YN164, 2.0TB] on ctrl 0 is Online, Not Certified > OK | 0:0 | Logical Drive '/dev/sda' [RAID-1, 1862.50 GB] is Ready > OK | 0:0 | Connector 0 [SAS] on controller 0 is Ready > OK | 0:1 | Connector 1 [SAS] on controller 0 is Ready > OK | 0:0:1 | Enclosure 0:0:1 [Backplane] on controller 0 is Ready > ----------------------------------------------------------------------------- > Chassis Components > ============================================================================= > STATE | ID | MESSAGE TEXT > ---------+------+------------------------------------------------------------ > OK | 0 | Memory module 0 [DIMM_A1, 4096 MB] is Ok > OK | 1 | Memory module 1 [DIMM_A2, 4096 MB] is Ok > OK | 2 | Memory module 2 [DIMM_A3, 4096 MB] is Ok > OK | 3 | Memory module 3 [DIMM_A4, 4096 MB] is Ok > OK | 0 | Chassis fan 0 [System Board Fan1 RPM] reading: 1200 RPM > OK | 1 | Chassis fan 1 [System Board Fan2 RPM] reading: 1080 RPM > OK | 2 | Chassis fan 2 [System Board Fan3 RPM] reading: 1200 RPM > OK | 3 | Chassis fan 3 [System Board Fan4 RPM] reading: 1080 RPM > OK | 4 | Chassis fan 4 [System Board Fan5 RPM] reading: 1080 RPM > OK | 5 | Chassis fan 5 [System Board Fan6 RPM] reading: 1080 RPM > OK | 0 | Power Supply 0 [AC]: Presence detected > OK | 0 | Temperature Probe 0 [System Board Inlet Temp] reads 26 C (min=3/-7, max=42/47) > OK | 1 | Temperature Probe 1 [System Board Exhaust Temp] reads 33 C (min=8/3, max=70/75) > OK | 2 | Temperature Probe 2 [CPU1 Temp] reads 49 C (min=8/3, max=83/88) > OK | 0 | Processor 0 [Intel Xeon E5-2603 0 1.80GHz] is Present > OK | 0 | Voltage sensor 0 [CPU1 VCORE PG] is Good > OK | 1 | Voltage sensor 1 [System Board 3.3V PG] is Good > OK | 2 | Voltage sensor 2 [System Board 5V PG] is Good > OK | 3 | Voltage sensor 3 [CPU1 PLL PG] is Good > OK | 4 | Voltage sensor 4 [System Board 1.1V PG] is Good > OK | 5 | Voltage sensor 5 [CPU1 M23 VDDQ PG] is Good > OK | 6 | Voltage sensor 6 [CPU1 M23 VTT PG] is Good > OK | 7 | Voltage sensor 7 [System Board FETDRV PG] is Good > OK | 8 | Voltage sensor 8 [CPU1 VSA PG] is Good > OK | 9 | Voltage sensor 9 [CPU1 M01 VDDQ PG] is Good > OK | 10 | Voltage sensor 10 [System Board NDC PG] is Good > OK | 11 | Voltage sensor 11 [CPU1 VTT PG] is Good > OK | 12 | Voltage sensor 12 [System Board 1.5V PG] is Good > OK | 13 | Voltage sensor 13 [PS2 PG Fail] is Good > OK | 14 | Voltage sensor 14 [System Board PS1 PG Fail] is Good > OK | 15 | Voltage sensor 15 [System Board BP1 5V PG] is Good > OK | 16 | Voltage sensor 16 [CPU1 M01 VTT PG] is Good > OK | 17 | Voltage sensor 17 [PS1 Voltage 1] reads 114 V > OK | 0 | Battery probe 0 [System Board CMOS Battery] is Presence Detected > OK | 0 | Amperage probe 0 [PS1 Current 1] reads 0.6 A > OK | 1 | Amperage probe 1 [System Board Pwr Consumption] reads 56 W > OK | 0 | Chassis intrusion 0 detection: Ok (Not Breached) > OK | 0 | SD Card 0 [vFlash] is Absent > ----------------------------------------------------------------------------- > Other messages > ============================================================================= > STATE | MESSAGE TEXT > ---------+------------------------------------------------------------------- > OK | ESM log health is Ok (less than 80% full) > OK | Chassis Service Tag is sane > > This run exits with 1 (WARNING). > > We're not sure we agree with the decision to make the fact that a disk is not > Dell Certified a Warning, but we can at least understand that.? So, what if we > exclude storage, with --no-storage? The decision to create a warning for non-certified disks belongs to Dell. I've tried to let the plugin simply relay the warning level from Openmanage, unless it's outright wrong (such as reporting disks in predictive failure as OK). > onlight at monitor:~$ check_openmanage -H host -C secret -d --no-storage > System: PowerEdge R720 OMSA version: 7.1.0 > ServiceTag: ####### Plugin version: 3.7.9 > BIOS/date: 1.2.6 05/10/2012 Checking mode: SNMPv2c UDP/IPv4 > ----------------------------------------------------------------------------- > Chassis Components > ============================================================================= > STATE | ID | MESSAGE TEXT > ---------+------+------------------------------------------------------------ > OK | 0 | Memory module 0 [DIMM_A1, 4096 MB] is Ok > OK | 1 | Memory module 1 [DIMM_A2, 4096 MB] is Ok > OK | 2 | Memory module 2 [DIMM_A3, 4096 MB] is Ok > OK | 3 | Memory module 3 [DIMM_A4, 4096 MB] is Ok > OK | 0 | Chassis fan 0 [System Board Fan1 RPM] reading: 1080 RPM > OK | 1 | Chassis fan 1 [System Board Fan2 RPM] reading: 1080 RPM > OK | 2 | Chassis fan 2 [System Board Fan3 RPM] reading: 1200 RPM > OK | 3 | Chassis fan 3 [System Board Fan4 RPM] reading: 1080 RPM > OK | 4 | Chassis fan 4 [System Board Fan5 RPM] reading: 1080 RPM > OK | 5 | Chassis fan 5 [System Board Fan6 RPM] reading: 1080 RPM > OK | 0 | Power Supply 0 [AC]: Presence detected > OK | 0 | Temperature Probe 0 [System Board Inlet Temp] reads 26 C (min=3/-7, max=42/47) > OK | 1 | Temperature Probe 1 [System Board Exhaust Temp] reads 33 C (min=8/3, max=70/75) > OK | 2 | Temperature Probe 2 [CPU1 Temp] reads 49 C (min=8/3, max=83/88) > OK | 0 | Processor 0 [Intel Xeon E5-2603 0 1.80GHz] is Present > OK | 0 | Voltage sensor 0 [CPU1 VCORE PG] is Good > OK | 1 | Voltage sensor 1 [System Board 3.3V PG] is Good > OK | 2 | Voltage sensor 2 [System Board 5V PG] is Good > OK | 3 | Voltage sensor 3 [CPU1 PLL PG] is Good > OK | 4 | Voltage sensor 4 [System Board 1.1V PG] is Good > OK | 5 | Voltage sensor 5 [CPU1 M23 VDDQ PG] is Good > OK | 6 | Voltage sensor 6 [CPU1 M23 VTT PG] is Good > OK | 7 | Voltage sensor 7 [System Board FETDRV PG] is Good > OK | 8 | Voltage sensor 8 [CPU1 VSA PG] is Good > OK | 9 | Voltage sensor 9 [CPU1 M01 VDDQ PG] is Good > OK | 10 | Voltage sensor 10 [System Board NDC PG] is Good > OK | 11 | Voltage sensor 11 [CPU1 VTT PG] is Good > OK | 12 | Voltage sensor 12 [System Board 1.5V PG] is Good > OK | 13 | Voltage sensor 13 [PS2 PG Fail] is Good > OK | 14 | Voltage sensor 14 [System Board PS1 PG Fail] is Good > OK | 15 | Voltage sensor 15 [System Board BP1 5V PG] is Good > OK | 16 | Voltage sensor 16 [CPU1 M01 VTT PG] is Good > OK | 17 | Voltage sensor 17 [PS1 Voltage 1] reads 112 V > OK | 0 | Battery probe 0 [System Board CMOS Battery] is Presence Detected > OK | 0 | Amperage probe 0 [PS1 Current 1] reads 0.6 A > OK | 1 | Amperage probe 1 [System Board Pwr Consumption] reads 56 W > OK | 0 | Chassis intrusion 0 detection: Ok (Not Breached) > OK | 0 | SD Card 0 [vFlash] is Absent > ----------------------------------------------------------------------------- > Other messages > ============================================================================= > STATE | MESSAGE TEXT > ---------+------------------------------------------------------------------- > OK | ESM log health is Ok (less than 80% full) > OK | Chassis Service Tag is sane > OOPS! Something is wrong with this server, but I don't know what. The global > system health status is WARNING, but every component check is OK. This may > be a bug in the Nagios plugin, please file a bug report. > > This yields exit code 3 (UNKNOWN). This is a bug. Using blacklisting or check manipulation (such as --no-storage) should disable the global health check. > Now, just for argument's sake, let's say we obviate the check for certified > drives, by commenting out the ????? "workaround for OMSA 7.1.0 bug" code (just > a handy little short-cut).? Here's what we get then: > > onlight at monitor:~$ check_openmanage -H host -C secret -d > System: PowerEdge R720 OMSA version: 7.1.0 > ServiceTag: ####### Plugin version: 3.7.9 > BIOS/date: 1.2.6 05/10/2012 Checking mode: SNMPv2c UDP/IPv4 > ----------------------------------------------------------------------------- > Storage Components > ============================================================================= > STATE | ID | MESSAGE TEXT > ---------+----------+-------------------------------------------------------- > OK | 0 | Controller 0 [PERC H310 Mini] is Ready > WARNING | 0:0:1:0 | Physical Disk 0:1:0 [Ata ST2000DM001-9YN164, 2.0TB] on ctrl 0 is Online > WARNING | 0:0:1:1 | Physical Disk 0:1:1 [Ata ST2000DM001-9YN164, 2.0TB] on ctrl 0 is Online > OK | 0:0 | Logical Drive '/dev/sda' [RAID-1, 1862.50 GB] is Ready > OK | 0:0 | Connector 0 [SAS] on controller 0 is Ready > OK | 0:1 | Connector 1 [SAS] on controller 0 is Ready > OK | 0:0:1 | Enclosure 0:0:1 [Backplane] on controller 0 is Ready > ----------------------------------------------------------------------------- > Chassis Components > ============================================================================= > STATE | ID | MESSAGE TEXT > ---------+------+------------------------------------------------------------ > OK | 0 | Memory module 0 [DIMM_A1, 4096 MB] is Ok > OK | 1 | Memory module 1 [DIMM_A2, 4096 MB] is Ok > OK | 2 | Memory module 2 [DIMM_A3, 4096 MB] is Ok > OK | 3 | Memory module 3 [DIMM_A4, 4096 MB] is Ok > OK | 0 | Chassis fan 0 [System Board Fan1 RPM] reading: 1080 RPM > OK | 1 | Chassis fan 1 [System Board Fan2 RPM] reading: 1200 RPM > OK | 2 | Chassis fan 2 [System Board Fan3 RPM] reading: 1200 RPM > OK | 3 | Chassis fan 3 [System Board Fan4 RPM] reading: 1080 RPM > OK | 4 | Chassis fan 4 [System Board Fan5 RPM] reading: 1080 RPM > OK | 5 | Chassis fan 5 [System Board Fan6 RPM] reading: 1200 RPM > OK | 0 | Power Supply 0 [AC]: Presence detected > OK | 0 | Temperature Probe 0 [System Board Inlet Temp] reads 26 C (min=3/-7, max=42/47) > OK | 1 | Temperature Probe 1 [System Board Exhaust Temp] reads 33 C (min=8/3, max=70/75) > OK | 2 | Temperature Probe 2 [CPU1 Temp] reads 48 C (min=8/3, max=83/88) > OK | 0 | Processor 0 [Intel Xeon E5-2603 0 1.80GHz] is Present > OK | 0 | Voltage sensor 0 [CPU1 VCORE PG] is Good > OK | 1 | Voltage sensor 1 [System Board 3.3V PG] is Good > OK | 2 | Voltage sensor 2 [System Board 5V PG] is Good > OK | 3 | Voltage sensor 3 [CPU1 PLL PG] is Good > OK | 4 | Voltage sensor 4 [System Board 1.1V PG] is Good > OK | 5 | Voltage sensor 5 [CPU1 M23 VDDQ PG] is Good > OK | 6 | Voltage sensor 6 [CPU1 M23 VTT PG] is Good > OK | 7 | Voltage sensor 7 [System Board FETDRV PG] is Good > OK | 8 | Voltage sensor 8 [CPU1 VSA PG] is Good > OK | 9 | Voltage sensor 9 [CPU1 M01 VDDQ PG] is Good > OK | 10 | Voltage sensor 10 [System Board NDC PG] is Good > OK | 11 | Voltage sensor 11 [CPU1 VTT PG] is Good > OK | 12 | Voltage sensor 12 [System Board 1.5V PG] is Good > OK | 13 | Voltage sensor 13 [PS2 PG Fail] is Good > OK | 14 | Voltage sensor 14 [System Board PS1 PG Fail] is Good > OK | 15 | Voltage sensor 15 [System Board BP1 5V PG] is Good > OK | 16 | Voltage sensor 16 [CPU1 M01 VTT PG] is Good > OK | 17 | Voltage sensor 17 [PS1 Voltage 1] reads 114 V > OK | 0 | Battery probe 0 [System Board CMOS Battery] is Presence Detected > OK | 0 | Amperage probe 0 [PS1 Current 1] reads 0.6 A > OK | 1 | Amperage probe 1 [System Board Pwr Consumption] reads 56 W > OK | 0 | Chassis intrusion 0 detection: Ok (Not Breached) > OK | 0 | SD Card 0 [vFlash] is Absent > ----------------------------------------------------------------------------- > Other messages > ============================================================================= > STATE | MESSAGE TEXT > ---------+------------------------------------------------------------------- > OK | ESM log health is Ok (less than 80% full) > OK | Chassis Service Tag is sane > > Again, as with the original case, exit code is 1 (WARNING). > > Is there any way around this?? Should I be disabling global health checks?? Openmanage contains a bug that flips the reported warning level wrt. certified disks. Any certified disks are reported as non-certified and vice versa. The output above is expected when you remove the workaround in the code. > Here's a run to test that, and it works: > > onlight at monitor:~$ check_openmanage -H host -C secret -b pdisk=all > OK - System: 'PowerEdge R720', SN: '#######', 16 GB ram (4 dimms), 1 logical drives, 2 physical drives Here, the physical disks aren't checked at all, and the global check is correctly disabled, so this is an expected result. > Interestingly, when combining the blacklist with debug ("-d -b pdisk=all"), the > exit code is 3 (UNKNOWN), but with debug off, it's 0 (OK). Sounds like a bug, perhaps related to the one discussed earlier. > So, I guess what I'm wondering is why we need to blacklist the physical disks > (pdisk) instead of using --no-storage?? Shouldn't --no-storage also cause > globalstatus to be ignored? Yes it should, I'll look into that, thanks for the report :) Regarding the non-certified disks problem... There is a special blacklisting keyword to suppress the message about non-certified disks: check_openmanage -b pdisk_cert=all Please try this and see if it resolves your issue. Using blacklisting should also disable the global health check. Regards, -- Trond H. Amundsen Center for Information Technology Services, University of Oslo ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From Stephan.Lorenz at medizin.uni-leipzig.de Wed Jun 19 10:57:38 2013 From: Stephan.Lorenz at medizin.uni-leipzig.de (Lorenz, Stephan) Date: Wed, 19 Jun 2013 08:57:38 +0000 Subject: Nagios openmanage ERROR: XML transformation failed Message-ID: <06D1875C585C904E81F8DB7FA23D46A627EB6857@S050003234.medizin.uni-leipzig.de> Dear users, since installing libxml2, libxml2-devel and curl, the Nagios installation on our Dell R720xd server reports XML errors. Problem running 'omreport storage controller': Error! XML Transformation failed
Problem running 'omreport chassis memory': Error! XML Transformation failed
Problem running 'omreport chassis fans': Error! XML Transformation failed
Problem running 'omreport chassis pwrsupplies': Error! XML Transformation failed
Problem running 'omreport chassis temps': Error! XML Transformation failed
Problem running 'omreport chassis processors': Error! XML Transformation failed
Problem running 'omreport chassis volts': Error! XML Transformation failed
Problem running 'omreport chassis batteries': Error! XML Transformation failed
Problem running 'omreport chassis pwrmonitoring': Error! XML Transformation failed
Problem running 'omreport chassis intrusion': Error! XML Transformation failed
Problem running 'omreport chassis removableflashmedia': Error! XML Transformation failed
Chassis Service Tag is bogus: 'N/A' I am using Nagios 3.5.1, check_openmanage 3.7.9, Openmanage 7.2.0 on Centos 6.4 2.6.32-358.11.1.el6.centos.plus.x86_64. When I run check_openmanage or omreport manually everything is fine. I tried to reinstall nagios-plugins-openmanage and php-xml for a start, but that did not help. I cannot remove libxml2 and the rest since it is needed elsewhere. Does anyone have a suggestion of how to fix this error? Thanks in advance Stephan -- Universit?t Leipzig K?R Medizinische Fakult?t Klinik und Poliklinik f?r Endokrinologie und Nephrologie Zentrales Forschungszentrum R 1053 Liebigstr. 21 04103 Leipzig Dr. rer. nat. Stephan Lorenz Diplom-Biochemiker wissenschaftlicher Mitarbeiter Fon: +49 341 97 12512 Fax: +49 341 97 13299 eMail: stephan.lorenz at medizin.uni-leipzig.de Web: www.medizin.uni-leipzig.de Rektorin der Universit?t Professor Dr. Beate Sch?cking Ritterstra?e 26, 04109 Leipzig Zust?ndige Aufsichtsbeh?rde: S?chsisches Staatsministerium f?r Wissenschaft und Kunst, Wigardstra?e 17, 01097 Dresden, www.smwk.de UStNr gem?? ? 27 a Umsatzsteuergesetz: DE 141510383 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From nic at onlight.com Wed Jun 19 14:55:05 2013 From: nic at onlight.com (Nic Bernstein) Date: Wed, 19 Jun 2013 07:55:05 -0500 Subject: Problem with check_openmanage plugin and storage In-Reply-To: References: Message-ID: <51C1AA29.3080704@onlight.com> On 06/18/2013 07:55 PM, nagios-users-request at lists.sourceforge.net wrote: > Date: Wed, 19 Jun 2013 02:41:02 +0200 From: Trond Hasle Amundsen > Subject: Re: [Nagios-users] Problem with > check_openmanage plugin and storage To: Nagios Users List > Message-ID: > <15tk3lrrkyp.fsf at tux.uio.no> Content-Type: text/plain; charset=utf-8 > Nic Bernstein writes: >> > We've recently been experimenting with Trond Hasle Amundsen's check_openmanage >> > on a large network with about a hundred Dell servers of various ages, >> > capabilities, etc.? Mostly PE-2950, R210, R410 and R720.? Much thanks to Trond >> > for all his great work on Nagios plugins and other projects, by the way. >> > >> > We've hit a wall, however, with the storage monitoring aspects of this plugin. >> > >> > For example, here's a quite specific case.? This is a new PE R720, in debug: >> > >> > onlight at monitor:~$ check_openmanage -H host -C secret -d >> > System: PowerEdge R720 OMSA version: 7.1.0 >> > ServiceTag: ####### Plugin version: 3.7.9 >> > BIOS/date: 1.2.6 05/10/2012 Checking mode: SNMPv2c UDP/IPv4 >> > ----------------------------------------------------------------------------- >> > Storage Components >> > ============================================================================= >> > STATE | ID | MESSAGE TEXT >> > ---------+----------+-------------------------------------------------------- >> > OK | 0 | Controller 0 [PERC H310 Mini] is Ready >> > WARNING | 0:0:1:0 | Physical Disk 0:1:0 [Ata ST2000DM001-9YN164, 2.0TB] on ctrl 0 is Online, Not Certified >> > WARNING | 0:0:1:1 | Physical Disk 0:1:1 [Ata ST2000DM001-9YN164, 2.0TB] on ctrl 0 is Online, Not Certified >> > OK | 0:0 | Logical Drive '/dev/sda' [RAID-1, 1862.50 GB] is Ready >> > OK | 0:0 | Connector 0 [SAS] on controller 0 is Ready >> > OK | 0:1 | Connector 1 [SAS] on controller 0 is Ready >> > OK | 0:0:1 | Enclosure 0:0:1 [Backplane] on controller 0 is Ready >> [...] >> > This run exits with 1 (WARNING). >> > >> > We're not sure we agree with the decision to make the fact that a disk is not >> > Dell Certified a Warning, but we can at least understand that.? So, what if we >> > exclude storage, with --no-storage? > The decision to create a warning for non-certified disks belongs to > Dell. I've tried to let the plugin simply relay the warning level from > Openmanage, unless it's outright wrong (such as reporting disks in > predictive failure as OK). Yes, we completely understand that, and the use of the global status flag. I should have been clearer that we get that it wasn't your choice. >> > onlight at monitor:~$ check_openmanage -H host -C secret -d --no-storage >> > System: PowerEdge R720 OMSA version: 7.1.0 >> > ServiceTag: ####### Plugin version: 3.7.9 >> > BIOS/date: 1.2.6 05/10/2012 Checking mode: SNMPv2c UDP/IPv4 >> > ----------------------------------------------------------------------------- >> > [...] >> > OOPS! Something is wrong with this server, but I don't know what. The global >> > system health status is WARNING, but every component check is OK. This may >> > be a bug in the Nagios plugin, please file a bug report. >> > >> > This yields exit code 3 (UNKNOWN). > This is a bug. Using blacklisting or check manipulation (such as > --no-storage) should disable the global health check. Okay, that's what we'd expect. >> > Now, just for argument's sake, let's say we obviate the check for certified >> > drives, by commenting out the ????? "workaround for OMSA 7.1.0 bug" code (just >> > a handy little short-cut).? Here's what we get then: >> > >> [...] >> > Again, as with the original case, exit code is 1 (WARNING). >> > >> > Is there any way around this?? Should I be disabling global health checks?? > Openmanage contains a bug that flips the reported warning level > wrt. certified disks. Any certified disks are reported as non-certified > and vice versa. The output above is expected when you remove the > workaround in the code. > >> > Here's a run to test that, and it works: >> > >> > onlight at monitor:~$ check_openmanage -H host -C secret -b pdisk=all >> > OK - System: 'PowerEdge R720', SN: '#######', 16 GB ram (4 dimms), 1 logical drives, 2 physical drives > Here, the physical disks aren't checked at all, and the global check is > correctly disabled, so this is an expected result. > >> > Interestingly, when combining the blacklist with debug ("-d -b pdisk=all"), the >> > exit code is 3 (UNKNOWN), but with debug off, it's 0 (OK). > Sounds like a bug, perhaps related to the one discussed earlier. > >> > So, I guess what I'm wondering is why we need to blacklist the physical disks >> > (pdisk) instead of using --no-storage?? Shouldn't --no-storage also cause >> > globalstatus to be ignored? > Yes it should, I'll look into that, thanks for the report :) Great, thanks! > Regarding the non-certified disks problem... There is a special > blacklisting keyword to suppress the message about non-certified disks: > > check_openmanage -b pdisk_cert=all > > Please try this and see if it resolves your issue. Using blacklisting > should also disable the global health check. Ah, that's just what we need. Much appreciated... No, that doesn't seem to be in my version (3.7.9, downloaded yesterday) onlight at monitor:~$ perl check_openmanage -H host -C secret -b pdisk_cert=all Physical Disk 0:1:0 [Ata ST2000DM001-9YN164, 2.0TB] on ctrl 0 is Online Physical Disk 0:1:1 [Ata ST2000DM001-9YN164, 2.0TB] on ctrl 0 is Online onlight at monitor:~$ echo $? 1 I guess I'll wait for a patch. Say Trond, I sent you some notes last week about enhancements we made to your check_linux_bonding plugin. Would you prefer I re-post those to the list instead? Thanks again! -nic -- Nic Bernstein nic at onlight.com Onlight, Inc. www.onlight.com 219 N. Milwaukee St., Suite 2a v. 414.272.4477 Milwaukee, Wisconsin 53202 -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From maestin at gmail.com Wed Jun 19 16:55:35 2013 From: maestin at gmail.com (martin Rodriguez) Date: Wed, 19 Jun 2013 11:55:35 -0300 Subject: install WMI Message-ID: Hi I have a problem with the WMI plugin. Someone knows how I can fix it # /usr/local/nagios/libexec/./check_wmi_plus.pl -H 10.4.134.25 -m checkcpu -u xxx -p xxxx Scalar found where operator expected at /wmi/check_wmi_plus.conf line 58, near " $usage_db_file" (Missing semicolon on previous line?) Configuration File Error with /wmi/check_wmi_plus.conf (mostly likely a syntax e rror) at /usr/local/nagios/libexec/./check_wmi_plus.pl line 259. root at Monitor:~# /usr/local/nagios/libexec/./check_wmi_plus.pl -H 10.4.134.25 -m checkcpu -u administrador -p Pa$$w0rd Scalar found where operator expected at /wmi/check_wmi_plus.conf line 58, near "$usage_db_file" (Missing semicolon on previous line?) Configuration File Error with /wmi/check_wmi_plus.conf (mostly likely a syntax error) at /usr/local/nagios/libexec/./check_wmi_plus.pl line 259. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From sunil at sunil.cc Wed Jun 19 18:25:14 2013 From: sunil at sunil.cc (Sunil Sankar) Date: Wed, 19 Jun 2013 21:55:14 +0530 Subject: install WMI In-Reply-To: References: Message-ID: paste the check_wmi_plus.conf lets see where the error is On Wed, Jun 19, 2013 at 8:25 PM, martin Rodriguez wrote: > Hi I have a problem with the WMI plugin. Someone knows how I can fix it > > # /usr/local/nagios/libexec/./check_wmi_plus.pl -H 10.4.134.25 -m > checkcpu -u xxx -p xxxx > Scalar found where operator expected at /wmi/check_wmi_plus.conf line 58, > near " > $usage_db_file" > (Missing semicolon on previous line?) > Configuration File Error with /wmi/check_wmi_plus.conf (mostly likely a > syntax e > rror) at /usr/local/nagios/libexec/./check_wmi_plus.pl line > 259. > root at Monitor:~# /usr/local/nagios/libexec/./check_wmi_plus.pl -H > 10.4.134.25 -m checkcpu -u administrador -p Pa$$w0rd > Scalar found where operator expected at /wmi/check_wmi_plus.conf line 58, > near "$usage_db_file" > (Missing semicolon on previous line?) > Configuration File Error with /wmi/check_wmi_plus.conf (mostly likely a > syntax error) at /usr/local/nagios/libexec/./check_wmi_plus.pl line 259. > > > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > Nagios-users mailing list > Nagios-users at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null > -- Regards Sunil Sankar -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From t.h.amundsen at usit.uio.no Wed Jun 19 21:04:13 2013 From: t.h.amundsen at usit.uio.no (Trond Hasle Amundsen) Date: Wed, 19 Jun 2013 21:04:13 +0200 Subject: Problem with check_openmanage plugin and storage In-Reply-To: <51C1AA29.3080704@onlight.com> (Nic Bernstein's message of "Wed, 19 Jun 2013 07:55:05 -0500") References: <51C1AA29.3080704@onlight.com> Message-ID: <15thagtrkgi.fsf@tux.uio.no> Nic Bernstein writes: > Regarding the non-certified disks problem... There is a special > blacklisting keyword to suppress the message about non-certified disks: > > check_openmanage -b pdisk_cert=all > > Please try this and see if it resolves your issue. Using blacklisting > should also disable the global health check. > > > Ah, that's just what we need. Much appreciated... > > No, that doesn't seem to be in my version (3.7.9, downloaded yesterday) > > onlight at monitor:~$ perl check_openmanage -H host -C secret -b pdisk_cert=all > Physical Disk 0:1:0 [Ata ST2000DM001-9YN164, 2.0TB] on ctrl 0 is Online > Physical Disk 0:1:1 [Ata ST2000DM001-9YN164, 2.0TB] on ctrl 0 is Online > onlight at monitor:~$ echo $? > 1 > > I guess I'll wait for a patch. Are you sure you didn't test this with the 7.1.0 workaround manually removed? > Say Trond, I sent you some notes last week about enhancements we made to your > check_linux_bonding plugin. Would you prefer I re-post those to the list > instead? Sorry for being non-responsive of late. I've been swamped at work lately and have attained somewhat of an email backlog. No need to resend :) Regards, -- Trond H. Amundsen Center for Information Technology Services, University of Oslo ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From t.h.amundsen at usit.uio.no Wed Jun 19 21:10:24 2013 From: t.h.amundsen at usit.uio.no (Trond Hasle Amundsen) Date: Wed, 19 Jun 2013 21:10:24 +0200 Subject: Nagios openmanage ERROR: XML transformation failed In-Reply-To: <06D1875C585C904E81F8DB7FA23D46A627EB6857@S050003234.medizin.uni-leipzig.de> (Stephan Lorenz's message of "Wed, 19 Jun 2013 08:57:38 +0000") References: <06D1875C585C904E81F8DB7FA23D46A627EB6857@S050003234.medizin.uni-leipzig.de> Message-ID: <15td2rhrk67.fsf@tux.uio.no> "Lorenz, Stephan" writes: > since installing libxml2, libxml2-devel and curl, the Nagios installation on > our Dell R720xd server reports XML errors. > > > > Problem running 'omreport storage controller': Error! XML Transformation failed >
Problem running 'omreport chassis memory': Error! XML Transformation > failed
Problem running 'omreport chassis fans': Error! XML Transformation > failed
Problem running 'omreport chassis pwrsupplies': Error! XML > Transformation failed
Problem running 'omreport chassis temps': Error! XML > Transformation failed
Problem running 'omreport chassis processors': Error! > XML Transformation failed
Problem running 'omreport chassis volts': Error! > XML Transformation failed
Problem running 'omreport chassis batteries': > Error! XML Transformation failed
Problem running 'omreport chassis > pwrmonitoring': Error! XML Transformation failed
Problem running 'omreport > chassis intrusion': Error! XML Transformation failed
Problem running > 'omreport chassis removableflashmedia': Error! XML Transformation failed
> Chassis Service Tag is bogus: 'N/A' > > > > I am using Nagios 3.5.1, check_openmanage 3.7.9, Openmanage 7.2.0 on Centos 6.4 > 2.6.32-358.11.1.el6.centos.plus.x86_64. > > > > When I run check_openmanage or omreport manually everything is fine. I tried to > reinstall nagios-plugins-openmanage and php-xml for a start, but that did not > help. I cannot remove libxml2 and the rest since it is needed elsewhere. > > > > Does anyone have a suggestion of how to fix this error? Given that it works when you run the commands manually I'm suspecting some sort of permission issue. Try running the commands as the NRPE user, and also try running it from Nagios with SELinux in permissive mode (needs to be run by the NRPE daemon with the correct SELinux domain). Check out this link about using check_openmanage with SELinux in enforcing mode: http://folk.uio.no/trondham/software/check_openmanage.html#selinux-considerations Regards, -- Trond H. Amundsen Center for Information Technology Services, University of Oslo ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From alexmiroslav at gmail.com Thu Jun 20 09:13:29 2013 From: alexmiroslav at gmail.com (Aleksandr Miroslav) Date: Thu, 20 Jun 2013 03:13:29 -0400 Subject: nrpe errors when too many hosts Message-ID: hi, We have nagios server that is working for long time. We note recently when we add new hosts, all NRPE checks fail with: Return code of 127 is out of bounds - plugin may be missing However, the plugin is there and we know it run fine from command line. Only when some new hosts added this happens. When you take new hosts out, errors go away. We don't have too many hosts in our file, only about 250. This problem is identical to the one find here: http://readlist.com/lists/lists.sourceforge.net/nagios-users/5/29272.html Alex -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From nic at onlight.com Thu Jun 20 14:46:39 2013 From: nic at onlight.com (Nic Bernstein) Date: Thu, 20 Jun 2013 07:46:39 -0500 Subject: Issues with check_openmanage documentation URLs Message-ID: <51C2F9AF.8000609@onlight.com> Friends, We love the embedded URL option on the check_openmanage plugin, but it doesn't properly format them for models with trailing revision numbers, like the "PowerEdge R210 II". The URL produced is of this form: http://support.dell.com/support/edocs/systems/per210%20ii/ when it should be: http://www.dell.com/support/Manuals/us/en/555/Product/poweredge-r210-2 I would just work up a patch, but it's obvious that this is in an entirely different branch of the Dell website, so before starting, I'm curious just where the original documentation URL schema was derived from? Cheers, -nic -- Nic Bernstein nic at onlight.com Onlight, Inc. www.onlight.com 219 N. Milwaukee St., Suite 2a v. 414.272.4477 Milwaukee, Wisconsin 53202 ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From phil.randal at hoopleltd.co.uk Thu Jun 20 17:50:52 2013 From: phil.randal at hoopleltd.co.uk (Randal, Phil) Date: Thu, 20 Jun 2013 15:50:52 +0000 Subject: Issues with check_openmanage documentation URLs In-Reply-To: <51C2F9AF.8000609@onlight.com> References: <51C2F9AF.8000609@onlight.com> Message-ID: <7CA580B59C1ABD45B4614ED90D4C7B85419B71C4@HC-EXMBX03.herefordshire.gov.uk> Hmm, Dell have changed the urls yet again... Seems to be no way to automagically determine the url from the product type - uk doc page for that server is: http://www.dell.com/support/Manuals/uk/en/ukdhs1/Product/poweredge-r210-2 Thanks very much, Dell :-( This will also affect check_esxi_hardware.py, which uses the same l;ogic as check_openmanage to produce documantation urls. Phil -----Original Message----- From: Nic Bernstein [mailto:nic at onlight.com] Sent: 20 June 2013 13:47 To: nagios-users at lists.sourceforge.net Subject: [Nagios-users] Issues with check_openmanage documentation URLs Friends, We love the embedded URL option on the check_openmanage plugin, but it doesn't properly format them for models with trailing revision numbers, like the "PowerEdge R210 II". The URL produced is of this form: http://support.dell.com/support/edocs/systems/per210%20ii/ when it should be: http://www.dell.com/support/Manuals/us/en/555/Product/poweredge-r210-2 I would just work up a patch, but it's obvious that this is in an entirely different branch of the Dell website, so before starting, I'm curious just where the original documentation URL schema was derived from? Cheers, -nic -- Nic Bernstein nic at onlight.com Onlight, Inc. www.onlight.com 219 N. Milwaukee St., Suite 2a v. 414.272.4477 Milwaukee, Wisconsin 53202 ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null Hoople Ltd, Registered in England and Wales No. 7556595 Registered office: Plough Lane, Hereford, HR4 0LE "Any opinion expressed in this e-mail or any attached files are those of the individual and not necessarily those of Hoople Ltd. You should be aware that Hoople Ltd. monitors its email service. This e-mail and any attached files are confidential and intended solely for the use of the addressee. This communication may contain material protected by law from being passed on. If you are not the intended recipient and have received this e-mail in error, you are advised that any use, dissemination, forwarding, printing or copying of this e-mail is strictly prohibited. If you have received this e-mail in error please contact the sender immediately and destroy all copies of it." ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From bennycrampton at gmail.com Thu Jun 20 18:51:15 2013 From: bennycrampton at gmail.com (Benny Crampton) Date: Thu, 20 Jun 2013 12:51:15 -0400 Subject: nrpe errors when too many hosts In-Reply-To: References: Message-ID: The error itself just means that Nagios can't find the plugin that you've told it to use. Every time I've ever encountered this error it boils down to something very simple; normally a typo or a mis-pasted file path. When you added the hosts did you also add the service, or were you using a service that has already been defined and used previously? Too, did you make any changes to your check definitions at all? On Thu, Jun 20, 2013 at 3:13 AM, Aleksandr Miroslav wrote: > hi, > > We have nagios server that is working for long time. > > We note recently when we add new hosts, all NRPE checks fail with: > > Return code of 127 is out of bounds - plugin may be missing > > However, the plugin is there and we know it run fine from command line. > Only when some new hosts added this happens. When you take new hosts > out, errors go away. > > We don't have too many hosts in our file, only about 250. > > This problem is identical to the one find here: > > http://readlist.com/lists/lists.sourceforge.net/nagios-users/5/29272.html > > > Alex > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > Nagios-users mailing list > Nagios-users at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From william at leibzon.org Thu Jun 20 19:26:46 2013 From: william at leibzon.org (William Leibzon) Date: Thu, 20 Jun 2013 10:26:46 -0700 Subject: nrpe errors when too many hosts In-Reply-To: References: Message-ID: Do you have servicegroups defined? How many services does it show in the largest service group? On Thu, Jun 20, 2013 at 12:13 AM, Aleksandr Miroslav wrote: > hi, > > We have nagios server that is working for long time. > > We note recently when we add new hosts, all NRPE checks fail with: > > Return code of 127 is out of bounds - plugin may be missing > > However, the plugin is there and we know it run fine from command line. Only > when some new hosts added this happens. When you take new hosts out, > errors go away. > > We don't have too many hosts in our file, only about 250. > > This problem is identical to the one find here: > > http://readlist.com/lists/lists.sourceforge.net/nagios-users/5/29272.html > > > Alex > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > Nagios-users mailing list > Nagios-users at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when reporting > any issue. > ::: Messages without supporting info will risk being sent to /dev/null ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From psk at psk.net Fri Jun 21 14:32:30 2013 From: psk at psk.net (Percy Kwong) Date: Fri, 21 Jun 2013 08:32:30 -0400 Subject: Nagios Plugin for IPTABLES Monitoring In-Reply-To: <9bfb3d7160274037a5ec1838584ac8db@DBXPR03MB062.eurprd03.prod.outlook.com> References: <7AD7AD85066563428C24C1317A33EF8350B31BB7@genoa.ucstaff.win.canberra.edu.au> <9bfb3d7160274037a5ec1838584ac8db@DBXPR03MB062.eurprd03.prod.outlook.com> Message-ID: <51C447DE.8070000@psk.net> adjust your awk statement. I bet you the output is shifted one field to the left or right. Cheers. -Percy On 5/14/2013 4:43 AM, Deborah Martin wrote: > > Hi, > > What is the wrong output being returned ? This might give us all a > clue as to the cause of the problem. > > When you run the check manually, are you doing this as the same user > that check_nrpe will use ? > > Regards, > > Deborah > > *From:*Thilakraj.Shanmugam [mailto:Thilakraj.Shanmugam at canberra.edu.au] > *Sent:* 14 May 2013 08:43 > *To:* nagios-users at lists.sourceforge.net > *Subject:* [Nagios-users] Nagios Plugin for IPTABLES Monitoring > > Greetings! > > Could someone send me nagios plugin which is tested and works well for > monitoring IPTABLES in Linux. > > I have tested below script but it is not returning correct output to > nagios server. > > If I execute script manually, it shows correct output... > > But if I execute via ./check_nrpe -- H localhost --c check_iptables, > it shows wrong output. > > Below is my plugin > > ------------------------------ > > #!/bin/bash > > set -x > > IPT='/sbin/iptables' > > GREP='/bin/grep' > > AWK='/bin/awk' > > EXPR='/usr/bin/expr' > > WC='/usr/bin/wc' > > A='/usr/bin/sudo' > > E_SUCCESS="0" > > E_CRITICAL="2" > > E_UNKNOWN="3" > > CHAINS=`$A $IPT -nvL | $GREP 'Chain' | $AWK '{ print $2 }'| $GREP Cid > | $WC -l` > > if [ $CHAINS -ne 0 ] ; then > > echo "Firewall is running!" > > exit ${E_SUCCESS} > > elif [ $CHAINS -eq 0 ] ; then > > echo "Firewall is not running" > > exit ${E_CRITICAL} > > fi > > > Untitled Document > > This e-mail and any files transmitted with it are strictly > confidential and intended solely for the use of the individual or > entity to whom they are addressed. If you are not the intended > recipient, please delete this e-mail immediately. Any unauthorised > distribution or copying is strictly prohibited. > > Whilst Kognitio endeavours to prevent the transmission of viruses via > e-mail, we cannot guarantee that any e-mail or attachment is free from > computer viruses and you are strongly advised to undertake your own > anti-virus precautions. Kognitio grants no warranties regarding > performance, use or quality of any e-mail or attachment and undertakes > no liability for loss or damage, howsoever caused. > > > > ------------------------------------------------------------------------------ > AlienVault Unified Security Management (USM) platform delivers complete > security visibility with the essential security capabilities. Easily and > efficiently configure, manage, and operate all of your security controls > from a single console and one unified framework. Download a free trial. > http://p.sf.net/sfu/alienvault_d2d > > > _______________________________________________ > Nagios-users mailing list > Nagios-users at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From bluethundr at gmail.com Sun Jun 23 19:51:04 2013 From: bluethundr at gmail.com (Tim Dunphy) Date: Sun, 23 Jun 2013 13:51:04 -0400 Subject: bacula plugin incorrect output Message-ID: Hey all, I've tried adding a plugin from the exchange for bacula backups. The output is not quite right and I think my definitions may be off. I'd like to try for some advice. This is the output: Check Bacula Last Backup UNKNOWN06-23-2013 13:45:490d 0h 4m 8s3/3check_bacula_lastbackup.pl 1.0 Nagios Plugin My command definition is this: # A command to check bacula last backup define command{ command_name check_bacula_last_backup command_line $USER1$/check_bacula_lastbackup.pl -H HOSTADDRESS -client $ARG1$ } # Define a service to check last bacula backup on the local machine. define service{ use generic-service host_name nagios service_description Check Bacula Last Backup contact_groups linux-admins check_command check_bacula_last_backup! cloud.mydomain.com notifications_enabled 1 } If I run the check locally the output appears correct: [root at cloud:~] #/usr/local/nagios/libexec/check_bacula_lastbackup.pl-client cloud.mydomain.com OK: Last backup for cloud.mydomain.com was 10:43 hours ago.mydomain And these are the ownership/permissions on the script: [root at cloud:~] #ls -l /usr/local/nagios/libexec/check_bacula_lastbackup.pl -rwxrwxr-x 1 nagios nagios 4335 Jun 23 12:32 /usr/local/nagios/libexec/ check_bacula_lastbackup.pl It'd be great if I could get an opinion on this and much appreciated. Thanks Tim -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From saurabh.85 at gmail.com Mon Jun 24 07:52:32 2013 From: saurabh.85 at gmail.com (Saurabh kumar) Date: Mon, 24 Jun 2013 11:22:32 +0530 Subject: check_mysql_query error [nagios plugin] Message-ID: Hi all, I am using check_mysql_query plugin and facing different errors with different instance of usage. Following are the configurations and error thrown by each one of them : 1.) define command{ command_name check_slow_query command_line $USER1$/check_mysql_query -H $HOSTADDRESS$ -P 3306 -d INFORMATION_SCHEMA -u username -p passwd -w 1 -c 3 -q 'SELECT COUNT(*) FROM processlist WHERE COMMAND = "Query" AND time > 300' } define service{ host_name host1 service_description Slow running queries contact_groups alerts max_check_attempts 2 check_period 24x7 check_command check_slow_query } [1372051002] INITIAL SERVICE STATE: slave-cia-db00;Slow running queries;CRITICAL;HARD;2;QUERY CRITICAL: Error with query - *Unknown column '300$' in 'where clause'* 2.) define command{ command_name check_daily_revenue command_line $USER1$/check_mysql_query -H $HOSTADDRESS$ -P 3306 -d db_name -u user -p passwd -c 10: -q "SELECT SUM(revenue) FROM table1 WHERE date = (SELECT MAX(date) FROM table1)" } define service{ host_name host1 service_description Todays total revenue from table1 contact_groups alerts max_check_attempts 2 check_period 24x7 check_command check_daily_revenue } [1372051002] INITIAL SERVICE STATE: slave-cia-db00;Todays total revenue from table1;CRITICAL;HARD;2;QUERY CRITICAL: Error with query - You have an error in your SQL syntax: check the manual that corresponds to your MySQL server version for the right syntax to use near '$' at line 1 3.) define command{ command_name count_sources command_line $USER1$/check_mysql_query -H $HOSTADDRESS$ -P 3306 -d db_name -u user -p passwd -q 'SELECT COUNT(*) FROM sources' } define service{ host_name host1 service_description count sources contact_groups alerts max_check_attempts 2 check_period 24x7 check_command count_sources } [1372051002] INITIAL SERVICE STATE: slave-cia-db00;Slow running Crawl Jobs;CRITICAL;HARD;2;QUERY CRITICAL: Error with query - Table '* cmp_pricing.sources$*' *doesn't exist* Note: The password above contains a $ at the end ex: 'passwd$' Anyone knows what could be the problem ? -Saurabh -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From Troy.Lea at strategicgroup.net.au Sun Jun 23 23:36:48 2013 From: Troy.Lea at strategicgroup.net.au (Troy Lea) Date: Mon, 24 Jun 2013 07:36:48 +1000 Subject: Broken link on check_mrtg Manpage Message-ID: <61C1A69FC98C924884D332BC7C6B517901D4BB5775@VAULT10.VAULT.local> Not sure if this is the right place to report this. On the following page: http://nagiosplugins.org/man/check_mrtg Under the Notes: section - MRTG stands for the Multi Router Traffic Grapher. It can be downloaded from http://ee-staff.ethz.ch/~oetiker/webtools/mrtg/mrtg.html The link does not work, I get a 404 Not Found error. Troy Lea Senior VAULT Infrastructure Engineer VCP3,VCP4,VTSP,VSP,MCP P: 1300 857 048 F: 02 4962 5102 T: @builtontrust troy.lea at strategicgroup.net.au [cid:image001.jpg at 01CE70AD.9603FD20] ________________________________ Confidentiality: The contents of this e-mail are confidential. The contents are intended only for the named recipient of this e-mail. If the reader of this e-mail is not the intended recipient you are hereby notified that any use, reproduction, disclosure or distribution of the information contained in the e-mail is prohibited. If you have received this e-mail in error please reply to us immediately and delete the document. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image001.jpg Type: image/jpeg Size: 31898 bytes Desc: image001.jpg URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From Anders.DaSilva at softronic.se Tue Jun 25 11:04:55 2013 From: Anders.DaSilva at softronic.se (Anders.DaSilva at softronic.se) Date: Tue, 25 Jun 2013 09:04:55 +0000 Subject: No "performance data" when increasing "normal_check_interval" time In-Reply-To: References: Message-ID: Hi. the .rrd files has a "heartbeat" set when they are generated. Is it nagiosgraph that you are using? in that case i think the solution is to regenerate .rrd files according to the new check_interval... Regards Anders da Silva ________________________________ Fr?n: Pankaj Sain [pankaj.sain at netprophetsglobal.com] Skickat: den 7 maj 2013 08:36 Till: Nagios Users List ?mne: [Nagios-users] No "performance data" when increasing "normal_check_interval" time Hi!, When i increase "normal_check_interval" value from default(10min.) to 15min. or greater for any service, nagios stops receiving performance data in .rrd files. I also checked it with increasing "perfdata_timeout" value but it didn't work. Nagios version: Nagios Core 3.5.0 or 3.2.3 plugin version: nagios-plugins-1.4.16 Nagiosgraph version: nagiosgraph-1.4.4 OS: Redhat 5.8 64bit Service configuration: define service{ use generic-service host_name Server-10 service_description Disk-Boot Partition Free Space check_command check_nrpe!check_disk_boot_partition normal_check_interval 30 } PS: Services with default "normal_check_interval" value, giving perfdata continually. Any help would be appreciated. Thanks, Pankaj Sain Please do not print this mail unless it is necessary... This e-mail, together with any attachments, is confidential, and may be privileged. It may be read, copied and used only by the intended recipient. Access to this e-mail or any of its attachments by anyone else and disclosure or copying of its contents or any action taken (or not taken) in reliance on it, is unauthorized and may be unlawful. If you have received it in error, please notify the sender immediately by e-mail or telephone and destroy all copies of this message and any attachments. -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From ck at claudiokuenzler.com Tue Jun 25 21:53:02 2013 From: ck at claudiokuenzler.com (Claudio Kuenzler) Date: Tue, 25 Jun 2013 21:53:02 +0200 Subject: No "performance data" when increasing "normal_check_interval" time In-Reply-To: References: Message-ID: > Is it nagiosgraph that you are using? in that case i think the solution is > to regenerate .rrd files according to the new check_interval... > ... and increase the heartbeat value in nagiosgraph.conf -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From peterwood.sd at gmail.com Tue Jun 25 22:34:17 2013 From: peterwood.sd at gmail.com (Peter Wood) Date: Tue, 25 Jun 2013 13:34:17 -0700 Subject: bacula plugin incorrect output In-Reply-To: References: Message-ID: Have you tried: command_line $USER1$/check_bacula_lastbackup.pl -client $ARG1$ The way it is now you are running: check_bacula_lastbackup.pl -H HOSTADDRESS -client cloud.mydomain.com On Sun, Jun 23, 2013 at 10:51 AM, Tim Dunphy wrote: > Hey all, > > I've tried adding a plugin from the exchange for bacula backups. The > output is not quite right and I think my definitions may be off. I'd like > to try for some advice. > > This is the output: > > Check Bacula Last Backup > UNKNOWN 06-23-2013 13:45:490d 0h 4m 8s 3/3check_bacula_lastbackup.pl 1.0 > Nagios Plugin > > My command definition is this: > > # A command to check bacula last backup > define command{ > command_name check_bacula_last_backup > command_line $USER1$/check_bacula_lastbackup.pl -H HOSTADDRESS > -client $ARG1$ > } > > # Define a service to check last bacula backup on the local machine. > define service{ > use generic-service > host_name nagios > service_description Check Bacula Last Backup > contact_groups linux-admins > check_command check_bacula_last_backup! > cloud.mydomain.com > notifications_enabled 1 > } > > > If I run the check locally the output appears correct: > > [root at cloud:~] #/usr/local/nagios/libexec/check_bacula_lastbackup.pl-client > cloud.mydomain.com > > OK: Last backup for cloud.mydomain.com was 10:43 hours ago.mydomain > > > And these are the ownership/permissions on the script: > > > [root at cloud:~] #ls -l /usr/local/nagios/libexec/check_bacula_lastbackup.pl > > -rwxrwxr-x 1 nagios nagios 4335 Jun 23 12:32 /usr/local/nagios/libexec/ > check_bacula_lastbackup.pl > > > It'd be great if I could get an opinion on this and much appreciated. > > > Thanks > > Tim > > -- > GPG me!! > > gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B > > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Windows: > > Build for Windows Store. > > http://p.sf.net/sfu/windows-dev2dev > _______________________________________________ > Nagios-users mailing list > Nagios-users at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/nagios-users > ::: Please include Nagios version, plugin version (-v) and OS when > reporting any issue. > ::: Messages without supporting info will risk being sent to /dev/null > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From uce_mike at yahoo.com Wed Jun 26 17:20:13 2013 From: uce_mike at yahoo.com (Mike W) Date: Wed, 26 Jun 2013 10:20:13 -0500 Subject: Host dependency configuration Message-ID: <51CB06AD.6010906@yahoo.com> I am attempting to configure it so 2 hosts are dependent on each other. Making it so if one is down/unreachable an alert it NOT sent unless both are down/unreachable. I tested the following configuration during the proper time period and it seemed to not function. Am I missing something? define hostdependency{ host_name lab1.hostname.com dependent_host_name lab2.hostname.com notification_failure_criteria d,u dependency_period soft_off_work_hours } define hostdependency{ host_name lab2.hostname.com dependent_host_name lab1.hostname.com notification_failure_criteria d,u dependency_period soft_off_work_hours } -- Mike Wilson ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From mh+nagios-users at zugschlus.de Wed Jun 26 21:22:36 2013 From: mh+nagios-users at zugschlus.de (Marc Haber) Date: Wed, 26 Jun 2013 21:22:36 +0200 Subject: check_ntp_peer parsing error Message-ID: <20130626192236.GA1189@torres.zugschlus.de> Hi, I have a system running Debian oldstable with Nagios-Plugins 1.4.15. A few weeks ago, my check_ntp_peer checks have started acting up: $ /usr/lib/nagios/plugins/check_ntp_peer --hostname=2001:1b18:f:4::2 --warning 3 --critical 5 --jwarn 10 --jcrit 20 --twarn 2: --tcrit 3: --swarn 2 --scrit 2 -v 3 candidate peers available synchronization source found Getting offset, jitter and stratum for peer e20a parsing offset from peer e20a: error: unable to read server offset response. parsing jitter from peer e20a: error: unable to read server jitter/dispersion response. parsing stratum from peer e20a: error: unable to read server stratum response. NTP CRITICAL: Offset unknown, jitter=-1,000000, stratum=-1, truechimers=6| jitter=-1,000000;10,000000;20,000000;0,000000 stratum=-1;2;2;0;16 truechimers=6;0;0;0 The server itself is reachable and gives plausible answers: $ ntpq -c pe 2001:1b18:f:4::2 remote refid st t when poll reach delay offset jitter ============================================================================== +ptbtime1.ptb.de .PTB. 1 u 17 64 177 26.891 1.663 1.340 +ptbtime2.ptb.de .PTB. 1 u 14 64 177 27.015 -0.286 1.271 -ns1.customer-re 192.53.103.104 2 u 18 64 177 8.054 3.469 0.835 -130.149.220.2 130.133.1.10 2 u 20 64 177 21.243 -0.139 1.154 *ntp0.rrze.ipv6. .GPS. 1 u 19 64 177 21.170 -5.249 1.345 -stratum2-2.NTP. 129.70.130.70 2 u 15 64 177 21.216 -2.147 1.039 $ ntpq -c associations 2001:1b18:f:4::2 ind assid status conf reach auth condition last_event cnt =========================================================== 1 57862 943a yes yes none candidate sys_peer 3 2 57863 9424 yes yes none candidate reachable 2 3 57864 9324 yes yes none outlyer reachable 2 4 57865 9324 yes yes none outlyer reachable 2 5 57866 963a yes yes none sys.peer sys_peer 3 6 57867 9324 yes yes none outlyer reachable 2 $ This behavior does not happen with all of my check_ntp_peer checks. I have not yet found out under which circumstances this behavior happens. For your reference, I have currently opened the ntp server on the IPv6 address listed above for in-depth queries from anywhere. What is going wrong? Is this a bug with check_ntp_peer? Greetings Marc -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Mannheim, Germany | lose things." Winona Ryder | Fon: *49 621 31958061 Nordisch by Nature | How to make an American Quilt | Fax: *49 621 31958062 ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From rms at sibs.pt Thu Jun 27 10:23:03 2013 From: rms at sibs.pt (Rui Miguel Silva Seabra) Date: Thu, 27 Jun 2013 09:23:03 +0100 Subject: NDO: Partitioning nagios_servicechecks Message-ID: <1372321383.2637.145.camel@greymalkin> Hi, We'd like to keep nagios_servicecheks for 14 months of current time but, of course, it can get quite big (37GB just for the table with about 6 months of data). So I tried partitioning it: ALTER TABLE nagios_servicechecks PARTITION BY RANGE(TO_DAYS(end_time)) ( partition p201301 VALUES LESS THAN (TO_DAYS('2013-02-01 00:00:00')), partition p201302 VALUES LESS THAN (TO_DAYS('2013-03-01 00:00:00')), partition p201303 VALUES LESS THAN (TO_DAYS('2013-04-01 00:00:00')), partition p201304 VALUES LESS THAN (TO_DAYS('2013-05-01 00:00:00')), partition p201305 VALUES LESS THAN (TO_DAYS('2013-06-01 00:00:00')), partition p201306 VALUES LESS THAN (TO_DAYS('2013-07-01 00:00:00')), partition p201307 VALUES LESS THAN (TO_DAYS('2013-08-01 00:00:00')), partition p201308 VALUES LESS THAN (TO_DAYS('2013-09-01 00:00:00')), partition p201309 VALUES LESS THAN (TO_DAYS('2013-10-01 00:00:00')), partition p201310 VALUES LESS THAN (TO_DAYS('2013-11-01 00:00:00')), partition p201311 VALUES LESS THAN (TO_DAYS('2013-12-01 00:00:00')), partition p201312 VALUES LESS THAN (TO_DAYS('2014-01-01 00:00:00')), partition p201xxx VALUES LESS THAN maxvalue); However, MySQL complains: ERROR 1503 (HY000): A PRIMARY KEY must include all columns in the table's partitioning function As a non expert in MySQL, this appears to me as making little sense since end_time is not a key! CREATE TABLE `nagios_servicechecks` ( `servicecheck_id` int(11) NOT NULL AUTO_INCREMENT, `instance_id` smallint(6) NOT NULL DEFAULT '0', `service_object_id` int(11) NOT NULL DEFAULT '0', `check_type` smallint(6) NOT NULL DEFAULT '0', `current_check_attempt` smallint(6) NOT NULL DEFAULT '0', `max_check_attempts` smallint(6) NOT NULL DEFAULT '0', `state` smallint(6) NOT NULL DEFAULT '0', `state_type` smallint(6) NOT NULL DEFAULT '0', `start_time` datetime NOT NULL DEFAULT '0000-00-00 00:00:00', `start_time_usec` int(11) NOT NULL DEFAULT '0', `end_time` datetime NOT NULL DEFAULT '0000-00-00 00:00:00', `end_time_usec` int(11) NOT NULL DEFAULT '0', `command_object_id` int(11) NOT NULL DEFAULT '0', `command_args` varchar(255) NOT NULL DEFAULT '', `command_line` varchar(255) NOT NULL DEFAULT '', `timeout` smallint(6) NOT NULL DEFAULT '0', `early_timeout` smallint(6) NOT NULL DEFAULT '0', `execution_time` double NOT NULL DEFAULT '0', `latency` double NOT NULL DEFAULT '0', `return_code` smallint(6) NOT NULL DEFAULT '0', `output` varchar(255) NOT NULL DEFAULT '', `long_output` text NOT NULL, `perfdata` text NOT NULL, PRIMARY KEY (`servicecheck_id`), KEY `instance_id` (`instance_id`), KEY `service_object_id` (`service_object_id`), KEY `start_time` (`start_time`) ) ENGINE=MyISAM AUTO_INCREMENT=245571748 DEFAULT CHARSET=latin1 COMMENT='Historical service checks' And there is no added index: mysql> SHOW INDEX FROM nagios_servicechecks; +----------------------+------------+-------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+ | Table | Non_unique | Key_name | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment | +----------------------+------------+-------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+ | nagios_servicechecks | 0 | PRIMARY | 1 | servicecheck_id | A | 245571747 | NULL | NULL | | BTREE | | | nagios_servicechecks | 1 | instance_id | 1 | instance_id | A | NULL | NULL | NULL | | BTREE | | | nagios_servicechecks | 1 | service_object_id | 1 | service_object_id | A | NULL | NULL | NULL | | BTREE | | | nagios_servicechecks | 1 | start_time | 1 | start_time | A | NULL | NULL | NULL | | BTREE | | +----------------------+------------+-------------------+--------------+-------------------+-----------+-------------+----------+--------+------+------------+---------+ 4 rows in set (0.01 sec) Has anyone tried this successfully and would like to share some hints? Best regards, Rui ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From mckell at us.ibm.com Thu Jun 27 22:03:30 2013 From: mckell at us.ibm.com (Sean McKell) Date: Thu, 27 Jun 2013 14:03:30 -0600 Subject: 0000461: reload appears to cause skip of remaining attempts Message-ID: Hi, just wanting to get this problem ticket on the radar of the developers to possibly get fixed in some 3.4.x or 3.5.x release: 0000461: reload appears to cause skip of remaining attempts If there is a better way to bring this to attention, please let me know, thank you. _____________________ Sean -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev -------------- next part -------------- _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null From holger at cis.fu-berlin.de Sat Jun 29 16:01:55 2013 From: holger at cis.fu-berlin.de (Holger =?iso-8859-1?Q?Wei=DF?=) Date: Sat, 29 Jun 2013 16:01:55 +0200 Subject: check_ntp_peer parsing error In-Reply-To: <20130626192236.GA1189@torres.zugschlus.de> References: <20130626192236.GA1189@torres.zugschlus.de> Message-ID: <20130629140155.GI3148@zedat.fu-berlin.de> JFTR: * Marc Haber [2013-06-26 21:22]: > I have a system running Debian oldstable with Nagios-Plugins 1.4.15. A > few weeks ago, my check_ntp_peer checks have started acting up: > > $ /usr/lib/nagios/plugins/check_ntp_peer --hostname=2001:1b18:f:4::2 --warning 3 --critical 5 --jwarn 10 --jcrit 20 --twarn 2: --tcrit 3: --swarn 2 --scrit 2 -v > 3 candidate peers available > synchronization source found > Getting offset, jitter and stratum for peer e20a > parsing offset from peer e20a: error: unable to read server offset response. > parsing jitter from peer e20a: error: unable to read server jitter/dispersion response. > parsing stratum from peer e20a: error: unable to read server stratum response. > NTP CRITICAL: Offset unknown, jitter=-1,000000, stratum=-1, truechimers=6| jitter=-1,000000;10,000000;20,000000;0,000000 stratum=-1;2;2;0;16 truechimers=6;0;0;0 This is probably caused by a bug in Force10 switches mentioned here: http://news.ntppool.org/2013/06/ipv6-monitoring-problems-for-g.html Due to that bug, the check_ntp_peer requests got duplicated on their way to the server, and the server therefore sent multiple responses per request. check_ntp_peer then stumbled over those duplicated responses. That's a bug, I'll fix it later today. Thanks to Marc for providing tcpdump output and for his help with tracking the issue down. Holger ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev _______________________________________________ Nagios-users mailing list Nagios-users at lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nagios-users ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. ::: Messages without supporting info will risk being sent to /dev/null