Windows disk health monitoring with smartmontoolsl/NSClient++?

Eric Pearce epearce at amberpoint.com
Thu Jan 22 23:26:59 CET 2009


I've hacked something together that seems to work using WSH and WMI (no smartmontools).
It displays the following for the "Service State Information":

    Current Status: OK   (for 0d 0h 14m 52s) 
    Status Information:SMART Status is OK
    Performance Data:WDC WD1500HLFS-01G6U0 139 GB

In "nsc.ini" on the client, I've made the following changes:

    uncommented NRPEListener.dll
    added to [External Scripts]
    check_smart_disk0=cscript.exe //T:30 //NoLogo "C:\Program Files\NSClient++\scripts\smart.vbs" 0
    check_smart_disk1=cscript.exe //T:30 //NoLogo "C:\Program Files\NSClient++\scripts\smart.vbs" 1

The file "smart.vbs" contains:

    set args = wscript.arguments
    drive = Cint(args(0))

    strComputer = "."
    Set objWMIService = GetObject("winmgmts:" _
    & "{impersonationLevel=impersonate}!\\" & strComputer _
    & "\root\cimv2")

    Set diskset = objWMIService.ExecQuery _
        ("Select * from Win32_DiskDrive")

    For Each disk in diskset
    If disk.index = drive Then
       Select Case Disk.Status
         Case "OK"
           WScript.Echo "SMART Status is OK| " & Disk.Model & " " & Int(Disk.Size/1073741824) & " GB"
           WScript.Quit(0)
         Case Else
           Wscript.Echo "SMART Status is " & Disk.Status
           Wscript.quit(1)
        End Select
      End If
    next

There are actually a bunch of different error states, but I figure I want to know about anything other than "OK".  I kept the level as "WARNING", as I don't know if it's going to be useful until I get more experience with the real-life disk error messages.  I'm aware that disks sometimes die with no warning from SMART.   I've never used visual basic before, so feel free to improve on this.  I just cobbled together little snippets of code I found via google.

On the Nagios server side,  the service and hostgroup definitions look like the following:

define service{
        use                     generic-service
        hostgroup_name               check_smart_disk0
        service_description     SMART Disk 0
        check_command           check_nrpe!check_smart_disk0
        check_interval  720
        }
define service{
        use                     generic-service
        hostgroup_name               check_smart_disk1
        service_description     SMART Disk 1
        check_command           check_nrpe!check_smart_disk1
        check_interval  720
        }

define hostgroup{
        hostgroup_name  check_smart_disk0
        alias           Windows SMART Disk0 status
        members         host1, host2, host3
         }
define hostgroup{
        hostgroup_name  check_smart_disk1
        alias           Windows SMART Disk1 status
        members         host2
        }

I do have to know ahead of time the number of disks to check on the client.  Seems to be working so far.
-e
  ----- Original Message ----- 
  From: Eric Pearce 
  To: Anthony Montibello 
  Cc: nagios-users at lists.sourceforge.net 
  Sent: Thursday, January 15, 2009 3:14 PM
  Subject: Re: [Nagios-users] Windows disk health monitoring with smartmontoolsl/NSClient++?


  Thanks for the tip -  I think I'm making some progress, i.e. 

   C:\Program Files\NSClient++>"nsclient++.exe" CheckWMI Select Status from Win32_DiskDrive

   \NSClient++.cpp(370) Attempting to start NSCLient++ - 0.3.5.2 2008-09-24 

  l \NSClient++.cpp(476) NSCLient++ - 0.3.5.2 2008-09-24 Started! 

  l \CheckWMI.cpp(306) |--------+ 

  l \CheckWMI.cpp(307) | Status | 

  l \CheckWMI.cpp(308) |--------+ 

  l \CheckWMI.cpp(317) | OK | 

  l \CheckWMI.cpp(319) |--------+ 

  l \NSClient++.cpp(530) Attempting to stop NSCLient++ - 0.3.5.2 2008-09-24 

  l \NSClient++.cpp(589) NSCLient++ - 0.3.5.2 2008-09-24 Stopped succcessfully 

  But I dont' see how to turn this output into something useful for Nagios, i.e. "OK", "WARNING", "CRITICAL". It appears that the possible return values for "Status" are one of the following: OK,Error,Degraded,Unknown,Pred Fail, Starting, Stopping, Service, Stressed, NonRecover, No Contact or Lost Comm. I would be happy with "OK" resulting in a Nagios "OK" and anything else being a "WARNING". Ideally, "WARNING" followed by the "Status" output from WMI. Is there a way to do this using the  NSClient "filter" and Max/Min syntax?   

  Bonus question: What do you do if you have multiple drives? I don't see any obvious way to specify a drive to check.

  Thanks 

  -e 

    ----- Original Message ----- 
    From: Anthony Montibello 
    To: Eric Pearce 
    Cc: nagios-users at lists.sourceforge.net 
    Sent: Wednesday, January 14, 2009 8:58 PM
    Subject: Re: [Nagios-users] Windows disk health monitoring with smartmontoolsl/NSClient++?


    USe WMI: 
    the path to the smart data:
    root/Cimv2/Win32_DiskDrive/
    [Instance] --> Status


    Hope this helps
    Tony (Author of NC_Net)

    On Tue, Jan 13, 2009 at 10:49 PM, Eric Pearce <epearce at amberpoint.com> wrote:

      I'd like to get SMART disk health status for Windows machines.  It looks like smartctl would work fine on Windows - has someone got it working with NSClient++?
      I've found some people asking about this in the list archives, but haven't found any concrete examples.
      All I'm looking for is a basic "OK" or "something bad is going to happen soon" alert from Nagios.
      Thanks
      -e



      ------------------------------------------------------------------------------
      This SF.net email is sponsored by:
      SourcForge Community
      SourceForge wants to tell your story.
      http://p.sf.net/sfu/sf-spreadtheword
      _______________________________________________
      Nagios-users mailing list
      Nagios-users at lists.sourceforge.net
      https://lists.sourceforge.net/lists/listinfo/nagios-users
      ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
      ::: Messages without supporting info will risk being sent to /dev/null


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/users/attachments/20090122/384aacef/attachment.html>
-------------- next part --------------
------------------------------------------------------------------------------
This SF.net email is sponsored by:
SourcForge Community
SourceForge wants to tell your story.
http://p.sf.net/sfu/sf-spreadtheword
-------------- next part --------------
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null


More information about the Users mailing list