SMART hard-disk monitoring

Derek Olsen derek.olsen at qsent.com
Thu Aug 31 21:27:07 CEST 2006


  Andy.
  The output if their is a problem will look like this.   The 
notification will only include the device that is in a down state.

***** Nagios *****

Notification Type: PROBLEM

Service: DiskDrives
Host: the.name.of.host
Address: the.name.of.host
State: CRITICAL

Date/Time: Thu Aug 17 09:55:02 PDT 2006

Documentation: https://where.the.docs.be

Additional Info:

DOWN=(/dev/sdg)




  I believe this plugin can only detect when a drive is down and wont do 
much for predicting when a failure is going to happen soon.

  Hope this helps.
  Deet.
 
> Hi Deet,
>
> Thanks very much for this script, had to do a minor touch of hacking, 
> but it also proves your script will work on SATA drives as well (at 
> least those SATA drives that Linux emulates as SCSI.)
>
> All I've touched is:
>    my $scsi_disks = `/usr/bin/sudo /sbin/sfdisk -s |/bin/grep -i 
> sd[a-z] |/bin/cut -f1 -d:`;
>
> /usr/bin/grep and /usr/bin/cut are in /bin/grep and /bin/cut on my 
> system (Fedora 5.)
>
>    $val = `/usr/bin/sudo /usr/sbin/smartctl -d ata -s on $drive &> 
> /dev/null || /bin/echo MISSING`;
>
> In the above line I had to add the "-d ata" argument to smartctl to 
> read the SATA drives as ATA drives, not SCSIs.
>
> The script outputs "UP=(/dev/sda /dev/sdb)".
>
> Can I just ask what the criteria is for the script to class a drive as 
> failed/failing according to SMART?
>
> Many thanks again for sharing, it's extremely helpful!
>
> Regards
>
> Andy.
>
> PS.  I couldn't reply to the list as I've got a problem with my DNS 
> server, and Sourceforge's server is bouncing any mail I send :(  If 
> you could post what I've done to get SATA drives working, it may come 
> in handy for somebody too.
>
> ---
>
> Derek Olsen wrote:
>>
>>  Andy.
>> I've attached the check_smart we use.  I think it's a barely modified 
>> version of the one that comes with the nagios plugins.       In the 
>> script we use the output of /sbin/sfdisk -s to find out which scsi 
>> disks are on the local box because we ran into problems using the 
>> output of scsiinfo.    So our sudoers file is configured to allow the 
>> nagios user to run /sbin/sfisk -s and /usr/sbin/smartctl.
>>
>>  This works for us.  Hope it helps.
>>   Deet.
>>> Has anyone got a check plugin working for monitoring SMART hard disk 
>>> status thresholds?
>>>
>>> The only one I found on nagiosexchange (check_smartmon) needs to be 
>>> run as root to get permission to read the drive stats, and also 
>>> doesn't work - it causes the below Python trace-back:
>>>
>>> Traceback (most recent call last):
>>>   File "./check_smartmon", line 254, in ?
>>>     (healthStatus, temperature) = parseOutput(healthStatusOutput, 
>>> temperatureOutput)
>>>   File "./check_smartmon", line 163, in parseOutput
>>>     healthStatus = parts[-1]
>>> IndexError: list index out of range
>>>
>>>
>>> I've just ran smartctl and it appears you do need to be root, so if 
>>> I can find a working plugin I can just sudo the nagios user.
>>>
>>> Any ideas?
>>>
>>> Thanks
>>>
>>> Andy.
>>>
>>> ------------------------------------------------------------------------- 
>>>
>>> Using Tomcat but need to do more? Need to support web services, 
>>> security?
>>> Get stuff done quickly with pre-integrated technology to make your 
>>> job easier
>>> Download IBM WebSphere Application Server v.1.0.1 based on Apache 
>>> Geronimo
>>> http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 
>>>
>>> _______________________________________________
>>> Nagios-users mailing list
>>> Nagios-users at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/nagios-users
>>> ::: Please include Nagios version, plugin version (-v) and OS when 
>>> reporting any issue. ::: Messages without supporting info will risk 
>>> being sent to /dev/null
>>>   
>>
>>
>>
>> !DSPAM:37,44f71ed4143297115289336!
>> ------------------------------------------------------------------------
>>
>> #!/usr/bin/perl -w
>>
>> #
>> # This script checks the hard drives on a system for S.M.A.R.T. health
>> # indicators.  Only supports SCSI right now.
>> #
>> #
>> use strict;
>>
>> my $debug = 0;
>> my @disk_up;
>> my @disk_down;
>> my @disks;
>> my $scsi_disks = `/usr/bin/sudo /sbin/sfdisk -s |/usr/bin/grep -i 
>> sd[a-z] |/usr/bin/cut -f1 -d:`;
>>
>> push @disks, split(' ', $scsi_disks);
>>
>> unless ( scalar @disks ) {
>>     print "0 No disks to monitor\n";
>>     exit 0;
>> }
>>
>> print "Monitoring: @disks\n" if $debug;
>>
>> for ( @disks ) {
>>   my $drive = $_;
>>   if($drive =~ /\/dev\/sd/) {
>>     my $val;
>>
>>     $val = `/usr/bin/sudo /usr/sbin/smartctl -s on $drive &> 
>> /dev/null || /bin/echo MISSING`;
>>     if ( $val eq "MISSING\n" ) {
>>         push @disk_down, $drive;
>>         next;
>>     }
>>
>>     $val = `/usr/bin/sudo /usr/sbin/smartctl -H $drive`;
>>     if ( $val =~ /SMART Health Status\: OK/g ) {
>>         print "$_ is OK\n" if $debug;
>>         push @disk_up, $drive;
>>     } else {
>>         print "$_ is BAD\n" if $debug;
>>         push @disk_down, $drive;
>>     }
>>   }
>> }
>>
>> my $ret = 0;    # OK
>> if ( scalar @disk_down ) {
>>     print "DOWN=(@disk_down)\n";
>>     exit 2;
>>     }
>> print "UP=(@disk_up) " if ( scalar @disk_up );
>> print "DOWN=(@disk_down) " if ( scalar @disk_down );
>> print "\n";
>>
>> exit 0;
>>
>>
>> !DSPAM:37,44f71ed4143297115289336!
>>   
>


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list