Disk failures

Dan Stromberg strombrg at dcs.nac.uci.edu
Sat Feb 5 01:34:02 CET 2005


On Fri, 2005-02-04 at 16:11 -0800, Jason Martin wrote:
> On Fri, Feb 04, 2005 at 04:07:08PM -0800, Edward Smith wrote:
> > Is it possible to setup nagios to detect disk failures?  How
> > about getting the load on a cpu?  Would something like MRTG be
> > better for this?  Thanks.
> If there is a logfile, snmp mib, or command that can be accessed
> to determine that a disk has failed then yes. If it is via some
> command then you might have to write a special plugin for it.
> 
> CPU load is monitorable by check_load, however if you want
> graphs over time then MRTG would be a good adjunct.  
> 
> -Jason Martin

Another option for load checking is to enable rpc.rstatd, and use the
following plugin:

#!/usr/bin/python

import sys
import os
import re
import string

host=sys.argv[1]

pipe=os.popen('/usr/bin/maxtime 10 /dcs/etc/rup '+host+' 2>&1','r')
line = pipe.readline()
#meter.eng                up 154 days,  4:40,    load average: 2.73 4.21
4.25
r = re.compile('^.*load average: ([0-9\.]*) ([0-9\.]*) ([0-9\.]*).*$')
m = r.match(line)
if not m:
   print 'service unavailable'
   sys.exit(2)
one_min = string.atof(m.group(1))
five_min = string.atof(m.group(2))

if one_min > 16.0 or five_min > 12.0:
   print 'load critical:',m.group(1), m.group(2), m.group(3)
   sys.exit(2)
if one_min > 12.0 or five_min > 8.0:
   print 'load warning:',m.group(1), m.group(2), m.group(3)
   sys.exit(1)
else:
   print 'load:',m.group(1), m.group(2), m.group(3)
   sys.exit(0)


maxtime is available from:

http://dcs.nac.uci.edu/~strombrg/maxtime.html

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
URL: <https://www.monitoring-lists.org/archive/users/attachments/20050204/c60e21b6/attachment.sig>


More information about the Users mailing list