Additional states in Nagios

Parish, Brent bparish at cognex.com
Tue Jun 29 14:42:34 CEST 2010
Previous message: Additional states in Nagios
Next message: Additional states in Nagios
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Kevin gave a GREAT answer - succinct and yet informative.
It sounds like he answered the first part of the question - clearing the
ambiguity of the states.

I interpreted the second part (the dream) as the desire to have Nagios
differentiating between informational messages and things that perhaps
require action (alarms).

I don't think there is any single 'catch all' solution to this, I
suppose it really depends on your environment, admin team, etc.  For
example, in your company, ALL alerts from S.M.A.R.T. disks might deserve
immediate attention.  In my world, we take the ostrich approach to those
(just kidding, and don't flay me for perpetuating the myth of ostrich
heads and sand).

I personally use a combination of things to tune the alerts.  For
example, with printers alerting on low toner, I set the frequency of
alerts to once every 24 hours, so as not to flood people with non
critical messages.  For disk alerts that come in as 'unknown' state, I
have set the retry time high to avoid extra alarms getting sent just
because network latency is high (thus returning the unknown state).  I
have also modified the plugins to strip out messages/states that are (to
us here) strictly informational and not worth alarming on.

And for any alert that comes in, you can always just 'acknowledge' it
through the CGIs to hush it if it is strictly informational - it will
alarm again (depending on your setup) if/when it changes state again
(for better or worse).

Lastly, though it is a TON of work, you can rebuild the entire alerting
process.  I store user preferences in a MySQL database and let the
individual admins change those through a CGI.  Then I send ALL Nagios
alerts through that processor which matches up the alert, time of day,
host, service, etc against the user prefs to decide who gets alerted and
how.  
When you do something like that, you can then define alternative methods
of alerting.  
For example, I get alerted on disks at warning level during business
hours, but not until critical level off hours.  In addition, I have the
alerts just going to email during business hours, but I also send via
instant messenger and to a home email address in off hours.  You could
use the same intelligence to split out what are strictly informational
messages vs. what are real alerts.

Ooops, I said lastly, didn't I?  Another thought: maybe you could send
all alerts to an Exchange (group) mailbox, and use Exchange rules to
filter the informational messages vs. real alerts and send those on to
individuals.

Just my 2 cents.
- Brent



-----Original Message-----
From: Kevin Keane [mailto:subscription at kkeane.com] 
Sent: Monday, June 28, 2010 11:06 PM
To: Nagios Users List
Subject: Re: [Nagios-users] Additional states in Nagios

Actually, there are four states reported by plugins: OK, WARNING,
CRITICAL and UNKNOWN. Services will have the same four states.

There are also three states that hosts can have: UP, DOWN, UNREACHABLE.
UP, DOWN and unreachable depends on the state reported by the plugin, as
well as the state of parents.
http://nagios.sourceforge.net/docs/3_0/hostchecks.html

HARD and SOFT states are separate from all of that. You can have a soft
warning or a hard warning, and a soft critical or a hard critical.
http://nagios.sourceforge.net/docs/3_0/statetypes.html

OK, WARNING, CRITICAL and UNKNOWN are the actual state of whatever you
are monitoring. The plugins decide which state it is. HARD, SOFT, as
well as UP or DOWN, are computed by Nagios based on the status reported
by the plugins. Exactly how Nagios does that is configurable.

-----Original Message-----
From: Jason W. [mailto:jwellband at gmail.com] 
Sent: Monday, June 28, 2010 7:18 PM
To: nagios-users at lists.sourceforge.net
Subject: [Nagios-users] Additional states in Nagios

(I've tried Googling for the answer, but there seems to be some
ambiguity in defining terms - even in the Nagios docs)

I've got Nagios monitoring a bunch of things on our servers and I also
have events being sent to Nagios via passive checks. This is all useful
information to us as sysadmins, but there is a difference in
criticality, e.g. is is down, is it about to go down, or is it purely
informational?

The latter is what I am writing about. Currently, there are two "states"
we use - WARNING and CRITICAL. This is the ambiguous part since the docs
refer to states as HARD or SOFT, but the plugin API docs refer to
WARNING and CRITICAL as states. I realize there is also UNKNOWN, but
with non-technical people occasionally looking at our Nagios, that may
lead them astray...

Is there a way to get more states, e.g. INFORMATION?  This would allow
one to sort by state in the web interface. Currently, we use WARNING for
most informational messages, so there is a mashup of "Service X is about
to die" and "Server Y did something you may want to know about"

I am guessing not without hacking the source, but I can dream ;)

Thoughts & comments appreciated - even if it's to say I'm Doing it
Wrong.

--
HTH, YMMV, HANW :)

Jason

The path to enlightenment is /usr/bin/enlightenment.

------------------------------------------------------------------------
------
This SF.net email is sponsored by Sprint What will you do first with
EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when
reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null

------------------------------------------------------------------------
------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when
reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null
Previous message: Additional states in Nagios
Next message: Additional states in Nagios
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Users mailing list