Passive freshness checks -> active checks

Jim Avery jim at jimavery.me.uk
Mon Aug 9 17:16:29 CEST 2010


On 6 August 2010 17:02, Charlie Reddington <charlie.reddington at gmail.com> wrote:
> Hi All,
>
> I'm having a bit of a problem with my nagios setup. I'm trying to move
> toward passive checks, with failover being a active check. For now, my
> failover check command is just a one liner that returns critical with
> a message.
>
> I'm it's looking like the active check is being run, even when I see
> the corresponding passive check coming in. I suspect it may be in my
> configs somewhere, but I'm not sure what is wrong yet.
>
> The big kicker of this, is it's not all of my checks. Only some of
> them. They all have different freshness thresholds, but that doesn't
> seem to be common. Their configs are the same, but in a different
> order, and that doesn't seem like the problem either as it's affecting
> some of one, and not of the other.
>
> Any thoughts of what I may be doing wrong?
>
> Charlie
>
> ---


I can't see any problem with the config below.  If you have dozens of
checks set up this way and they are all set up in crontab to run at
*/15 then you will get a storm of checks at each 15 minute intervals.
I normally make sure I stagger the checks in cron so that they are
reasonably evenly spaced.  If you have thousands it might also be
worth introducing a small random sleep to spread them out even more.

I've not had any problems with it myself, but if you have a very busy
system, you might need to check that the command buffers aren't
filling (run /usr/local/nagios/bin/nagiosstats to list the current
Nagios statistics).

Check the logs from nsca too.  If I recall correctly you may need to
set debug=1 in nsca.cfg for a while to get enough information.  One
problem I sometimes see occurs when the clock on the sending server is
way out of sync with the clock on the Nagios server, nsca will
complain and not process the check.  See this section in the nsca.cfg
file:

  # MAX PACKET AGE OPTION
  # This option is used by the nsca daemon to determine when client
  # data is too old to be valid.  Keeping this value as small as
  # possible is recommended, as it helps prevent the possibility of
  # "replay" attacks.  This value needs to be at least as long as
  # the time it takes your clients to send their data to the server.
  # Values are in seconds.  The max packet age cannot exceed 15
  # minutes (900 seconds).  If this variable is set to zero (0), no
  # packets will be rejected based on their age.

  max_packet_age=30

If I recall, I increased this from some smaller value to make it more
forgiving of systems which are a bit out of sync.


I hope that's pointed you in the right direction.

Cheers,

Jim

>
>
> Nagios Version: 3.2.0
>
> I have a service template definition that looks like this.
> define service{
>         name                            passive-service
>         check_freshness                 1
>         active_checks_enabled           0
>         passive_checks_enabled          1
>         parallelize_check               1
>         obsess_over_service             0
>         notifications_enabled           0
>         event_handler_enabled           0
>         flap_detection_enabled          0
>         failure_prediction_enabled      0
>         process_perf_data               1
>         retain_status_information       1
>         retain_nonstatus_information    1
>         is_volatile                     0
>         check_period                    24x7
>         max_check_attempts              1
>         contact_groups                  admins
>         notification_options            w,c,r
>         notification_interval           60
>         notification_period             24x7
>         register                        0
>         }
>
> And then I have a services defined like so.
> # Free Memory Check
> define service{
>         use                     passive-service
>         service_description     Passive Memory Check
>         check_command           check_stale
>         hostgroups              passive
>         freshness_threshold     3600
>         }
>
> My active checks are defined with.
> # alert on stale    define command{        command_name
> check_stale
>         command_line            $USER1$/check_dummy 2 "Check is
> stale, please run manually"
>         }
>
> On my host, I use cron jobs to run things like this. I use
> nsca_wrapper to send my check results to the central nagios server.
> # Check Free Memory
> */15 * * * * root /usr/local/nagios/libexec/nsca_wrapper.sh -H
> server.name -S 'Passive Memory Check' -C '/usr/local/nagios/libexec/
> check_memory -w 10 -c 5'  >& /dev/null
>
>
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by
>
> Make an app they can't live without
> Enter the BlackBerry Developer Challenge
> http://p.sf.net/sfu/RIM-dev2dev
> _______________________________________________
> Nagios-users mailing list
> Nagios-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/nagios-users
> ::: Please include Nagios version, plugin version (-v) and OS when reporting any issue.
> ::: Messages without supporting info will risk being sent to /dev/null
>

------------------------------------------------------------------------------
This SF.net email is sponsored by 

Make an app they can't live without
Enter the BlackBerry Developer Challenge
http://p.sf.net/sfu/RIM-dev2dev 
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list