[Nagios-devel] new plugin interface for Nagios

Andreas Ericsson ae at op5.se
Fri May 7 16:10:02 CEST 2004


Deomid Ryabkov wrote:
> AE> Deomid Ryabkov wrote:
>>>
>>>well, basically I think it's just about time to add a new plugin interaction interface to Nagios.
>>>pretty bold, ha? ;)
>>>
>>>currently we have 248 hosts monitored with 755 active checks at a 60 seconds interval.
>>>(interval_length=10, normal_check_interval 6)
>>>
>>>being in charge of the monitoring, by now i have done all i could to optimize plugins,
>>>and in fact this has helped a lot to keep the system running at a decent pace.
>>>(for example, i have integrated disk checks into one plugin that uses shared snmplib
>>>instead of calling snmpget, effectively elimitaing another fork)

I'd like to see some of these plugins, if you don't mind. New plugins 
are always interesting.

> 
> AE> That problem will still exist, unless you mean to make the code 
> AE> thread-safe, which would make nagios a memory-hog on large systems (a 
> AE> lot more hash buckets would be required for this to work). Besides, on 
> AE> linux-systems, fork() uses copy-on-write, so only the PTE needs be created.
> 
> well, now it takes fork() + exec() to complete a check. and my aim is that latter exec().
> that doesn't make nagios threaded.
> 
Are you meaning to remove the fork()? If so, how do you suspect nagios 
to run several checks at once? Your 755 seconds would (at best) take 400 
seconds to complete without some sort of parallellization.
Or are you meaning to remove the exec()? That can't be done without 
removing the fork() as well, and then we're back with the bloated memory 
hog nagios isn't today (at least not without database support and other 
bling-bling).

> 
>>>so now i'm thinking of adding some kind of plugin invocation mechanism into Nagios
>>>that wouldn't require starting up another program.
>>>and what i am thinking of as my options are:
>>>
>>>1) shared library mechanism, like Apache modules. should be the fastest of all, but has its shortcomings.
>>>not very flexible.
> 
> 
> AE> Not a bad idea, but nagios would still have to fork() or 
> AE> pthread_create() to actually RUN the different checks (unless you want 
> AE> it to serialize checks, which is just plain dumb).
> 
> basically, i don't mind nagios to fork (yet), but instead of running an external plugin it should...
> well, that is to be decided ;)
> 
External plugins is the foremost power of nagios. If everybody would 
have to write C modules (like for apache), only very few people would be 
competent enough to manage that, and we would take a wide step back in 
nagios' evolution before we managed to rewrite all the perl and sh 
scripts as modules to nagios (not to mention the code in nagios itself).

> as of now, for every check a separate process is launched. arguments are parsed, snmp session
> is created and initialized, host's filesystems are enumerated, their current state is recorded,
> warning threshold value is obtained (for unix hosts).
> then a match of fs data against thresholds is done with most severe condition becoming exitcode.
> summary is printed and there we go, check done.
> and we do this for more that 200 hosts, every minute (we are leaving the check interval out of our discussion for now).
> for me, it seems obvious that this could be optimized. only if we hadn't to start all over every time.

Even if you implemented the code for every check in nagios, all you'd 
save would be the exec() call. This at the price of stripping nagios of 
its most powerful feature; flexibility beyond belief.

> most of the data is the same all the time, so why not to just cache it?

Because you need it fresh to be valuable. This would work very nicely 
for stateful tcp connections, but how does one go about checking 
web-pages? HTTP is a stateless protocol, and I don't suspect the world 
to change that because someone wants to write a program that already 
exists and work with current standards.

> i could write a check_disk_snmpd, that'd create and initialize an snmp session, cache filesystem data
> and thresholds and only do a couple of get()'s upon a request arriving from nagios to freshen the data.

Parsing arguments and setting threshold values are done in a blink and 
requires little if any CPU power. Obtaining a socket and requesting a 
connection is also very light on both CPU and memory. If you want to 
optimize something, work on things that need it.

> seems pretty obvious for me indeed.
> 
> so, what is to be done?
> basically, we have to teach nagios to open a socket (or sould it be other IPC mechanism? may be a message queue? I'm still unsure)
> send it a request packet and settle down waiting for a reply.

In a fork()ed process, I'm sure you mean. If it's just waiting, it can't 
do something else, which means 10 seconds of doing nothing while a 
socket times out because a webserver is down, and another 10 just to 
determine that the server actually IS down, and not just IIS fucking up 
again.

> the daemon on the other side could be threaded (i think i'd write mine this way), but it doesn't in fact matter.

It makes a huge difference, actually. You can't make a program do two 
things at once without threading one way or another.

> with socket we could even go as far as running this daemon on remote machine,
> but the benefit of this is unclear to me.
> 
Three reasons, all highly thought of;
Load balancing, redundancy, configuration propagation.

> that is it. what do you think?
> 
I think you should study C some more.

> --
>  Best regards,
> Deomid Ryabkov
> UNIX Systems Administrator
> RosBusinessConsulting | http://www.rbc.ru/
> E-mail: rojer at rbc.ru  | ICQ: 8025844

-- 
Mvh / Best Regards
Sourcerer / Andreas Ericsson
OP5 AB
+46 (0)733 709032
andreas.ericsson at op5.se


-------------------------------------------------------
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver
higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
_______________________________________________
Nagios-users mailing list
Nagios-users at lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nagios-users
::: Please include Nagios version, plugin version (-v) and OS when reporting any issue. 
::: Messages without supporting info will risk being sent to /dev/null





More information about the Users mailing list