new plugin interface for Nagios

Andreas Ericsson ae at op5.se
Fri May 7 13:55:31 CEST 2004


Deomid Ryabkov wrote:
> greetings, fellow Nagios users.
> 
> well, basically I think it's just about time to add a new plugin interaction interface to Nagios.
> pretty bold, ha? ;)
> now let me explain. it has been almost a year since we turned to Nagios for our monitoring needs
> (we were previosly using BigBrother and oh my dear, was it awful! ;))
> so we are being almost happy now. however, as configuration continues to grow, the response time
> of the whole monitoring system increases.
> 
> currently we have 248 hosts monitored with 755 active checks at a 60 seconds interval.
> (interval_length=10, normal_check_interval 6)
> 
I think this is where some of your problems start. Running ALL checks 
with a 60 second interval is hardly useful. You should look into 
implementing different templates for them (we have critical-service (1 
minute interval), default-service (5 minute interval), 
noncritical-service (30 minutes interval)). This allows for excellent 
scalability.

> being in charge of the monitoring, by now i have done all i could to optimize plugins,
> and in fact this has helped a lot to keep the system running at a decent pace.
> (for example, i have integrated disk checks into one plugin that uses shared snmplib
> instead of calling snmpget, effectively elimitaing another fork)

snmpget only loads heavily if it needs to parse the mibs. Use '-m: ' to 
load NO mibs with snmpget. This will make it a whole lot faster.

> so the biggest problem at this time seems to be Nagios's need to launch a process for every check.
> 
That problem will still exist, unless you mean to make the code 
thread-safe, which would make nagios a memory-hog on large systems (a 
lot more hash buckets would be required for this to work). Besides, on 
linux-systems, fork() uses copy-on-write, so only the PTE needs be created.

> so now i'm thinking of adding some kind of plugin invocation mechanism into Nagios
> that wouldn't require starting up another program.
> and what i am thinking of as my options are:
> 
> 1) shared library mechanism, like Apache modules. should be the fastest of all, but has its shortcomings.
> not very flexible.

Not a bad idea, but nagios would still have to fork() or 
pthread_create() to actually RUN the different checks (unless you want 
it to serialize checks, which is just plain dumb).

> 2) some kind of IPC. this would involve, i think, some check daemon process that'd start with nagios
> and respond to check requests from it. a pipe or message queue could be used for communication.

Now we're talking. See comments below.

> 3) just forget about it.
> 
Not necessarily a bad thing.

> i think i'll do that one way or another. but i want to make it The Right Way (r) and this is
> where i turn to you and ask if you have any ideas/opinions/suggestions and in general, if it's worth
> implementing at all...
> 
I'd say that nagios should be split in two. One of the parts being 
responsible for reading, verifying and holding the configuration. This 
can ofcourse also be used by CGI's to access configuration data real 
quickly. The other part should work as nagios does now, but should be 
started from the db-process and read in the configuration from it in a 
stripped down list with the fields;
check id, actual command, timing(?)

Timing can be in the db program, or left to the checker daemon.
This approach has several advantages.
1. With a small coding effort, it allows for native redundancy and load 
balancing (any number of hosts can poll together, while the db program 
of each host maintains check-data and complete configuration).
2. Configuration changes can easily be propagated to other servers.
3. WebGUI will become MUCH faster, since the DB app will keep last known 
state of each service and host, as well as all the textual configuration 
in a sorted array. This way, sorting algorithms doesn't need to be 
optimized very much, as they'll be incurring a one-time penalty only.


> --
>  Best regards,
> Deomid Ryabkov
> UNIX Systems Administrator
> RosBusinessConsulting | http://www.rbc.ru/
> E-mail: rojer at rbc.ru  | ICQ: 8025844

-- 
Mvh
Andreas Ericsson
OP5 AB
+46 (0)733 709032
andreas.ericsson at op5.se


-------------------------------------------------------
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver
higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3




More information about the Developers mailing list