Nagios external commands (2)

Andreas Ericsson ae at op5.se
Wed Sep 26 09:56:29 CEST 2007


Tobias Mucke wrote:
> Hi Andreas, hi list,
> 
> thanks for this good thread and your time.
> 
>>> Working with the External Commands documentation at nagios.org I got
>>> the impression that it is written manually and stored in some kind of
>>> content management system.
>> True. This must be so, or the docs might not match the reality in the
>> code.
> 
> Hopefully, the docs do not match the reality in the code! There are
> some unlovely mistakes in it. Until today I thought this are just
> typos, because code and docs are maintained apart.
> My approach would be to maintain the XML file with two ideas behind.
> 
> - it is an independent program understandable format describing the
> supported commands

Yes, but since Nagios can't use it (it needs to map the input to real
variables and memory addresses in-core), some documentation will still
need to be maintained.

> - it can be used to generate a more readable documentation
> 

A worthy goal.

>>> With a XML file describing the Nagios
>>> External Command Interface you could generate the documentation e.g.
>>> by XSLT.
>>> Why should you do that? Because it would be possible to send patches
>>> for the documentation in a program understandable diff format.
>>>
>> But it already is. The program creating such program understandable diffs
>> is called (surprise, surprise) "diff". The program that understands those
>> diffs is called "patch". ;-)
> 
> Surprise: I know these programs. ;-) And I would use them to send
> patches for the documentation. But I am not convinced that this will
> work fine. How to send patches for files generated by a content
> management system?
> Besides that this would only correct the documentation not the API. I
> am convinced that documentation has to be generated out of the program
> code, so documentation always matches the API and every change to the
> API is reflected immediatly in the documentation. Other programs are
> using the API. How can they handle changes without changing the
> program code? With a format independent description of the API. These
> ideas result in: API -> description -> documentation.
> 
>> Seriously though, like I said above, the docs need to be maintained along
>> with the code, or the doc-to-code relation might get broken for a certain
>> revision of either.
> 
> Agreed, this is the right approach. But there is no description of the
> API yet, so I decided to start with it. In future the description
> should be generated automatically out of the API. That's why I think
> it is important to Nagios and this mailing list.
> 

Adding some auto-documentation feature about the API would have to be done
in the nagios core. Getting the nagios core to read XML files in order to
validate its input is totally pointless, because it still needs to use a
second and much stricter validation when it maps the input to the variables.
Getting the second, and stricter, validation to generate its documentation
is an exercise in futility, because it will be a lot harder to get right
than just updating the docs.

>>> Actually we are going to work on a concept for an even more
>>> distributed environment. Right now all distributed monitoring systems
>>> are located at one datacenter location. But we want to roll out Nagios
>>> to other datacenters at remote locations too. The idea is to give the
>>> remote datacenter as much as autonomy as possible e.g. keep
>>> notifications local at each location.
>> Sounds like a good idea. I'd use a NEB to propagate the commands if I
>> were you, or a separate program altogether.
> 
> What about NEBs, are they incorporated within the Nagios code base
> after some discussions and code quality reviews, e.g. like kernel
> modules?
> 

No. They're free-floating projects. Nagios has no method of linking any
modules into its core.


>>> Nethertheless we want just one central console to control the Nagios
>>> infrastructure, for example to enable / disable notifications. So you
>>> have to control Nagios by an interface you can send commands to from a
>>> central system. These commands should be checked if they are correct
>>> before they are passed into the pipe.
>>>
>> Why? They must still be checked for correctness inside Nagios. Adding
>> another point of failure might mean you get bitrot in the xml-doc, so
>> your commands (which nagios would understand perfectly) don't go through.
>> Otoh, if you update the xml-doc prematurely, you would get "OK" from the
>> XML validation thingie and then Nagios wouldn't grok it anyways.
> 
> Because of multiple reasons. First of all: all input should be checked
> before passed to a program, so just writing a program which is able to
> write anything to the pipe you want to would be bad.

Yes ofcourse, but you should ofcourse construct the command sent to nagios
based on what input you get, so you need to know how to construct commands
that nagios groks, which implies that you know the format nagios understands.

It's possible that it's easier to add the support for all the data-sending
machines to understand the XML-format, but it's still a format that a program
needs to learn about and construct properly.

>
> Second: A tool should support its users and give some explanations
> about its usage, in this case some informations about the API.
> Third: Quality control and versioning. How is versioning of the API
> today realized? Just by changing the documentation? What about
> commands which are under development and should not be used in a
> production environment? How to you test the API after changes or
> before a new Nagios release?



> Fourth: you need an instance of control e.g. access control. The
> Nagios webinterface works with a simple authentication scheme:
> contacts are allowed to do everything. This instance does not exist
> anymore if you are at the command line.
> 

True. Unix-style filesystem permissions take over there. Hopefully the
admins are clever enough to not allow any old schmuck to send commands
to the nagios pipe. If you want to enable authentication and access-
control beyond the unix-style fs thing, you'll have to make your program
at least setgid, which, last time I checked, excluded any kind of
scripting language.

> 
>> However, assuming your implementation doesn't touch the nagios core, it won't
>> be a problem since you can then convert the XML-command into a format that
>> nagios groks, after having stripped the XML overhead.
> 
> I don't understand why you think that XML is overhead.

Because

	<?xml version="1.0" encoding="UTF-8" ?>
	<label variable="value">
		<item name="var1" value="val1"/>
		<item name="val2" value="val2"/>
	</label>

is miles longer than

	variable=value;var1=val1;val2=val2

and quite a lot harder to parse for a program, even if you use something
like ezxml.

> XML is just
> used to describe the API. Which format would you prefer? No, it won't
> touch the core at first. But I would like to see some rethinking
> concerning this approach: API -> description -> documentation.
> This would open a chance to improve the quality of the API and
> documentation and could play an important part in future developments.
> 

I still don't get this. Nagios is written in C. Handling XML in C is
just about as horrible as it gets. If you want the nagios core to grok
the XML description about its own API and also *use* that description
as some sort of command interpreter, you'll end up implementing a very
weak script language interpreter.

Using XML to describe the API might be a handy thing, but getting Nagios
to read XML through the FIFO just won't happen, for the very simple reason
that XML is roughly three times larger than the current format, and the
pipe is already a limiting factor in Nagios due to its small size.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

-------------------------------------------------------------------------
This SF.net email is sponsored by: Microsoft
Defy all challenges. Microsoft(R) Visual Studio 2005.
http://clk.atdmt.com/MRT/go/vse0120000070mrt/direct/01/




More information about the Developers mailing list