vanished bug numbers in the nagios mantis tracker

Andreas Ericsson ae at op5.se
Thu Aug 9 11:36:31 CEST 2012


On 08/08/2012 09:40 PM, rouilj+nagiosdev at cs.umb.edu wrote:
> 
> I tried to rework the attribuitions a bit since the
> levels of >>> were getting a bit much. I am not sure if
> I was successful or not in making in clearer who said
> what but.
> 

I'll just cut the extra. We know what we're talking about.
> 
> [rouilj 8/6]
> [rouilj 8/6] So add a field that is a pointer to a nagios maintained data
> [rouilj 8/6] store for each object. Where each object in the datastore is a
> [rouilj 8/6] key/value (blob pointer) pair. "adding items to the object"
> [rouilj 8/6] wasn't meant as add a new field. More along the lines of add a
> [rouilj 8/6] new item to a linked list in the object.
> [rouilj 8/6]
> [Andreas 8/7] Why on earth would you prefer O(n) complexity over O(1)?
> [Andreas 8/7] Do you realize that you're requesting something that's
> [Andreas 8/7] both slower and more complex than what I proposed?
> 
> Arrays are fixed length. What do you expect to use for the size
> of the array? I agree being able to do an address lookup at
> 
>     array_start + n*(sizeof pointer)
> 
> is a win, but it's also a good way to end up wasting a lot of
> space by oversizing the array at compile time to support an
> undeterminable number of brokers/plugins/modules.

Why on earth would you want to do that at compile time? You can
just

  ary = malloc(sched_info.services * sizeof(whatever));

at runtime and waste zero memory instead. We don't have to put
everything on the stack.

> After all every
> object (host, service) will need this array. An alternative is
> to take some time to traverse a linked list (or binary tree, hash
> table ....)  and reducing the amount of memory needed.
> 

Linked list require more memory per stored object since they also
must have pointers to the next item in the array.

> Now if you are taking about mallocing the array and allowing the
> number of elements in the array to be set in the config file
> that's something different, but I saw no indication of that.
> 

Not config file. I imagine 99.9999% of the usecases will be to
support arrays that are of exactly the same size as the number of
some type of objects.

> Ideally to save on memory you would want a configuation setting
> for each object type, but a single setting for all objects would
> probably work without too much waste.
> 

What on earth do you want to store??

> [rouilj 8/3] Custom variables help a little bit but introduce the need to
> [rouilj 8/3] parse the value on every event which is not acceptable. The
> [rouilj 8/3] value should be parsed once into a quickly accessible form
> [rouilj 8/3] (e.g. bitfield) and carried around with the object.
> [rouilj 8/3]
> [andreas 8/3] Well, no. You might want to add things other than flags in
> [andreas 8/3] such variables and you may well have several modules at once
> [andreas 8/3] competing for the dubious honor of putting extra junk into
> [andreas 8/3] objects.
> [rouilj 8/6]
> [rouilj 8/6] The object carries the data as a linked list, static
> [rouilj 8/6] array of pointers.... rather than as a named item in the
> [rouilj 8/6] struct. As long as the name for each item is unique there
> [rouilj 8/6] isn't a problem. (e.g. ec_ in my examples). Nagios could
> [rouilj 8/6] even provide a register function to allow modules to
> [rouilj 8/6] determine which prefixes are already registered so if
> [rouilj 8/6] Eric Charles' plugin tries to register ec_ but my event
> [rouilj 8/6] correlation plugin already has ec_ loaded, it gets an
> [rouilj 8/6] error message and gets to chose a new prefix. Heck nagios
> [rouilj 8/6] could even return a prefix code that is used by the
> [rouilj 8/6] module. There are many ways to handle the issue.
> [rouilj 8/6]
> [andreas 8/7] Not really, no, because user documentation has to work
> [andreas 8/7] too or users won't be able to use Eric Charles' module
> [andreas 8/7] and yours at the same time.
> 
> Yeah if you get a conflict like that, then the broker
> would have to be recompiled with a new prefix which
> could conceivably conflict again. However nagios
> wouldn't have to be recompiled. Then again I guess if
> you have to recompile the broker, you will need to have
> all the nagios code available anyway so...
> 
> [rouilj 8/3] Are there any plans to add something like:
> [rouilj 8/3]
> [rouilj 8/3]    add_new_custom_variable_property(SERVICE_OBJECT, "_ec_active_action",
>> \
> [rouilj 8/3]       func_parse_ec_active_action);
> [rouilj 8/3][...]
> [rouilj 8/3]    int add_property_value(*service_object, "_ec_active_action", void * v
>> alu
>> [rouilj 8/6] e)
> [rouilj 8/3]
> [rouilj 8/3]    void * get_property(*service_object, "_ec_active_action");
> [rouilj 8/3]
> [rouilj 8/3] to allow the brokers to register parsers for specific custom
> [rouilj 8/3] variables?
> [rouilj 8/3]
> [Andreas 8/3?] No, and in time for Nagios 4 I doubt we'll do that since it doesn't
> [Andreas 8/3?] really require a major version bump to be added and I just don't have
> [Andreas 8/3?] the time for it right now. Patches are ofcourse welcome, but please
> [Andreas 8/3?] make sure to base them on top of latest svn from sourceforge, or the
> [Andreas 8/3?] git repo at www.github.com/ageric/nagios
> [rouilj 8/6]
> [rouilj 8/6] Well changing the objects to provide a location in which to store this
> [rouilj 8/6] info does require a version bump since it changes the layout of all
> [rouilj 8/6] the object structures. But once that is done, they are more or less
> [rouilj 8/6] futureproof.
> [rouilj 8/6]
> [rouilj 8/6] So maybe for nagos 4 we can get a single pointer for
> [rouilj 8/6] broker_data. Since all pointers are the same size, defining the
> [rouilj 8/6] structure of the data and the api for accessing it can be done anytime
> [rouilj 8/6] in the 4.x series.
>>
>> No, but you can get an id in the object which you can then use to access
>> an indexed array in constant time for any kind(s) of data your module
>> wants to store *without* having to traverse a linked list.
> 
> Again, what size is that array?
> 

Whatever it has to be. The writer decides the datatype and the access
method. The reader has to know about that, ofcourse, but we can't get
away from that *anyway*.

>> In that case, looking up your extra data becomes as simple as
>>
>>    xd = extra_host_data[host->id];
>>
>> although you'll ofcourse have to set up the indexed array before
>> you can use it.
> 
> I would think the id should be something the broker is given by
> the nagios core and the broker stores so it's not dependent on
> the host object (assuming that's what host means in your
> example).
> 

It *is* dependant on the host object. The id is an id number *only*
for the host object.

> Something like:
> 
>    in broker init:
> 
>      static myid = get_this_brokers_id();
> 
> then myid can be used as:
> 
>     host_object->extra_data[myid]
> 
> which is a pointer to some struct/object/malloced memory region.
> 


Ah. Now I see what you're getting at. You're thinking backwards and
believing that each nagios object will have an array attached to it.
That's not the case, and that *would* have all the problems you're
thinking of. The arrays will be allocated by the writer modules and
be as large as necessary. The reader modules will access the data it
wants by issuing some (to-be-invented) calls to Nagios and then use
the object id's to look up the data in a few large arrays rather
than a module id to look it up in a lot of small ones. The reader
module is responsible for tracking pointers to data it wants or
needs.



> [rouilj 8/3] I envision the parsed properties (returned
> [rouilj 8/3] as a void *) are stored in a linked list
> [rouilj 8/3] or array in the object struct and simply
> [rouilj 8/3] store a propery name and a pointer to some
> [rouilj 8/3] blob returned by the
> [rouilj 8/3] func_parse... function. The blob is opaque
> [rouilj 8/3] to nagios and only has meaning to the
> [rouilj 8/3] event brokers.
> [rouilj 8/3]
> [rouilj 8/3] This way brokers (and other plugins too)
> [rouilj 8/3] could add new data to the internal objects
> [rouilj 8/3] without having to break abi compatibility.
> 
> [Andreas 8/3?] Well, no, they can't, because if several
> [Andreas 8/3?] modules use the same mechanism they'll
> [Andreas 8/3?] overwrite each others data or be unable
> [Andreas 8/3?] to find their own.  Unless you mean for
> [Andreas 8/3?] the objects to have some sort of marker
> [Andreas 8/3?] for each module, but that's just overly
> [Andreas 8/3?] complex.
>   
> [rouilj 8/6] Not a marker for each module, but a marker
> [rouilj 8/6] for each data bit. Consider (pseudo code):
> [rouilj 8/6]
> [rouilj 8/6]    struct extended_attribute {
> [rouilj 8/6]        string[10] identifier;
> [rouilj 8/6]        void * value;
> [rouilj 8/6]        extended_attribute * next;
> [rouilj 8/6]        extended_attribute * prev; /* if you want */
> [rouilj 8/6]    }
> [Andreas 8/7] Why a string? Do you really want to do X
> [Andreas 8/7] strcmp()'s for each object you're dealing
> [Andreas 8/7] with?
> 
> Well it's X strncmp's (or (*string == *lookupstring &&
> stringcmp(string,lookupstring)) for each attribute
> (tens to hundreds) not for each object (thousands to
> hundreds of thousands) but I see your point.
> 
> That string could easily be a uniquely assigned integer
> (id). See example below.
> 

Not sanely, since perfect hashing requires knowing in advance
what strings you will be hashing, or an inordinate amount of
runtime bruteforce-testing which is not guaranteed to come up
with a result.

> [rouilj 8/6] The extended_attributes would be a linked
> [rouilj 8/6] list made available in an internal nagios
> [rouilj 8/6] structure using:
> [rouilj 8/6]
> [rouilj 8/6]      service_object {
> [rouilj 8/6]         void * extended_attribute;
> [rouilj 8/6]         all other service object fields;
> [rouilj 8/6]      }
> [rouilj 8/6]
> [Andreas 8/7] That will break horribly. mod_gearman
> [Andreas 8/7] won't know (and sholdn't know) if the
> [Andreas 8/7] extended_attributes field is used by
> [Andreas 8/7] merlin already or if it's available, or
> [Andreas 8/7] where in the list of items the
> [Andreas 8/7] mod_gearman data resides, without looking
> [Andreas 8/7] at all of them (which can be ridiculously
> [Andreas 8/7] slow and is far more complex than it has
> [Andreas 8/7] to be).
> 
> Agreed, mod_gearman won't know or care. It will simply pass an
> opaque token to a nagios function that takes extended_attribute
> as an argument and returns the (struct extended_attribute).value
> to the caller. The opaque token could be the integer id assigned
> by nagios to the module for every piece of data that the module
> want's to store.
> 
> [rouilj 8/6] Nagios provides commands to add/remove/get pointer to items
> [rouilj 8/6] from the extended_attribute linked list. This keeps the data
> [rouilj 8/6] with the object easily available to all modules (all they need
> [rouilj 8/6] is the nagios strcture and the identifier and need to be able
> [rouilj 8/6] to understand the structure of the value). All the data is
> [rouilj 8/6] publically available in the service, host, .... structure.
> [rouilj 8/6]
> [rouilj 8/6] (The above could be augmented by a nagios supplied unique (for
> [rouilj 8/6] this running instance of nagios) number/handle that is used as
> [rouilj 8/6] part of the retreival process. I think I described such a
> [rouilj 8/6] mechanism when I was trying to get the external correlation
> [rouilj 8/6] mechanism into nagios 2 or 3.)
> 
> [rouilj 8/3] This also neatly (IMHO) works around the issues with
> [rouilj 8/3] serializing the object to disk as the custom variable
> [rouilj 8/3] mechanism can pass the info around to only those who are
> [rouilj 8/3] looking for it. In most other applications, the overhead of
> [rouilj 8/3] parsing the custom variable's value isn't a huge issue but
> [rouilj 8/3] in the event broker loop....
> [Andreas 8/3?]
> [Andreas 8/3?] In my mail to Janne Aho, I suggested adding an 'id' field to
> [Andreas 8/3?] all objects. That would allow modules to utilize indexed
> [Andreas 8/3?] arrays or large bitvectors to mark certain properties in
> [Andreas 8/3?] objects and look them up in O(1) time without having to
> [Andreas 8/3?] compute a hash and deal with collisions.
> [rouilj 8/6]
> [rouilj 8/6] But doesn't that require that the object be changed for
> [rouilj 8/6] each module so it has it's own array or vector?
> [rouilj 8/6]
> [Andreas 8/7] No. Each module gets to keep its own data and Nagios
> [Andreas 8/7] doesn't need to know anything at all about it (although
> [Andreas 8/7] it would be neat if it could issue a callback and get a
> [Andreas 8/7] key/value vector when printing status data).
> 
> In my original draft of the email that started this thread I had
> a serialize function that would do exactly that, but I left it
> out as it wasn't required t that part of the discussion.
> 
> But I agree nagios should not know about the data, merely provide a
> key/value store.
> 
> [Andreas 8/3?] If we couple that with a mechanism allowing one module to
> [Andreas 8/3?] obtain the pointers of another module, any module that wants
> [Andreas 8/3?] to know about another module (such as for example livestatus
> [Andreas 8/3?] wanting to know which mod_gearman worker executed a
> [Andreas 8/3?] particular check) can just grab the ancillary data of that
> [Andreas 8/3?] module and look it up in constant time.
> [Andreas 8/3?]
> [Andreas 8/3?] This means that either mod_gearman has to maintain a
> [Andreas 8/3?] key/value vector for livestatus to use (or some other common
> [Andreas 8/3?] form that all modules know about), or that livestatus has to
> [Andreas 8/3?] know about the other module's dataformat, but it's pretty
> [Andreas 8/3?] much the best we can do to allow constant time lookups of
> [Andreas 8/3?] ancillary data.
> [rouilj 8/6]
> [rouilj 8/6] Agreed if constant time lookup is required. I think that could
> [rouilj 8/6] work in my case. Would be ugly to write and annoying to
> [rouilj 8/6] troubleshoot/debug but it could be made to work and would
> [rouilj 8/6] provide the fastest lookup option.
> 
> [Andreas 8/7] How would it be annoying to troubleshoot
> [Andreas 8/7] something so moronically simple as
> [Andreas 8/7]
> [Andreas 8/7]   extra_data = some_pointer_array[hst->id];
> [Andreas 8/7]
> [Andreas 8/7] and why the hell would you *need* to
> [Andreas 8/7] debug something like that? Beyond
> [Andreas 8/7]
> [Andreas 8/7]   assert(hst->id >= 0 && hst->id <
> [Andreas 8/7]        some_pointer_array_size);
> [Andreas 8/7]
> [Andreas 8/7] there's absolutely *nothing* that can go
> [Andreas 8/7] wrong with code like that, if we assume
> [Andreas 8/7] the compiler is somewhat sane.
> 
> Say I have 10 things I need to represent including strings,
> integer values and floating point data. What does extra_data
> return to me? How do I tell it which of the 10 things I want?
> 

You get a struct from module X which has the fields you want.
You access whatever variables you want from those structs (yes,
you do need to know about that, or the modules have to make the
data available in such a way that reader modules doesn't have to
know about it at all).

> At that point I need more than just an arbitrary memory
> location.  I need extra_data be a (pointer to some)
> data structure that I create (say a linked list) so now
> the broker module writer needs to write a linked list
> library (or maybe the nagios core exposes one) or
> something else to store arbitrary key indexed data.
> 

Forget about linked list. They're muddling up your thinking and
have huge performance issues when dealing with any amount of
data.

> In my original scheme, nagios supplies a key/value
> store with a string key and arbitrary (pointer to)
> value. So the broker writer doesn't have to deal with
> the messyness of lookups, keeping a key/value store in
> his code etc. All that is placed in the realm of
> nagios. As long as they can create a simple type (or
> even a struct), they can store it and retrieve it from
> the host, service or whatever object.
> 

True that, but there will be *one* key lookup to get a compile-time
determined struct from the module.

> If nagios gets an upgrade in how it processes those
> things, every broker module gets the same upgrade and
> speed improvement.
> 

There can be no speed improvements to O(1) algorithms that require only
RAM accesses, unless you're talking about quantum computing with infinite
qubits. I hope you're not, because that's just silly.

> [rouilj 8/6] But I fail to see how that would keep the object abi
> [rouilj 8/6] static while allowing multiple modules to
> [rouilj 8/6] add/remove/change data.
> [Andreas 8/7]
> [Andreas 8/7] Because core nagios doesn't have to care one whit about
> [Andreas 8/7] what modules do with the id variable. It's *only* purpose
> [Andreas 8/7] is to allow modules to do constant-time lookups of
> [Andreas 8/7] whatever the module wants to track, without modules
> [Andreas 8/7] having to know *or care* about other modules' data.
> 
> Same thing happens with the linked list provided the
> keys are unique. However, having a generator in nagios
> that provides unique values for the keys would provide
> the same benefit. So in the broker init module I could
> do:
> 
>     static int ec_active_action = getid();
>     static int ec_passive_action = getid();
>     static int ec_active_prefix = getid();
>     static int ec_passive_prefix = getid();
>     static int ec_retry_count = getid();
> 
>     set(service_object, ec_active_action) = (int) parse_ec_active_action(get_custom_macro(_ec_active_action));
>     set(service_object, ec_active_prefix) = (char *) parse_ec_active_prefix(get_custom_macro(_ec_active_prefix));
> 

And why the hell would you *want* to?? It'll have horrible performance.

Seriously. There's just no talking to you when it comes to getting rid
of linked lists, so I'll just code two modules two different ways so
you can see how obvious it is that the way I propose to do things is
far superior.

If you still disagree with me, I urge you to study a computer science
book on algorithms. Steve Skiena's "The Algorithm Design Manual" is
really good.

So here we go... I'll start with your way and the code is unlikely to
compile since I'll be writing it in my email client. WR_module looks
at all host objects and determines if the id is prime, a power of two
or just "boring". RD_module uses that data to make sure all host
checks whose id is anything but "boring" never gets executed.

Your way:
--------------
in nagios:
------------
int register_ancillary_host_data(host *h, const char *id, void *ptr)
{
	ancillary_data *anc_ptr;
	anc_ptr = malloc(sizeof(*anc_ptr));
	anc_ptr->id = id;
	anc_ptr->data = ptr;
	anc_ptr->next = h->ancillary_data;
}

void *get_ancillary_host_data(host *h, const char *id)
{
	ancillary_data *ap;
	/* linear search. Stupid, but the best we can do since
	 * we expect quite few items anyway */
	for (ap = h->ancillary_data; ap; ap = ap->next)
		if (!strcmp(ap->id, id))
			return ap->data;
	return NULL;
}

------
in WR_mod:
------------------

#define ID_BORING 0
#define ID_POW2   1
#define ID_PRIME  2
unsigned int divisor(unsigned int val)
{
    unsigned int d;

    /* two is prime */
    if (val == 2)
        return 0;

    /* other even numbers are not */
    if ((val & 1) == 0)
        return 2;

    /* check every odd number up to sqrt(val) */
    for (d = 3; d * d < val; d += 2) {
        if (val % d == 0)
            return d;    /* evenly divisible */
    }

    return d * d == val ? d : 0;
}

int post_config_init(int callback, void *ds)
{
	host *h;
	int *ary;

	if (*(int *)ds != NEBCALLBACK_CREATE_ANCILLARY_DATA)
		return 0;
	ary = malloc(sizeof(int) * sched_info.total_hosts);
	for (h = host_list; h; h = h->next) {
		if (!divisor(h->id)) {
			register_host_data(h, "WR_mod", (void *)ID_PRIME);
		} else {
			/* 0 for boring. 1 for power of 2 */
			register_host_data(h, "WR_mod", (void *)!(h->id & h->id - 1));
		}
	}
}

int nebmodule_init(int flags, char *arg, nebmodule_handle *handle)
{
	neb_register_callback(NEBCALLBACK_PROCESS_DATA, handle, 0, post_config_init);
}


----
in RD_mod:
--------------
int block_primes_pow2(int callback, void *data)
{
	void *WR_mod_data;
	nebstruct_host_check_data *d = (nebstruct_host_check_data *)data;
	/* this causes a linear search, and one strcmp() per item, on scattered
	 * and fragmented memory, O(n) complexity, where n is registered items
	 * and a constant-cost of m, where m = strlen(id)
	 */
	WR_mod_data = get_ancillary_host_data(d->host_ptr, "WR_mod");
	if ((int)WR_mod_data) {
		return NEBERROR_CALLBACK_CANCEL;
	}
}

int nebmodule_init(int flags, char *arg, nebmodule_handle *handle)
{
	neb_register_callback(NEBCALLBACK_HOST_CHECK_DATA, handle, 0, block_primes_pow2);
}

-------------
my way:
-------------
in nagios:
-------------
void register_ancillary_data(nebmodule_handle *handle, void *data)
{
	handle->ancillary_data = data;
}

void *get_ancillary_data(const char *id)
{
	nebmodule_handle *mod;
	/* linear search, but happens once per config load */
	for (mod = nebmodule_list; mod; mod = mod->next) {
		if (!mod->id)
			continue;
		if (!strcmp(mod->id, id)
			return mod->data;
	}
	return NULL;
}

----
WR_mod:
----------------
static nebmodule_handle *g_handle;
int post_config_init(int callback, void *ds)
{
	host *h;
	int *ary;

	if (*(int *)ds != NEBCALLBACK_CREATE_ANCILLARY_DATA)
		return 0;
	ary = malloc(sizeof(int) * sched_info.total_hosts);
	for (h = host_list; h; h = h->next) {
		if (!divisor(h->id)) {
			ary[h->id] = ID_PRIME;
		} else {
			ary[h->id] = !(h->id & (h->id - 1));
		}
	}
	register_ancillary_data(handle, 
	return 0;
}

int nebmodule_init(int flags, char *arg, nebmodule_handle *handle)
{
	g_handle = handle;
	handle->id = "WR_mod"; /* probably set by macro at load-time, really */
	neb_register_callback(NEBCALLBACK_PROCESS_DATA, handle, 0, post_config_init);
}

------
in RD_mod:
-------
static int *WR_mod_ary;

int block_prime_pow2(int callback, void *data)
{
	nebstruct_host_check_data *d = (nebstruct_host_check_data *)data;
	/* lookup in constant time. */
	if (WR_mod_ary && WR_mod_ary[d->host_ptr->id])
		return NEBERROR_CALLBACK_CANCEL;
	return 0;
}

int post_config_init(void cb, void *ds)
{
	if (*(int *)ds != NEBCALLBACK_ANCILLARY_DATA_CREATED)
		return 0;
	/* linear search. Happens once in O(n) time, where n is number of
	 * loaded modules with data to dish out
	 */
	WR_mod_ary = get_ancillary_data("WR_mod");
	if (!WR_mod_ary)
		return 0;
	neb_register_callback(NEBCALLBACK_HOST_CHECK_DATA, handle, 0, block_prime_pow2);
	return 0;
}

int nebmodule_init(int flags, char *arg, nebmodule_handle *handle)
{
	neb_register_callback(NEBCALLBACK_PROCESS_DATA, handle, 1, post_config_init);
	return 0;
}

-----------

Now... if you want me to accept your way of doing things, you'll have
to code up your way and prove to me that it performs better than my
way or provides other benefits that my way does not. That's a pretty
tough challenge btw, since I'm very, very certain you can't succeed.

-- 
Andreas Ericsson                   andreas.ericsson at op5.se
OP5 AB                             www.op5.se
Tel: +46 8-230225                  Fax: +46 8-230231

Considering the successes of the wars on alcohol, poverty, drugs and
terror, I think we should give some serious thought to declaring war
on peace.

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/




More information about the Developers mailing list