[RFC] - NDOUtils duplicate lines problem

Michael Friedrich michael.friedrich at univie.ac.at
Mon Nov 16 22:56:50 CET 2009


Hi,

Hendrik Baecker wrote:
> a few days ago Øyvind Nordang, Julien Mathis and  Michael Friedrich,
> discussed about duplicate rows in the servicecheck table.
> Michael dropped me a few lines off-list showing up the problem exists
> in some more tables.
I wrote that here too but noone replied on that. I did a little research 
where the faulty patch was coming from but I could only see the initial 
commit by Ethan on that.
>
> The underlying problem:
> MySQL queries like
>     "insert into... on duplicate key update..."
> needs primary and/or unique keys to be hit by the query to trigger the
> UPDATE instead of the INSERT.
> Most of the relevant keys for those queries are actually primary nor
> unique keys, they are just KEYs.
Jep. Just as the MySQL manual points out.
>
> So we have(98%) / might have (2%) duplicate entries in the following
> tables:
>
> nagios_servicechecks
> nagios_systemcommands
> nagios_timedeventqueue
> nagios_timedevents
Those users who are using the 2 years old 1.4b7 are not affected - the 
commit was after that release. So if they read this, do not upgrade and 
wait for the next version (or do not apply the unique key changes).
>
> Now to the problem from the point of development: I see no good
> solution to handle this.
Well regarding the fact that NDOUtils doesn't seem to come out of beta, 
you might just change the db scheme using the unique keys again.
>
> One part of the solution is to change the keys to define the mandatory
> UNIQUE keys, but this can't be done cleanly on existing data since you
> can't define some UNIQUE keys on rows that would be duplicates.
>
> Solutions to be discussed:
>
> 1. Big-Bang-Solution
> Define the user have to truncate above tables to be able to set the
> right keys.
> This will result in data loss.
You might be able to write a query which extracts duplicated rows on the 
unique constraints you want to redefine (group by, order by) - and then 
redo the 2x on duplicate key insert or update statements. But this is 
only a solution for people who need exactly those tables for reporting. 
But if you want to learn procedures this would be the right place for 
starting correcting this *pardon* big fuckup.
>
> 2.
> It might be possible to write some nifty upgrade scripts to dump the
> data out of the database, set the unique keys and try to re-import the
> data with some fault tolerance.
> Anyone out there with some experience on this level?
> Contra:
> Depending on the amount of data, such a job might be: a long runner
> (time); complicated; not realy error resistant...
I would not recommend that on a production server since the table will 
be locked then and ndo2db blocks and Nagios becomes a dead daemon from 
the blocking neb module.
>
> What thinks this community about the cost-benefit ratio?
Regarding the blocking NEB module i would recommend data backup and 
truncate table.
>
> IMO:
> nagios_timedevent* table are not so important and data loss is irrelevant.
> nagios_systemcommands might be just more interesting for someone
> but nagios_servicechecks seems to me like a treasure of gold, data
> loss in here might be realy painful
systemcommands are issued e.g. if perfdata is enabled. servicechecks 
need to be re-organised in case. but using 2 backups - one to work on 
and the other as is.

>
> Any comments?
> May be someone is able to adress some voodoo data repair query/script
> to eliminate those duplicates before key changing?
Some kind of the following script (best practical would be a perl script)

0. create a new table after servicechecks scheme but apply the drop 
key/add unique key statement
1. query to get all the data: select * from icinga_servicechecks order 
by instance_id, service_object_id, start_time, start_time_usec;
2. step through and fetch odd and even results, check if constraints 
from above are the same. if not continue;
3. all data from the first row should be kept, only the ones to be 
updated are:
check_type, current_check_attempt, max_check_attempts, state, 
state_type, start_time, start_time_usec, end_time, end_time_usec, 
timeout='%d', early_timeout, execution_time, latency, return_code, 
output, long_output, perfdata
4. build a nice insert query with those values and insert into a 
temporay table
5. step further through all of the entries (will take very very long on 
bigger databases)
6. check the new table for duplicate entries issueing
7. drink a beer
8. truncate the servicechecks table, dump the new temp table and import that
9. apply drop key / add unique key, should not fail
10. drink more beer and start *do2db again
11. use Oracle instead of MySQL ;-)
>
>
> PS
> For the interested one, the following key changes should be done:
If you are on 1.4b8/9 they will fail with the ADD UNIQUE KEY statement 
each time :-)

Kind regards,
Michael


------------------------------------------------------------------------------
Let Crystal Reports handle the reporting - Free Crystal Reports 2008 30-Day 
trial. Simplify your report design, integration and deployment - and focus on 
what you do best, core application coding. Discover what's new with
Crystal Reports now.  http://p.sf.net/sfu/bobj-july




More information about the Developers mailing list