bischeck suddenly stops working

Anders Håål anders.haal at ingby.com
Wed Jul 26 17:52:00 CEST 2017


Thanks for the feedback.

When bischeck "stop working" it would be interesting to understand if 
anything gets logged after it "stops" and also what is logged when you 
do a restart - but I suggest you do a stop and see what is logged before 
starting.

I would suggest that you change the log level in logback.xml for all 
packages

  <root level="INFO">
     <appender-ref ref="bischeck"/>
   </root>

To avoid duplicates you should also add the additivity="false" on the 
other logger. Based on the standard logback.xml you can test this in 
your test environment first, have not tested it my self, and if it looks 
good deploy in in production according to your specific customization of 
paths, etc.


logback.xml:

<?xml version="1.0" encoding="UTF-8"?>

<configuration>
   <jmxConfigurator />
   <appender name="bischeck" 
class="ch.qos.logback.core.rolling.RollingFileAppender">
     <!--See also 
http://logback.qos.ch/manual/appenders.html#RollingFileAppender-->
     <File>/var/tmp/bischeck.log</File>
     <encoder>
       <pattern>%d{yyyy-MM-dd HH:mm:ss.SSS,Europe/Stockholm} ; %p ; %t ; 
%c ; %m%ex%n</pattern>
     </encoder>

     <rollingPolicy 
class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
       <maxIndex>3</maxIndex>
<FileNamePattern>/var/tmp/bischeck.log.%i</FileNamePattern>
     </rollingPolicy>

     <triggeringPolicy 
class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
       <MaxFileSize>1000KB</MaxFileSize>
     </triggeringPolicy>

   </appender>

   <logger name="com.ingby" level="INFO" additivity="false">
     <appender-ref ref="bischeck"/>
   </logger>


   <logger name="com.ingby.socbox.bischeck.configuration.CachePurgeJob" 
level="DEBUG" additivity="false">
     <appender-ref ref="bischeck"/>
   </logger>

   <logger name="com.ingby.socbox.bischeck.cache.provider.redis" 
level="DEBUG" additivity="false">
     <appender-ref ref="bischeck"/>
   </logger>


   <logger name="org.quartz" level="INFO" additivity="false">
     <appender-ref ref="bischeck"/>
   </logger>

   <root level="WARN">
     <appender-ref ref="bischeck"/>
   </root>

</configuration>


The root section will secure that everything from any java packages with 
WARN or ERROR is logged to the bischeck appender.
Regards
Anders

On 07/25/2017 09:55 AM, Francesco Giuseppe Toffoli wrote:
>
> Hi Anders,
> thanks for your reply. I'll answer you to the variuos questions:
>
> (1) the java version is:
>
> openjdk version "1.8.0_91"
> OpenJDK Runtime Environment (build 1.8.0_91-b14)
> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>
> and has not been updated recently. In our test environment, (where the 
> problem does not occur), the version is nearly the same (1.8.0_121).
> The OS has not been updated, (CentOS release 6.6).
>
> (2) Redis has not been uptaded recently, (redis 2.8.23). At the moment 
> we have more or less 13.000 keys used.
>
> (3) We usually add checks, maybe weekly. The issue started to occur 
> some months ago, but it could happen that for 2 or 3 weeks everything 
> is ok,  then we have several crashes in a week. I'm not so inclined to 
> give the guilt to some new checks, also because the testing server is 
> aligned to the production one.
>
>
> (5) Yes, the restart is done via '/etc/init.d/bischeckd restart' and 
> it solves the issue. Physical memory on the server is always OK, i 
> don't think to a jvm out of memory.
>
> In the Bischeck logs i didn't notice any error. However, at the next 
> crash i'll try have a deeper look at them.
> Could i have a look at some other logs maybe?
>
> Thanks,
> Francesco
>
>
>
>
>
> Il 24/07/2017 21:57, Anders Håål ha scritto:
>>
>> Hi Giuseppe,
>>
>> Sounds strange that it just stopped working after along time of 
>> stability if not something has change:
>>
>> - Anything change on the server you run bischeck on - OS, jdk 
>> version, ......
>>
>> - Update redis version? Change in configuration?
>>
>> - Added any new bischeck check or changed something in the configuration?
>>
>> - Anything else you can think about that may have change?
>>
>> When you say restarting is it the normal /etc/init.d/bischeckd 
>> restart that fix the problem? The reason I ask is that the script 
>> just do a kill with TERM signal. If the jvm would be in a out of 
>> memory situation it may not be enough, but you should have seen that 
>> in the log I guess. Sure you do not have any ERROR or WARN entries in 
>> the log.
>>
>> /Anders
>>
>>
>>
>> On 07/24/2017 02:14 PM, Francesco Giuseppe Toffoli wrote:
>>>
>>> Hi,
>>> we are experiencing a critical problem with Bischeck. It's a couple 
>>> of months it sometimes suddenly stops working: the daemon  
>>> /etc/init.d/bicheckd is running but no check results are sent to 
>>> Nagios. Restarting bischeck daemon fixes the issue.
>>> Unfortunately we can't find any clue about the root cause on 
>>> bischeck logs, not even with DEBUG logging level enabled. Redis 
>>> database seems working properly  and no increasing of memory/cpu 
>>> usage are reported on the server hosting bischeck while the issue 
>>> occurs.
>>>
>>> Do you have any suggestion on how to deeply investigate this?
>>>
>>> Regards,
>>> Francesco
>>>
>>> -- 
>>>
>>> Francesco Giuseppe Toffoli
>>> Monitoring Engineer
>>>
>>> GSE Department
>>>
>>> Tel: +39 01127387488
>>>
>>> Mobile: +39 349.800.60.35
>>> Email: _ftoffoli at skylogic.it <mailto:ftoffoli at skylogic.it>_
>>> *
>>> **Skylogic S. p. A.*
>>> Strada Pianezza, 289
>>> 10151 Torino, Italy
>>>
>>>
>>>
>>> This message contains confidential information and is intended only 
>>> for the individual named. If you are not the named addressee you 
>>> should not disseminate, distribute or copy this e-mail. Please 
>>> notify the sender immediately by e-mail if you have received this 
>>> e-mail by mistake and delete this e-mail from your system. E-mail 
>>> transmission cannot be guaranteed to be secure or error-free as 
>>> information could be intercepted, corrupted, lost, destroyed, arrive 
>>> late or incomplete, or contain viruses. The sender therefore does 
>>> not accept liability for any errors or omissions in the contents of 
>>> this message, which arise as a result of e-mail transmission. If 
>>> verification is required please request a hard-copy version. Please 
>>> note that any views or opinions presented in this email are solely 
>>> those of the author and do not necessarily represent those of the 
>>> Company.
>>> No employee or agent is authorized to conclude any binding agreement 
>>> on behalf of this Company nor, through this latter, any of the 
>>> Eutelsat Communication group with another party by email without 
>>> express written confirmation by a duly authorized officer of the 
>>> Company. The list of duly authorized officers and the scope of their 
>>> powers is published on the Trade Register according to the national 
>>> law of each affiliate.
>>
>> -- 
>>
>>
>> Ingby<http://www.ingby.com>
>>
>> bischeck - dynamic and adaptive monitoring for Nagios<http://www.bischeck.org>
>>
>> anders.haal at ingby.com<mailto:anders.haal at ingby.com>
>>
>> Mjukvara genom ingenjörsmässig kreativitet och kompetens
>>
>> Ingenjörsbyn
>> Box 531
>> 101 30 Stockholm
>> Sweden
>> www.ingby.com  <http://www.ingby.com/>
>> Mobil: +46 70 575 35 46
>> Tele: +46 75 75 75 090
>> Fax:  +46 75 75 75 091
>
> -- 
>
> Francesco Giuseppe Toffoli
> Monitoring Engineer
>
> GSE Department
>
> Tel: +39 01127387488
>
> Mobile: +39 349.800.60.35
> Email: _ftoffoli at skylogic.it <mailto:ftoffoli at skylogic.it>_
> *
> **Skylogic S. p. A.*
> Strada Pianezza, 289
> 10151 Torino, Italy
>
>
>
> This message contains confidential information and is intended only 
> for the individual named. If you are not the named addressee you 
> should not disseminate, distribute or copy this e-mail. Please notify 
> the sender immediately by e-mail if you have received this e-mail by 
> mistake and delete this e-mail from your system. E-mail transmission 
> cannot be guaranteed to be secure or error-free as information could 
> be intercepted, corrupted, lost, destroyed, arrive late or incomplete, 
> or contain viruses. The sender therefore does not accept liability for 
> any errors or omissions in the contents of this message, which arise 
> as a result of e-mail transmission. If verification is required please 
> request a hard-copy version. Please note that any views or opinions 
> presented in this email are solely those of the author and do not 
> necessarily represent those of the Company.
> No employee or agent is authorized to conclude any binding agreement 
> on behalf of this Company nor, through this latter, any of the 
> Eutelsat Communication group with another party by email without 
> express written confirmation by a duly authorized officer of the 
> Company. The list of duly authorized officers and the scope of their 
> powers is published on the Trade Register according to the national 
> law of each affiliate.

-- 


Ingby <http://www.ingby.com>

bischeck - dynamic and adaptive monitoring for Nagios <http://www.bischeck.org>

anders.haal at ingby.com<mailto:anders.haal at ingby.com>

Mjukvara genom ingenjörsmässig kreativitet och kompetens

Ingenjörsbyn
Box 531
101 30 Stockholm
Sweden
www.ingby.com <http://www.ingby.com/>
Mobil: +46 70 575 35 46
Tele: +46 75 75 75 090
Fax:  +46 75 75 75 091

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/bischeck-users/attachments/20170726/863b8450/attachment-0001.html>


More information about the Bischeck-users mailing list