R: bischeck suddenly stops working

Anders Håål anders.haal at ingby.com
Wed Aug 9 17:15:51 CEST 2017


Okay and keep us updated with your findings.

On 08/09/2017 05:10 PM, Francesco Toffoli wrote:
> Hi Anders,
> I modified the log configuration as you suggested, but after the 
> bischeckd daemon stop and start i didn't notice any particular warning 
> or critical messages. So i decided to start waiting for a crash and 
> then to proceed with the logs analisys.  I'll keep you updated
> .Thanks
>
>
> Inviato da smartphone Samsung Galaxy.
>
> -------- Messaggio originale --------
> Da: Anders Håål <anders.haal at ingby.com>
> Data: 09/08/17 08:16 (GMT+01:00)
> A: bischeck-users at monitoring-lists.org
> Oggetto: Re: bischeck suddenly stops working
>
> Francesco - any progress on the issue?
>
>
> On 07/26/2017 05:52 PM, Anders Håål wrote:
>>
>> Thanks for the feedback.
>>
>> When bischeck "stop working" it would be interesting to understand if 
>> anything gets logged after it "stops" and also what is logged when 
>> you do a restart - but I suggest you do a stop and see what is logged 
>> before starting.
>>
>> I would suggest that you change the log level in logback.xml for all 
>> packages
>>
>>  <root level="INFO">
>>     <appender-ref ref="bischeck"/>
>>   </root>
>>
>> To avoid duplicates you should also add the additivity="false" on the 
>> other logger. Based on the standard logback.xml you can test this in 
>> your test environment first, have not tested it my self, and if it 
>> looks good deploy in in production according to your specific 
>> customization of paths, etc.
>>
>>
>> logback.xml:
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>>
>> <configuration>
>>   <jmxConfigurator />
>>   <appender name="bischeck" 
>> class="ch.qos.logback.core.rolling.RollingFileAppender">
>>     <!--See also 
>> http://logback.qos.ch/manual/appenders.html#RollingFileAppender-->
>>     <File>/var/tmp/bischeck.log</File>
>>     <encoder>
>>       <pattern>%d{yyyy-MM-dd HH:mm:ss.SSS,Europe/Stockholm} ; %p ; %t 
>> ; %c ; %m%ex%n</pattern>
>>     </encoder>
>>
>>     <rollingPolicy 
>> class="ch.qos.logback.core.rolling.FixedWindowRollingPolicy">
>>       <maxIndex>3</maxIndex>
>> <FileNamePattern>/var/tmp/bischeck.log.%i</FileNamePattern>
>>     </rollingPolicy>
>>
>>     <triggeringPolicy 
>> class="ch.qos.logback.core.rolling.SizeBasedTriggeringPolicy">
>>       <MaxFileSize>1000KB</MaxFileSize>
>>     </triggeringPolicy>
>>
>>   </appender>
>>
>>   <logger name="com.ingby" level="INFO" additivity="false">
>>     <appender-ref ref="bischeck"/>
>>   </logger>
>>
>>
>>   <logger 
>> name="com.ingby.socbox.bischeck.configuration.CachePurgeJob" 
>> level="DEBUG" additivity="false">
>>     <appender-ref ref="bischeck"/>
>>   </logger>
>>
>>   <logger name="com.ingby.socbox.bischeck.cache.provider.redis" 
>> level="DEBUG" additivity="false">
>>     <appender-ref ref="bischeck"/>
>>   </logger>
>>
>>
>>   <logger name="org.quartz" level="INFO" additivity="false">
>>     <appender-ref ref="bischeck"/>
>>   </logger>
>>
>>   <root level="WARN">
>>     <appender-ref ref="bischeck"/>
>>   </root>
>>
>> </configuration>
>>
>>
>> The root section will secure that everything from any java packages 
>> with WARN or ERROR is logged to the bischeck appender.
>> Regards
>> Anders
>>
>> On 07/25/2017 09:55 AM, Francesco Giuseppe Toffoli wrote:
>>>
>>> Hi Anders,
>>> thanks for your reply. I'll answer you to the variuos questions:
>>>
>>> (1) the java version is:
>>>
>>> openjdk version "1.8.0_91"
>>> OpenJDK Runtime Environment (build 1.8.0_91-b14)
>>> OpenJDK 64-Bit Server VM (build 25.91-b14, mixed mode)
>>>
>>> and has not been updated recently. In our test environment, (where 
>>> the problem does not occur), the version is nearly the same (1.8.0_121).
>>> The OS has not been updated, (CentOS release 6.6).
>>>
>>> (2) Redis has not been uptaded recently, (redis 2.8.23). At the 
>>> moment we have more or less 13.000 keys used.
>>>
>>> (3) We usually add checks, maybe weekly. The issue started to occur 
>>> some months ago, but it could happen that for 2 or 3 weeks 
>>> everything is ok,  then we have several crashes in a week. I'm not 
>>> so inclined to give the guilt to some new checks, also because the 
>>> testing server is aligned to the production one.
>>>
>>>
>>> (5) Yes, the restart is done via '/etc/init.d/bischeckd restart' and 
>>> it solves the issue. Physical memory on the server is always OK, i 
>>> don't think to a jvm out of memory.
>>>
>>> In the Bischeck logs i didn't notice any error. However, at the next 
>>> crash i'll try have a deeper look at them.
>>> Could i have a look at some other logs maybe?
>>>
>>> Thanks,
>>> Francesco
>>>
>>>
>>>
>>>
>>>
>>> Il 24/07/2017 21:57, Anders Håål ha scritto:
>>>>
>>>> Hi Giuseppe,
>>>>
>>>> Sounds strange that it just stopped working after along time of 
>>>> stability if not something has change:
>>>>
>>>> - Anything change on the server you run bischeck on - OS, jdk 
>>>> version, ......
>>>>
>>>> - Update redis version? Change in configuration?
>>>>
>>>> - Added any new bischeck check or changed something in the 
>>>> configuration?
>>>>
>>>> - Anything else you can think about that may have change?
>>>>
>>>> When you say restarting is it the normal /etc/init.d/bischeckd 
>>>> restart that fix the problem? The reason I ask is that the script 
>>>> just do a kill with TERM signal. If the jvm would be in a out of 
>>>> memory situation it may not be enough, but you should have seen 
>>>> that in the log I guess. Sure you do not have any ERROR or WARN 
>>>> entries in the log.
>>>>
>>>> /Anders
>>>>
>>>>
>>>>
>>>> On 07/24/2017 02:14 PM, Francesco Giuseppe Toffoli wrote:
>>>>>
>>>>> Hi,
>>>>> we are experiencing a critical problem with Bischeck. It's a 
>>>>> couple of months it sometimes suddenly stops working: the daemon  
>>>>> /etc/init.d/bicheckd is running but no check results are sent to 
>>>>> Nagios. Restarting bischeck daemon fixes the issue.
>>>>> Unfortunately we can't find any clue about the root cause on 
>>>>> bischeck logs, not even with DEBUG logging level enabled. Redis 
>>>>> database seems working properly and no increasing of memory/cpu 
>>>>> usage are reported on the server hosting bischeck while the issue 
>>>>> occurs.
>>>>>
>>>>> Do you have any suggestion on how to deeply investigate this?
>>>>>
>>>>> Regards,
>>>>> Francesco
>>>>>
>>>>> -- 
>>>>>
>>>>> Francesco Giuseppe Toffoli
>>>>> Monitoring Engineer
>>>>>
>>>>> GSE Department
>>>>>
>>>>> Tel: +39 01127387488
>>>>>
>>>>> Mobile: +39 349.800.60.35
>>>>> Email: _ftoffoli at skylogic.it <mailto:ftoffoli at skylogic.it>_
>>>>> *
>>>>> **Skylogic S. p. A.*
>>>>> Strada Pianezza, 289
>>>>> 10151 Torino, Italy
>>>>>
>>>>>
>>>>>
>>>>> This message contains confidential information and is intended 
>>>>> only for the individual named. If you are not the named addressee 
>>>>> you should not disseminate, distribute or copy this e-mail. Please 
>>>>> notify the sender immediately by e-mail if you have received this 
>>>>> e-mail by mistake and delete this e-mail from your system. E-mail 
>>>>> transmission cannot be guaranteed to be secure or error-free as 
>>>>> information could be intercepted, corrupted, lost, destroyed, 
>>>>> arrive late or incomplete, or contain viruses. The sender 
>>>>> therefore does not accept liability for any errors or omissions in 
>>>>> the contents of this message, which arise as a result of e-mail 
>>>>> transmission. If verification is required please request a 
>>>>> hard-copy version. Please note that any views or opinions 
>>>>> presented in this email are solely those of the author and do not 
>>>>> necessarily represent those of the Company.
>>>>> No employee or agent is authorized to conclude any binding 
>>>>> agreement on behalf of this Company nor, through this latter, any 
>>>>> of the Eutelsat Communication group with another party by email 
>>>>> without express written confirmation by a duly authorized officer 
>>>>> of the Company. The list of duly authorized officers and the scope 
>>>>> of their powers is published on the Trade Register according to 
>>>>> the national law of each affiliate.
>>>>
>>>> -- 
>>>>
>>>>
>>>> Ingby<http://www.ingby.com>
>>>>
>>>> bischeck - dynamic and adaptive monitoring for Nagios<http://www.bischeck.org>
>>>>
>>>> anders.haal at ingby.com<mailto:anders.haal at ingby.com>
>>>>
>>>> Mjukvara genom ingenjörsmässig kreativitet och kompetens
>>>>
>>>> Ingenjörsbyn
>>>> Box 531
>>>> 101 30 Stockholm
>>>> Sweden
>>>> www.ingby.com  <http://www.ingby.com/>
>>>> Mobil: +46 70 575 35 46
>>>> Tele: +46 75 75 75 090
>>>> Fax:  +46 75 75 75 091
>>>
>>> -- 
>>>
>>> Francesco Giuseppe Toffoli
>>> Monitoring Engineer
>>>
>>> GSE Department
>>>
>>> Tel: +39 01127387488
>>>
>>> Mobile: +39 349.800.60.35
>>> Email: _ftoffoli at skylogic.it <mailto:ftoffoli at skylogic.it>_
>>> *
>>> **Skylogic S. p. A.*
>>> Strada Pianezza, 289
>>> 10151 Torino, Italy
>>>
>>>
>>>
>>> This message contains confidential information and is intended only 
>>> for the individual named. If you are not the named addressee you 
>>> should not disseminate, distribute or copy this e-mail. Please 
>>> notify the sender immediately by e-mail if you have received this 
>>> e-mail by mistake and delete this e-mail from your system. E-mail 
>>> transmission cannot be guaranteed to be secure or error-free as 
>>> information could be intercepted, corrupted, lost, destroyed, arrive 
>>> late or incomplete, or contain viruses. The sender therefore does 
>>> not accept liability for any errors or omissions in the contents of 
>>> this message, which arise as a result of e-mail transmission. If 
>>> verification is required please request a hard-copy version. Please 
>>> note that any views or opinions presented in this email are solely 
>>> those of the author and do not necessarily represent those of the 
>>> Company.
>>> No employee or agent is authorized to conclude any binding agreement 
>>> on behalf of this Company nor, through this latter, any of the 
>>> Eutelsat Communication group with another party by email without 
>>> express written confirmation by a duly authorized officer of the 
>>> Company. The list of duly authorized officers and the scope of their 
>>> powers is published on the Trade Register according to the national 
>>> law of each affiliate.
>>
>> -- 
>>
>>
>> Ingby<http://www.ingby.com>
>>
>> bischeck - dynamic and adaptive monitoring for Nagios<http://www.bischeck.org>
>>
>> anders.haal at ingby.com<mailto:anders.haal at ingby.com>
>>
>> Mjukvara genom ingenjörsmässig kreativitet och kompetens
>>
>> Ingenjörsbyn
>> Box 531
>> 101 30 Stockholm
>> Sweden
>> www.ingby.com  <http://www.ingby.com/>
>> Mobil: +46 70 575 35 46
>> Tele: +46 75 75 75 090
>> Fax:  +46 75 75 75 091
>
> -- 
>
>
> Ingby<http://www.ingby.com>
>
> bischeck - dynamic and adaptive monitoring for Nagios<http://www.bischeck.org>
>
> anders.haal at ingby.com<mailto:anders.haal at ingby.com>
>
> Mjukvara genom ingenjörsmässig kreativitet och kompetens
>
> Ingenjörsbyn
> Box 531
> 101 30 Stockholm
> Sweden
> www.ingby.com  <http://www.ingby.com/>
> Mobil: +46 70 575 35 46
> Tele: +46 75 75 75 090
> Fax:  +46 75 75 75 091
>
>
> This message contains confidential information and is intended only 
> for the individual named. If you are not the named addressee you 
> should not disseminate, distribute or copy this e-mail. Please notify 
> the sender immediately by e-mail if you have received this e-mail by 
> mistake and delete this e-mail from your system. E-mail transmission 
> cannot be guaranteed to be secure or error-free as information could 
> be intercepted, corrupted, lost, destroyed, arrive late or incomplete, 
> or contain viruses. The sender therefore does not accept liability for 
> any errors or omissions in the contents of this message, which arise 
> as a result of e-mail transmission. If verification is required please 
> request a hard-copy version. Please note that any views or opinions 
> presented in this email are solely those of the author and do not 
> necessarily represent those of the Company.
> No employee or agent is authorized to conclude any binding agreement 
> on behalf of this Company nor, through this latter, any of the 
> Eutelsat Communication group with another party by email without 
> express written confirmation by a duly authorized officer of the 
> Company. The list of duly authorized officers and the scope of their 
> powers is published on the Trade Register according to the national 
> law of each affiliate. 

-- 


Ingby <http://www.ingby.com>

bischeck - dynamic and adaptive monitoring for Nagios <http://www.bischeck.org>

anders.haal at ingby.com<mailto:anders.haal at ingby.com>

Mjukvara genom ingenjörsmässig kreativitet och kompetens

Ingenjörsbyn
Box 531
101 30 Stockholm
Sweden
www.ingby.com <http://www.ingby.com/>
Mobil: +46 70 575 35 46
Tele: +46 75 75 75 090
Fax:  +46 75 75 75 091

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://www.monitoring-lists.org/archive/bischeck-users/attachments/20170809/d168b0fa/attachment-0001.html>


More information about the Bischeck-users mailing list