Problem
You are finding log messages in your Checkmk log files ~/var/log/*.log
and do not understand at first sight, what they tell you.
Solution
The following is a listing of common messages in the above-mentioned log files and what can be learned from them.
Preface
All the log entries beginning with [log] like the following are from the python application.
2021-09-15 00:00:45 [4] [main] [RRD helper 2205] [log] Error creating RRD for cmc_single;$HOST;$SERVICE;;count;0: /opt/omd/sites/$SITE/var/check_mk/rrd/$HOST/_HOST_.rrd: illegal attempt to update using time 1631656844 when last update time is 1631656844
The main thread from the CMC sends the command to the RRD helper process. This process is answering with an error line. So in the error is related to the RRD daemon. This has to be kept in mind when troubleshooting these log messages.
To analyze the internal CMC threads you can use pstree
:
cmc.log
Message | Description |
---|---|
[client $NUMBER] Polling failed: Bad file descriptor | When using livestatus proxy these messages are expected and can be disregarded. |
[client $NUMBER] Polling failed: Connection timed out [client $NUMBER] error: client connection terminated: timeout | These messages occur, when the connection to another livestatus daemon or proxy time out. If these message occurs sporadically, you can disregard it. If it occurs frequently, you will see errors in the web interface, indicating, that there is an issue with a certain site. |
[generic pool] [helper $NUMBER] killed by signal 1 | The generic helper did not finish in a timely manner or misbehaved in another way, so it was killed by the core. |
[helper $NUMBER] [log] Error in PIGGYBACK fetcher: MKTimeout('Fetcher for host "$HOST" timed out after 60 seconds') | The fetcher did not receive an answer from the monitored system and timed out. This issue will be visible in the web interface when occurring repeatedly, as the host will complain about not receiving agent data. |
[helper $NUMBER] [log] Error in SNMP fetcher: ValueError("invalid literal for int() with base 10: ''") | Probably invalid SNMP output from the monitored device, you should have received a crash report. Please upload this crash report. |
[generic pool] [helper $NUMBER] killed by signal 11 | https://checkmk.de/check_mk-werks.php?werk_id=10130 |
Related articles