Debugging the Checkmk Micro Core (CMC)

This manual will show you a few tools for debugging the CMC core if it's crashing.

LAST TESTED ON CHECKMK 2.2.0P1

Table of Contents

Before you delve into low-level debugging of why the CMC is running but not working (without a stack trace), please check the "Master Control" snap-in in the sidebar first!

If the Service Checks and Host Checks are disabled, that might be the reason for your problem.

Analyze CMC core

strace

You can use strace to track the CMC process when you face any issue:

root@linux~# strace -o cmc-strace.log -p $(cat ~<MYSITE>/tmp/run/cmc.pid)

valgrind

You can use valgrind to start the CMC in the debug mode. Here you will get a full stack trace. If valgrind is unavailable on your system, install it or run the CMC only with the -g option.

root@linux~# su mysite
OMD[mysite]:~$ omd stop cmc
OMD[mysite]:~$ valgrind --num-callers=30 cmc -g

or   

root@linux~# su mysite
OMD[mysite]:~$ omd stop cmc
OMD[mysite]:~$ cmc -g

gdb

With gdb, you can analyze the coredump if checkmk will create one. Note: Checkmk will only create one if you enable it in the global settings.

With the -r option, you can re-run the CMC to analyze inside gdb.

root@linux:~# gdb /omd/sites/mysite/bin/cmc --core=<PATH/TO/COREUMP>
(gdb) r 

frozen CMC

When the CMC seems to freeze and nothing happens, please run this command before restarting the CMC:

root@linux:~# gdb -p $(cat ~mysite/tmp/run/cmc.pid) --batch -ex 'set pagination off' -ex 'thread apply all backtrace'

Or to write that to a file:

root@linux:~# gdb -p $(cat ~mysite/tmp/run/cmc.pid) --batch -ex 'set pagination off' -ex 'thread apply all backtrace' |& tee /home/mylinuxuser/Downloads/cmccrash/gdb.txt


Another option to collect more traces would be to run gdb in a loop  (5 minutes)and write the output in a file:

root@linux:~# for iter in {1..60}; do
printf "\nrun %i\n\n" $iter
gdb -p "$(cat "/omd/sites/mysite/tmp/run/cmc.pid")" --batch -ex 'set pagination off' -ex 'thread apply all backtrace' || true
sleep 5
done |& tee /home/mylinuxuser/Downloads/gdb.txt

Analyze coredump file

By default, there is no coredump creation enabled. You can enable that via Setup Global settings Monitoring coreEnable core dumps

After a crash of the CMC, a coredump in ~/var/check_mk/core/ will be written

gdb

With gdb, you can analyze the coredump if checkmk will create one. Note: Checkmk will only create one if you enable it in the global settings.

root@linux:~#gdb /omd/sites/mysite/bin/cmc --core=/home/mylinuxuser/Downloads/core.python3.989.4b7ee3adffd14e31a0188aac0c215161.804036.1640164046000000 
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2Copyright (C) 2020 Free Software Foundation, Inc.License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>This is free software: you are free to change and redistribute it.There is NO WARRANTY, to the extent permitted by law.Type "show copying" and "show warranty" for details.This GDB was configured as "x86_64-linux-gnu".Type "show configuration" for configuration details.For bug reporting instructions, please see:<http://www.gnu.org/software/gdb/bugs/>.Find the GDB manual and other documentation resources online at:    <http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".Type "apropos word" to search for commands related to "word"...Reading symbols from /omd/sites/at/bin/cmc...
warning: core file may not match specified executable file.[New LWP 804036]Core was generated by `python3 /omd/sites/mysite/bin/cmk --discover-marked-hosts'.Program terminated with signal SIGSEGV, Segmentation fault.#0  0x00007f2b661be1fd in ?? ()
(gdb) where
#0  0x00007f2b661be1fd in ?? ()
#1  0x00007ffed8a75060 in ?? ()
#2  0x0000000000000000 in ?? ()


# Run it (if it's still crashing, you'll see it crash)
r 
# View the backtrace (call stack)
bt  
# Quit when done 
q
# Memory mappings
i proc m

# Listing all threads. This is really useful! 
thread apply all bt


Enable log within gdb

set logging file gdb_log.txt
set logging on
set trace-commands on
show logging     # prove logging is on
flush
set pretty print on
bt               # view the backtrace
set logging off  
show logging     # prove logging is back off


objdump

With objdump, you can fetch the content of the dump.

root@linux:~# objdump -s /mypath_tofile/core.python3.989.4b7ee3adffd14e31a0188aac0c215161.804036.1640164046000000 >dump_sup8890.txt


file command

With the file command, you can also fetch the content of the dump.

# Command:
file /mypath_tofile/core.python3.989.4b7ee3adffd14e31a0188aac0c215161.804036.1640164046000000 

# Output:
/mypath_tofile/core.python3.989.4b7ee3adffd14e31a0188aac0c215161.804036.1640164046000000: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from 'python3 /omd/sites/mysite/bin/cmk --discover-marked-hosts', real uid: 989, effective uid: 989, real gid: 1000, effective gid: 1000, execfn: '/omd/sites/mysite/bin/python3', platform: 'x86_64'


Open a support case

If your investigation is not successful, please open a ticket and provide us with the following data:

Please send us the following data to help us reproduce the issue. 

Please send us the following data to help us reproduce the issue.

 * Login as a site user with {{su - $MYSITE}} and
 * create an archive with the following command {{tar czf ~/corefiles.tgz ~/var/check_mk/core/ ~/var/log/}}.