This manual will show you a few tools for debugging the CMC core if it's crashing.
LAST TESTED ON CHECKMK 2.2.0P1
Before you delve into low-level debugging of why the CMC is running but not working (without a stack trace), please check the "Master Control" snap-in in the sidebar first!
If the Service Checks and Host Checks are disabled, that might be the reason for your problem.
Analyze CMC core
strace
You can use strace to track the CMC process when you face any issue:
root@linux~# strace -o cmc-strace.log -p $(cat ~<MYSITE>/tmp/run/cmc.pid)
valgrind
You can use valgrind to start the CMC in the debug mode. Here you will get a full stack trace. If valgrind is unavailable on your system, install it or run the CMC only with the -g option.
root@linux~# su mysite OMD[mysite]:~$ omd stop cmc OMD[mysite]:~$ valgrind --num-callers=30 cmc -g or root@linux~# su mysite OMD[mysite]:~$ omd stop cmc OMD[mysite]:~$ cmc -g
gdb
With gdb, you can analyze the coredump if checkmk will create one. Note: Checkmk will only create one if you enable it in the global settings.
With the -r option, you can re-run the CMC to analyze inside gdb.
root@linux:~# gdb /omd/sites/mysite/bin/cmc --core=<PATH/TO/COREUMP> (gdb) r
frozen CMC
When the CMC seems to freeze and nothing happens, please run this command before restarting the CMC:
root@linux:~# gdb -p $(cat ~mysite/tmp/run/cmc.pid) --batch -ex 'set pagination off' -ex 'thread apply all backtrace'
Or to write that to a file:
root@linux:~# gdb -p $(cat ~mysite/tmp/run/cmc.pid) --batch -ex 'set pagination off' -ex 'thread apply all backtrace' |& tee /home/mylinuxuser/Downloads/cmccrash/gdb.txt
Another option to collect more traces would be to run gdb in a loop (5 minutes)and write the output in a file:
root@linux:~# for iter in {1..60}; do printf "\nrun %i\n\n" $iter gdb -p "$(cat "/omd/sites/mysite/tmp/run/cmc.pid")" --batch -ex 'set pagination off' -ex 'thread apply all backtrace' || true sleep 5 done |& tee /home/mylinuxuser/Downloads/gdb.txt
Analyze coredump file
By default, there is no coredump creation enabled. You can enable that via Setup → Global settings → Monitoring core → Enable core dumps
After a crash of the CMC, a coredump in ~/var/check_mk/core/ will be written
gdb
With gdb, you can analyze the coredump if checkmk will create one. Note: Checkmk will only create one if you enable it in the global settings.
root@linux:~#gdb /omd/sites/mysite/bin/cmc --core=/home/mylinuxuser/Downloads/core.python3.989.4b7ee3adffd14e31a0188aac0c215161.804036.1640164046000000 GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2Copyright (C) 2020 Free Software Foundation, Inc.License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>This is free software: you are free to change and redistribute it.There is NO WARRANTY, to the extent permitted by law.Type "show copying" and "show warranty" for details.This GDB was configured as "x86_64-linux-gnu".Type "show configuration" for configuration details.For bug reporting instructions, please see:<http://www.gnu.org/software/gdb/bugs/>.Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help".Type "apropos word" to search for commands related to "word"...Reading symbols from /omd/sites/at/bin/cmc... warning: core file may not match specified executable file.[New LWP 804036]Core was generated by `python3 /omd/sites/mysite/bin/cmk --discover-marked-hosts'.Program terminated with signal SIGSEGV, Segmentation fault.#0 0x00007f2b661be1fd in ?? () (gdb) where #0 0x00007f2b661be1fd in ?? () #1 0x00007ffed8a75060 in ?? () #2 0x0000000000000000 in ?? () # Run it (if it's still crashing, you'll see it crash) r # View the backtrace (call stack) bt # Quit when done q # Memory mappings i proc m # Listing all threads. This is really useful! thread apply all bt
- Enable log within gdb
set logging file gdb_log.txt set logging on set trace-commands on show logging # prove logging is on flush set pretty print on bt # view the backtrace set logging off show logging # prove logging is back off
objdump
With objdump, you can fetch the content of the dump.
root@linux:~# objdump -s /mypath_tofile/core.python3.989.4b7ee3adffd14e31a0188aac0c215161.804036.1640164046000000 >dump_sup8890.txt
file command
With the file command, you can also fetch the content of the dump.
# Command: file /mypath_tofile/core.python3.989.4b7ee3adffd14e31a0188aac0c215161.804036.1640164046000000 # Output: /mypath_tofile/core.python3.989.4b7ee3adffd14e31a0188aac0c215161.804036.1640164046000000: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from 'python3 /omd/sites/mysite/bin/cmk --discover-marked-hosts', real uid: 989, effective uid: 989, real gid: 1000, effective gid: 1000, execfn: '/omd/sites/mysite/bin/python3', platform: 'x86_64'
Open a support case
If your investigation is not successful, please open a ticket and provide us with the following data:
Please send us the following data to help us reproduce the issue.
Please send us the following data to help us reproduce the issue. * Login as a site user with {{su - $MYSITE}} and * create an archive with the following command {{tar czf ~/corefiles.tgz ~/var/check_mk/core/ ~/var/log/}}.
Useful links
- https://gcc.gnu.org/onlinedocs/gcc/Debugging-Options.html
- https://stackoverflow.com/questions/8305866/how-do-i-analyze-a-programs-core-dump-file-with-gdb-when-it-has-command-line-pa
- https://valgrind.org/
- https://www.brendangregg.com/blog/2016-08-09/gdb-example-ncurses.html
- https://man7.org/linux/man-pages/man1/objdump.1.html
Related articles