Info |
---|
This manual will show you a few tools for debugging the CMC core if it's crashing. |
Status |
---|
colour | Green |
---|
title | LAST TESTED ON CHECKMK 2.3.0p1 |
---|
|
...
Warning |
---|
Before you delve into low-level debugging of why the CMC is running but not working (without a stack trace), please check the "Master Control" snap-in in the sidebar first! If the Service Checks and Host Checks are disabled, that might be the reason for your problem.
|
Analyze CMC Core
strace
You can use strace to track the CMC process when you face any issue:
Code Block |
---|
|
root@linux~# strace --output=cmc-strace.log --string-limit=9999 --absolute-timestamps=precision:us --follow-forks --attach="$(cat $MYSITE/tmp/run/cmc.pid)" |
...
Option | Explanation |
---|
--attach | The process ID to attach to |
--output= | Sets the output file |
--string-limit= | Sets the possible string output length |
--absolute-timestamps= | Sets the format for timestamps |
--follow-forks | Follow forks of the traced process |
valgrind
You can use valgrind to start the CMC in the debug mode. Here you will get a full stack trace. If valgrind is unavailable on your system, install it or run the CMC only with the -g option.
Code Block |
---|
|
root@linux~# su mysite
OMD[mysite]:~$ omd stop cmc
OMD[mysite]:~$ valgrind --num-callers=30 cmc -g
or
root@linux~# su mysite
OMD[mysite]:~$ omd stop cmc
OMD[mysite]:~$ cmc -g |
...
Code Block |
---|
|
root@linux:~# gdb /omd/sites/mysite/bin/cmc --core=<PATH/TO/COREUMP>
(gdb) r |
frozen CMC
When the CMC seems to freeze and nothing happens, please run this command before restarting the CMC:
Code Block |
---|
|
root@linux:~# gdb -p $(cat ~mysite/tmp/run/cmc.pid) --batch -ex 'set pagination off' -ex 'thread apply all backtrace' |
Or to write that to a file:
Code Block |
---|
|
root@linux:~# gdb -p $(cat ~mysite/tmp/run/cmc.pid) --batch -ex 'set pagination off' -ex 'thread apply all backtrace' |& tee /home/mylinuxuser/Downloads/cmccrash/gdb.txt |
Another option to collect more traces would be to run gdb in a loop (5 minutes)and write the output in a file:
Code Block |
---|
|
root@linux:~# for iter in {1..60}; do
printf "\nrun %i\n\n" $iter
gdb -p "$(cat "/omd/sites/mysite/tmp/run/cmc.pid")" --batch -ex 'set pagination off' -ex 'thread apply all backtrace' || true
sleep 5
done |& tee /home/mylinuxuser/Downloads/gdb.txt |
Analyze coredump file
Note |
---|
By default, there is no coredump creation enabled. You can enable that via Setup → Global settings → Monitoring core → Enable core dumps After a crash of the CMC, a coredump in ~/var/check_mk/core/ will be written |
gbd
With gdb, you can analyze the coredump if checkmk will create one. Note: Checkmk will only create one if you enable it in the global settings.
Code Block |
---|
|
root@linux:~#gdb /omd/sites/mysite/bin/cmc --core=/home/mylinuxuser/Downloads/core.python3.989.4b7ee3adffd14e31a0188aac0c215161.804036.1640164046000000
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2Copyright (C) 2020 Free Software Foundation, Inc.License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>This is free software: you are free to change and redistribute it.There is NO WARRANTY, to the extent permitted by law.Type "show copying" and "show warranty" for details.This GDB was configured as "x86_64-linux-gnu".Type "show configuration" for configuration details.For bug reporting instructions, please see:<http://www.gnu.org/software/gdb/bugs/>.Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".Type "apropos word" to search for commands related to "word"...Reading symbols from /omd/sites/at/bin/cmc...
warning: core file may not match specified executable file.[New LWP 804036]Core was generated by `python3 /omd/sites/mysite/bin/cmk --discover-marked-hosts'.Program terminated with signal SIGSEGV, Segmentation fault.#0 0x00007f2b661be1fd in ?? ()
(gdb) where
#0 0x00007f2b661be1fd in ?? ()
#1 0x00007ffed8a75060 in ?? ()
#2 0x0000000000000000 in ?? ()
# Run it (if it's still crashing, you'll see it crash)
r
# View the backtrace (call stack)
bt
# Quit when done
q
# Memory mappings
i proc m
# Listing all threads. This is really useful!
thread apply all bt |
Enable log within
...
gbd
Code Block |
---|
|
set logging file gdb_log.txt
set logging on
set trace-commands on
show logging # prove logging is on
flush
set pretty print on
bt # view the backtrace
set logging off
show logging # prove logging is back off |
objdump
With objdump, you can fetch the content of the dump.
Code Block |
---|
|
root@linux:~# objdump -s /mypath_tofile/core.python3.989.4b7ee3adffd14e31a0188aac0c215161.804036.1640164046000000 >dump_sup8890.txt |
file command
With the file command, you can also fetch the content of the dump.
Code Block |
---|
|
# Command:
file /mypath_tofile/core.python3.989.4b7ee3adffd14e31a0188aac0c215161.804036.1640164046000000
# Output:
/mypath_tofile/core.python3.989.4b7ee3adffd14e31a0188aac0c215161.804036.1640164046000000: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from 'python3 /omd/sites/mysite/bin/cmk --discover-marked-hosts', real uid: 989, effective uid: 989, real gid: 1000, effective gid: 1000, execfn: '/omd/sites/mysite/bin/python3', platform: 'x86_64' |
Open a support case
If your investigation is not successful, please open a ticket and provide us with the following data:
...