Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Info

This manual will show you a few tools for debugging the CMC core if it's crashing.

Status
colourGreen
titleLAST TESTED ON CHECKMK 2.2.0P1


Panel
borderColorblack
bgColor#f8f8f8
titleTable of Contents

Table of Contents


Warning

Before you delve into low-level debugging of why the CMC is running but not working (without a stack trace), please check the "Master Control" snap-in in the sidebar first!

If the Service Checks and Host Checks are disabled, that might be the reason for your problem.

Analyze CMC core

strace

You can use strace to track the CMC process when you face any issue:

Code Block
languagebash
themeRDark
root@linux~# strace -o cmc-strace.log -p $(cat ~<MYSITE>/tmp/run/cmc.pid)

valgrind

You can use valgrind to start the CMC in the debug mode. Here you will get a full stack trace. If valgrind is unavailable on your system, install it or run the CMC only with the -g option.

Code Block
languagebash
themeRDark
root@linux~# su mysite
OMD[mysite]:~$ omd stop cmc
OMD[mysite]:~$ valgrind --num-callers=30 cmc -g

or   

root@linux~# su mysite
OMD[mysite]:~$ omd stop cmc
OMD[mysite]:~$ cmc -g

gdb

With gdb, you can analyze the coredump if checkmk will create one. Note: Checkmk will only create one if you enable it in the global settings.

With the -r option, you can re-run the CMC to analyze inside gdb.

Code Block
languagebash
themeRDark
root@linux:~# gdb /omd/sites/mysite/bin/cmc --core=<PATH/TO/COREUMP>
(gdb) r 

frozen CMC

When the CMC seems to freeze and nothing happens, please run this command before restarting the CMC:

Code Block
languagebash
themeRDark
root@linux:~# gdb -p $(cat ~mysite/tmp/run/cmc.pid) --batch -ex 'set pagination off' -ex 'thread apply all backtrace'

Or to write that to a file:

Code Block
languagebash
themeRDark
root@linux:~# gdb -p $(cat ~mysite/tmp/run/cmc.pid) --batch -ex 'set pagination off' -ex 'thread apply all backtrace' |& tee /home/mylinuxuser/Downloads/cmccrash/gdb.txt


Another option to collect more traces would be to run gdb in a loop  (5 minutes)and write the output in a file:

Code Block
languagebash
themeRDark
root@linux:~# for iter in {1..60}; do
printf "\nrun %i\n\n" $iter
gdb -p "$(cat "/omd/sites/mysite/tmp/run/cmc.pid")" --batch -ex 'set pagination off' -ex 'thread apply all backtrace' || true
sleep 5
done |& tee /home/mylinuxuser/Downloads/gdb.txt

Analyze coredump file

Note

By default, there is no coredump creation enabled. You can enable that via Setup Global settings Monitoring coreEnable core dumps

After a crash of the CMC, a coredump in ~/var/check_mk/core/ will be written

gdb

With gdb, you can analyze the coredump if checkmk will create one. Note: Checkmk will only create one if you enable it in the global settings.

Code Block
languagebash
themeRDark
root@linux:~#gdb /omd/sites/mysite/bin/cmc --core=/home/mylinuxuser/Downloads/core.python3.989.4b7ee3adffd14e31a0188aac0c215161.804036.1640164046000000 
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2Copyright (C) 2020 Free Software Foundation, Inc.License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>This is free software: you are free to change and redistribute it.There is NO WARRANTY, to the extent permitted by law.Type "show copying" and "show warranty" for details.This GDB was configured as "x86_64-linux-gnu".Type "show configuration" for configuration details.For bug reporting instructions, please see:<http://www.gnu.org/software/gdb/bugs/>.Find the GDB manual and other documentation resources online at:    <http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".Type "apropos word" to search for commands related to "word"...Reading symbols from /omd/sites/at/bin/cmc...
warning: core file may not match specified executable file.[New LWP 804036]Core was generated by `python3 /omd/sites/mysite/bin/cmk --discover-marked-hosts'.Program terminated with signal SIGSEGV, Segmentation fault.#0  0x00007f2b661be1fd in ?? ()
(gdb) where
#0  0x00007f2b661be1fd in ?? ()
#1  0x00007ffed8a75060 in ?? ()
#2  0x0000000000000000 in ?? ()


# Run it (if it's still crashing, you'll see it crash)
r 
# View the backtrace (call stack)
bt  
# Quit when done 
q
# Memory mappings
i proc m

# Listing all threads. This is really useful! 
thread apply all bt


Enable log within gdb

Code Block
languagebash
themeRDark
set logging file gdb_log.txt
set logging on
set trace-commands on
show logging     # prove logging is on
flush
set pretty print on
bt               # view the backtrace
set logging off  
show logging     # prove logging is back off


objdump

With objdump, you can fetch the content of the dump.

Code Block
languagebash
themeRDark
root@linux:~# objdump -s /mypath_tofile/core.python3.989.4b7ee3adffd14e31a0188aac0c215161.804036.1640164046000000 >dump_sup8890.txt


file command

With the file command, you can also fetch the content of the dump.

Code Block
languagebash
themeRDark
# Command:
file /mypath_tofile/core.python3.989.4b7ee3adffd14e31a0188aac0c215161.804036.1640164046000000 

# Output:
/mypath_tofile/core.python3.989.4b7ee3adffd14e31a0188aac0c215161.804036.1640164046000000: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from 'python3 /omd/sites/mysite/bin/cmk --discover-marked-hosts', real uid: 989, effective uid: 989, real gid: 1000, effective gid: 1000, execfn: '/omd/sites/mysite/bin/python3', platform: 'x86_64'


Open a support case

If your investigation is not successful, please open a ticket and provide us with the following data:

Please send us the following data to help us reproduce the issue. 

Code Block
languagebash
themeRDark
Please send us the following data to help us reproduce the issue.

 * Login as a site user with {{su - $MYSITE}} and
 * create an archive with the following command {{tar czf ~/corefiles.tgz ~/var/check_mk/core/ ~/var/log/}}.


Info


Filter by label (Content by label)
showLabelsfalse
max5
spacesKB
showSpacefalse
sortmodified
reversetrue
typepage
cqllabel in ( "coredump" , "cmc" ) and type = "page" and space = "KB"
labelscoredump cmc


Page Properties
hiddentrue


Related issues