Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Info

This manual will show you a few tools for debugging the CMC core if it's crashing.

Status
colourGreen
titleLAST TESTED ON CHECKMK 2.3.0p1

...

Warning

Before you delve into low-level debugging of why the CMC is running but not working (without a stack trace), please check the "Master Control" snap-in in the sidebar first!

If the Service Checks and Host Checks are disabled, that might be the reason for your problem.

Screenshot of the Master Control with both service checks and host checks disabled


Analyze CMC Core

strace

You can use strace to track the CMC process when you face any issue:

Code Block
languagebash
themeRDark
root@linux~# strace --output=cmc-strace.log --string-limit=9999 --absolute-timestamps=precision:us --follow-forks --attach="$(cat $MYSITE/tmp/run/cmc.pid)"

...

OptionExplanation
--attachThe process ID to attach to
--output=Sets the output file
--string-limit=Sets the possible string output length
--absolute-timestamps=Sets the format for timestamps
--follow-forksFollow forks of the traced process


valgrind

You can use valgrind to start the CMC in the debug mode. Here you will get a full stack trace. If valgrind is unavailable on your system, install it or run the CMC only with the -g option.

Code Block
languagebash
themeRDark
root@linux~# su mysite
OMD[mysite]:~$ omd stop cmc
OMD[mysite]:~$ valgrind --num-callers=30 cmc -g
or   
root@linux~# su mysite
OMD[mysite]:~$ omd stop cmc
OMD[mysite]:~$ cmc -g


gbd

With gdb, you can analyze the coredump if checkmk will create one. Note: Checkmk will only create one if you enable it in the global settings.

...

Code Block
languagebash
themeRDark
root@linux:~# gdb /omd/sites/mysite/bin/cmc --core=<PATH/TO/COREUMP>
(gdb) r 


frozen CMC

When the CMC seems to freeze and nothing happens, please run this command before restarting the CMC:

Code Block
languagebash
themeRDark
root@linux:~# gdb -p $(cat ~mysite/tmp/run/cmc.pid) --batch -ex 'set pagination off' -ex 'thread apply all backtrace'


Or to write that to a file:

Code Block
languagebash
themeRDark
root@linux:~# gdb -p $(cat ~mysite/tmp/run/cmc.pid) --batch -ex 'set pagination off' -ex 'thread apply all backtrace' |& tee /home/mylinuxuser/Downloads/cmccrash/gdb.txt


Another option to collect more traces would be to run gdb in a loop  (5 minutes)and write the output in a file:

Code Block
languagebash
themeRDark
root@linux:~# for iter in {1..60}; do
printf "\nrun %i\n\n" $iter
gdb -p "$(cat "/omd/sites/mysite/tmp/run/cmc.pid")" --batch -ex 'set pagination off' -ex 'thread apply all backtrace' || true
sleep 5
done |& tee /home/mylinuxuser/Downloads/gdb.txt


Analyze coredump file

Note

By default, there is no coredump creation enabled. You can enable that via Setup Global settings Monitoring coreEnable core dumps

After a crash of the CMC, a coredump in ~/var/check_mk/core/ will be written

screenshot of Enable core dumps disabled


gbd

With gdb, you can analyze the coredump if checkmk will create one. Note: Checkmk will only create one if you enable it in the global settings.

Code Block
languagebash
themeRDark
root@linux:~#gdb /omd/sites/mysite/bin/cmc --core=/home/mylinuxuser/Downloads/core.python3.989.4b7ee3adffd14e31a0188aac0c215161.804036.1640164046000000 
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2Copyright (C) 2020 Free Software Foundation, Inc.License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>This is free software: you are free to change and redistribute it.There is NO WARRANTY, to the extent permitted by law.Type "show copying" and "show warranty" for details.This GDB was configured as "x86_64-linux-gnu".Type "show configuration" for configuration details.For bug reporting instructions, please see:<http://www.gnu.org/software/gdb/bugs/>.Find the GDB manual and other documentation resources online at:    <http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".Type "apropos word" to search for commands related to "word"...Reading symbols from /omd/sites/at/bin/cmc...
warning: core file may not match specified executable file.[New LWP 804036]Core was generated by `python3 /omd/sites/mysite/bin/cmk --discover-marked-hosts'.Program terminated with signal SIGSEGV, Segmentation fault.#0  0x00007f2b661be1fd in ?? ()
(gdb) where
#0  0x00007f2b661be1fd in ?? ()
#1  0x00007ffed8a75060 in ?? ()
#2  0x0000000000000000 in ?? ()
# Run it (if it's still crashing, you'll see it crash)
r 
# View the backtrace (call stack)
bt  
# Quit when done 
q
# Memory mappings
i proc m
# Listing all threads. This is really useful! 
thread apply all bt


Enable log within gdb

Code Block
languagebash
themeRDark
set logging file gdb_log.txt
set logging on
set trace-commands on
show logging     # prove logging is on
flush
set pretty print on
bt               # view the backtrace
set logging off  
show logging     # prove logging is back off


objdump

With objdump, you can fetch the content of the dump.

Code Block
languagebash
themeRDark
root@linux:~# objdump -s /mypath_tofile/core.python3.989.4b7ee3adffd14e31a0188aac0c215161.804036.1640164046000000 >dump_sup8890.txt


file command

With the file command, you can also fetch the content of the dump.

Code Block
languagebash
themeRDark
# Command:
file /mypath_tofile/core.python3.989.4b7ee3adffd14e31a0188aac0c215161.804036.1640164046000000 

# Output:
/mypath_tofile/core.python3.989.4b7ee3adffd14e31a0188aac0c215161.804036.1640164046000000: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from 'python3 /omd/sites/mysite/bin/cmk --discover-marked-hosts', real uid: 989, effective uid: 989, real gid: 1000, effective gid: 1000, execfn: '/omd/sites/mysite/bin/python3', platform: 'x86_64'


Open a support case

If your investigation is not successful, please open a ticket and provide us with the following data:

...