Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Info
In this step-by-step guide, we want to advise you on dealing with high CPU usage of the CMC.



Panel
borderColor#CCCCCC
bgColor#e3fcef

LAST TESTED ON CHECKMK 2.3.0P1



Panel
borderColorblack
bgColor#f8f8f8
titleTable of Contents

Table of Contents

Problem

You experience high CPU load and/or CPU utilization on your Checkmk server. Please be aware, that there are no absolute "good" numbers. "Good" utilization or load are entirely dependent on your infrastructure.



Context

A process monitor like the htop command shows 100% CPU usage for one core by the CMC process. The command line should look something similar to the one below.

Code Block
languagebash
themeRDark
/omd/sites/mysite/bin/cmc /omd/sites/my_site/var/check_mk/core/config.pb


Solution

Check your memory

It sounds weird, but often enough, low memory can lead to CPU stress, as the system starts swapping. If your memory is running low, add some more and check if the CPU load decreases. Check out the Checkmk System Requirements as a frame of reference for sizing.

Check the number of CPU cores

Yes, it is perfectly possible, that your server needs just a bit more power. Grant it a core or two more and see, if the CPU utilization decreases. Check out the Checkmk System Requirements as a frame of reference for sizing.

Check your VMs virtual CPU

It is not uncommon, that CPU emulation limits the CPU-features presented to a VM. This can have different reasons, e.g., compatibility in a clustered environment composed of hosts with different CPUs.

There have been reports of users, who changed the CPU emulation for their Checkmk server to one that enabled more features (e.g., hardware support for AES) and their load was cut in half.

Are you using the old symmetric encryption?

As outlined in our official guide, it is not a good idea to run the symmetric encryption within the TLS encryption of the agent, that comes with Checkmk 2.1.0 and above.

There have been reports of users, who had high CPU usage, that disabling the symmetric encryption (while keeping TLS encryption active of course) cut their load roughly in half.

Step-by-step guide

  1. Verify that the CMC is consuming 100% of one or more CPU cores

    1. Install a process monitor like htop
    2. Run the process monitor as a site user
    3. Filter for CMC (e.g., for htop with F4 key) and write the string cmc into the filter

Screenshot of Htop with all four CPUs at 100 percent utilization.


Debugging

  1. Go to 'Master Control' within your sidebar.
    .
  2. Disable both Host and Service checks and restart CMC
    Screenshot of the right-hand side Master control with services and host checks set to off.

    Code Block
    languagebash
    themeRDark
    OMD[mysite]~# omd restart cmc

    .

  3. Re-enable Host Checks and wait for at least 5 minutes
    Screenshot of the right-hand side Master control with services checks set to off.

    If the behavior reoccurs, disable Host Checks and restart CMC.

  4. Re-enable Service Checks and wait for at least 5 minutes
    Screenshot of the right-hand side Master control with host checks set to off.

    If the behavior reoccurs, disable Service Checks and restart CMC.

  5. Re-enable both Host Checks and Service Checks

Now, we need to understand which hosts might be causing this behavior.

  1. Start with the top-level folder of the affected site in Setup Hosts and set the "Criticality" of the folder to "Do not monitor this host."
    Screenshot of a host folder properties. Criticality is enabled and set to Do not monitor host.


    The subfolders will inherit this property.
    Screenshot of a host folder properties. Criticality is enabled and set to Do not monitor this  host.


  2. Activate changes and run omd restart on that site as the site user.

    Code Block
    languagebash
    themeRDark
    OMD[mysite]~# omd restart

    .

  3. Now enable one of the subfolders and activate changes.
    Screenshot of a host folder properties. Criticality is enabled and set to Productive System.


  4. Run omd restart again and wait at least 5 minutes before checking htop

    Code Block
    languagebash
    themeRDark
    OMD[mysite]~# omd restart




  5. If the CPU usage does not go back to 100%, repeat steps #3 & #4 until it does. Make sure to wait at least 5 minutes between eachomd restart. Once the CPU usage is back at 100%, we found our culprit.

  6. Now, we can move forward to see what is causing the issue. What kind of host is it? Agent, SNMP, or Special Agent?

    • If it is an agent-based host:
    • Any local plugins?
    • Any special configuration?

  7. Run strace as root. You can use strace to track the cmc process when you face any issue. 

    Code Block
    languagebash
    themeRDark
    root@mylinuxhost~# strace -o cmc-strace.log -p $(cat ~<mysite>/tmp/run/cmc.pid)


    Tip
    Further information can be found here: Debugging the Checkmk Micro Core (CMC) old#strace

    .

  8. With gdb, you can analyze the coredump if checkmk will create one. Note: Checkmk will only create one if you enable it in the global settings.

    Code Block
    languagebash
    themeRDark
    gdb /omd/sites/mysite/bin/cmc --core=/home/mylinuxuser/Downloads/core.python3.989.4b7ee3adffd14e31a0188aac0c215161.804036.1640164046000000
    GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2Copyright (C) 2020 Free Software Foundation, Inc.License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>This is free software: you are free to change and redistribute it.There is NO WARRANTY, to the extent permitted by law.Type "show copying" and "show warranty" for details.This GDB was configured as "x86_64-linux-gnu".Type "show configuration" for configuration details.For bug reporting instructions, please see:<http://www.gnu.org/software/gdb/bugs/>.Find the GDB manual and other documentation resources online at:    <http://www.gnu.org/software/gdb/documentation/>.
    For help, type "help".Type "apropos word" to search for commands related to "word"...Reading symbols from /omd/sites/mysite/bin/cmc...
    warning: core file may not match specified executable file.[New LWP 804036]Core was generated by `python3 /omd/sites/mysite/bin/cmk --discover-marked-hosts'.Program terminated with signal SIGSEGV, Segmentation fault.#0  0x00007f2b661be1fd in ?? ()
    (gdb) where
    #0  0x00007f2b661be1fd in ?? ()
    #1  0x00007ffed8a75060 in ?? ()
    #2  0x0000000000000000 in ?? ()
     
     
    # Run it (if it's still crashing, you'll see it crash)
    r
    # View the backtrace (call stack)
    bt 
    # Quit when done
    q
    # Memory mappings
    i proc m
     
    # Listing all threads. This is really useful!
    thread apply all bt

    .

    Tip
    Further information can be found here: Debugging the Checkmk Micro Core (CMC) old#gdb

    .

  9. If your investigation is not successful, please open a ticket and provide us with the following data:

    Please send us the following data to help us reproduce the issue. 

    Code Block
    languagebash
    themeRDark
     * Login as a site user with {{su - $MYSITE}} and
     * create an archive with the following command {{tar czf ~/corefiles.tgz ~/var/check_mk/core/ ~/var/log/}}.


Filter by label
showLabelsfalse
max5
spacesCON
showSpacefalse
sortmodified
reversetrue
typepage
cqllabel in ( "cmc" , "kb-troubleshooting-article" ) and type = "page" and space = "KB"
labelscmc


Page Properties
hiddentrue


Related issues