Panel | ||||||
---|---|---|---|---|---|---|
| ||||||
|
Problem
You experience high CPU load and/or CPU utilization on your Checkmk server. Please be aware, that there are no absolute "good" numbers. "Good" utilization or load are entirely dependent on your infrastructure.
Context
A process monitor like the htop command shows 100% CPU usage for one core by the CMC process. The command line should look something similar to the one below.
Code Block | ||||
---|---|---|---|---|
| ||||
/omd/sites/mysite/bin/cmc /omd/sites/my_site/var/check_mk/core/config.pb |
Solution
Check your memory
It sounds weird, but often enough, low memory can lead to CPU stress, as the system starts swapping. If your memory is running low, add some more and check if the CPU load decreases. Check out the Checkmk System Requirements as a frame of reference for sizing.
Check the number of CPU cores
Yes, it is perfectly possible, that your server needs just a bit more power. Grant it a core or two more and see, if the CPU utilization decreases. Check out the Checkmk System Requirements as a frame of reference for sizing.
Check your VMs virtual CPU
It is not uncommon, that CPU emulation limits the CPU-features presented to a VM. This can have different reasons, e.g., compatibility in a clustered environment composed of hosts with different CPUs.
There have been reports of users, who changed the CPU emulation for their Checkmk server to one that enabled more features (e.g., hardware support for AES) and their load was cut in half.
Are you using the old symmetric encryption?
As outlined in our official guide, it is not a good idea to run the symmetric encryption within the TLS encryption of the agent, that comes with Checkmk 2.1.0 and above.
There have been reports of users, who had high CPU usage, that disabling the symmetric encryption (while keeping TLS encryption active of course) cut their load roughly in half.
Step-by-step guide
- Verify that the CMC is consuming 100% of one or more CPU cores
- Install a process monitor like htop
- Run the process monitor as a site user
- Filter for CMC (e.g., for htop with F4 key) and write the string cmc into the filter
Debugging
- Go to 'Master Control' within your sidebar.
. Disable both Host and Service checks and restart CMC
Code Block language bash theme RDark OMD[mysite]~# omd restart cmc
.
- Re-enable Host Checks and wait for at least 5 minutes
If the behavior reoccurs, disable Host Checks and restart CMC. - Re-enable Service Checks and wait for at least 5 minutes
If the behavior reoccurs, disable Service Checks and restart CMC. - Re-enable both Host Checks and Service Checks
Now, we need to understand which hosts might be causing this behavior.
- Start with the top-level folder of the affected site in Setup → Hosts and set the "Criticality" of the folder to "Do not monitor this host."
The subfolders will inherit this property. Activate changes and run omd restart on that site as the site user.
Code Block language bash theme RDark OMD[mysite]~# omd restart
.
- Now enable one of the subfolders and activate changes.
Run omd restart again and wait at least 5 minutes before checking htop.
Code Block language bash theme RDark OMD[mysite]~# omd restart
- If the CPU usage does not go back to 100%, repeat steps #3 & #4 until it does. Make sure to wait at least 5 minutes between eachomd restart. Once the CPU usage is back at 100%, we found our culprit.
- Now, we can move forward to see what is causing the issue. What kind of host is it? Agent, SNMP, or Special Agent?
- If it is an agent-based host:
- Any local plugins?
- Any special configuration?
Run strace as root. You can use strace to track the cmc process when you face any issue.
Code Block language bash theme RDark root@mylinuxhost~# strace -o cmc-strace.log -p $(cat ~<mysite>/tmp/run/cmc.pid)
Tip Further information can be found here: Debugging the Checkmk Micro Core (CMC) old#strace .
With gdb, you can analyze the coredump if checkmk will create one. Note: Checkmk will only create one if you enable it in the global settings.
Code Block language bash theme RDark gdb /omd/sites/mysite/bin/cmc --core=/home/mylinuxuser/Downloads/core.python3.989.4b7ee3adffd14e31a0188aac0c215161.804036.1640164046000000 GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2Copyright (C) 2020 Free Software Foundation, Inc.License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>This is free software: you are free to change and redistribute it.There is NO WARRANTY, to the extent permitted by law.Type "show copying" and "show warranty" for details.This GDB was configured as "x86_64-linux-gnu".Type "show configuration" for configuration details.For bug reporting instructions, please see:<http://www.gnu.org/software/gdb/bugs/>.Find the GDB manual and other documentation resources online at: <http://www.gnu.org/software/gdb/documentation/>. For help, type "help".Type "apropos word" to search for commands related to "word"...Reading symbols from /omd/sites/mysite/bin/cmc... warning: core file may not match specified executable file.[New LWP 804036]Core was generated by `python3 /omd/sites/mysite/bin/cmk --discover-marked-hosts'.Program terminated with signal SIGSEGV, Segmentation fault.#0 0x00007f2b661be1fd in ?? () (gdb) where #0 0x00007f2b661be1fd in ?? () #1 0x00007ffed8a75060 in ?? () #2 0x0000000000000000 in ?? () # Run it (if it's still crashing, you'll see it crash) r # View the backtrace (call stack) bt # Quit when done q # Memory mappings i proc m # Listing all threads. This is really useful! thread apply all bt
.
Tip Further information can be found here: Debugging the Checkmk Micro Core (CMC) old#gdb .
If your investigation is not successful, please open a ticket and provide us with the following data:
Please send us the following data to help us reproduce the issue.
Code Block language bash theme RDark * Login as a site user with {{su - $MYSITE}} and * create an archive with the following command {{tar czf ~/corefiles.tgz ~/var/check_mk/core/ ~/var/log/}}.
Related articles
Filter by label | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Page Properties | ||
---|---|---|
| ||
|