Problem
The Windows Management Instrumentation tends to time out on a regular basis, no matter which version is used or how big the machine is sized. Since WMI is responsible for collecting all performance data within Windows (which can be also viewed with perfmon on the specific machines), this may likely result in Checkmk services go stale, such as "Memory & Pagefile" and "Processor Queue".
The easiest fix: reboot. If you reboot your systems on a regular basis, i.e. once a month due to patching, this problem should occur quite rarely.
Another approach is to increase the WMI timeout in the Checkmk yml-configuration file, which is available since v1.6 (former: ini-file)
Inside the C:\ProgramData\checkmk\agent\log\check_mk.log you should see an error like this:
2021-08-03 17:15:35.063 [Err ] Timeout [3] seconds broken when query WMI
Solution
Please follow this guidance to increase the timeout.
- Go to the agent directory C:\ProgramData\checkmk\agent
Open the check_mk.user.yml file and search for the wmi_timeout section. Remove the '#' and select the value
global: ... wmi_timeout: 7 # <- 7 sec, default ist 3 ...
You can modify this file even when you're using the Agent Bakery!
This guidance is working for Checkmk 1.6 and 2.0
With Checkmk 2.1 you can modify this timeout using the Agent Bakery: https://checkmk.com/werk/12328
Related articles