Troubleshooting long-running Windows agent
This article helps debug long-running Windows agents.
LAST TESTED ON CHECKMK 2.3.0P1
Problem
A Windows agent runs by default a few seconds. This is within our default 60 seconds timeout. Due to agent extensions (e.g., plugins, local checks) or misconfiguration an agent can run longer and this can lead to a timeout in the Checkmk UI.
Step-by-step Guide
Steps to figure out which section/plugin is affected
Open PowerShell and run the agent locally to make sure how long it took
Measure-Command { C:\ProgramData\checkmk\agent\bin\cmk-agent-ctl.exe -vv dump > agent_output.txt } > agent_output_time.txtTip: For easier troubleshooting on the next steps, the files were moved to a Linux server.
Go through the
check_mk.log*files and use the followinggrepcommands:
This command will give you a list of all the sections and how long each of them took. We'll also sort by the section name.grep -roP "Section '\w+' took \[\d+\]" |sort -t '[' -k1,1n |uniq -c
This command will give you a list of all the sections and how long each of them took. We'll also sort by the time to list the long-running ones first.grep -roP "Section '\w+' took \[\d+\]" |sort -t '[' -k2,2n |uniq -c
Solution
Before tweaking or changing anything, we should first understand why that section/plugin is taking very long.
Two possible solutions:
If the plugin needs more than 60 seconds to provide data, feel free to follow this guide to run it asynchronous:
Asynchronous execution of Windows pluginsIf it's a section inside the agent, you would have to change the agent interval.