Troubleshooting long-running Windows agent
This article helps debug long-running Windows agents.
LAST TESTED ON CHECKMK 2.3.0P1
Problem
A Windows agent runs by default a few seconds. This is within our default 60 seconds timeout. Due to agent extensions (e.g., plugins, local checks) or misconfiguration an agent can run longer and this can lead to a timeout in the Checkmk UI.
Step-by-step Guide
Steps to figure out which section/plugin is affected
Open PowerShell and run the agent locally to make sure how long it took
Measure-Command { C:\ProgramData\checkmk\agent\bin\cmk-agent-ctl.exe -vv dump > agent_output.txt } > agent_output_time.txt
.
Go through the check_mk.log* files and use the following grep commands:
Disclaimer
For easier troubleshooting, I moved the files to a Linux server.
This command will give you a list of all the sections and how long each of them took. We'll also sort by the section name.grep -roP "Section '\w+' took \[\d+\]" |sort -t '[' -k1,1n |uniq -c
This command will give you a list of all the sections and how long each of them took. We'll also sort by the time to list the long-running ones first.grep -roP "Section '\w+' took \[\d+\]" |sort -t '[' -k2,2n |uniq -c
Solution
- Before tweaking or changing anything, we should first understand why that section/plugin is taking very long.
- Two possible solutions:
- If the plugin needs more than 60 seconds to provide data, feel free to follow this guide to run it asynchronous:
Asynchronous execution of Windows plugins - If it's a section inside the agent, you would have to change the agent interval.
- If the plugin needs more than 60 seconds to provide data, feel free to follow this guide to run it asynchronous:
Related articles