Debugging Checkmk Special Agents (Combined)
This article explains how to manually execute and debug Checkmk special agents to identify configuration, connectivity, and permission issues.
LAST TESTED ON CHECKMK 2.3.0P1
Overview
Checkmk special agents are standalone programs used to collect monitoring data from systems that cannot be monitored with the standard Checkmk agent, such as virtualization environments, storage systems, cloud and container platforms.
When a special agent fails, the Checkmk GUI usually provides only limited error information. Proper troubleshooting requires identifying the exact agent command generated by Checkmk, running it manually as the site user, and enabling debug and trace options to capture detailed output.
AWS special agent
The Amazon Web Service (AWS) special agent collects metrics and inventory data from AWS using Application Programming Interface (API) credentials.
Checkmk Versions 2.2 and above
If you want to execute the special agent from the command line, please run the following command.
/omd/sites/mysite/share/check_mk/agents/special/agent_aws --access-key-id MYACCESSKEYID --secret-access-key MYSECRETEACCESSKEY --regions MYAWSREGION --global-services ce cloudfront route53 --services cloudwatch_alarms dynamodb ebs ec2 ecs elasticache elb elbv2 glacier lambda rds s3 sns wafv2 --ec2-limits --ebs-limits --s3-limits --glacier-limits --elb-limits --elbv2-limits --rds-limits --cloudwatch_alarms-limits --dynamodb-limits --wafv2-limits --lambda-limits --sns-limits --ecs-limits --elasticache-limits --s3-requests --cloudwatch-alarms --wafv2-cloudfront --cloudfront-host-assignment aws_host --hostname aws --piggyback-naming-convention ip_region_instance
Viewing Effective Configuratio
To view more AWS options, use the cmk -D aws combined with grep.
OMD[mysite]:~$ cmk -D aws |grep -A2 "Type of agent"
Type of agent:
Program: /omd/sites/mysite/share/check_mk/agents/special/agent_aws --access-key-id MYACCESSKEY --secret-access-key MYSECRETKEY --regions MYAWSREGION --global-services ce cloudfront route53 --services cloudwatch_alarms dynamodb ebs ec2 ecs elasticache elb elbv2 glacier lambda rds s3 sns wafv2 --ec2-limits --ebs-limits --s3-limits --glacier-limits --elb-limits --elbv2-limits --rds-limits --cloudwatch_alarms-limits --dynamodb-limits --wafv2-limits --lambda-limits --sns-limits --ecs-limits --elasticache-limits --s3-requests --cloudwatch-alarms --wafv2-cloudfront --cloudfront-host-assignment aws_host --hostname aws --piggyback-naming-convention ip_region_instance
Debugging
Enable debug, verbose output, and API tracing by running the agent with the --debug, --verbose, and --vcrtrace options.
Checkmk Versions 2.2.0p42 and Above
/omd/sites/mysite/share/check_mk/agents/special/agent_aws --access-key-id MYACCESSKEYID --secret-access-key MYSECRETEACCESSKEY --regions MYAWSREGION --services ecs --ecs-limits --hostname MYHOSTNAME --piggyback-naming-convention ip_region_instance --debug --verbose --vcrtrace /omd/sites/mysite/tmp/debug &> vtrace.txtTwo files must be generated:
debug.logcontaining agent execution detailstrace.txtcontaining recorded AWS API calls
Azure special agent
The Azure special agent retrieves metrics and inventory data via Azure Resource Manager and Microsoft Graph Application Programming Interface (API)s. Errors often stem from missing permissions or blocked outbound connectivity.
Azure endpoints
If you're experiencing connectivity issues or receiving unexpected data, please ensure that your endpoints are available from the Checkmk server.
Please review Microsoft's documentation for further information.
Checkmk Versions 2.2.0p42 and Above
/omd/sites/mysite/share/check_mk/agents/special/agent_azure '--subscription' 'MYSUBSCRIPTIONKEY' '--tenant' 'MYTENANTKEY' '--client' 'MYCLIENTKEY' '--secret' 'MYSECRET' '--authority' 'global' '--cache-id' 'azure' '--piggyback_vms' 'self' --services SERVICES --debug -vvv --vcrtrace /omd/sites/mysite/tmp/debug &> output.log
Service examples that can be used in the command above:
usage_detailsMicrosoft.Compute/virtualMachinesMicrosoft.Network/virtualNetworkGatewaysMicrosoft.Sql/servers/databasesMicrosoft.Storage/storageAccountsMicrosoft.Web/sitesMicrosoft.DBforMySQL/serversMicrosoft.DBforMySQL/flexibleServersMicrosoft.DBforPostgreSQL/serversMicrosoft.DBforPostgreSQL/flexibleServersMicrosoft.Network/trafficmanagerprofilesMicrosoft.Network/loadBalancersMicrosoft.RecoveryServices/vaultsMicrosoft.Network/applicationGateways
Viewing Effective Configuration
To view more Azure options, use the cmk -D azure combined with grep.
OMD[mysite]:~$ cmk -D azure |grep -A2 "Type of agent"
Type of agent:
Program: /omd/sites/mysite/share/check_mk/agents/special/agent_azure --tenant MYTENANTKEY --client MYCLIENTKEY --secret MYSECRETKEY --subscription MYSUBSCRIPTIONKEY --piggyback_vms self --services users_count ad_connect app_registrations Microsoft.Compute/virtualMachines Microsoft.Network/virtualNetworkGateways Microsoft.Sql/servers/databases Microsoft.Storage/storageAccounts Microsoft.Web/sites Microsoft.DBforMySQL/servers Microsoft.DBforPostgreSQL/servers Microsoft.Network/trafficmanagerprofiles Microsoft.Network/loadBalancers Microsoft.RecoveryServices/vaults Microsoft.Network/applicationGateways
Program: /omd/sites/mysite/share/check_mk/agents/special/agent_azure_status australiacentral australiacentral2 australiaeast australiasoutheast brazilsouth brazilsoutheast canadacentral germanynorth germanywestcentral koreacentral westeurope
Common Error: Insufficient Graph Privileges
Error message:Graph client: Insufficient privileges to complete the operation
Possible Solution:
Open the Azure Portal
Navigate to Azure Active Directory
Open App registrations on the left side
Select the Checkmk application
Open API permissions on the left side
Click Add Permissions and add a permissions for Microsoft Graph
Full list of access rights needed:
API & Use | Documentation |
|---|---|
Get Metric data | |
Get resources | |
Get resource groups | |
Consumption details | |
VM info | |
Active Directory top users | |
Active Directory organizations |
These are the metrics we get via the Azure agents
Resource URI | Metric name |
|---|---|
Microsoft.Network/virtualNetworkGateways | AverageBandwidth,P2SBandwidth |
Microsoft.Sql/servers/databases | storage_percent,deadlock,cpu_percent,dtu_consumption_percent,connection_successful,connection_failed |
Microsoft.Storage/storageAccounts | UsedCapacity,Ingress,Egress,Transactions,SuccessServerLatency,SuccessE2ELatency,Availability |
Microsoft.Web/sites |
|
Common Error: SSL "bad handshake"
When trying to monitor Microsoft Azure environment, you see the following error message in the Checkmk service Azure Agent Info:
Possible Solution
You need to make sure that your Checkmk server can connect to the following two addresses:
management.azure.comlogin.microsoft.com
If your Checkmk server cannot establish a connection or the connection times out, Azure monitoring will not function. You can quickly verify connectivity by logging in as the Checkmk site user and testing the connection with either Telnet or Netcat.
OMD[mysite]:~$ nc -zv management.azure.com 443
OMD[mysite]:~$ telnet management.azure.com 443
OMD[mysite]:~$ nc -zv login.microsoftonline.com 443
OMD[mysite]:~$ telnet login.microsoftonline.com 443
The output of these commands should look like this:
OMD[mysite]:~$ nc -zv management.azure.com 443
Connection to management.azure.com 443 port [tcp/https] succeeded!
OMD[mysite]:~$ telnet management.azure.com 443
Trying 20.51.10.137...
Connected to management.azure.com.
Escape character is '^]'.
OMD[mysite]:~$ nc -zv login.microsoft.com 443
Connection to login.microsoft.com 443 port [tcp/https] succeeded!
OMD[mysite]:~$ telnet login.microsoftonline.com 443
Trying 20.190.128.18...
Connected to login.microsoftonline.com.
Escape character is '^]'.Firewalls blocking these endpoints are a frequent cause of agent failure.
BI Special Agent
A BI Special Agent is a monitoring agent that collects health, status, and performance data directly from Business Intelligence (BI) systems using BI-specific APIs or interfaces, providing visibility beyond basic infrastructure monitoring. With Werk #6679, a new BI Special Agent was introduced that securely receives the required key directly via stdin, rather than through command-line arguments or configuration files.
If you use special agents installed from a Feature Pack, you can find the special agents in:
OMD[mysite]:~$ ~/local/share/check_mk/agents/special/
The special agents that are already shipped with Checkmk can be found here:
OMD[mysite]:~$ ~/share/check_mk/agents/special/
If you want to execute the special agent from the command line, use cmk -D bi which can be piped (|) to head.
OMD[mysite]:~$ cmk -D bi |head -n15
bi
Addresses: No IP
Tags: [address_family:no-ip], [agent:cmk-agent], [checkmk-agent:checkmk-agent], [criticality:prod], [networking:lan], [piggyback:auto-piggyback], [site:bi], [snmp_ds:no-snmp], [tcp:tcp]
Labels: [cmk/site:mysite]
Host groups: check_mk
Contact groups: all
Agent mode: Normal Checkmk agent, or special agent if configured
Type of agent:
Program: /omd/sites/mysite/share/check_mk/agents/special/agent_bi
Program stdin:
[{'site': 'local', 'credentials': 'automation', 'filter': {'groups': ['Hosts']}, 'assignments': {'querying_host': 'querying_host'}}]
Process piggyback data from /omd/sites/bi/tmp/check_mk/piggyback/bi
Services:
checktype item params description groupsWith this command, you will get the special agent call. Now you can continue the debugging.
How to execute the command manually:
echo "[{'site': 'local', 'credentials': 'automation', 'filter': {'groups': ['Hosts']}, 'assignments': {'querying_host': 'querying_host'}}]" | /omd/sites/mysite/share/check_mk/agents/special/agent_bi
Jenkins Special Agent
For the Jenkins Special Agent to function correctly, the Jenkins user account used for authentication must have sufficient permissions to query job, node, and build information. At a minimum, the user must be granted the following access rights within Jenkins:
General: Read
To allow basic access to the Jenkins instance.Agent: Connect
To enable communication with build agents.Element: Read & Workspace
To retrieve job details and workspace information.Views: Read
To access and enumerate configured views.
Without these permissions, the agent may fail to collect monitoring data or return incomplete results.
Kubernetes Special Agent
Getting Started
Background information regarding this subject is available on our:
Supported Kubernetes Versions
Support is limited to the three most recent Kubernetes versions. When a newly released Kubernetes version is in use and that release is less than one month old, a support ticket should be opened to verify whether compatibility and official support for that version are already in place.
Installation of Checkmk Cluster Collectors (a.k.a. Checkmk Kubernetes agent)
We strongly recommend using our helm charts for installing the Checkmk Cluster Collectors unless you are well-experienced with Kubernetes and want to install the agent using the manifests in YAML format provided by us.
Please remember that we cannot support you in installing the agent while using the manifests.
The Helm chart installs and configures all necessary components to run the agent and exposes several helpful configuration options that will help you automatically set up complex resources. The prerequisites have to be fulfilled before you proceed with the installation.
Below is an example of deploying the helm charts using a LoadBalancer (requires the ability of the cluster to create a LoadBalancer):
$ helm repo add checkmk-chart https://checkmk.github.io/checkmk_kube_agent
$ helm repo update
$ helm upgrade --install --create-namespace -n checkmk-monitoring checkmk checkmk-chart/checkmk --set clusterCollector.service.type="LoadBalancer"
Release "checkmk" does not exist. Installing it now.
NAME: checkmk
LAST DEPLOYED: Tue May 17 22:01:07 2022
NAMESPACE: checkmk-monitoring
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:You can access the checkmk `cluster-collector` via:
LoadBalancer:
==========
NOTE: It may take a few minutes for the LoadBalancer IP to be available.
You can watch the status of by running 'kubectl get --namespace checkmk-monitoring svc -w checkmk-cluster-collector'
export SERVICE_IP=$(kubectl get svc --namespace checkmk-monitoring checkmk-cluster-collector --template "{{ range (index .status.loadBalancer.ingress 0) }}{{.}}{{ end }}");
echo http://$SERVICE_IP:8080
# Cluster-internal DNS of `cluster-collector`: checkmk-cluster-collector.checkmk-monitoring
=========================================================================================
With the token of the service account named `checkmk-checkmk` in the namespace `checkmk-monitoring` you can now issue queries against the `cluster-collector`.
Run the following to fetch its token and the ca-certificate of the cluster:
export TOKEN=$(kubectl get secret checkmk-checkmk -n checkmk-monitoring -o=jsonpath='{.data.token}' | base64 --decode);
export CA_CRT="$(kubectl get secret checkmk-checkmk -n checkmk-monitoring -o=jsonpath='{.data.ca\.crt}' | base64 --decode)";
# Note: Quote the variable when echo'ing to preserve proper line breaks: `echo "$CA_CRT"`
To test access you can run:
curl -H "Authorization: Bearer $TOKEN" http://$SERVICE_IP:8080/metadata | jq
As an example, you can further set some configuration options on the command line to the above helm command (these are some examples, but depending on your requirement, you can specify multiple or separate values) :
Description | Flags |
|---|---|
This sets the cluster collector service type to LoadBalancer. |
|
Here, you can specify a different service type and port again. |
|
Specify a version constraint for the chart version to use. |
|
We recommend using these values.yaml to configure your Helm chart.
For this, you need to then run the command:
$ helm upgrade --install --create-namespace -n checkmk-monitoring myrelease checkmk-chart/checkmk -f values.yaml
After the chart has been successfully deployed, you will be presented with a set of commands to access the cluster-collector from the command line. In case you want to see those commands, you can do the following:
helm status checkmk -n checkmk-monitoring
At the same time, you can also verify if all the essential resources in the namespace have been deployed successfully. The below command in the code-block lists some important resources:
$kubectl get all -n checkmk-monitoring
NAME READY STATUS RESTARTS AGE
pod/checkmk-cluster-collector-57c7f5f54b-xgqvx 1/1 Running 0 19m
pod/checknk-node-collector-container-netrics-lflhs 2/2 Running 0 20m
pod/checkmk-node-collector-container-netrics-s59lb 2/2 Running 0 20m
pod/checkmk-node-collector-container-metrics-tnccf 2/2 Running 0 20m
pod/checknk-node-collector-machine-sections-9k441 1/1 Running 0 20m
pod/checkmk-node-collector-machine-sections-fc795 1/1 Running 0 19m
pod/checknk-node-collector-machine-sections-lfv9l 1/1 Running 0 20m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORTS AGE
service/checkmk-cluster-collector LoadBalancer 10.20.10.165 34.107.19.22 8080:31168/TCP 20m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset. apps/checkmk-node-collector-container-metrics 3 3 3 3 3 <none> 20m
daemonset.apps/checkmk-node-collector-machine-sections 3 3 3 3 3 <none> 20m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/checkmk-cluster-collector 1/1 1 1 20m
NAME DESIRED CURRENT READY AGE
replicaset.apps/checkmk-cluster-collector-57c7f5f54b 1 1 1 20M
Exposing the Checkmk Cluster Collector
By default, the API of Checkmk Cluster Collector is not exposed to the outside (not to be mistaken with Kubernetes API itself). This is required to gather usage metrics and enrich your monitoring.
Checkmk pulls data from this API, which can be exposed via the service checkmk-cluster-collector. To do so, you must run it with one of the following flags or set them in a values.yaml.
Description | Flags |
|---|---|
This sets the cluster collector service type to LoadBalancer. |
|
Here, you can specify a different service type and port again. |
|
Debugging
To begin debugging, identify the full command used to execute the Kubernetes special agent.
The command is shown under Type of agent → Program and includes all parameters defined by the datasource program rule.
OMD[mysite]:~$ cmk -D k8s | more
k8s
Addresses: No IP
Tags: [address_family:no-ip], [agent:special-agents], [criticality:prod], [networking:lan],
[piggyback:auto-piggyback], [site:a21], [snmp_ds:no-snmp], [tcp:tcp]
Labels: [cmk/kubernetes/cluster:at], [cmk/kubernetes/object:cluster], [cmk/site:k8s]
Host groups: check_mk
Contact groups: all
Agent mode: No Checkmk agent, all configured special agents
Type of agent:
Program: /omd/sites/mysite/share/check_mk/agents/special/agent_kube '--cluster' 'k8s' '--token' 'xyz' '--monitored-objects' 'deployments' 'daemonsets' 'statefulsets' 'nodes' 'pods' '--api-server-endpoint' 'https://<YOUR-IP>:6443' '--api-server-proxy' 'FROM_ENVIRONMENT' '--cluster-collector-endpoint' 'https://<YOUR-ENDPOINT>:30035' '--cluster-collector-proxy' 'FROM_ENVIRONMENT'
Process piggyback data from /omd/sites/mysite/tmp/check_mk/piggyback/k8s
Services:
...
As a shortcut, the following command can be used to extract and execute the Kubernetes special agent command automatically:
/bin/sh -c "$(cmk -D k8s | grep -A1 '^Type of agent:' | grep 'Program:' | cut -f2- -d':')"If multiple Type of agent and Program blocks exist in the output, this command may produce incorrect results.
The special agent provides the following options for debugging.
OMD[mysite]:~$ /omd/sites/mysite/share/check_mk/agents/special/agent_kube -h
...
--debug Debug mode: raise Python exceptions
-v / --verbose Verbose mode (for even more output use -vvv)
--vcrtrace FILENAME Enables VCR tracing for the API calls
You can now modify the Kubernetes special agent command as shown below.
OMD[mysite]:~$ /omd/sites/mysite/share/check_mk/agents/special/agent_kube \
'--cluster' 'at' \
'--token' 'xyz' \
'--monitored-objects' 'deployments' 'daemonsets' 'statefulsets' 'nodes' 'pods' \
'--api-server-endpoint' 'https://<YOUR-IP>:6443' \
'--api-server-proxy' 'FROM_ENVIRONMENT' \
'--cluster-collector-endpoint' 'https://<YOUR-ENDPOINT>:30035' \
'--cluster-collector-proxy' 'FROM_ENVIRONMENT' \
--debug -vvv --vcrtrace ~/tmp/vcrtrace.txt > ~/tmp/k8s_with_debug.txt 2>&1Here, you can also reduce the number of '--monitored-objects' to a few resources to get less output.
Run the special agent without debug options to generate a clean agent output, or download the agent output from the cluster host via the Checkmk web interface.
/omd/sites/mysite/share/check_mk/agents/special/agent_kube '--cluster' 'at' '--token' 'xyz' '--monitored
-objects' 'deployments' 'daemonsets' 'statefulsets' 'nodes' 'pods' '--api-server-endpoint' 'https://<YOUR-IP>:6443' '--api-server-proxy' 'FROM_ENVIRONMENT' '--cluster-collector-endpoint' 'https://<YOUR-ENDPOINT>:30035' '--cluster-collector-proxy' 'FROM_ENVIRONMENT' > ~/tmp/k8s_agent_output.txt 2>&1
Please upload the following files to the support ticket.
Agent output
~/tmp/k8s_agent_output.txtDebug output
~/tmp/k8s_with_debug.txtTrace file
~/tmp/vcrtrace.txt
Common Error: Cluster Collector Errors
Cluster Collector connection issues are reported by the Cluster Collector service and do not cause the Kubernetes special agent to terminate. The Kubernetes special agent can collect data from multiple datasources, and connections to the Cluster Collector datasources are optional and controlled by the configured datasource rule.
In production mode, failures to connect to the Cluster Collector are not raised as agent errors. Detailed error information is available only when the agent is executed with the --debug flag.
Common Causes
Cluster Collector connection problems are typically caused by configuration or infrastructure issues. The Cluster Collector must be properly exposed so that it can be reached, either via a NodePort or an Ingress. If it is not exposed, the agent will be unable to connect.
Another frequent cause is unhealthy or unstable Kubernetes resources. Core components such as pods, deployments, daemon sets, and replica sets must be running correctly. If these resources are not running or are restarting frequently, the Cluster Collector may be unavailable.
Network restrictions can also prevent connectivity. Firewalls or security groups may block access to the Cluster Collector’s IP address or port, or the configured IP and port may simply be incorrect.
In environments where a proxy is required to reach the Cluster Collector, the agent will be unable to establish a connection if the proxy is not configured in the datasource rule.
Common Error: API Connection & Processesing
The connection to the Kubernetes API server is mandatory for the Kubernetes special agent. If the agent fails to connect to the API, for example by returning a 401 Unauthorized error when querying api/v1/core/pods, the failure is reported by the corresponding Checkmk service and the agent terminates.
The debug output generated using the --debug flag is usually sufficient to diagnose Kubernetes API connection problems. In cases where the agent reports an API processing error similar to “value … was not set”, the debug output alone may not be sufficient. In these situations, the user should be asked to provide the vcrtrace.txt file to allow deeper analysis of API request and response handling.
Common Causes
Kubernetes API connection issues are most often related to authentication or connectivity problems. A common cause is an incorrectly configured service account, which may be missing the required roles or permissions to access the Kubernetes API.
Authentication failures can also occur if the token is wrong or has expired, preventing the agent from successfully querying the API.
TLS configuration problems are another frequent source of errors. If the --verify-cert-api option is enabled but the corresponding ca.crt file has not been uploaded under Global settings → Trusted certificate authorities for SSL, the API connection will fail due to certificate validation errors.
Connectivity issues should also be checked. An incorrect API server IP address or port will prevent the agent from reaching the Kubernetes API endpoint.
In environments where a proxy is required, the connection will fail if the proxy is missing or incorrectly configured in the datasource rule.
Linux Agent Over SSH
When running the Checkmk Linux agent via SSH, you may encounter error messages if the SSH setup is incorrect. In most cases, the Check_MK service will report connection-related issues.
Below are common error messages, their typical causes, and steps to resolve them.
Common Error: Permission denied
Error Message
Agent exited with code 255: Permission denied, please try again.
Possible Cause
The public SSH key stored in the authorized_keys file on the monitored host is invalid or corrupted. This often happens when:
A line break is accidentally inserted into the key
One or more characters are missing
The key was incorrectly copied
Possible Solution
Carefully verify that the public key on the monitored host matches exactly the public key generated on your Checkmk server.
Ensure there are no extra spaces or line breaks
Compare the keys character by character if necessary
Even a single incorrect character will cause SSH authentication to fail.
Common Error: Host key verification failed
Error Message
Agent exited with code 255: Host key verification failed.
CRIT - Got no information from host, execution time 0.0 sec
Possible Cause
The error message here is clear. The "host key verification failed". But what does this mean? It might just mean that you never introduced your Checkmk server and the host to one another, and hence the key fingerprint is not available in the file "~/.ssh/known_hosts" on your Checkmk server.
Possible Solution
This issue can be resolved easily. Log in to your site and initiate an SSH connection to the target host. When prompted to confirm the host’s authenticity, type yes. This action will add the host’s key to the list of known hosts, allowing future connections to proceed without errors.
NetApp Special Agent
The NetApp special agent in Checkmk runs on the monitoring server and uses the REST API over HTTPS to collect health, storage, and performance data from a NetApp cluster. It retrieves status information as well as capacity and performance metrics, which Checkmk turns into services for alerting and graphing.
Debugging
Login to the Checkmk server and become siteuser.
root@linux:$ su mysite OMD[mysite]:~$ cmk -D <netapp_host> | head -n 15This should display the whole special agent query, including all arguments (similar to vSphere debugging)
Copy the whole output.
.Paste it and add the debug option to it like so:
/omd/sites/yoursitename/share/check_mk/agents/special/agent_netapp --hostname {yourhostname} --username {yourusername} --password {yourpassword} --vcrtrace /tmp/TRACEFILE '-no_counters' --debug --xml > /tmp/debug.txt 2>&1.
Add the agent_netapp command line (password stripped) and the
dump.txtto your support ticket.c0::security login rest-role> show -role checkmk Role Access Vserver Name API Level ---------- ------------- ------------------- ------ c0 checkmk /api/cluster/counter/tables readonly /api/cluster/nodes readonly /api/cluster/sensors readonly /api/network/ethernet/ports readonly /api/network/fc/ports readonly /api/network/ip/interfaces readonly /api/private/support/alerts readonly /api/snapmirror/relationships readonly /api/storage/disks readonly /api/storage/luns readonly /api/storage/quota/reports readonly /api/storage/shelves readonly /api/storage/volumes readonly /api/svm/svms readonly 14 entries were displayed.
Special Agents Using stdin Parameters
Some Checkmk special agents receive their configuration exclusively via standard input (stdin) rather than command-line arguments or config files. This approach is used by agents such as the Prometheus and AWS special agents, allowing Checkmk to securely and centrally pass structured configuration data at runtime, simplify parameter handling, and avoid exposing sensitive information in process arguments.
Debugging
To debug the Prometheus special agent, first ensure that a corresponding rule is configured and pinned to the host myprometheushost. Next, log in to the host as the site user and execute the following command:
OMD[mysite]:~$ cmk -D myprometheushost
After running the command, you should see output similar to the following example:
myprometheushost
Addresses: 10.18.49.2
Tags: [address_family:ip-v4-only], [agent:cmk-agent], [checkmk-agent:checkmk-agent], [criticality:prod], [ip-v4:ip-v4], [networking:lan], [piggyback:auto-piggyback], [site:kube], [snmp_ds:no-snmp], [tcp:tcp]
Labels: [cmk/site:mysite]
Host groups: check_mk
Contact groups: all
Agent mode: Normal Checkmk agent, or special agent if configured
Type of agent:
Program: /omd/sites/mysite/local/share/check_mk/agents/special/agent_prometheus
Program stdin:
{'connection': ('ip_address', {'port': 31275}), 'verify-cert': False, 'protocol': 'http', 'exporter': [('kube_state', {'cluster_name': 'mypromcluster', 'prepend_namespaces': 'use_namespace', 'entities': ['cluster', 'nodes', 'services', 'pods', 'daemon_sets']})], 'promql_checks': [], 'host_address': '10.18.49.2', 'host_name': 'myprometheushost'}
Process piggyback data from /omd/sites/mysite/tmp/check_mk/piggyback/myprometheushost
Now copy the block that appears after Program stdin:, wrap it in double quotes, and prepend it with echo. Then add a pipe (|) followed by the path to the special agent shown on the line that begins with Program:. Combined, the command looks like this:
OMD[mysite]:~$ echo "{'connection': ('ip_address', {'port': 31275}), 'verify-cert': False, 'protocol': 'http', 'exporter': [('kube_state', {'cluster_name': 'mypromcluster', 'prepend_namespaces': 'use_namespace', 'entities': ['cluster', 'nodes', 'services', 'pods', 'daemon_sets']})], 'promql_checks': [], 'host_address': '10.18.49.2', 'host_name': 'myprometheushost'}" | /omd/sites/mysite/local/share/check_mk/agents/special/agent_prometheusIn most cases, special agents support verbose or debug output from Python. To enable this, simply append -vvv and/or --debug to the very end of the command shown above.
StoreOnce 4x Special Agent
The StoreOnce 4x Special Agent is a dedicated special agent in Checkmk designed to monitor HPE StoreOnce 4x backup systems via their REST API. Unlike classic Checkmk agents that run directly on the monitored host, this special agent is executed by the Checkmk site and communicates remotely with the StoreOnce appliance.
Common Error: Crash due to expired OAuth token
The StoreOnce 4x special agent crashes during execution with an OAuth-related error. The agent output shows that the stored access token has expired and the token refresh fails because no new access token is returned.
Error Message:
<<<storeonce4x_d2d_services:sep(0)>>>
Traceback (most recent call last):
File "/omd/sites/mysite/lib/python3/requests_oauthlib/oauth2_session.py", line 477, in request
url, headers, data = self._client.add_token(
File "/omd/sites/mysite/lib/python3/oauthlib/oauth2/rfc6749/clients/base.py", line 198, in add_token
raise TokenExpiredError()
oauthlib.oauth2.rfc6749.errors.TokenExpiredError: (token_expired)
During handling of the above exception, another exception occurred:
...
oauthlib.oauth2.rfc6749.errors.MissingTokenError: (missing_token) Missing access token parameter.This indicates that the OAuth token stored locally is no longer valid and cannot be refreshed successfully.
Possible Solution
The issue can be diagnosed and resolved by enabling debugging on the special agent and resetting the stored OAuth token.
Determine the exact special agent command
First, find out how the StoreOnce 4x special agent is executed for the affected host. You can check this in the Type of agent section.OMD[mysite]:~$ cmk -D hostname
An easier way to extract the command is:/bin/sh -c "$(cmk -D hostname | grep -A1 "^Type of agent:" | grep "Program:" | cut -f2 -d':')"Note: If the output contains multiple blocks with
Type of agent:followed byProgram:, the extracted command may be incorrect.Check available debugging options
The StoreOnce 4x special agent provides several debugging and tracing options.OMD[mysite]:~$ ~/share/check_mk/agents/special/agent_storeonce4x -hRelevant options are:
--debug, -d Enable debug mode (keep some exceptions unhandled) --verbose, -v --vcrtrace TRACEFILE, --tracefile TRACEFILE If this flag is set to a TRACEFILE that does not exist yet, it will be created and all requests the program sends and their corresponding answers will be recorded in said file. If the file already exists, no requests are sent to the server, but the responses will be replayed from the tracefile.Run the special agent with debugging enabled
Extend the original special agent command by adding the debugging options:OMD[mysite]:~$ ~/share/check_mk/agents/special/agent_storeonce4x <OTHER ARGUMENTS> --debug -v --vcrtrace ~/tmp/vcrtrace.txt 2>1 ~/tmp/storeonce4x_with_debug.txtThis will create a detailed debug log and a trace file containing the REST API communication.
Create a clean agent output (without debugging)
For reproducibility, also run the agent once without debug options and store the raw output:OMD[mysite]:~$ /omd/sites/mysite/share/check_mk/agents/special/agent_kube <OTHER ARGUMENTS> > ~/tmp/k8s_agent_output.txtWith this file, we can reproduce your issue.
Reset the stored OAuth token
The StoreOnce 4x special agent authenticates using username and password. After a successful login, an OAuth access token is stored locally and reused for future REST requests.The token file is stored under:
~/tmp/check_mk/special_agents/agent_storeonce4x/<hostname>_oAuthToken.jsonRename the token file to force the agent to perform a fresh login:
OMD[mysite]:~$ mv \ ~/tmp/check_mk/special_agents/agent_storeonce4x/<hostname>_oAuthToken.json \ ~/tmp/check_mk/special_agents/agent_storeonce4x/<hostname>_oAuthToken.json.backRun the special agent again
After renaming the token file, run the StoreOnce 4x special agent again. A new OAuth token will be requested and stored automatically, which usually resolves the crash caused by expired or invalid tokens.
For further details about authentication and available endpoints, refer to the StoreOnce REST API documentation.
VMware vSphere Special Agent
VMware vSphere remains a cornerstone of many on-premises and hybrid IT environments. While containers and Kubernetes have gained significant traction, classic virtualization continues to be essential wherever containerization is not practical or supported.
This following describes the vSphere extension to VMware ESXi monitoring in Checkmk, explains how datastore provisioning and piggyback monitoring work, and provides debugging procedures for troubleshooting the vSphere Special Agent.
Datastore provisioning in vSphere
When a vCenter Server is added to Checkmk, datastore provisioning is monitored automatically and with full accuracy. This allows you to detect and alert on excessive thin-provisioning, where virtual machines collectively claim more logical storage than the datastore can physically deliver.
If only an ESXi host is monitored, provisioning metrics may still appear in the filesystem checks, but they simply mirror the used filesystem values. In this setup, true over-provisioning cannot be identified, as only vCenter has complete knowledge of the actual logical provisioning across all virtual machines.
Piggyback-only with ESXI hosts
While using a read-only account on both ESXi hosts and vCenter is considered best practice for continuous monitoring, direct access to the hosts is not always permitted.
With piggyback monitoring configured correctly, a read-only user on vCenter alone is sufficient. Using a local vSphere account, for example monitoring@vsphere.local, is recommended over Active Directory accounts, as AD authentication may time out during inventory queries. Access should be validated by logging in to the vSphere Web Client with this user.
When vCenter is added to Checkmk with all available data and the ESXi hosts are added afterward, piggyback automatically assigns vCenter-derived resources such as CPU, memory, and datastores to the corresponding ESXi hosts.
When virtual machines are added as hosts in Checkmk, additional checks are enabled automatically without further configuration. These checks typically start with “ESX” and focus on VM resource consumption. One example is ESX Snapshots, which monitors all snapshots assigned to a VM and can alert when snapshots exceed a defined age. This helps ensure that manually created snapshots are removed in a timely manner.
Limitation: If ESXi hosts are not monitored directly via the Special Agent or SNMP, local host partitions are not visible.
Debugging
Identify the Special Agent Command
Use the following command to inspect how Checkmk runs the vSphere Special Agent:
OMD[mysite]:~$ cmk -D <vcenter-host> | more vcenter Addresses: x.x.x.x Tags: [add_ip_addresses:add_ip_addresses_1], [address_family:ip-v4-only], [agent:special-agents], [criticality:prod], [ip-v4:ip-v4], [networking:lan], [piggyback:auto-piggyback], [site:nagnis_master], [snmp_ds:no-snmp], [tcp:tcp] Labels: [cmk/vsphere_object:vm] Host groups: check_mk Contact groups: all Agent mode: No Checkmk agent, all configured special agents Type of agent: Program: /omd/sites/mysite/share/check_mk/agents/special/agent_vsphere -u 'user' -s 'password' -i hos tsystem,virtualmachine,datastore,counters,licenses -P --spaces cut --snapshot_display vCenter --no-cert-check 'x.x.x.x' Process piggyback data from /omd/sites/mysite/tmp/check_mk/piggyback/vcenter Services: checktype item paramsLook for the Program: line under Type of agent.This shows the full command Checkmk executes internally.