Debugging Special Agents (Combined)
This article helps debug issues with various Checkmk special agents.
LAST TESTED ON CHECKMK 2.0.0P1
AWS special agent
If you want to execute the special agent from the command line, please run the following commands.
Checkmk versions 1.6, 2.0, and 2.1
echo '{"access_key_id": "xxxxxxxxxxxxxxxx", "secret_access_key": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"}' | ~/share/check_mk/agents/special/agent_aws '--regions' 'eu-central-1' '--services' 'cloudwatch_alarms' 'dynamodb' 'ebs' 'ec2' 'elb' 'elbv2' 'glacier' 'rds' 's3' 'wafv2' '--ec2-limits' '--ebs-limits' '--s3-limits' '--glacier-limits' '--elb-limits' '--elbv2-limits' '--rds-limits' '--cloudwatch_alarms-limits' '--dynamodb-limits' '--wafv2-limits' '--cloudwatch-alarms' '--wafv2-cloudfront' '--hostname' 'aws' --debug --v
Checkmk Versions 2.2 and above
/omd/sites/mysite/share/check_mk/agents/special/agent_aws --access-key-id MYACCESSKEYID --secret-access-key MYSECRETEACCESSKEY --regions MYAWSREGION --global-services ce cloudfront route53 --services cloudwatch_alarms dynamodb ebs ec2 ecs elasticache elb elbv2 glacier lambda rds s3 sns wafv2 --ec2-limits --ebs-limits --s3-limits --glacier-limits --elb-limits --elbv2-limits --rds-limits --cloudwatch_alarms-limits --dynamodb-limits --wafv2-limits --lambda-limits --sns-limits --ecs-limits --elasticache-limits --s3-requests --cloudwatch-alarms --wafv2-cloudfront --cloudfront-host-assignment aws_host --hostname aws --piggyback-naming-convention ip_region_instance
Viewing AWS Options
To view more AWS options, use the cmk -D aws combined with grep
OMD[mysite]:~$ cmk -D aws |grep -A2 "Type of agent" Type of agent: Program: /omd/sites/mysite/share/check_mk/agents/special/agent_aws --access-key-id MYACCESSKEY --secret-access-key MYSECRETKEY --regions MYAWSREGION --global-services ce cloudfront route53 --services cloudwatch_alarms dynamodb ebs ec2 ecs elasticache elb elbv2 glacier lambda rds s3 sns wafv2 --ec2-limits --ebs-limits --s3-limits --glacier-limits --elb-limits --elbv2-limits --rds-limits --cloudwatch_alarms-limits --dynamodb-limits --wafv2-limits --lambda-limits --sns-limits --ecs-limits --elasticache-limits --s3-requests --cloudwatch-alarms --wafv2-cloudfront --cloudfront-host-assignment aws_host --hostname aws --piggyback-naming-convention ip_region_instance
Azure special agent
Azure endpoints
If you're experiencing connectivity issues or receiving unexpected data, please ensure that your endpoints are available from the Checkmk server.
Please review Microsoft's documentation for further information.
Checkmk versions 1.6 and below
If you want to execute the special agent from the command line, please run the following commands.
echo '{"secret": "xxxxxxxxx"}'|/omd/sites/mysite/share/check_mk/agents/special/agent_azure '--subscription' 'xxxxxxxxx' '--tenant' 'xxxxxxxxx' '--client' 'xxxxxxxxx' '--piggyback_vms' 'self' --debug -vvv --vcrtrace /tmp/TRACEFILE
- To enable debugging, you need the parameter "–debug"
- To enable verbose output, you need the parameter "–vvv"
Checkmk Versions 2.0 and above
/omd/sites/mysite/share/check_mk/agents/special/agent_azure '--subscription' 'MYSUBSCRIPTIONKEY' '--tenant' 'MYTENANTKEY' '--client' 'MYCLIENTKEY' '--piggyback_vms' 'self' --debug -vvv --vcrtrace /tmp/TRACEFILE
Display Azure Options
To view more Azure options, use the cmk -D azure combined with grep
OMD[mysite]:~$ cmk -D azure |grep -A2 "Type of agent" Type of agent: Program: /omd/sites/mysite/share/check_mk/agents/special/agent_azure --tenant MYTENANTKEY --client MYCLIENTKEY --secret MYSECRETKEY --subscription MYSUBSCRIPTIONKEY --piggyback_vms self --services users_count ad_connect app_registrations Microsoft.Compute/virtualMachines Microsoft.Network/virtualNetworkGateways Microsoft.Sql/servers/databases Microsoft.Storage/storageAccounts Microsoft.Web/sites Microsoft.DBforMySQL/servers Microsoft.DBforPostgreSQL/servers Microsoft.Network/trafficmanagerprofiles Microsoft.Network/loadBalancers Microsoft.RecoveryServices/vaults Microsoft.Network/applicationGateways Program: /omd/sites/mysite/share/check_mk/agents/special/agent_azure_status australiacentral australiacentral2 australiaeast australiasoutheast brazilsouth brazilsoutheast canadacentral germanynorth germanywestcentral koreacentral westeurope
More information
Troubleshooting Microsoft Azure - "Graph client: Insufficient privileges to complete the operation" error
If you see the error message "Graph client: Insufficient privileges to complete the operation." when connecting to Azure, do the following:
- Open the Azure Portal
- Click Azure Active Directory
- Click App registrations in the left bar
- Click the app you registered for Checkmk
- Click API permissions in the left bar
- Click Add Permissions and add a permissions for Microsoft Graph
Full list of access rights needed:
API & Use | Documentation |
---|---|
Get Metric data | https://docs.microsoft.com/en-us/rest/api/monitor/metrics/list |
get resources | https://docs.microsoft.com/en-us/rest/api/resources/operations%20(resources)/list |
get resource groups | https://docs.microsoft.com/en-us/rest/api/resources/resource-groups/list |
consumption details | https://docs.microsoft.com/en-us/rest/api/consumption/usage-details/list |
VM info | https://docs.microsoft.com/en-us/rest/api/compute/virtual-machines/instance-view |
Active Directory top users | https://docs.microsoft.com/en-us/graph/api/user-list?view=graph-rest-1.0&tabs=http |
Active Directory organizations | https://docs.microsoft.com/en-us/graph/api/intune-onboarding-organization-list?view=graph-rest-1.0 |
These are the metrics we get via the Azure agents
Resource URI | Metric name |
---|---|
Microsoft.Network/virtualNetworkGateways | AverageBandwidth,P2SBandwidth |
Microsoft.Sql/servers/databases | storage_percent,deadlock,cpu_percent,dtu_consumption_percent,connection_successful,connection_failed |
Microsoft.Storage/storageAccounts | UsedCapacity,Ingress,Egress,Transactions,SuccessServerLatency,SuccessE2ELatency,Availability |
Microsoft.Web/sites |
SSL error - bad handshake
Problem
When trying to monitor my Microsoft Azure environment, you see the following error message in the Checkmk service Azure Agent Info:
Solution
You need to make sure that your Checkmk server can connect to the following two addresses of MS Azure: management.azure.com and login.microsoft.com
When a connection from your Checkmk server is impossible or times out, monitoring Azure will not be possible. You can quickly check this as the site user of your Checkmk site with either Telnet or Netcat:
OMD[mysite]:~$ nc -zv login.microsoftonline.com 443 OMD[mysite]:~$ nc -zv management.azure.com 443
The output of these commands should look like this:
OMD[mysite]:~$ nc -zv login.microsoft.com 443 Connection to login.microsoft.com 443 port [tcp/https] succeeded! OMD[mysite]:~$ nc -zv management.azure.com 443 Connection to management.azure.com 443 port [tcp/https] succeeded!
If the output looks any different, you have to check the connection of your Checkmk server to Azure or contact your network people. More than once, there was a firewall blocking this connection.
BI special agent
If you want to execute the special agent from the command line, please run the following command.
Step-by-step guide
If you use special agents installed from a Feature Pack, you can find the special agents in:
OMD[mysite]:~$ ~/local/share/check_mk/agents/special/
.
The special agents which are already shipped with Checkmk can be found here:
OMD[mysite]:~$ ~/share/check_mk/agents/special/
.
How to execute the special agent manually?
OMD[mysite]:~$ cmk -D bi |head -n15 bi Addresses: No IP Tags: [address_family:no-ip], [agent:cmk-agent], [checkmk-agent:checkmk-agent], [criticality:prod], [networking:lan], [piggyback:auto-piggyback], [site:bi], [snmp_ds:no-snmp], [tcp:tcp] Labels: [cmk/site:mysite] Host groups: check_mk Contact groups: all Agent mode: Normal Checkmk agent, or special agent if configured Type of agent: Program: /omd/sites/mysite/share/check_mk/agents/special/agent_bi Program stdin: [{'site': 'local', 'credentials': 'automation', 'filter': {'groups': ['Hosts']}, 'assignments': {'querying_host': 'querying_host'}}] Process piggyback data from /omd/sites/bi/tmp/check_mk/piggyback/bi Services: checktype item params description groups
With this command, you will get the special agent call. Now you can continue the debugging.
.Execute the command
echo "[{'site': 'local', 'credentials': 'automation', 'filter': {'groups': ['Hosts']}, 'assignments': {'querying_host': 'querying_host'}}]" | /omd/sites/mysite/share/check_mk/agents/special/agent_bi
Jenkins special agent and access rights
For the Jenkins Special Agent to work, the user that logs on to Jenkins has to have the following access rights:
- General: Read
- Agent: Connect
- Element: Read & Workspace
- Views: Read
Kubernetes - k8s special agent
Getting Started
Background information regarding this subject is available on our:
Installation of Checkmk Cluster Collectors (a.k.a Checkmk Kubernetes agent)
We strongly recommend using our helm charts for installing the Checkmk Cluster Collectors unless you are very experienced with Kubernetes and want to install the agent using the manifests in YAML format provided by us.
The Helm chart installs and configures all necessary components to run the agent and exposes several helpful configuration options that will help you automatically set up complex resources. The prerequisites have to be fulfilled before you proceed with the installation.
Below is an example of deploying the helm charts using a LoadBalancer (requires ability of cluster to create a LoadBalancer):
$ helm repo add checkmk-chart https://checkmk.github.io/checkmk_kube_agent $ helm repo update $ helm upgrade --install --create-namespace -n checkmk-monitoring checkmk checkmk-chart/checkmk --set clusterCollector.service.type="LoadBalancer" Release "checkmk" does not exist. Installing it now. NAME: checkmk LAST DEPLOYED: Tue May 17 22:01:07 2022 NAMESPACE: checkmk-monitoring STATUS: deployed REVISION: 1 TEST SUITE: None NOTES:You can access the checkmk `cluster-collector` via: LoadBalancer: ========== NOTE: It may take a few minutes for the LoadBalancer IP to be available. You can watch the status of by running 'kubectl get --namespace checkmk-monitoring svc -w checkmk-cluster-collector' export SERVICE_IP=$(kubectl get svc --namespace checkmk-monitoring checkmk-cluster-collector --template "{{ range (index .status.loadBalancer.ingress 0) }}{{.}}{{ end }}"); echo http://$SERVICE_IP:8080 # Cluster-internal DNS of `cluster-collector`: checkmk-cluster-collector.checkmk-monitoring ========================================================================================= With the token of the service account named `checkmk-checkmk` in the namespace `checkmk-monitoring` you can now issue queries against the `cluster-collector`. Run the following to fetch its token and the ca-certificate of the cluster: export TOKEN=$(kubectl get secret checkmk-checkmk -n checkmk-monitoring -o=jsonpath='{.data.token}' | base64 --decode); export CA_CRT="$(kubectl get secret checkmk-checkmk -n checkmk-monitoring -o=jsonpath='{.data.ca\.crt}' | base64 --decode)"; # Note: Quote the variable when echo'ing to preserve proper line breaks: `echo "$CA_CRT"` To test access you can run: curl -H "Authorization: Bearer $TOKEN" http://$SERVICE_IP:8080/metadata | jq
As an example, you can further set some configuration options on the command line to the above helm command (these are some examples, but depending on your requirement, you can specify multiple or separate values) :
Flags | Description |
---|---|
--set clusterCollector.service.type="LoadBalancer" | This sets the cluster collector service type to LoadBalancer. |
--set clusterCollector.service.type="NodePort" --set clusterCollector.service.nodePort=30035 | Here, you can specify a different service type and port again. |
--version 1.0.0-beta.2 | specify a version constraint for the chart version to use |
We recommend using these values.yaml to configure your Helm chart.
For this, you need to then run the command:
$ helm upgrade --install --create-namespace -n checkmk-monitoring myrelease checkmk-chart/checkmk -f values.yaml
After the chart has been successfully deployed, you will be presented with a set of commands to access the cluster-collector from the command line. In case you want to see those commands, you can do the following:
helm status checkmk -n checkmk-monitoring
At the same time, you can also verify if all the essential resources in the namespace have been deployed successfully. The below command in the screenshot lists some important resources:
$kubectl get all -n checkmk-monitoring NAME READY STATUS RESTARTS AGE pod/checkmk-cluster-collector-57c7f5f54b-xgqvx 1/1 Running 0 19m pod/checknk-node-collector-container-netrics-lflhs 2/2 Running 0 20m pod/checkmk-node-collector-container-netrics-s59lb 2/2 Running 0 20m pod/checkmk-node-collector-container-metrics-tnccf 2/2 Running 0 20m pod/checknk-node-collector-machine-sections-9k441 1/1 Running 0 20m pod/checkmk-node-collector-machine-sections-fc795 1/1 Running 0 19m pod/checknk-node-collector-machine-sections-lfv9l 1/1 Running 0 20m NAME TYPE CLUSTER-IP EXTERNAL-IP PORTS AGE service/checkmk-cluster-collector LoadBalancer 10.20.10.165 34.107.19.22 8080:31168/TCP 20m NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset. apps/checkmk-node-collector-container-metrics 3 3 3 3 3 <none> 20m daemonset.apps/checkmk-node-collector-machine-sections 3 3 3 3 3 <none> 20m NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/checkmk-cluster-collector 1/1 1 1 20m NAME DESIRED CURRENT READY AGE replicaset.apps/checkmk-cluster-collector-57c7f5f54b 1 1 1 20M
Exposing the Checkmk Cluster Collector
By default, the API of Checkmk Cluster Collector is not exposed to the outside (not to be mistaken with Kubernetes API itself). This is required to gather usage metrics and enrich your monitoring.
Checkmk pulls data from this API, which can be exposed via the service checkmk-cluster-collector. To do so, you must run it with one of the following flags or set them in a values.yaml.
Flags | Description |
---|---|
--set clusterCollector.service.type="LoadBalancer" | This sets the cluster collector service type to LoadBalancer. |
--set clusterCollector.service.type="NodePort" --set clusterCollector.service.nodePort=30035 | Here, you can specify a different service type and port again. |
Debugging K8s special agent
- The first step would be to find the complete command of the Kubernetes special agent.
The command can be found under "Type of agent >> Program." It will consist of multiple parameters depending on how the datasource program rule has been configured.
OMD[mysite]:~$ cmk -D k8s | more k8s Addresses: No IP Tags: [address_family:no-ip], [agent:special-agents], [criticality:prod], [networking:lan], [piggyback:auto-piggyback], [site:a21], [snmp_ds:no-snmp], [tcp:tcp] Labels: [cmk/kubernetes/cluster:at], [cmk/kubernetes/object:cluster], [cmk/site:k8s] Host groups: check_mk Contact groups: all Agent mode: No Checkmk agent, all configured special agents Type of agent: Program: /omd/sites/mysite/share/check_mk/agents/special/agent_kube '--cluster' 'k8s' '--token' 'xyz' '--monitored-objects' 'deployments' 'daemonsets' 'statefulsets' 'nodes' 'pods' '--api-server-endpoint' 'https://<YOUR-IP>:6443' '--api-server-proxy' 'FROM_ENVIRONMENT' '--cluster-collector-endpoint' 'https://<YOUR-ENDPOINT>:30035' '--cluster-collector-proxy' 'FROM_ENVIRONMENT' Process piggyback data from /omd/sites/mysite/tmp/check_mk/piggyback/k8s Services: ...
An easier way would be this command: /bin/sh -c "$(cmk -D k8s | grep -A1 "^Type of agent:" | grep "Program:" | cut -f2- -d':')"
Please note that if a line matching "^Type of agent:" followed by a line matching "^ Program:" exists more than once, the output might be messed up.
.
The special agent has the below options available for debugging purposes:
OMD[mysite]:~$ /omd/sites/mysite/share/check_mk/agents/special/agent_kube -h ... --debug Debug mode: raise Python exceptions -v / --verbose Verbose mode (for even more output use -vvv) --vcrtrace FILENAME Enables VCR tracing for the API calls ...
.
Now, you can modify the above command of the Kubernetes special agent like this:
OMD[mysite]:~$ /omd/sites/mysite/share/check_mk/agents/special/agent_kube \ '--cluster' 'at' \ '--token' 'xyz' \ '--monitored-objects' 'deployments' 'daemonsets' 'statefulsets' 'nodes' 'pods' \ '--api-server-endpoint' 'https://<YOUR-IP>:6443' \ '--api-server-proxy' 'FROM_ENVIRONMENT' \ '--cluster-collector-endpoint' 'https://<YOUR-ENDPOINT>:30035' \ '--cluster-collector-proxy' 'FROM_ENVIRONMENT' \ --debug -vvv --vcrtrace ~/tmp/vcrtrace.txt > ~/tmp/k8s_with_debug.txt 2>&1
Here, you can also reduce the number of '--monitored-objects' to a few resources to get less output.
.Run the special agent with no debug options to create an agent output, or you could download it from the cluster host via the Checkmk web interface.
/omd/sites/mysite/share/check_mk/agents/special/agent_kube '--cluster' 'at' '--token' 'xyz' '--monitored -objects' 'deployments' 'daemonsets' 'statefulsets' 'nodes' 'pods' '--api-server-endpoint' 'https://<YOUR-IP>:6443' '--api-server-proxy' 'FROM_ENVIRONMENT' '--cluster-collector-endpoint' 'https://<YOUR-ENDPOINT>:30035' '--cluster-collector-proxy' 'FROM_ENVIRONMENT' > ~/tmp/k8s_agent_output.txt 2>&1
.
Please upload the following files to the support ticket.
~/tmp/vcrtrace.txt | Tracefile |
~/tmp/k8s_with_debug.txt | Debug output |
~/tmp/k8s_agent_output.txt | Agent output |
Common errors
- Context: the Kubernetes special agent is slightly unconventional relative to other Special agents as it handles up to three different datasources (the API, the cluster collector container metrics, and the cluster collector node metrics)
- the connection to the Kubernetes API server is mandatory, while the connection to the others is optional (and decided through the configured Datasource rule)
- Failure to connect to the Kubernetes API server will be shown by the Checkmk service (as usual) → the agent crashes
- Failure to connect to the cluster collector will be highlighted in the Cluster Collector service → the error is not raised by the agent in production
- the error is only raised when executing the agent with the --debug flag
- the error is only raised when executing the agent with the --debug flag
- the connection to the Kubernetes API server is mandatory, while the connection to the others is optional (and decided through the configured Datasource rule)
- Version: We only support the latest three Kubernetes versions (Kubernetes Release History)
- If a customer has the latest release and the release itself is quite new (less than one month), ask one of the devs if we already have support.
- If a customer has the latest release and the release itself is quite new (less than one month), ask one of the devs if we already have support.
- Kubernetes API connection error: If the agent fails to make a connection to the Kubernetes API (e.g., 401 Unauthorized to query api/v1/core/pods), then the output based on the --debug flag should be sufficient
- common causes:
- service account was not configured correctly in the Kubernetes cluster
- wrong token configured
- Forgot to upload the ca.crt in the Global settings >> Trusted certificate authorities for SSL but --verify-cert-api is enabled.
- Wrong IP or Port
- Proxy is not configured in the datasource rule.
- Checkmk Cluster Collector connection error:
- Common causes:
- The cluster collector is not exposed via either NodePort or Ingress.
- The essential resources like pods, deployments, daemon-sets, replicas, etc., are not running or frequently restarting.
- A firewall or a security group blocks the cluster collector IP.
- Port/IP incorrect.
- Forgot to upload the ca.crt in the Global settings >> Trusted certificate authorities for SSL but --verify-cert-api is enabled.
- Proxy is not configured in the datasource rule.
- Common causes:
- API processing error: If the agent reports a bug similar to "value ... was not set, " the user should be asked for the vcrtrace file.
Linux agent over SSH
Problem
When executing the Checkmk agent for Linux via SSH, you might encounter error messages when something is configured properly. Usually, the service Check_MK will notify you about any connection problems that might occur. Below, we will list a couple of these error messages and try to give some pointers as to what might solve your troubles.
Solution
Error Message 01
Agent exited with code 255: Permission denied, please try again.
Possible Cause
The public key in the file authorized_keys on the host might contain an error. This can easily happen when - for example - a line break is somehow inserted in the key, or you omitted a single character, when copying the key to the host.
Possible Solution
Double and triple-check, that the public key on the host you are trying to monitor is 100 % the same as on your Checkmk server.
Error Message 02
Agent exited with code 255: Host key verification failed.CRIT, Got no information from the host, execution time 0.0 sec
Possible Cause
The error message here is clear. The "host key verification failed". But what does this mean? It might just mean that you never introduced your Checkmk server and the host to one another, and hence the key fingerprint is not available in the file _~/.ssh/known_hosts_ on your Checkmk server.
Possible Solution
This one can be resolved easily. Log in to your site and create an SSH connection to your host. SSH should now ask you if you actually want to connect to this machine. You should answer by typing 'yes'. This will add the host to the list of known hosts.
Netapp
Login to the Checkmk server and become siteuser
root@linux:# su mysite OMD[mysite]:~$ cmk -D <netapp_host> | head -n 15
This should display the whole special agent query, including all arguments (similar to vSphere debugging)
.- Copy that whole output
. Paste it and add the debug option to it like so:
/omd/sites/yoursitename/share/check_mk/agents/special/agent_netapp 'hosntame' 'user' 'password' --vcrtrace /tmp/TRACEFILE '-no_counters' --debug --xml > /tmp/debug.txt 2>&1
.
- Add the agent_netapp command line (password stripped) and the dump.txt to your support ticket
Special Agents with parameters via stdin
Step-by-step guide
A couple of our special agents get their parameters via stdin. For example, the Prometheus special agent or the AWS special agent. You can see this in the output of the command cmk -D myhost
. If after the line for Program
you find a line beginning with Program stdin
, you have to pipe these parameters into the special agent with echo
.
Let's say you want to debug the special agent for Prometheus. You configured a rule and pinned it to the host myprometheushost
. Log in as the site user and run the following command:
OMD[mysite]:~$ cmk -D myprometheushost
The output will look something like this:
myprometheushost Addresses: 10.18.49.2 Tags: [address_family:ip-v4-only], [agent:cmk-agent], [checkmk-agent:checkmk-agent], [criticality:prod], [ip-v4:ip-v4], [networking:lan], [piggyback:auto-piggyback], [site:kube], [snmp_ds:no-snmp], [tcp:tcp] Labels: [cmk/site:mysite] Host groups: check_mk Contact groups: all Agent mode: Normal Checkmk agent, or special agent if configured Type of agent: Program: /omd/sites/kube/local/share/check_mk/agents/special/agent_prometheus Program stdin: {'connection': ('ip_address', {'port': 31275}), 'verify-cert': False, 'protocol': 'http', 'exporter': [('kube_state', {'cluster_name': 'mypromcluster', 'prepend_namespaces': 'use_namespace', 'entities': ['cluster', 'nodes', 'services', 'pods', 'daemon_sets']})], 'promql_checks': [], 'host_address': '10.18.49.2', 'host_name': 'myprometheushost'} Process piggyback data from /omd/sites/mysite/tmp/check_mk/piggyback/myprometheushost
Now go ahead and copy the block after Program stdin
wrap it in double quotes and prepend it with an echo
. Next, put a pipe and the path to the special agent you find in the line, starting with Program
. Together it looks like this:
OMD[mysite]:~$ echo "{'connection': ('ip_address', {'port': 31275}), 'verify-cert': False, 'protocol': 'http', 'exporter': [('kube_state', {'cluster_name': 'mypromcluster', 'prepend_namespaces': 'use_namespace', 'entities': ['cluster', 'nodes', 'services', 'pods', 'daemon_sets']})], 'promql_checks': [], 'host_address': '10.18.49.2', 'host_name': 'myprometheushost'}" | /omd/sites/mysite/local/share/check_mk/agents/special/agent_prometheus
In most cases, the special agents offer the possibility to activate verbose output or debug output from Python. Simply append -vvv
and/or --debug
at the very end of the command above.
StoreOnce 4x special agent
Problem
The StoreOnce is agent is crashing with the following message
<<<storeonce4x_d2d_services:sep(0)>>> Traceback (most recent call last): File "/omd/sites/mysite/lib/python3/requests_oauthlib/oauth2_session.py", line 477, in request url, headers, data = self._client.add_token( File "/omd/sites/mysite/lib/python3/oauthlib/oauth2/rfc6749/clients/base.py", line 198, in add_token raise TokenExpiredError() oauthlib.oauth2.rfc6749.errors.TokenExpiredError: (token_expired) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "share/check_mk/agents/special/agent_storeonce4x", line 10, in <module> main() File "/omd/sites/mysite/lib/python3/cmk/special_agents/agent_storeonce4x.py", line 260, in main special_agent_main(parse_arguments, agent_storeonce4x_main) File "/omd/sites/mysite/lib/python3/cmk/special_agents/utils/agent_common.py", line 159, in special_agent_main _special_agent_main_core(parse_arguments, main_fn, sys.argv[1:]) File "/omd/sites/mysite/lib/python3/cmk/special_agents/utils/agent_common.py", line 141, in _special_agent_main_core main_fn(args) File "/omd/sites/mysite/lib/python3/cmk/special_agents/agent_storeonce4x.py", line 251, in agent_storeonce4x_main writer.append_json(function(oauth_session)) File "/omd/sites/mysite/lib/python3/cmk/special_agents/utils/agent_common.py", line 51, in append_json for l in data: File "/omd/sites/mysite/lib/python3/cmk/special_agents/agent_storeonce4x.py", line 154, in handler_simple yield from (requester.get(uri) for uri in uris) File "/omd/sites/mysite/lib/python3/cmk/special_agents/agent_storeonce4x.py", line 154, in <genexpr> yield from (requester.get(uri) for uri in uris) File "/omd/sites/mysite/lib/python3/cmk/special_agents/agent_storeonce4x.py", line 142, in get resp = self._oauth_session.request( File "/omd/sites/mysite/lib/python3/requests_oauthlib/oauth2_session.py", line 496, in request token = self.refresh_token( File "/omd/sites/mysite/lib/python3/requests_oauthlib/oauth2_session.py", line 446, in refresh_token self.token = self._client.parse_request_body_response(r.text, scope=self.scope) File "/omd/sites/mysite/lib/python3/oauthlib/oauth2/rfc6749/clients/base.py", line 421, in parse_request_body_response self.token = parse_token_response(body, scope=scope) File "/omd/sites/mysite/lib/python3/oauthlib/oauth2/rfc6749/parameters.py", line 431, in parse_token_response validate_token_parameters(params) File "/omd/sites/mysite/lib/python3/oauthlib/oauth2/rfc6749/parameters.py", line 441, in validate_token_parameters raise MissingTokenError(description="Missing access token parameter.") oauthlib.oauth2.rfc6749.errors.MissingTokenError: (missing_token) Missing access token parameter.
Solution
Example with Special Agent of storeonce4x
Find out the detailed special agent command (Type of agent column)
OMD[mysite]:~$ cmk -D hostname
an easier way would be this command: /bin/sh -c "$(cmk -D k8s | grep -A1 "^Type of agent:" | grep "Program:" | cut -f2 -d':')"
Please note that if a line matching "^Type of agent:" followed by a line matching "^ Program:" exists more than once, then the output might be messed up.
.
Check if there are some options for debugging
OMD[mysite]:~$ ~/share/check_mk/agents/special/agent_storeonce4x -h
There are three options for debugging the request:--debug, -d Enable debug mode (keep some exceptions unhandled) --verbose, -v --vcrtrace TRACEFILE, --tracefile TRACEFILE If this flag is set to a TRACEFILE that does not exist yet, it will be created and all requests the program sends and their corresponding answers will be recorded in said file. If the file already exists, no requests are sent to the server, but the responses will be replayed from the tracefile.
.
Modify the special agent command by adding these three options
OMD[mysite]:~$ ~/share/check_mk/agents/special/agent_storeonce4x <OTHER ARGUMENTS> --debug -v --vcrtrace ~/tmp/vcrtrace.txt 2>1 ~/tmp/storeonce4x_with_debug.txt
.
Run the special agent with no debug options to create an agent output. With this file, we can reproduce your issue
OMD[mysite]:~$ /omd/sites/mysite/share/check_mk/agents/special/agent_kube <OTHER ARGUMENTS> > ~/tmp/k8s_agent_output.txt
Rename the token file
The storeonce4x special agent is using username/password for authentication. After the successful login, we obtain the access token. The access token is used for future REST requests.
If you want to read more, you can check this out: https://hewlettpackard.github.io/storeonce-rest/#AuthenticationWe save the token file inside the site in
~/tmp/check_mk/special_agents/agent_storeonce4x/<hostname>_oAuthToken.json
.
Rename the file to _oAuthToken.json.back
OMD[mysite]~# mv ~/tmp/check_mk/special_agents/agent_storeonce4x/<hostname>_oAuthToken.json ~/tmp/check_mk/special_agents/agent_storeonce4x/<hostname>_oAuthToken.json.back
.
Run the special agent again
VMware vSphere
Although Containers and their management with Kubernetes took the IT industry by storm, virtualization still has its "right to exist" in on-prem environments and everywhere where containerization would not fit.
This is an extension to Monitoring VMware ESXi.
Getting Started
Background information regarding this subject is available in our Official documentation
Datastore provisioning in vSphere
When adding the vCenter into Checkmk, you automatically have full insight into datastore provisioning, and you can be alerted if too many VMs are provisioned as "Thin", thus reclaiming more logical space than the datastore can provide physically.
If you add the ESXi host solely, you'd probably see the "provisioning" value also in your filesystems, but then they are identical to the "Used filesystem" value. Only vCenter knows the real provisioned values.
Piggyback-only with ESXi hosts
Although we consider it best practice to use a read-only user on the ESXi hosts themselves AND the vCenter, to allow continuous monitoring, in some cases, it might not be allowed to access the hosts themselves.
If Piggyback is configured correctly, then you only need a read-only user to access the vCenter inventory (tip: use a local vSphere user, i.e., monitoring@vsphere.local, not AD, as this might time out during query), and test access by logging in with this user at the vSphere console site.
When adding the vCenter with all available data and then adding the ESXi hosts in Checkmk, piggyback will automatically assign all resources to the ESXi hosts as you see them in the vCenter (i.e., CPU, memory, data stores ....).
Disadvantage: if the ESXi hosts are not directly monitored via Special Agent (or SNMP, we've seen that, too), the local partitions on the hosts are not visible.
More piggyback! Snapshot monitoring
When adding the VMs as hosts into Checkmk, several more Checks are automatically added to them, without any further config needed, beginning with "ESX" and mainly displaying the VMs resource consumption.
One of them, "ESX Snapshots," allows you to monitor all given snapshots of the VM and alert you if they get too old. This is very useful to remind POs to delete their manually created snapshots in a timely fashion.
Basic debugging
- Example with Special Agent of vSphere
.Find out the detailed special agent command
OMD[mysite]:~$ cmk -D <vcenter-host> | more vcenter Addresses: x.x.x.x Tags: [add_ip_addresses:add_ip_addresses_1], [address_family:ip-v4-only], [agent:special-agents], [criticality:prod], [ip-v4:ip-v4], [networking:lan], [piggyback:auto-piggyback], [site:nagnis_master], [snmp_ds:no-snmp], [tcp:tcp] Labels: [cmk/vsphere_object:vm] Host groups: check_mk Contact groups: all Agent mode: No Checkmk agent, all configured special agents Type of agent: Program: /omd/sites/mysite/share/check_mk/agents/special/agent_vsphere -u 'user' -s 'password' -i hos tsystem,virtualmachine,datastore,counters,licenses -P --spaces cut --snapshot_display vCenter --no-cert-check 'x.x.x.x' Process piggyback data from /omd/sites/mysite/tmp/check_mk/piggyback/vcenter Services: checktype item params
An easier way would be this command: /bin/sh -c "$(cmk -D vcenter | grep -A1 "^Type of agent:" | grep "^ Program:" | cut -f2 -d':')"
Please note that if a line matching "^Type of agent:" followed by a line matching "^ Program:" exists more than once, the output might be messed up.
.
Check if there are options for debugging.
OMD[mysite]:~$ /omd/sites/mysite/share/check_mk/agents/special/agent_vsphere -h
There are two options for debugging the request.
--debug Debug mode: let Python exceptions come through --tracefile FILENAME Log all outgoing and incoming data into the given tracefile
.
Modify the special agent command by adding these two options
OMD[mysite]:~$ /omd/sites/mysite/share/check_mk/agents/special/agent_vsphere -u 'user' -s 'password' --debug --tracefile $OMD_ROOT/tmp/vcenter.out -i hostsystem,virtualmachine,datastore,counters,licenses -P --spaces cut --no-cert-check '$HOST_ADDRESS' > $OMD_ROOT/tmp/vcenter.debug
In CMK 1.6.0, you might find the option "--snapshot_display vCenter" in your CMK -D output. If that's the case, you can include this parameter.
.
Run the special agent with no debug options to create an agent output. With this file, we can reproduce your issue.
root@linux~# /omd/sites/mysite/share/check_mk/agents/special/agent_vsphere -u 'user' -s 'password' -i hostsystem,virtualmachine,datastore,counters,licenses -P --spaces cut --no-cert-check 'x.x.x.x' >/~tmp/agent.output
.
Please send us all three files. Now we're able to investigate further.
1
2
3~
/tmp/vcenter
.debug
# Debug Output
~
/tmp/vcenter
.out
# Tracefile
/~tmp
/agent
.output
# Agent Output
Advanced Debugging Examples
Collect several agent outputs over a period of time:
export t=60; export s=0; while [ $s -le 600 ]; do echo $s; cmk -d $VSPHERE_HOST > /tmp/agent_vsphere_output.$s; let s=$s+$t; sleep $t; done
Collect several trace files over a period of time:
export t=60; export s=0; while [ $s -le 600 ]; do echo $s; ./agent_vsphere --trace /tmp/agent_vsphere_trace.$s $OTHER_COMMAND_PARAMS; let s=$s+$t; sleep $t; done
Related articles