Info |
---|
This document is for debugging the new k8 special agent introduced with Checkmk 2.1 Werk #13810 |
...
- The first step would be to find the complete command of the Kubernetes special agent.
The command can be found under "Type of agent >> Program." It will consist of multiple parameters depending on how the datasource program rule has been configured.
Code Block language bash theme RDark OMD[mysite]:~$ cmk -D k8s | more k8s Addresses: No IP Tags: [address_family:no-ip], [agent:special-agents], [criticality:prod], [networking:lan], [piggyback:auto-piggyback], [site:a21], [snmp_ds:no-snmp], [tcp:tcp] Labels: [cmk/kubernetes/cluster:at], [cmk/kubernetes/object:cluster], [cmk/site:k8s] Host groups: check_mk Contact groups: all Agent mode: No Checkmk agent, all configured special agents Type of agent: Program: /omd/sites/mysite/share/check_mk/agents/special/agent_kube '--cluster' 'k8s' '--token' 'xyz' '--monitored-objects' 'deployments' 'daemonsets' 'statefulsets' 'nodes' 'pods' '--api-server-endpoint' 'https://<YOUR-IP>:6443' '--api-server-proxy' 'FROM_ENVIRONMENT' '--cluster-collector-endpoint' 'https://<YOUR-ENDPOINT>:30035' '--cluster-collector-proxy' 'FROM_ENVIRONMENT' Process piggyback data from /omd/sites/mysite/tmp/check_mk/piggyback/k8s Services: ...
Note An easier way would be this command: /bin/sh -c "$(cmk -D k8s | grep -A1 "^Type of agent:" | grep "Program:" | cut -f2- -d':')"
Please note that if a line matching "^Type of agent:" followed by a line matching "^ Program:" exists more than once, the output might be messed up.
.
The special agent has the below options available for debugging purposes:
Code Block language bash theme RDark OMD[mysite]:~$ /omd/sites/mysite/share/check_mk/agents/special/agent_kube -h ... --debug Debug mode: raise Python exceptions -v / --verbose Verbose mode (for even more output use -vvv) --vcrtrace FILENAME Enables VCR tracing for the API calls ...
.
Now, you can modify the above command of the Kubernetes special agent like this:
Code Block language bash theme RDark OMD[mysite]:~$ /omd/sites/mysite/share/check_mk/agents/special/agent_kube \ '--cluster' 'at' \ '--token' 'xyz' \ '--monitored-objects' 'deployments' 'daemonsets' 'statefulsets' 'nodes' 'pods' \ '--api-server-endpoint' 'https://<YOUR-IP>:6443' \ '--api-server-proxy' 'FROM_ENVIRONMENT' \ '--cluster-collector-endpoint' 'https://<YOUR-ENDPOINT>:30035' \ '--cluster-collector-proxy' 'FROM_ENVIRONMENT' \ --debug -vvv --vcrtrace ~/tmp/vcrtrace.txt > ~/tmp/k8s_with_debug.txt 2>&1
Here, you can also reduce the number of '--monitored-objects' to a few resources to get less output.
.Run the special agent with no debug options to create an agent output, or you could download it from the cluster host via the Checkmk web interface.
Code Block language bash theme RDark /omd/sites/mysite/share/check_mk/agents/special/agent_kube '--cluster' 'at' '--token' 'xyz' '--monitored -objects' 'deployments' 'daemonsets' 'statefulsets' 'nodes' 'pods' '--api-server-endpoint' 'https://<YOUR-IP>:6443' '--api-server-proxy' 'FROM_ENVIRONMENT' '--cluster-collector-endpoint' 'https://<YOUR-ENDPOINT>:30035' '--cluster-collector-proxy' 'FROM_ENVIRONMENT' > ~/tmp/k8s_agent_output.txt 2>&1
.
Please upload the following files to the support ticket.
...
- Context: the Kubernetes special agent is slightly unconventional relative to other Special agents as it handles up to three different datasources (the API, the cluster collector container metrics, and the cluster collector node metrics)
- the connection to the Kubernetes API server is mandatory, while the connection to the others is optional (and decided through the configured Datasource rule)
- Failure to connect to the Kubernetes API server will be shown by the Checkmk service (as usual) → the agent crashes
- Failure to connect to the cluster collector will be highlighted in the Cluster Collector service → the error is not raised by the agent in production
- the error is only raised when executing the agent with the --debug flag
- the error is only raised when executing the agent with the --debug flag
- the connection to the Kubernetes API server is mandatory, while the connection to the others is optional (and decided through the configured Datasource rule)
- Version: We only support the latest three Kubernetes versions (https://kubernetes.io/releases/#:~:text=The%20Kubernetes%20project%20maintains%20release,9%20months%20of%20patch%20support.)
- If a customer has the latest release and the release itself is quite new (less than one month), ask one of the devs if we already have support.
- If a customer has the latest release and the release itself is quite new (less than one month), ask one of the devs if we already have support.
- Kubernetes API connection error: If the agent fails to make a connection to the Kubernetes API (e.g., 401 Unauthorized to query api/v1/core/pods), then the output based on the --debug flag should be sufficient
- common causes:
- service account was not configured correctly in the Kubernetes cluster
- wrong token configured
- Forgot to upload the ca.crt in the Global settings >> Trusted certificate authorities for SSL but --verify-cert-api is enabled.
- Wrong IP or Port
- Proxy is not configured in the datasource rule.
- Checkmk Cluster Collector connection error:
- Common causes:
- The cluster collector is not exposed via either NodePort or Ingress.
- The essential resources like pods, deployments, daemon-sets, replicas, etc., are not running or frequently restarting.
- A firewall or a security group blocks the cluster collector IP.
- Port/IP incorrect.
- Forgot to upload the ca.crt in the Global settings >> Trusted certificate authorities for SSL but --verify-cert-api is enabled.
- Proxy is not configured in the datasource rule.
- Common causes:
- API processing error: If the agent reports a bug similar to "value ... was not set, " the user should be asked for the vcrtrace file.
Further information
More information regarding Kubernetes is available on our:
Related articles
Filter by label (Content by label) | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
...