Checkmk Docker operations and troubleshooting
This article is a troubleshooting and operations guide for Checkmk in Docker.
LAST TESTED ON CHECKMK 2.4.0P1
Overview
This guide provides a single reference for common operational tasks and troubleshooting procedures when running Checkmk inside Docker. It includes backup and restore instructions and solutions for known container startup and performance issues.
Backup and Restore of Checkmk in Docker
Problem
omd backup works inside a container, but omd restore is not supported without manual intervention. Manual extraction is possible but requires several steps and can be error-prone.
Solution
Use Docker’s built-in tools to back up and restore the container image and the named volume. This provides a reliable and consistent method for full restoration.
Always stop the container before restoring the volume to avoid data corruption.
Run the following command:
docker save -o checkmk-image-backup.tar <checkmk-image-name>Store the resulting tar file in your backup location.
Identify the volume. Example:
monitoring:/omd/sitesRun the backup command:
docker run --rm \ --mount source=<volume-name>,target=<target> \ -v $(pwd):/backup \ busybox \ tar -czvf /backup/<backup-filename>.tar.gz <target>Replace:
<volume-name>with the name of your volume<target>usually/omd/sites<backup-filename>with any archive name
Load the previously saved image:
docker load -i checkmk-image-backup.tarStop the Checkmk container.
Run:
docker run --rm \ --mount source=<volume-name>,target=<target> \ -v $(pwd):/backup \ busybox \ tar -xzvf /backup/<backup-filename>.tar.gz -C <target>Start the Checkmk container.
Binding Service “Address Already in Use” Error
Problem
During container startup, Checkmk may fail with output similar to:
Binding service [live-tls] to localhost:/omd/sites/mysite/tmp/run/live-tls: Address already in use (98)
Binding service [live-tls] failedThe reason is that a leftover temporary socket or PID file is still present in the site’s ~/tmp directory. This usually happens after an unclean shutdown or forced container stop.
Solution
Remove stale temporary files inside the Checkmk site.
The site user must delete the reported file within the sites
~/tmp/folder:Run:
OMD[mysite]~$ rm -rf ~/tmp/*Restart the container.
The container should now start without issue.
Removing ~/tmp/* does not affect persistent site data because it only contains temporary runtime files.
Activation of Changes Timing Out
Problem
When activating changes, the web interface may time out or freeze. This is frequently seen when Checkmk runs inside Docker without custom ulimit settings.
Checkmk requires a higher limit for open file descriptors. Docker defaults may be too restrictive unless explicitly overridden.
Solution
Specify the correct ulimit value (--ulimit nofile=1024) when launching the container.
Run the container with a reduced but properly set nofile limit, for example:
docker container run -dit -p 8080:5000 --ulimit nofile=1024 --tmpfs /opt/omd/sites/cmk/tmp:uid=1000,gid=1000 -v monitoring:/omd/sites --name monitoring -v /etc/localtime:/etc/localtime:ro --restart always checkmk/check-mk-raw:1.6.0-latest
This is also described in our official guide, and we urgently recommend reading the following two articles in their entirety when running Checkmk in Docker:
The modification of ulimit is also possible in Docker swarm. Compare: Add support for --ulimit...to swarm mode via GitHub