Checkmk Docker operations and troubleshooting

Checkmk Docker operations and troubleshooting

This article is a troubleshooting and operations guide for Checkmk in Docker.

LAST TESTED ON CHECKMK 2.4.0P1

Table of Contents

Overview

This guide provides a single reference for common operational tasks and troubleshooting procedures when running Checkmk inside Docker. It includes backup and restore instructions and solutions for known container startup and performance issues.

 

Backup and Restore of Checkmk in Docker

Problem

omd backup works inside a container, but omd restore is not supported without manual intervention. Manual extraction is possible but requires several steps and can be error-prone.

Solution

Use Docker’s built-in tools to back up and restore the container image and the named volume. This provides a reliable and consistent method for full restoration.

Always stop the container before restoring the volume to avoid data corruption.

 

  1. Run the following command:

    docker save -o checkmk-image-backup.tar <checkmk-image-name>

     

  2. Store the resulting tar file in your backup location.

  3. Identify the volume. Example:

    monitoring:/omd/sites

     

  4. Run the backup command:

    docker run --rm \ --mount source=<volume-name>,target=<target> \ -v $(pwd):/backup \ busybox \ tar -czvf /backup/<backup-filename>.tar.gz <target>

    Replace:

    • <volume-name> with the name of your volume

    • <target> usually /omd/sites

    • <backup-filename> with any archive name

  5. Load the previously saved image:

    docker load -i checkmk-image-backup.tar

     

  6. Stop the Checkmk container.

  7. Run:

    docker run --rm \ --mount source=<volume-name>,target=<target> \ -v $(pwd):/backup \ busybox \ tar -xzvf /backup/<backup-filename>.tar.gz -C <target>

     

  8. Start the Checkmk container.

 

Binding Service “Address Already in Use” Error

Problem

During container startup, Checkmk may fail with output similar to:

Binding service [live-tls] to localhost:/omd/sites/mysite/tmp/run/live-tls: Address already in use (98) Binding service [live-tls] failed

The reason is that a leftover temporary socket or PID file is still present in the site’s ~/tmp directory. This usually happens after an unclean shutdown or forced container stop.


Solution

Remove stale temporary files inside the Checkmk site.

  1. The site user must delete the reported file within the sites ~/tmp/ folder:

  2. Run:

    OMD[mysite]~$ rm -rf ~/tmp/*

     

  3. Restart the container.

The container should now start without issue.

Removing ~/tmp/* does not affect persistent site data because it only contains temporary runtime files.

 

Activation of Changes Timing Out

Problem

When activating changes, the web interface may time out or freeze. This is frequently seen when Checkmk runs inside Docker without custom ulimit settings.

Checkmk requires a higher limit for open file descriptors. Docker defaults may be too restrictive unless explicitly overridden.

Solution

Specify the correct ulimit value (--ulimit nofile=1024) when launching the container.

Run the container with a reduced but properly set nofile limit, for example:

docker container run -dit -p 8080:5000 --ulimit nofile=1024 --tmpfs /opt/omd/sites/cmk/tmp:uid=1000,gid=1000 -v monitoring:/omd/sites --name monitoring -v /etc/localtime:/etc/localtime:ro --restart always checkmk/check-mk-raw:1.6.0-latest


This is also described in our official guide, and we urgently recommend reading the following two articles in their entirety when running Checkmk in Docker:

The modification of ulimit is also possible in Docker swarm. Compare: Add support for --ulimit...to swarm mode via GitHub

 

Related articles