Troubleshooting "no space left on device" or filesystem full errors

Troubleshooting "no space left on device" or filesystem full errors

Filesystems on the Checkmk Appliance can fill up for expected reasons or due to misconfiguration. This guide helps identify the cause and suggests solutions.

APPLICABLE TO ALL CHECKMK APPLIANCES

Table of Contents

Problem

There are various reasons why a filesystem on the Checkmk Appliance may fill up. The first step is to identify which filesystem is affected, as each filesystem serves a different purpose and requires a different approach.

You should usually start by checking the Checkmk site running on the appliance itself. Since the appliance monitors its own resources, it often provides a quick overview of the current situation.

image-20260415-085155.png

On a typical Checkmk Appliance, you will see the following filesystems (filesystems ending in /tmp can be ignored):

  • /ro

  • /rw 

  • /omd

The first three filesystems (/, /ro, /rw) should never fill up during normal operation. The /omd filesystem, however, can grow over time depending on the number of sites, hosts, services, and stored monitoring data.

 

Identifying disk usage

To get an overview of disk usage on the appliance, log in via SSH and run:

root@myappliance~$ df -hl -x tmpfs

 

Example output:

Filesystem Size Used Avail Use% Mounted on udev 7,8G 0 7,8G 0% /dev /dev/sdc2 219G 1,8G 217G 1% /ro /dev/disk/by-label/mk-rw 3,9G 3,9G 0G 100% /rw overlay 3,9G 3,9G 0G 100% / efivarfs 304K 79K 221K 27% /sys/firmware/efi/efivars /dev/md0p2 435G 22G 414G 5% /omd

In this example, both / and /rw are at 100% usage.
As /  is mounted as an overlay filesystem, ignore it for this guide, the physical disc utilization is at /rw/ and can only be cleaned up there.

 

Finding what consumes disk space

Once you know which filesystem is full, you need to identify the files or directories consuming the most space.

We start from /rw/ to search for the big file. Use du -sh to inspect disk usage:

root@myappliance~$ du -sh /rw/* root@myappliance~$ du -sh /rw/overlay-rw/* root@myappliance~$ du -sh /rw/overlay-rw/mnt/* root@myappliance~$ du -sh /rw/overlay-rw/mnt/static/*

These commands list the size of all directories one level below the specified path. This example is one of likeliest, where a broken mount ends up filling /rw/.

Identify the largest directories and continue drilling down until you find the files responsible for the disk usage.

 

Examples

A full filesystem can cause scheduled maintenance tasks to fail. Typical error messages include:

/etc/cron.daily/dpkg

cp: error writing 'dpkg.status': No space left on device gzip: .//dpkg.status.0.gz: No space left on device mv: cannot stat './/dpkg.status.0.gz': No such file or directory

 

/etc/cron.daily/logrotate

/etc/cron.daily/logrotate: error: Compressing program wrote following message to stderr when compressing log /var/log/apache2/ssl_access.log.1: gzip: stdout: No space left on device error: failed to compress log /var/log/apache2/ssl_access.log.1 run-parts: /etc/cron.daily/logrotate exited with return code 1

 

/etc/cron.daily/man-db

gdbm fatal: read error run-parts: /etc/cron.daily/man-db exited with return code 1

These errors indicate that routine background jobs cannot write temporary or compressed files due to insufficient disk space.

 

Possible solutions

Cleanup

For additional cleanup steps, refer to the documentation:
How to remove custom changes to files in the appliance

If you have identified large files or directories, verify the following:

  • Possible failed mounts, typically located under /rw/overlay-rw/mnt/

    • Verify that the source works and is accessible

    • Clean up the local files and re-mount the share

    • Verify successful mount

  • If a log file is growing extensively

    • Is debug logging enabled?

    • Is a script writing excessive log output?

    • Is an application repeatedly crashing and logging errors?

  • Any other place will most likely be an unsupported modification of the Appliance firmware. Refer to the link above on how to remove them.

 

Resize

It is normal for the /omd filesystem to grow over time. If cleanup does not free sufficient space, resizing may be required.

Currently, there is no online resizing option for the Checkmk Appliance. Expanding disk space requires setting up a new appliance and migrating existing sites.

 

Virtual appliance

  1. Create a new appliance.

  2. Before the first boot, resize the second disk to a sufficient size.

  3. Boot and configure the appliance.

  4. After the basic setup is complete, migrate your sites from the old appliance as described in the migration documentation.

 

Physical appliance

The process is technically similar. However, physical appliances are available only in fixed sizes.
If you are already using the larger rack5 model, no further resizing is possible.


Werk 9295

With Werk #9295, the default size of the /rw filesystem was increased from 800 MB to 4 GB.

To use the larger /rw filesystem, follow these steps:

  1. Update to the latest 1.4 Firmware

  2. Create a backup of the appliance as described in Configuring and using the appliance.

  3. Reset the appliance to the factory settings.

  4. Restore the backup.

After completing these steps, the /rw filesystem will have a size of 4 GB.

 

Related articles