How-to debug failed backup jobs

This manual is an extension of our general Checkmk backup article: https://docs.checkmk.com/master/en/backup.html 

LAST TESTED ON CHECKMK 2.0.0P1

Table of Contents

Problem

Basic information about mkbackup


After configuring the backup job in Webconf, a cronjob is created. This job can be inspected on the command line, after logging in via SSH as site user:

OMD[mysite]:~$ cat etc/cron.d/mkbackup 
# Written by mkbackup configuration
0 0 * * * mkbackup backup mybackup >/dev/null

OMD[mysite]:~$ mkbackup backup mybackup
2022-05-17 16:02:22 --- Starting backup (Check_MK-cma-mysite-mybackup to mytarget) ---
2022-05-17 16:02:24 Verifying backup consistency
2022-05-17 16:02:24 Cleaning up previously completed backup
2022-05-17 16:02:24 --- Backup completed (Duration: 0:00:01, Size: 42.00 MB, IO: 0.42 B/s) ---
OMD[mysite]:~$ 


If you need more debugging, you can add --verbose and --debug to the mkbackup command:

OMD[cma]:~$ mkbackup --verbose --debug backup mybackup

Collection of error messages

Failed to perform a backup: [Errno 104] Connection reset by peer

2021-03-17 11:10:20 --- Starting backup (Check_MK_Appliance-test+stage+106-nfs+backup+appliance to nfs-backup-appliance) ---
2021-03-17 11:10:20 Performing system backup (system.tar)
2021-03-17 11:10:25 Performing system data backup (system-data.tar)
2021-03-17 11:10:48 Performing site backup: test
Site backup failed: Failed to perform backup: [Errno 104] Connection reset by peer

Solution

Find the correct backup job

OMD[mysite]:~$ mkbackup jobs
Job-ID                        Title                         
------------------------------------------------------------
myid                           mytitle                             
OMD[mysite]:~$ 


Please run the backup directly on the command line and forward the output to a log file.

OMD[mysite]:~$ omd -v backup --no-compression mybackup - >~/path/to/my_backup.txt
 Pausing RRD updates for /omd/sites/mysite/var/pnp4nagios/perfdata/myhost/my_disk_read.rrd
 rrdcached command: SUSPEND /omd/sites/mysite/var/pnp4nagios/perfdata/myhost/my_disk_read.rrd
 rrdcached response: '-1 /omd/sites/mysite/var/pnp4nagios/perfdata/myhost/my_disk_read.rrd - No such file or directory\n'
 Resuming RRD updates for /omd/sites/mysite/var/pnp4nagios/perfdata/myhost/my_disk_read.rrd
 rrdcached command: RESUME /omd/sites/mysite/var/pnp4nagios/perfdata/myhost/my_disk_read.rrd
 skipping rrdcached command (broken pipe)
 Pausing RRD updates for /omd/sites/mysite/var/pnp4nagios/perfdata/myhost/my_disk_write.rrd
 rrdcached command: SUSPEND /omd/sites/mysite/var/pnp4nagios/perfdata/myhost/my_disk_write.rrd
 rrdcached response: '-1 /omd/sites/mysite/var/pnp4nagios/perfdata/myhost/my_disk_write.rrd - No such file or directory\n'
 Resuming RRD updates for /omd/sites/mysite/var/pnp4nagios/perfdata/myhost/my_disk_write.rrd
 rrdcached command: RESUME /omd/sites/mysite/var/pnp4nagios/perfdata/myhost/my_disk_write.rrd
 Failed to perform backup: [Errno 104] Connection reset by peer

Here it looks like Checkmk is using pnp4nagios instead of Round Robin Database (RRD). We recommend converting the performance data to the RRD format. Please follow the steps described here: https://docs.checkmk.com/latest/en/graphing.html#customise_rrds

Don't forget to stop the site before converting the files!

Now the backup should run without any errors.

UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc3' in position 122: surrogates not allowed

Job state: Site mysite Backup

#############################################
Site backup
State Failed
Runtime Started at 2022-06-21 03:00:02, Finished at 2022-06-21 03:00:02 (Duration: 0:16:36)
Output

2022-06-21 03:00:02 — Starting backup (Check_MK-mysite+cmk2-mysite-mysite+bak to Reload) —
2022-06-21 03:00:02 Found previous incomplete backup. Cleaning up those files.
Site backup failed: Traceback (most recent call last):
File "/omd/sites/mysite/bin/omd", line 60, in <module>
omdlib.main.main()
File "/omd/versions/2.0.0p23.cee/lib/python3/omdlib/main.py", line 4022, in main
command.handler(version_info, site, global_opts, args, command_options)
File "/omd/versions/2.0.0p23.cee/lib/python3/omdlib/main.py", line 2753, in main_backup
omdlib.backup.backup_site_to_tarfile(site, fh, tar_mode, options, global_opts.verbose)
File "/omd/versions/2.0.0p23.cee/lib/python3/omdlib/backup.py", line 54, in backup_site_to_tarfile
_backup_site_files_to_tarfile(site, tar, options)
File "/omd/versions/2.0.0p23.cee/lib/python3/omdlib/backup.py", line 112, in _backup_site_files_to_tarfile
tar.add(site.dir, site.name, filter=filter_files)
File "/omd/versions/2.0.0p23.cee/lib/python3/omdlib/backup.py", line 134, in add
super(BackupTarFile, self).add(name, arcname, recursive, filter=filter)
File "/omd/versions/2.0.0p23.cee/lib/python3.8/tarfile.py", line 1977, in add
self.add(os.path.join(name, f), os.path.join(arcname, f),
File "/omd/versions/2.0.0p23.cee/lib/python3/omdlib/backup.py", line 134, in add
super(BackupTarFile, self).add(name, arcname, recursive, filter=filter)
File "/omd/versions/2.0.0p23.cee/lib/python3.8/tarfile.py", line 1977, in add
self.add(os.path.join(name, f), os.path.join(arcname, f),
File "/omd/versions/2.0.0p23.cee/lib/python3/omdlib/backup.py", line 134, in add
super(BackupTarFile, self).add(name, arcname, recursive, filter=filter)
File "/omd/versions/2.0.0p23.cee/lib/python3.8/tarfile.py", line 1977, in add
self.add(os.path.join(name, f), os.path.join(arcname, f),
File "/omd/versions/2.0.0p23.cee/lib/python3/omdlib/backup.py", line 134, in add
super(BackupTarFile, self).add(name, arcname, recursive, filter=filter)
File "/omd/versions/2.0.0p23.cee/lib/python3.8/tarfile.py", line 1977, in add
self.add(os.path.join(name, f), os.path.join(arcname, f),
File "/omd/versions/2.0.0p23.cee/lib/python3/omdlib/backup.py", line 134, in add
super(BackupTarFile, self).add(name, arcname, recursive, filter=filter)
File "/omd/versions/2.0.0p23.cee/lib/python3.8/tarfile.py", line 1977, in add
self.add(os.path.join(name, f), os.path.join(arcname, f),
File "/omd/versions/2.0.0p23.cee/lib/python3/omdlib/backup.py", line 134, in add
super(BackupTarFile, self).add(name, arcname, recursive, filter=filter)
File "/omd/versions/2.0.0p23.cee/lib/python3.8/tarfile.py", line 1971, in add
self.addfile(tarinfo, f)
File "/omd/versions/2.0.0p23.cee/lib/python3/omdlib/backup.py", line 158, in addfile
self._suspend_rrd_update(rrd_file_path)
File "/omd/versions/2.0.0p23.cee/lib/python3/omdlib/backup.py", line 169, in _suspend_rrd_update
self._send_rrdcached_command("SUSPEND %s" % path)
File "/omd/versions/2.0.0p23.cee/lib/python3/omdlib/backup.py", line 199, in _send_rrdcached_command
self._sock.sendall(("%s\n" % cmd).encode("utf-8"))
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc3' in position 122: surrogates not allowed

Solution

Please run the backup directly on the command line and forward the output to a log file.

OMD[mysite]:~$ omd -v backup --no-compression mybackup - >~/path/to/my_backup.txt


Let's check the log now:

..
...
rrdcached command: SUSPEND /opt/omd/sites/mysite/var/pnp4nagios/perfdata/mysite/Check_MK_Jun_17_12_29_15_49152.18456_MSSQLSERVER_NT-AUTORIT�.rrd
Traceback (most recent call last):
  File "/omd/sites/rrd2/bin/omd", line 60, in <module>
    omdlib.main.main()
  File "/omd/versions/2.0.0p26.cee/lib/python3/omdlib/main.py", line 4022, in main
    command.handler(version_info, site, global_opts, args, command_options)
  File "/omd/versions/2.0.0p26.cee/lib/python3/omdlib/main.py", line 2753, in main_backup
    omdlib.backup.backup_site_to_tarfile(site, fh, tar_mode, options, global_opts.verbose)
  File "/omd/versions/2.0.0p26.cee/lib/python3/omdlib/backup.py", line 54, in backup_site_to_tarfile
    _backup_site_files_to_tarfile(site, tar, options)
  File "/omd/versions/2.0.0p26.cee/lib/python3/omdlib/backup.py", line 112, in _backup_site_files_to_tarfile
    tar.add(site.dir, site.name, filter=filter_files)
  File "/omd/versions/2.0.0p26.cee/lib/python3/omdlib/backup.py", line 134, in add
    super(BackupTarFile, self).add(name, arcname, recursive, filter=filter)
  File "/omd/versions/2.0.0p26.cee/lib/python3.8/tarfile.py", line 1977, in add
    self.add(os.path.join(name, f), os.path.join(arcname, f),
  File "/omd/versions/2.0.0p26.cee/lib/python3/omdlib/backup.py", line 134, in add
    super(BackupTarFile, self).add(name, arcname, recursive, filter=filter)
  File "/omd/versions/2.0.0p26.cee/lib/python3.8/tarfile.py", line 1977, in add
    self.add(os.path.join(name, f), os.path.join(arcname, f),
  File "/omd/versions/2.0.0p26.cee/lib/python3/omdlib/backup.py", line 134, in add
    super(BackupTarFile, self).add(name, arcname, recursive, filter=filter)
  File "/omd/versions/2.0.0p26.cee/lib/python3.8/tarfile.py", line 1977, in add
    self.add(os.path.join(name, f), os.path.join(arcname, f),
  File "/omd/versions/2.0.0p26.cee/lib/python3/omdlib/backup.py", line 134, in add
    super(BackupTarFile, self).add(name, arcname, recursive, filter=filter)
  File "/omd/versions/2.0.0p26.cee/lib/python3.8/tarfile.py", line 1977, in add
    self.add(os.path.join(name, f), os.path.join(arcname, f),
  File "/omd/versions/2.0.0p26.cee/lib/python3/omdlib/backup.py", line 134, in add
    super(BackupTarFile, self).add(name, arcname, recursive, filter=filter)
  File "/omd/versions/2.0.0p26.cee/lib/python3.8/tarfile.py", line 1977, in add
    self.add(os.path.join(name, f), os.path.join(arcname, f),
  File "/omd/versions/2.0.0p26.cee/lib/python3/omdlib/backup.py", line 134, in add
    super(BackupTarFile, self).add(name, arcname, recursive, filter=filter)
  File "/omd/versions/2.0.0p26.cee/lib/python3.8/tarfile.py", line 1971, in add
    self.addfile(tarinfo, f)
  File "/omd/versions/2.0.0p26.cee/lib/python3/omdlib/backup.py", line 158, in addfile
    self._suspend_rrd_update(rrd_file_path)
  File "/omd/versions/2.0.0p26.cee/lib/python3/omdlib/backup.py", line 169, in _suspend_rrd_update
    self._send_rrdcached_command("SUSPEND %s" % path)
  File "/omd/versions/2.0.0p26.cee/lib/python3/omdlib/backup.py", line 199, in _send_rrdcached_command
    self._sock.sendall(("%s\n" % cmd).encode("utf-8"))
UnicodeEncodeError: 'utf-8' codec can't encode character '\udcc3' in position 122: surrogates not allowed


This issue is that this file contains a non-ascii character at the end. "AUTORIT�.rrd"

To correct this, we must delete or rename this file. The safest solution would be to rename it.

OMD[mysite]:~$ mv oldfilename newfilename