Troubleshooting SSL and CA Issues after omd copy or omd move in Distributed Monitoring

Troubleshooting SSL and CA Issues after omd copy or omd move in Distributed Monitoring

After renaming a site through "omd cp" or "om mv" the  SSL/CA certificates are not automatically updated to reflect the new site name. With CMK2.5 a new feature "cmk-cert" is published. This article gives an example how to use this new feature. This article is a troubleshooting and operations guide for Checkmk in Docker.

LAST TESTED ON CHECKMK 2.5.0

Table of Contents

Possible Symptom (Example):

Agent Bakery fails during "Bake & Sign" after renaming the site using "omd cp" or "omd mv"

Error in bakery plug-in "agent_controller_connections" ("files" section): Could not load certificate for site 'mysitenew' FileNotFoundError: /omd/sites/mysitenew/var/ssl/remote_sites_cas/mysitenew.pem

 

Root Cause

  • etc/ssl/ca.pem still contains the CN of the old site name (e.g. mysiteold)

  • var/ssl/remote_sites_cas/ only contains entries for the old site names

  • Entries for the new site names are missing (mysitenew.pem, remote1.pem, etc.)

  • Missing Authority Key Identifier (AKI) as the site certificate was create before 2.4.0p20 (see Werk 18990)

 

Solution

With Werk 18988 a new feature "cmk-cert" is introduced. The below step-by-step guide gives an overview, how to rotate the site certificates successfully in distributed monitoring using Checkmk 2.5.

 Environment (example) 

Role

Site name

Server

Role

Site name

Server

Central

mysitenew

192.168.0.232

Remote

remote1

192.168.0.233

  • Checkmk Version: 2.5.x CME

  • Upgraded from: mysiteoldmysitenew via omd cp

  • Upgraded from remote1old → remote1 via omd cp

 

To Regenerate Site CA on a Central site

OMD[mysitenew]:~$ cmk-cert rotate site-ca # Output: "rotation successfully initialized" # Activate pending changes in the GUI: # Setup > Activate Changes → Activate OMD[mysitenew]:~$ cmk-cert rotate site-ca --finalize # Output: "rotation successfully finalized" # Activate pending changes in the GUI again # Verify the CN is correct: OMD[mysitenew]:~$ openssl x509 -in etc/ssl/ca.pem -noout -subject -dates # Expected: CN=Site 'mysitenew' local CA, O=Checkmk Site mysitenew

 

To Regenerate Site CA on a Remote site in Distributed Monitoring

# On the remote server (192.168.0.233): OMD[remote1]:~$ cmk-cert rotate site-ca # Output: "rotation successfully initialized" # If GUI shows no pending changes → open any rule → Save (no changes needed) # Then Activate Changes in the GUI OMD[remote1]:~$ cmk-cert rotate site-ca --finalize # Output: "rotation successfully finalized" # Verify: OMD[remote1]:~$ openssl x509 -in etc/ssl/ca.pem -noout -subject # Expected: CN=Site 'remote1' local CA

 

Known issue in 2.5, "Fake Change" workaround required: After cmk-cert rotate site-ca, the finalize step may fail with:

cmk-cert: Aborting, there are still pending changes to review

Even though the GUI shows no open Activate Changes.

Workaround: Open any rule in the GUI without making changes → click Save. This creates a dummy pending change. Then activate it → run --finalize again.

 

Adding Central site CA to a remote sites CA store

OMD[mysitenew]:~$ cp etc/ssl/ca.pem var/ssl/remote_sites_cas/mysitenew.pem OMD[mysitenew]:~$ chmod 600 var/ssl/remote_sites_cas/mysitenew.pem

 

Fetch CA certificates from a remote site

# On the mysitenew server (192.168.0.232), fetch from remote server (192.168.0.233): root@mysitenew:~$ scp root@192.168.0.233:/omd/sites/remote1/etc/ssl/ca.pem \ /omd/sites/mysitenew/var/ssl/remote_sites_cas/remote1.pem # Fix ownership and permissions: root@mysitenew:~$ chown mysitenew:mysitenew \ /omd/sites/mysitenew/var/ssl/remote_sites_cas/remote1.pem root@mysitenew:~$ chmod 600 \ /omd/sites/mysitenew/var/ssl/remote_sites_cas/remote1.pem

 

Remove stale entries from a remote sites CA store

OMD[mysitenew]:~$ rm var/ssl/remote_sites_cas/mysiteold.pem \ var/ssl/remote_sites_cas/remote1_old.pem 2>/dev/null # Adjust filenames to match whatever old entries exist in your store # Verify only new site names should remain: OMD[mysitenew]:~$ for f in var/ssl/remote_sites_cas/*.pem; do echo "=== $f ===" openssl x509 -in "$f" -noout -subject done # Expected: # mysitenew.pem → CN=Site 'mysitenew' local CA # remote1.pem → CN=Site 'remote1' local CA

 

Trust remote site certificates in the GUI

In the GUI on the mysitenew site:

Setup → General → Distributed Monitoring

For a remote site:

  • Click the yellow shield icon next to the site

  • Confirm trusting the certificate

  • The new remote CA is added to the SSL Trust Store

Please check the certificate store in the site-specific global settings, as there may be different certificates configured there.

Do NOT delete old entries from the Trust Store before clicking the yellow shield, the connection will drop and the shield may not appear. Correct order: shield first → Activate Changes → then clean up old entries.

 

Clean up the SSL Trust Store

Setup → Global Settings → Site Management → Trusted certificate authorities for SSL

Delete all old entries (e.g. mysiteold, old remote CAs).

Keep only:

  • mysitenew local CA

  • remote1 local CA

Then Activate Changes.

 

Verify and test

# Final check of trust store: OMD[mysitenew]:~$ ls -la var/ssl/remote_sites_cas/ # Check Distributed Monitoring connections: # Setup > General > Distributed Monitoring # All sites should show green "Online"

 

Quick Diagnostic Commands

# Check all CA CNs in the remote sites store: OMD[mysitenew]:~$ for f in var/ssl/remote_sites_cas/*.pem; do echo "=== $f ===" openssl x509 -in "$f" -noout -subject done # Check mysitenew site CA CN: OMD[mysitenew]:~$ openssl x509 -in etc/ssl/ca.pem -noout -subject -dates # Find all PEM files and their CNs: OMD[mysitenew]:~$ find etc/ssl var/ssl -name "*.pem" | \ xargs -I{} sh -c 'echo "=== {} ==="; openssl x509 -in {} -noout -subject 2>/dev/null' # Check for old site name references: OMD[mysitenew]:~$ find var/ssl etc/ssl -name "*mysiteold*"

 

Bake & Sign is possible again
The Site CA (etc/ssl/ca.pem) was rotated, this is the CA that signs the site certificate used for Livestatus/TLS connections between sites. 

Already registered agents should NOT need to re-register, because:

  • The agent registration uses the Agent CA (agents/ca.pem), which we left untouched.
    The CN mismatch is cosmetically wrong but functionally harmless as long as you don't rotate the Agent CA.

  • The cmk-cert rotate site-ca command only rotates the Site CA, not the Agent CA

  • Existing agent certificates were issued by the old Agent CA, which is still in place

 

 

Optional: Rotate Agent CA (only if you want a fully clean setup)

OMD[mysitenew]:~$ cmk-cert rotate agent-ca

This means every registered agent must re-register only viable if you have a small number of agents or are still in a test phase with no productive agents registered yet.

 

Related articles