How-to monitor SQL AG backups on secondary replicas with mk-sql

How-to monitor SQL AG backups on secondary replicas with mk-sql

Enables Checkmk to detect SQL Server Always On backups running on secondary replicas using a custom mk-sql configuration.

LAST TESTED ON CHECKMK 2.3.0P34

Table of Contents

Overview

In environments using Microsoft SQL Server Always On Availability Groups (AG), it is common practice to perform backups on a secondary replica to reduce load on the primary.
However, the default Checkmk mk-sql query filters for primary replicas, which can prevent the backup check from detecting these jobs.
This article provides a supported solution that enables Checkmk to properly recognize backups from secondary replicas using a deployable SQL configuration file.


Problem

By default, mk-sql only considers databases where rep.is_primary_replica is True.
This condition excludes backup results executed on secondary replicas, even though such configurations follow Microsoft’s best practices:
What is an Always On availability group?


As a result:

  • Backup checks may report missing or outdated backups on the secondary replica.

  • The monitoring output differs from the real backup state.

 

Solution

To ensure all backup activity is reported, including those on secondary replicas, Checkmk supports the use of a custom SQL file that adjusts the mk-sql query.
This SQL safely removes the restrictive filter and enables proper detection of secondary-side backups, without impacting primary replica behavior.

This solution can be rolled out using the Agent Bakery or manually deployed to the secondary SQL node.

Step-by-step instructions

  1. Create the folder structure

    On your Checkmk site or in your bakery configuration tree, create the following directory:

    local/share/check_mk/agents/custom/ └── backup2ndreplicaquery └── lib └── config └── mssql └── backup.sql


    This is what it will look like within Windows Explorer:

    image-20251023-151556.png



  2. Use the following SQL content

    Save the below as backup.sql inside the path shown in step 1.

    DECLARE @HADRStatus sql_variant; DECLARE @SQLCommand NVARCHAR(MAX); SET @HADRStatus = (SELECT SERVERPROPERTY ('IsHadrEnabled')); IF (@HADRStatus IS NULL or @HADRStatus <> 1) BEGIN SET @SQLCommand = ' SELECT CONVERT(NVARCHAR, DATEADD(s, MAX(DATEDIFF(s, ''19700101'', backup_finish_date) - (CASE WHEN time_zone IS NOT NULL AND time_zone <> 127 THEN 60 * 15 * time_zone ELSE 0 END)), ''19700101''), 120) AS last_backup_date, CAST(type AS NVARCHAR(128)) AS type, CAST(machine_name AS NVARCHAR(128)) AS machine_name, CAST(''True'' AS NVARCHAR(12))as is_primary_replica, CAST(''1'' AS NVARCHAR(12)) AS is_local, CAST('''' AS NVARCHAR(12)) AS replica_id, CAST(sys.databases.name AS NVARCHAR(MAX)) AS database_name FROM msdb.dbo.backupset LEFT OUTER JOIN sys.databases ON CAST(sys.databases.name AS NVARCHAR(MAX)) = CAST(msdb.dbo.backupset.database_name AS NVARCHAR(MAX)) WHERE UPPER(machine_name) = UPPER(CAST(SERVERPROPERTY(''Machinename'') AS NVARCHAR(MAX))) GROUP BY type, machine_name, CAST(sys.databases.name AS NVARCHAR(MAX)) ' END ELSE BEGIN SET @SQLCommand = ' SELECT CONVERT(NVARCHAR, DATEADD(s, MAX(DATEDIFF(s, ''19700101'', b.backup_finish_date) - (CASE WHEN time_zone IS NOT NULL AND time_zone <> 127 THEN 60 * 15 * time_zone ELSE 0 END)), ''19700101''), 120) AS last_backup_date, CAST(b.type AS NVARCHAR(MAX)) AS type, CAST(b.machine_name AS NVARCHAR(MAX)) AS machine_name, -- ISNULL(CONVERT(NVARCHAR(40), rep.is_primary_replica), '''') AS is_primary_replica, CAST(''1'' AS NVARCHAR(12)) AS is_primary_replica, -- patch rep.is_local, -- ISNULL(CONVERT(NVARCHAR(40), rep.replica_id), '''') AS replica_id, CAST('''' AS NVARCHAR(12)) AS replica_id, -- patch CAST(b.machine_name AS NVARCHAR(MAX)) AS machine_name, CAST(db.name AS NVARCHAR(MAX)) AS database_name FROM msdb.dbo.backupset b LEFT OUTER JOIN sys.databases db ON CAST(b.database_name AS NVARCHAR(MAX)) = CAST(db.name AS NVARCHAR(MAX)) LEFT OUTER JOIN sys.dm_hadr_database_replica_states rep ON db.database_id = rep.database_id WHERE (rep.is_local is null or rep.is_local = 1) -- AND (rep.is_primary_replica is null or rep.is_primary_replica = ''True'') AND UPPER(machine_name) = UPPER(CAST(SERVERPROPERTY(''Machinename'') AS NVARCHAR(120))) GROUP BY type, rep.replica_id, rep.is_primary_replica, rep.is_local, CAST(db.name AS NVARCHAR(MAX)), CAST(b.machine_name AS NVARCHAR(MAX)), rep.synchronization_state, rep.synchronization_health ' END EXEC (@SQLCommand)

     

  3. Deploy via the Agent Bakery

    1. Navigate to Setup → Agents → Windows, Linux, Solaris → Agent rules → Deploy custom files.

    2. Create a rule called MSSQL backup on secondary replica.

    3. Upload the backup.sql file.
      Windows destination path:
      C:\ProgramData\checkmk\agent\config\mssql\backup.sql

    4. Restrict the rule to the secondary AG node.

    5. Bake and deploy the agent.


  4. Validate

    Run the following test on the SQL host:

    PS C:\ProgramData\checkmk\agent\config> ..\plugins\mk-sql.exe -c .\mk-sql.yml

    You should now see the expected backup job data from the secondary replica.

    After confirming, perform a service discovery in Checkmk to verify that MSSQL backup services now report correctly.

Notes

  • This solution affects only the configured secondary node.

  • Do not deploy the modified SQL globally, as it intentionally changes replica filtering.

  • Attach agent outputs and SQL versions when opening support tickets for traceability.

 

Related articles