Troubleshooting the notification path

Troubleshooting the notification path

This article details how to troubleshoot notifications that are not being sent or are delayed.

LAST TESTED ON CHECKMK 2.2.0P1

Table of Contents

Background

In some cases it might be necessary to troubleshoot the notification path due to:

  • Missing alerts

  • Delayed alert delivery

The root cause can be a misconfiguration, e.g., a misconfigured host check interval.

Some background information in order to understand the notification path: The path of a notification from beginning to end

Step-by-step guide


Enable Debug Logging

To gather more insight into what's happening during notification handling, you can enable increased log levels.

Increasing the log levels leads to increasing log file size and can make subsequent troubleshooting harder. Please make sure reset the log levels to defaults, as soon as your immediate troubleshooting concludes.


CMC Logging

  1. Navigate to Setup → Global Settings → Monitoring core → Logging of the core → Notification system

  2. Set to "Debug"

    This setting increases the verbosity of the cmc.log

Notification Log Level

  1. Navigate to Setup → Global Settings → Notifications → Notification log level

  2. Set to "Full dump of all variables and command"

    This setting increases the verbosity of the notify.log

Simulate a Notification


Note: Starting with Checkmk 2.3.0, the built-in "Test notifications" feature should be used instead of the steps below!


To reproduce the issue under controlled conditions:

  1. Open the Events of Service view for the affected service

  2. Disable flap detection via Master Control to avoid suppression.

  3. Ensure no delay rules or check attempt rules are active

  4. Create a fake check (e.g., set a known service manually to CRIT )

Monitor Logs

To observe notification processing in real time inside the cmc.log:

OMD[mysite]:~/var/log$ tail -f cmc.log | grep "HOSTNAME;SERVICENAME"


Typical output might show lines like the following:

[notification helper] ... hard state change to CRITICAL
[notification helper] ... postponing, last host check not recent enough
[notification helper] ... sending PROBLEM notification to its contacts


To observe notification processing in real time inside the notify.log:

OMD[mysite]:~/var/log$ tail -f notify.log | grep "HOSTNAME;SERVICENAME"				

If this log stays empty, the issue will be somewhere in the cmc.log


Analysis

In this case, logs revealed repeated postponement of notifications due to an outdated host check timestamp. Notifications were only sent once the host check became recent.


Key Finding


Recommended Fix

Update the host check interval to match the service check interval:

  • Suggested setting: 60 seconds

Result: Notification delay resolved under observation


Related articles