Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 60 Next »

Customers are reporting a lingering downtime that is no longer relevant. While this issue may be rare, a customer had a similar experience documented HERE.

LAST TESTED ON CHECKMK 2.2.0P1

Problem

The customer is experiencing difficulties resolving the constant downtime for a specific host. The issue involves a situation where the downtime was initiated without an associated end time.

Troubleshooting

  1. Check the downtimes for both services and hosts:


  2. If no relevant information is discovered, the next step would involve executing a Livestatus query to retrieve all existing downtimes. 

    lq "GET downtimes\nColumns: downtime_author downtime_comment downtime_duration downtime_end_time downtime_entry_time downtime_fixed downtime_id downtime_is_service downtime_origin downtime_recurring downtime_start_time host_has_been_checked host_labels host_name host_scheduled_downtime_depth host_state service_description service_has_been_checked service_state"
  3. If nothing is still found, it is recommended to investigate the history file located in the ~/var/check_mk/core, explicitly searching for the summary information. In this particular scenario, the summary to search for is 'DT2'. 

    OMD[mysite]~$ grep -rl <DOWNTIMESUMMARY> ~/var/check_mk/core/history 
    OMD[mysite]:-/var/check_ mk/core$ grep -r DT2
    history:[1684243614] EXTERNAL COMMAND: SCHEDULE HOST_ DOWNTIME;localhost2;1684243614:1684243734;1;0;0;cmkadmin;DT2
    history:[1684243614] HOST DOHNTIME ALERT: localhost2;STARTED;DT2
    OMD[mysite]:~ /var/check_mk/core$
  4. If the history file is large, reviewing the files in ~/var/check_mk/core/archive can also be helpful. These history files contain Unix timestamps that can help with troubleshooting. 

    OMD[mysite]~$ grep -rl <DOWNTIMESUMMARY> ~/var/check_mk/core/archive/*

Solution

Warning

Please note that the following steps are unsupported, and a backup of the Checkmk site should be created before proceeding.


If the event is found in the history file but nowhere else:

  1. Stop the site 

    OMD[mysite]~$ omd stop
  2. Open a CLI text editor and remove the entry from the history file. 

    OMD[mysite]~$ vi ~/var/check_mk/core/history
  3. Start this site again. 

    OMD[mysite]~$ omd start

These steps will effectively clear the active lingering downtime.


  • No labels