Troubleshooting 110 second timeouts and high memory consumption by apache

This article details how to resolve timeouts due to high memory consumption caused by Apache.

LAST TESTED ON CHECKMK 2.0.0P1

Table of Contents

Problem

You receive timeout errors from the Apache after 110 seconds in several places in Checkmk, e.g., the 'Background Jobs' page.

Additionally, you are experiencing fluctuating overall memory consumption and observe high memory consumption by Apache processes. The overall fluctuation might be up to 10 GB, and the Apache processes might be bigger than 300 MB, which would be a normal size.

The following log messages can be found in web.log :

2022-02-02 01:02:03,405 [40] [cmk.web.job_manager 203828] http://localhost:5000/mysite/check_mk/run_cron.py/mysite/check_mk/run_cron.py Traceback (most recent call last):
  File "/omd/sites/mysite/lib/python3/cmk/gui/background_job.py", line 643, in do_housekeeping
    all_jobs.append((job_id, job_instances[job_id].get_status()))
  File "/omd/sites/mysite/lib/python3/cmk/gui/background_job.py", line 430, in get_status
    status = self._jobstatus.get_status_from_file()
  File "/omd/sites/mysite/lib/python3/cmk/gui/background_job.py", line 574, in get_status_from_file
    data["loginfo"][field_id] = f.read().splitlines()
  File "/omd/sites/mysite/lib/python3/cmk/gui/utils/timeout_manager.py", line 35, in handle_request_timeout
    raise RequestTimeout(
cmk.gui.exceptions.RequestTimeout: Your request timed out after 110 seconds. This issue may be related to a local configuration problem or a request which works with a too large number of objects. But if you think this issue is a bug, please send a crash report.

Reason

This can be due to files and folders in the ~/var/check_mk/background_jobs/ directory, which extend a certain amount and size. Typically, there should be 10 to 15 files and folders, and the size should be below 1 GB. If you exceed that amount, something might be off.

Solution

  1. Log into the affected site omd su mysite 

    root@linux~# omd su mysite

    .

  2. Stop the site omd stop 

    OMD[mysite]:~ omd stop

    .

  3. Make sure the site is completely stopped
    .
  4. Remove the contents of ~/var/check_mk/background_jobs/: rm -rf ~/var/check_mk/background_jobs/

    OMD[mysite]:~ rm -rf ~/var/check_mk/background_jobs/

    .

  5. Start the site omd start

    OMD[mysite]:~ omd start

    .

  6. Observe the memory consumption and verify there are no more timeouts