Troubleshooting parent/child topology in a distributed setup

Troubleshooting parent/child topology in a distributed setup

This article addresses a Proxy Error issue where the parent/child topology fails to load in a distributed setup.

LAST TESTED ON CHECKMK 2.4.0P1

Table of Contents

Problem

The parent/child topology takes over two minutes to load and crashes with a Proxy Error in a distributed environment.



Proxy Error

The proxy server received an invalid response from an upstream server.
The proxy server could not handle the request

Reason: Error reading from remote server


Troubleshooting steps

Several troubleshooting steps to help identify the issue

Profiling

  1. Enable profiling for the parent/child topology view on both the central and remote sites: Enable Checkmk profiling

  2. Run the URL with that profile option

    https://FQDN/SITENAME/check_mk/parent_child_topology.py?&profile=1

    .

  3. Please provide the Checkmk Support Team with the following data

    OMD[mysite]:~$ ls var/check_mk/multisite.* var/check_mk/multisite.cachegrind  var/check_mk/multisite.profile  var/check_mk/multisite.py


Central site logs

  1. cmc.log with Livestatus set to debug  (Setup→  Global Settings → Monitoring core → Logging of the core → Livestatus = debug)

  2. Now you can refresh the parent/child topology and check in the cmc.log for long-running livestatus requests as described here: Debug Livestatus performance

    ~/var/log/apache/
    ~/var/log/web.log*
    ~/var/log/liveproxyd.log*

Remote site logs

  1. cmc.log with Livestatus set to debug  (Setup→  Global Settings → Monitoring core → Logging of the core → Livestatus = debug)

  2. Now you can refresh the parent/child topology and check in the cmc.log for long-running livestatus requests as described here: Debug Livestatus performance

    ~/var/log/apache/
    ~/var/log/web.log*


Possible solutions

  • Delete all files in OMD[SITENAME]:~/var/check_mk/topology/configs/ and restart the site.

  • Based on your findings in the liveproxyd.log on the central site, disable one remote site and reload the view.

  • As a workaround, try opening the Parent/Child topology with filters such as:

    https://your_checkmk_server/your_central_site_name/check_mk/parent_child_topology.py?&site=your_remote_site_name
    https://your_checkmk_server/your_central_site_name/check_mk/parent_child_topology.py?&&wato_folder=FOLDERNAME