Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Import Macro Repair

...

  • Core
  • Debugging of Checkmk helpers

High Fetcher Usage Although the fetcher helper count is already high

If you face the following problems: 

  • Fetcher helper usage is permanently above 96%, and fetcher count is already high (i.e., >50 or 100 or more) and

  • the service "Check_MK" runs into constant "CRIT with fetcher timeouts   
    • You can also use this command as site user to narrow down and find slow-running active checks.

      Code Block
      languagebash
      themeRDark
      lq "GET services\nColumns: execution_time host_name display_name" | awk -F';' '{ printf("%.2f %s %s\n", $1, $2, $3)}' | sort -rn | head


This can have several reasons:

  • Firewalls are dropping traffic from Checkmk to the monitored systems. If the packets are dropped rather than blocked, Checkmk must wait for a timeout instead of instantly terminating the fetching process.

  • You might have too many DOWN hosts, which are still being checked. Checkmk still tries to query those hosts, and the fetchers need to wait for a timeout every time. This can bind a lot of fetcher helpers, which are blocked for that time. Remove hosts which are in a DOWN state from your monitoring. Either permanently or by setting their Criticality to "Do not monitor this host".

  • For classical operating systems (Linux/Windows/etc.), this indicates that you might have plugins/local checks with quite a long runtime. Increasing the number of fetchers further here is not constructive. Instead, you must identify the long-running plugins/local checks and set them to asynchronous execution and/or define (generous) cache settings or even timeouts, especially for them.

  • For SNMP devices, you might have poorly performing SNMP devices. To troubleshoot those, have a look at this blog post.

Filter by label (Content by label)
showLabelsfalse
max5
spacesKB
showSpacefalse
sortmodified
reversetrue
typepage
cqllabel in ( "checker" , "fetcher" , "cmc" , "troubleshooting" , "performance" ) and type = "page" and space = "KB"
labelscmc fetcher checker

...