...
Status | ||||
---|---|---|---|---|
|
Panel | ||||||
---|---|---|---|---|---|---|
| ||||||
|
...
The only overview regarding needed resources we have is, and it is just a rough approximation: https://checkmk.com/product/appliancesCheckmk Appliance
We always recommend customers to orientate on the specifications for the HW Appliance.
...
You will find more information about the fetcher and checker architecture here:
https://checkmk.com/blog/checkmk-2-0-cmc
...
- Checkmk 2.0: The Core gets more power under the hood
- Werk #11500: Microcore: Improved memory efficiency of helper processes
Note |
---|
Important information about the Checkers: The checkers should not exceed your CPU core count! |
...
Maximum concurrent Checkmk fetchers
- With increasing the number of fetchers, your RAM usage will rise, so make sure to adjust this setting carefully and keep an eye on the memory consumption of your server.
- The usage should stay under 80% on average.
Maximum concurrent Checkmk checkers
- The number of checkers should not be higher than your CPU core count! If you have more than two cores, the general rule of thumb is:
Maximum checkers = number of cores - 1
. - The usage should stay under 80% on average.
- Maximum concurrent Livestatus connections
- In a distributed monitoring setup, having different values for the remote sites may be helpful. You will find the guidance on how to do that here!
...
- Firewalls are dropping traffic from Checkmk to the monitored systems. If the packets are dropped rather than blocked, Checkmk must wait for a timeout instead of instantly terminating the fetching process.
- You might have too many DOWN hosts, which are still being checked. Checkmk still tries to query those hosts, and the fetchers need to wait for a timeout every time. This can bind a lot of fetcher helpers, which are blocked for that time. Remove hosts which are in a DOWN state from your monitoring. Either permanently or by setting their Criticality to "Do not monitor this host".
- For classical operating systems (Linux/Windows/etc.), this indicates that you might have plugins/local checks with quite a long runtime. Increasing the number of fetchers further here is not constructive. Instead, you must identify the long-running plugins/local checks and set them to asynchronous execution and/or define (generous) cache settings or even timeouts, especially for them.
- For SNMP devices, you might have poorly performing SNMP devices. To troubleshoot those, have take a look at this blog post.
Related articles
...