Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Info
In this manual, we will show you how to debug predictive monitoring and show you some common issues!

Status
colourGreen
titleLAST TESTED ON CHECKMK 2.0.0P1

Table of Contents

Step-by-Step How To

Before we start debugging the predictive monitoring, we need to increase the log level of Livestatus to debug as described here: How to collect troubleshooting data for various issue types#Core

Let's find all the predictive metrics via Livestatus:

Code Block
languagebash
themeRDark
OMD[mysite]:~$ lq "GET services\nColumns: host_name description metrics\nFilter: metrics ~ predict\nFilter: host_name ~ localhost|Windows"
localhost;CPU load;predict_load15,load15,load5,load1
localhost;CPU utilization;predict_util,util,wait,system,user


The predictive metrics in my example are:

Code Block
languagebash
themeRDark
predict_load15
predict_util


Now let's grep for the livestatus queries. In this example, I want to check the "predict_load15":

Code Block
languagebash
themeRDark
OMD[mysite]:~$ tail -f ~/var/log/cmc.log |grep "predict_load15"
2021-12-15 16:02:07 [6] [client 1] request: GET services\nColumns: rrddata:predict_load15:predict_load15.max:1639566126.581912:1639580526.581912:20\nFilter: host_name = localhost\nFilter: service_description = CPU load\nLocaltime: 1639580527\nOutputFormat: python3\nKeepAlive: on\nResponseHeader: fixed16\nColumnHeaders: off
2021-12-15 16:02:07 [6] [client 1] request: GET services\nColumns: rrddata:predict_load15:predict_load15.max:1639566127:1639580527:162\nFilter: host_name = localhost\nFilter: service_description = CPU load\nLocaltime: 1639580527\nOutputFormat: python3\nKeepAlive: on\nResponseHeader: fixed16\nColumnHeaders: off
2021-12-15 16:02:07 [6] [client 1] request: GET services\nColumns: rrddata:predict_load15:predict_load15.max:1639490527:1639580527:1022\nFilter: host_name = localhost\nFilter: service_description = CPU load\nLocaltime: 1639580527\nOutputFormat: python3\nKeepAlive: on\nResponseHeader: fixed16\nColumnHeaders: off
2021-12-15 16:02:07 [6] [client 1] request: GET services\nColumns: rrddata:predict_load15:predict_load15.max:1638889327:1639580527:7854\nFilter: host_name = localhost\nFilter: service_description = CPU load\nLocaltime: 1639580527\nOutputFormat: python3\nKeepAlive: on\nResponseHeader: fixed16\nColumnHeaders: off
2021-12-15 16:02:07 [6] [client 1] request: GET services\nColumns: rrddata:predict_load15:predict_load15.max:1636556527:1639580527:34362\nFilter: host_name = localhost\nFilter: service_description = CPU load\nLocaltime: 1639580527\nOutputFormat: python3\nKeepAlive: on\nResponseHeader: fixed16\nColumnHeaders: off
2021-12-15 16:02:07 [6] [client 1] request: GET services\nColumns: rrddata:predict_load15:predict_load15.max:1605020527:1639580527:392726\nFilter: host_name = localhost\nFilter: service_description = CPU load\nLocaltime: 1639580527\nOutputFormat: python3\nKeepAlive: on\nResponseHeader: fixed16\nColumnHeaders: off

OMD[mysite]:~$ date -d@1639566126.581912
Wed Dec 15 12:02:06 CET 2021
OMD[mysite]:~$ date -d@1639566127
Wed Dec 15 12:02:07 CET 2021
OMD[mysite]:~$ date -d@1639490527
Tue Dec 14 15:02:07 CET 2021
OMD[mysite]:~$ date -d@1638889327
Tue Dec  7 16:02:07 CET 2021
OMD[mysite]:~$ date -d@1636556527
Wed Nov 10 16:02:07 CET 2021
OMD[mysite]:~$ date -d@1605020527
Tue Nov 10 16:02:07 CET 2020
OMD[mysite]:~$ date -d@1639580527
Wed Dec 15 16:02:07 CET 2021
OMD[mysite]:~$ date -d@1639580526.581912
Wed Dec 15 16:02:06 CET 2021

I receive one query for every entry of the prediction:

Let's execute one query to check the result:

Code Block
languagebash
themeRDark
OMD[mysite]:~/share/check_mk/checks$ lq "GET services\nColumns: rrddata:predict_load15:predict_load15.max:1639566126.581912:1639580526.581912:20\nFilter: host_name = localhost\nFilter: service_description = CPU load\nLocaltime: 1639580527\nOutputFormat: python3\nKeepAlive: on\nResponseHeader: fixed16\nColumnHeaders: off"
200        1930
[[[1639566120,1639580580,60,3.06435,3.09255,3.2166,3.31536,3.31581,3.30978,3.31816,3.32769,3.34188,3.40987,3.40987,3.50678,3.52646,3.60033,3.45151,3.43412,3.43716,3.42292,3.40943,3.39457,3.37658,3.36772,3.31099,3.1323,3.11909,3.11909,3.08713,3.06339,3.05419,3.04785,3.03488,3.02283,2.99926,2.9806,2.97307,2.97948,2.99778,2.99544,2.96494,2.95892,2.94827,2.94685,2.91284,2.91284,2.91213,2.91284,2.91445,2.96208,2.98138,2.97591,2.9698,2.96678,2.9724,2.98398,2.98746,3.02909,3.02909,3.06068,3.05956,3.06067,3.06496,3.09694,3.23989,3.30752,3.30752,3.31084,3.31903,3.32933,3.34248,3.3564,3.41404,3.49619,3.49619,3.60583,3.42791,3.43915,3.4352,3.41852,3.40748,3.38854,3.37507,3.36383,3.25904,3.13075,3.12543,3.11936,3.10665,3.0829,3.06073,3.04169,3.04169,3.0195,2.99023,2.97748,2.97202,2.98214,2.9996,2.99401,2.96502,2.95936,2.91964,2.91964,2.91964,2.90655,2.91192,2.91283,2.91441,2.96792,2.96792,2.9697,2.9697,2.96678,2.97328,2.98388,2.98884,3.02909,3.02909,3.06068,3.05978,3.06046,3.06435,3.09182,3.2166,3.31516,3.31631,3.30963,3.31816,3.32742,3.37813,3.37813,3.46381,3.52538,3.52538,3.59483,3.46162,3.20276,3.09222,3.09445,3.09842,3.10299,3.11873,3.09722,3.09722,3.05229,3.05354,3.05238,3.05,3.0438,3.05046,3.07423,3.08348,3.08672,3.11572,3.12379,3.1263,3.12176,3.12176,3.09628,3.09628,3.07907,3.06702,3.05316,3.04101,2.91862,2.91862,2.87516,2.86114,2.86123,2.8403,2.8403,2.8403,2.85418,2.85418,2.85418,2.87496,2.86234,2.85637,2.85299,2.84596,2.84008,2.83742,2.84809,2.89343,2.9418,2.93344,2.92632,2.93675,2.93675,2.97311,2.99865,3.02315,3.01523,2.99345,2.98254,3.02738,3.08088,3.09228,3.09357,3.09813,3.1011,3.13395,3.13395,3.06299,3.05283,3.05338,3.05252,3.05023,3.04447,3.04967,3.07308,3.0838,3.08467,3.11556,3.1237,3.12684,3.12144,3.12144,3.11013,3.09773,3.08135,3.0684,3.05708,3.0454,2.95554,2.95554,2.86397,2.85806,2.86728,2.85463,2.8441,2.83506,2.8255,2.83713,2.85906,2.86726,2.87175,2.86031,2.84854,2.84854,2.83797,2.83696]]]


Last but not least, please check this directory:

Code Block
languagebash
themeRDark
./var/check_mk/prediction/<HOSTNAME>/

The prediction will create for every Host and Service a directory with the metrics as a subdirectory:

Code Block
languagebash
themeRDark
➜  prediction ls Windows/Memory/*
Windows/Memory/memory:
everyhour  everyhour.info

Windows/Memory/pagefile:
everyhour  everyhour.info


Tip

So the prediction for this should work!


Common issues

Predictive monitoring in a distributed setup

At the moment, predictive monitoring is only on a local site possible. In a distributed setup, you will receive this message on the master node. 

This is due to the missing file inside ./var/check_mk/prediction/.

We are evaluating implementing this in feature future Checkmk releases!

No reference for prediction yet

Note

Checkmk will show a prediction if he has enough history data. In case you don't have enough data, you can configure a shorter time horizon!


Crash report on prediction icon


This is because of missing performance data in the past. Checkmk can't interpret these "None" values!

Code Block
languagebash
themeRDark
OMD[mysite]:~/share/check_mk/checks$ lq "GET services\nColumns: rrddata:predict_load15:predict_load15.max:1605020527:1639580527:392726\nFilter: host_name = localhost\nFilter: service_description = CPU load\nLocaltime: 1639580527\nOutputFormat: python3\nKeepAlive: on\nResponseHeader: fixed16\nColumnHeaders: off"
200         470
[[[1604736000,1639814400,403200,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,None,]]]

We will fix this in future Checkmk Releases!

Files and directories

Warning

This is the code for the prediction. Please don't do any changes inside these files


Code Block
languagebash
themeRDark
➜  2.0.0p16.cee vi lib/python3/cmk/base/check_api.py
➜  2.0.0p16.cee vi lib/python3/cmk/gui/prediction.py  
2.0.0p16.cee vi lib/python3/cmk/base/prediction.py


Filter by label (Content by label)
showLabelsfalse
max5
spacesKB
showSpacefalse
sortmodified
reversetrue
typepage
cqllabel in ("lql","lql_queries","predictive_monitoring","troubleshooting") and type = "page" and space = "KB"
labelspredictive_monitoring lql lql_queries

Page Properties
hiddentrue


Related issues