...
Status | ||||
---|---|---|---|---|
|
Panel | ||||||
---|---|---|---|---|---|---|
| ||||||
|
...
Debugging
Before we start debugging the predictive monitoring, we need to increase the log level of Livestatus to debug as described here: How to collect troubleshooting data for various issue types#Core
...
Code Block | ||||
---|---|---|---|---|
| ||||
OMD[mysite]:~$ lq "GET services\nColumns: host_name description metrics\nFilter: metrics ~ predict\nFilter: host_name ~ localhost|Windows" localhost;CPU load;predict_load15,load15,load5,load1 localhost;CPU utilization;predict_util,util,wait,system,user |
.
The predictive metrics in my example are:
Code Block | ||||
---|---|---|---|---|
| ||||
predict_load15 predict_util |
.
Now let's grep for the livestatus queries. In this example, I want to check the "predict_load15":
Code Block | ||||
---|---|---|---|---|
| ||||
OMD[mysite]:~$ tail -f ~/var/log/cmc.log |grep "predict_load15" 2021-12-15 16:02:07 [6] [client 1] request: GET services\nColumns: rrddata:predict_load15:predict_load15.max:1639566126.581912:1639580526.581912:20\nFilter: host_name = localhost\nFilter: service_description = CPU load\nLocaltime: 1639580527\nOutputFormat: python3\nKeepAlive: on\nResponseHeader: fixed16\nColumnHeaders: off 2021-12-15 16:02:07 [6] [client 1] request: GET services\nColumns: rrddata:predict_load15:predict_load15.max:1639566127:1639580527:162\nFilter: host_name = localhost\nFilter: service_description = CPU load\nLocaltime: 1639580527\nOutputFormat: python3\nKeepAlive: on\nResponseHeader: fixed16\nColumnHeaders: off 2021-12-15 16:02:07 [6] [client 1] request: GET services\nColumns: rrddata:predict_load15:predict_load15.max:1639490527:1639580527:1022\nFilter: host_name = localhost\nFilter: service_description = CPU load\nLocaltime: 1639580527\nOutputFormat: python3\nKeepAlive: on\nResponseHeader: fixed16\nColumnHeaders: off 2021-12-15 16:02:07 [6] [client 1] request: GET services\nColumns: rrddata:predict_load15:predict_load15.max:1638889327:1639580527:7854\nFilter: host_name = localhost\nFilter: service_description = CPU load\nLocaltime: 1639580527\nOutputFormat: python3\nKeepAlive: on\nResponseHeader: fixed16\nColumnHeaders: off 2021-12-15 16:02:07 [6] [client 1] request: GET services\nColumns: rrddata:predict_load15:predict_load15.max:1636556527:1639580527:34362\nFilter: host_name = localhost\nFilter: service_description = CPU load\nLocaltime: 1639580527\nOutputFormat: python3\nKeepAlive: on\nResponseHeader: fixed16\nColumnHeaders: off 2021-12-15 16:02:07 [6] [client 1] request: GET services\nColumns: rrddata:predict_load15:predict_load15.max:1605020527:1639580527:392726\nFilter: host_name = localhost\nFilter: service_description = CPU load\nLocaltime: 1639580527\nOutputFormat: python3\nKeepAlive: on\nResponseHeader: fixed16\nColumnHeaders: off OMD[mysite]:~$ date -d@1639566126.581912 Wed Dec 15 12:02:06 CET 2021 OMD[mysite]:~$ date -d@1639566127 Wed Dec 15 12:02:07 CET 2021 OMD[mysite]:~$ date -d@1639490527 Tue Dec 14 15:02:07 CET 2021 OMD[mysite]:~$ date -d@1638889327 Tue Dec 7 16:02:07 CET 2021 OMD[mysite]:~$ date -d@1636556527 Wed Nov 10 16:02:07 CET 2021 OMD[mysite]:~$ date -d@1605020527 Tue Nov 10 16:02:07 CET 2020 OMD[mysite]:~$ date -d@1639580527 Wed Dec 15 16:02:07 CET 2021 OMD[mysite]:~$ date -d@1639580526.581912 Wed Dec 15 16:02:06 CET 2021 |
.
I receive one query for every entry of the prediction:
.Let's execute one query to check the result:
Code Block | ||||
---|---|---|---|---|
| ||||
OMD[mysite]:~/share/check_mk/checks$ lq "GET services\nColumns: rrddata:predict_load15:predict_load15.max:1639566126.581912:1639580526.581912:20\nFilter: host_name = localhost\nFilter: service_description = CPU load\nLocaltime: 1639580527\nOutputFormat: python3\nKeepAlive: on\nResponseHeader: fixed16\nColumnHeaders: off" 200 1930 [[[1639566120,1639580580,60,3.06435,3.09255,3.2166,3.31536,3.31581,3.30978,3.31816,3.32769,3.34188,3.40987,3.40987,3.50678,3.52646,3.60033,3.45151,3.43412,3.43716,3.42292,3.40943,3.39457,3.37658,3.36772,3.31099,3.1323,3.11909,3.11909,3.08713,3.06339,3.05419,3.04785,3.03488,3.02283,2.99926,2.9806,2.97307,2.97948,2.99778,2.99544,2.96494,2.95892,2.94827,2.94685,2.91284,2.91284,2.91213,2.91284,2.91445,2.96208,2.98138,2.97591,2.9698,2.96678,2.9724,2.98398,2.98746,3.02909,3.02909,3.06068,3.05956,3.06067,3.06496,3.09694,3.23989,3.30752,3.30752,3.31084,3.31903,3.32933,3.34248,3.3564,3.41404,3.49619,3.49619,3.60583,3.42791,3.43915,3.4352,3.41852,3.40748,3.38854,3.37507,3.36383,3.25904,3.13075,3.12543,3.11936,3.10665,3.0829,3.06073,3.04169,3.04169,3.0195,2.99023,2.97748,2.97202,2.98214,2.9996,2.99401,2.96502,2.95936,2.91964,2.91964,2.91964,2.90655,2.91192,2.91283,2.91441,2.96792,2.96792,2.9697,2.9697,2.96678,2.97328,2.98388,2.98884,3.02909,3.02909,3.06068,3.05978,3.06046,3.06435,3.09182,3.2166,3.31516,3.31631,3.30963,3.31816,3.32742,3.37813,3.37813,3.46381,3.52538,3.52538,3.59483,3.46162,3.20276,3.09222,3.09445,3.09842,3.10299,3.11873,3.09722,3.09722,3.05229,3.05354,3.05238,3.05,3.0438,3.05046,3.07423,3.08348,3.08672,3.11572,3.12379,3.1263,3.12176,3.12176,3.09628,3.09628,3.07907,3.06702,3.05316,3.04101,2.91862,2.91862,2.87516,2.86114,2.86123,2.8403,2.8403,2.8403,2.85418,2.85418,2.85418,2.87496,2.86234,2.85637,2.85299,2.84596,2.84008,2.83742,2.84809,2.89343,2.9418,2.93344,2.92632,2.93675,2.93675,2.97311,2.99865,3.02315,3.01523,2.99345,2.98254,3.02738,3.08088,3.09228,3.09357,3.09813,3.1011,3.13395,3.13395,3.06299,3.05283,3.05338,3.05252,3.05023,3.04447,3.04967,3.07308,3.0838,3.08467,3.11556,3.1237,3.12684,3.12144,3.12144,3.11013,3.09773,3.08135,3.0684,3.05708,3.0454,2.95554,2.95554,2.86397,2.85806,2.86728,2.85463,2.8441,2.83506,2.8255,2.83713,2.85906,2.86726,2.87175,2.86031,2.84854,2.84854,2.83797,2.83696]]] |
.
Last but not least, please check this directory:
Code Block | ||||
---|---|---|---|---|
| ||||
./var/check_mk/prediction/<HOSTNAME>/ |
.
The prediction will create for every Host and Service a directory with the metrics as a subdirectory:
...
Tip |
---|
So the prediction for this should work! |
Common issues
Predictive monitoring in a distributed setup
At the moment, predictive monitoring is only on a local site possible. In a distributed setup, you will receive this message on the master central node.
This is due to the missing file inside ./var/check_mk/prediction/.
We are evaluating implementing this in feature future Checkmk releases!
Code Block | ||||
---|---|---|---|---|
| ||||
Error: There is currently no prediction information available for this service. |
No reference for prediction yet
Note |
---|
Checkmk will show a prediction if he has enough history data. In case you don't have enough data, you can configure a shorter time horizon! |
Crash report on prediction icon
Code Block | ||||
---|---|---|---|---|
| ||||
Internal error: ’>’ not supported between instances of ‘NoneType’ and ‘NoneType’
An internal error occurred while processing your request. You can report this issue to the Checkmk team to help fixing this issue. Please open the crash report page and use the form for reporting the problem. |
This is because of missing performance data in the past. Checkmk can't interpret these "None" values!
...
We will fix this in future Checkmk Releases!
Files and directories
Warning |
---|
This is the code for the prediction. Please don't do any changes inside these files |
...
Code Block | ||||
---|---|---|---|---|
| ||||
➜ 2.0.0p16.cee vi OMD[mysite]~# ~/lib/python3/cmk/base/check_api.py ➜ 2.0.0p16.cee vi OMD[mysite]~# ~/lib/python3/cmk/gui/prediction.py 2.0.0p16.cee vi OMD[mysite]~# ~/lib/python3/cmk/base/prediction.py |
Related articles
Filter by label (Content by label) | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
...