Troubleshooting graph not recording values after RRD migration

Some customers may run into an issue after Round Robin Database (RRD) migration where graphs are not working correctly. 

LAST TESTED ON CHECKMK 2.1.0P1

Table of Contents

Problem

After the RRD migration, for some reason, some graphs are not recording values:

Screenshot of graphs not able to record values


In the cmc.log, you may see these kinds of messages:

~/var/log/cmc.log
2022-04-06 13:24:41 [4] [client 1] Error accessing RRD: No DS called '12' in '/opt/omd/sites/mysite/var/check_mk/rrd/schiller/Interface_eth1-02.45.rrd'
2022-04-06 13:24:41 [4] [client 1] Error accessing RRD: No DS called '6' in '/opt/omd/sites/mysite/var/check_mk/rrd/schiller/Interface_eth1-02.45.rrd'
2022-04-06 13:24:41 [4] [client 1] Error accessing RRD: No DS called '5' in '/opt/omd/sites/mysite/var/check_mk/rrd/schiller/Interface_eth1-02.45.rrd'
2022-04-06 13:24:41 [4] [client 1] Error accessing RRD: No DS called '11' in '/opt/omd/sites/mysite/var/check_mk/rrd/schiller/Interface_eth1-02.45.rrd'


Solution 1


To automate those steps, you can get a script from Checkmk Support.


  1. Change to the RRD directory.

    OMD[mysite]:~$ cd var/check_mk/rrd/<HOSTNAME>
  2. How many metrics are stored in the .info file?

    OMD[mysite]:~/var/check_mk/rrd/myhost.txt$ cat Interface_2.info 
    HOST myhost
    SERVICE Interface eth0
    METRICS outqlen;in;out;inerr;inmcast;inbcast;inucast;innucast;indisc;outerr;outmcast;outbcast;outucast;outnucast;outdis

    15 Metrics are stored here

  3. How many datasources do we have in the RRD file?

    OMD[mysite]:~/var/check_mk/rrd/kirchner_check_mk_agent_output.txt$ rrdtool info Interface_2.rrd |grep "last_ds"
    ds[1].last_ds = "U"
    ds[2].last_ds = "U"
    ds[3].last_ds = "U"
    ds[4].last_ds = "0"
    ds[7].last_ds = "U"
    ds[8].last_ds = "U"
    ds[9].last_ds = "U"
    ds[10].last_ds = "U"
    ds[13].last_ds = "U"
    ds[14].last_ds = "U"
    ds[15].last_ds = "U"
    11 Datasources are stored. As described in the cmc.log, DS 5,6,11, and 12 are missing.
  4. So here we have a mismatch of the number of metrics in the .info (15 Metrics ) file and the amount of Datasources (11 DS) in the .rrd file.

  5. We need to create the missing DS into the RRD file. 

    Before continuing here, please note that we do not provide support for any broken .rrd file. Please be careful and do a backup of those files beforehand. 

    1. Stop the site

      OMD[mysite]:~$ omd stop
    2. Create the four missing datasources

      OMD[mysite]:~$ rrdtool tune Interface_2.rrd DS:5:GAUGE:8460:0:U 
      OMD[mysite]:~$ rrdtool tune Interface_2.rrd DS:6:GAUGE:8460:0:U 
      OMD[mysite]:~$ rrdtool tune Interface_2.rrd DS:11:GAUGE:8460:0:U 
      OMD[mysite]:~$ rrdtool tune Interface_2.rrd DS:12:GAUGE:8460:0:U

      More information about the command can be found here: https://oss.oetiker.ch/rrdtool/doc/rrdtune.en.html.

    3. Start the site

      OMD[mysite]:~$ omd start

Solution 2

Ensure that the info file does not have less data sources than the rrd file. Thus the rrd file may need to be tuned to contain less data sources, e.g. by

OMD[mysite]:~$ omd stop && rrdtool tune ORA_MYHOST01_SQL_AVQ_MSG_IN_STATUS.rrd DEL:3 && omd start