Skip to content

CWAgent fails to resolve linux mount point device to EBS VolumeId on nitro instances #1727

@montaguethomas

Description

@montaguethomas

Describe the bug
Linux allows mounting disks using a device alias (symlink) but the CWAgent is not able to resolve the EBS VolumeId for the device.

Steps to reproduce

  1. Launch Linux t3 instance (with required instance profile)

  2. Install, configure, and start the CloudWatch agent

yum install -y amazon-cloudwatch-agent
cat <<'EOF' > /tmp/amazon-cloudwatch-agent-config.json
{
  "agent": {
    "metrics_collection_interval": 60
  },
  "metrics": {
    "aggregation_dimensions": [
      ["VolumeId"]
    ],
    "append_dimensions": {
      "InstanceId": "${aws:InstanceId}"
    },
    "metrics_collected": {
      "disk": {
        "append_dimensions": {
          "VolumeId": "${aws:VolumeId}"
        },
        "ignore_file_system_types": ["devtmpfs", "overlay", "shm", "sysfs", "tmpfs"],
        "measurement": ["used_percent"],
        "metrics_collection_interval": 60,
        "resources": ["*"]
      }
    }
  }
}
EOF
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -s -c file:/tmp/amazon-cloudwatch-agent-config.json
  1. Confirm the base metrics are reporting and have VolumeId populated

  2. Create new EBS volume and attach to the instance as /dev/xvdz

  3. Format the EBS volume: mkfs.xfs /dev/xvdz

  4. Mount the EBS volume using /dev/xvdz source device via a direct syscall:

cat <<'EOF' > ~/mount.py
#!/usr/bin/env python3
import ctypes
import ctypes.util
import os

libc = ctypes.CDLL(ctypes.util.find_library("c"), use_errno=True)
libc.mount.argtypes = (ctypes.c_char_p, ctypes.c_char_p, ctypes.c_char_p, ctypes.c_ulong, ctypes.c_char_p)

def mount(source, target, fs, options=""):
  ret = libc.mount(source.encode(), target.encode(), fs.encode(), 0, options.encode())
  if ret < 0:
    errno = ctypes.get_errno()
    raise OSError(errno, f"Error mounting {source} ({fs}) on {target} with options '{options}': {os.strerror(errno)}")

mount("/dev/xvdz", "/mnt/data-xvdz", "xfs", "")
EOF

mkdir -p /mnt/data-xvdz
python3 ~/mount.py
  1. The mounted volume will show up as /dev/xvdz when running df -h and cat /proc/mounts. Running the mount command will show the resolved device symlink name.

  2. Check for metrics for the newly mounted EBS volume and if VolumeId is populated

What did you expect to see?
Expected to see VolumeId populated for all disk mount points.

What did you see instead?
The VolumeId is not populated.

What version did you use?
Version: CWAgent/1.300054.1 (go1.23.8; linux; amd64)

What config did you use?

{
  "agent": {
    "metrics_collection_interval": 60
  },
  "metrics": {
    "aggregation_dimensions": [
      ["VolumeId"]
    ],
    "append_dimensions": {
      "InstanceId": "${aws:InstanceId}"
    },
    "metrics_collected": {
      "disk": {
        "append_dimensions": {
          "VolumeId": "${aws:VolumeId}"
        },
        "ignore_file_system_types": ["devtmpfs", "overlay", "shm", "sysfs", "tmpfs"],
        "measurement": ["used_percent"],
        "metrics_collection_interval": 60,
        "resources": ["*"]
      }
    }
  }
}

Environment
OS: Amazon Linux 2 (amazon/amzn2-ami-ecs-hvm-2.0.20250610-x86_64-ebs)

Additional context
I make use of the Rexray EBS plugin to handle creation and mounting of EBS Volumes for ECS Services. Turns out that Rexray EBS plugin calls the mount syscall without resolving the symlink that the nvme driver creates. This results in the kernel truly mounting the block device as xvd*.

[root@ip-10-0-91-150 ~]# df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/nvme0n1p1  100T  9.7G  90.3T   10% /
/dev/xvdp      50G   25G  25G   50% /var/lib/docker/plugins/cfbcd2009d193760d0b441f622a2385bde857b3f4e1b66c827467e6b47fae543/propagated-mount/volumes/my-app-data


[root@ip-10-0-91-150 ~]# cat /proc/mounts
/dev/nvme0n1p1 / xfs rw,noatime,attr2,inode64,noquota 0 0
/dev/nvme0n1p1 /var/lib/docker/plugins/cfbcd2009d193760d0b441f622a2385bde857b3f4e1b66c827467e6b47fae543/propagated-mount xfs rw,noatime,attr2,inode64,noquota 0 0
/dev/nvme0n1p1 /var/lib/docker/plugins/399504751ea4753b38a6931240b4f1ae63be57bf6edaa50bf3535e11aae9ee34/propagated-mount xfs rw,noatime,attr2,inode64,noquota 0 0
/dev/xvdp /var/lib/docker/plugins/cfbcd2009d193760d0b441f622a2385bde857b3f4e1b66c827467e6b47fae543/propagated-mount/volumes/my-app-data xfs rw,relatime,nouuid,attr2,inode64,noquota 0 0

In order to verify what Telegraf is actually reporting directly, I adjusted the generated config CWAgent generates and ran the latest Telegraf

cat <<'EOF' > ~/telegraf-config.toml
[agent]
  collection_jitter = "0s"
  debug = false
  flush_interval = "1s"
  flush_jitter = "0s"
  hostname = ""
  interval = "60s"
  logtarget = "stderr"
  metric_batch_size = 1000
  metric_buffer_limit = 10000
  omit_hostname = false
  precision = ""
  quiet = false
  round_interval = false

[inputs]

  [[inputs.disk]]
    fieldpass = ["used_percent"]
    ignore_fs = ["devtmpfs", "overlay", "shm", "sysfs", "tmpfs"]
    interval = "60s"
    tagexclude = ["mode"]
    [inputs.disk.tags]

[outputs]

  [[outputs.file]]
    files = ["stdout"]
EOF

curl -LO https://dl.influxdata.com/telegraf/releases/telegraf-1.34.4_linux_amd64.tar.gz
tar -xzf telegraf-1.34.4_linux_amd64.tar.gz
./telegraf-1.34.4/usr/bin/telegraf -config ~/telegraf-config.toml

Telegraf Results:

disk,device=nvme0n1p1,fstype=xfs,host=ip-10-0-91-150.us-east-2.compute.internal,label=/,path=/ used_percent=9.79178633890271363 1749864322000000000
disk,device=nvme0n1p1,fstype=xfs,host=ip-10-0-91-150.us-east-2.compute.internal,label=/,path=/var/lib/docker/plugins/cfbcd2009d193760d0b441f622a2385bde857b3f4e1b66c827467e6b47fae543/propagated-mount used_percent=9.79178633890271363 1749864322000000000
disk,device=nvme0n1p1,fstype=xfs,host=ip-10-0-91-150.us-east-2.compute.internal,label=/,path=/var/lib/docker/plugins/399504751ea4753b38a6931240b4f1ae63be57bf6edaa50bf3535e11aae9ee34/propagated-mount used_percent=9.79178633890271363 1749864322000000000
disk,device=xvdp,fstype=xfs,host=ip-10-0-91-150.us-east-2.compute.internal,path=/var/lib/docker/plugins/cfbcd2009d193760d0b441f622a2385bde857b3f4e1b66c827467e6b47fae543/propagated-mount/volumes/my-app-data used_percent=49.956185744611764 1749864322000000000

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions