-
Notifications
You must be signed in to change notification settings - Fork 236
Description
Describe the bug
Linux allows mounting disks using a device alias (symlink) but the CWAgent is not able to resolve the EBS VolumeId for the device.
Steps to reproduce
-
Launch Linux t3 instance (with required instance profile)
-
Install, configure, and start the CloudWatch agent
yum install -y amazon-cloudwatch-agent
cat <<'EOF' > /tmp/amazon-cloudwatch-agent-config.json
{
"agent": {
"metrics_collection_interval": 60
},
"metrics": {
"aggregation_dimensions": [
["VolumeId"]
],
"append_dimensions": {
"InstanceId": "${aws:InstanceId}"
},
"metrics_collected": {
"disk": {
"append_dimensions": {
"VolumeId": "${aws:VolumeId}"
},
"ignore_file_system_types": ["devtmpfs", "overlay", "shm", "sysfs", "tmpfs"],
"measurement": ["used_percent"],
"metrics_collection_interval": 60,
"resources": ["*"]
}
}
}
}
EOF
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -s -c file:/tmp/amazon-cloudwatch-agent-config.json
-
Confirm the base metrics are reporting and have VolumeId populated
-
Create new EBS volume and attach to the instance as
/dev/xvdz
-
Format the EBS volume:
mkfs.xfs /dev/xvdz
-
Mount the EBS volume using
/dev/xvdz
source device via a direct syscall:
cat <<'EOF' > ~/mount.py
#!/usr/bin/env python3
import ctypes
import ctypes.util
import os
libc = ctypes.CDLL(ctypes.util.find_library("c"), use_errno=True)
libc.mount.argtypes = (ctypes.c_char_p, ctypes.c_char_p, ctypes.c_char_p, ctypes.c_ulong, ctypes.c_char_p)
def mount(source, target, fs, options=""):
ret = libc.mount(source.encode(), target.encode(), fs.encode(), 0, options.encode())
if ret < 0:
errno = ctypes.get_errno()
raise OSError(errno, f"Error mounting {source} ({fs}) on {target} with options '{options}': {os.strerror(errno)}")
mount("/dev/xvdz", "/mnt/data-xvdz", "xfs", "")
EOF
mkdir -p /mnt/data-xvdz
python3 ~/mount.py
-
The mounted volume will show up as
/dev/xvdz
when runningdf -h
andcat /proc/mounts
. Running themount
command will show the resolved device symlink name. -
Check for metrics for the newly mounted EBS volume and if VolumeId is populated
What did you expect to see?
Expected to see VolumeId populated for all disk mount points.
What did you see instead?
The VolumeId is not populated.
What version did you use?
Version: CWAgent/1.300054.1 (go1.23.8; linux; amd64)
What config did you use?
{
"agent": {
"metrics_collection_interval": 60
},
"metrics": {
"aggregation_dimensions": [
["VolumeId"]
],
"append_dimensions": {
"InstanceId": "${aws:InstanceId}"
},
"metrics_collected": {
"disk": {
"append_dimensions": {
"VolumeId": "${aws:VolumeId}"
},
"ignore_file_system_types": ["devtmpfs", "overlay", "shm", "sysfs", "tmpfs"],
"measurement": ["used_percent"],
"metrics_collection_interval": 60,
"resources": ["*"]
}
}
}
}
Environment
OS: Amazon Linux 2 (amazon/amzn2-ami-ecs-hvm-2.0.20250610-x86_64-ebs)
Additional context
I make use of the Rexray EBS plugin to handle creation and mounting of EBS Volumes for ECS Services. Turns out that Rexray EBS plugin calls the mount syscall without resolving the symlink that the nvme driver creates. This results in the kernel truly mounting the block device as xvd*
.
[root@ip-10-0-91-150 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/nvme0n1p1 100T 9.7G 90.3T 10% /
/dev/xvdp 50G 25G 25G 50% /var/lib/docker/plugins/cfbcd2009d193760d0b441f622a2385bde857b3f4e1b66c827467e6b47fae543/propagated-mount/volumes/my-app-data
[root@ip-10-0-91-150 ~]# cat /proc/mounts
/dev/nvme0n1p1 / xfs rw,noatime,attr2,inode64,noquota 0 0
/dev/nvme0n1p1 /var/lib/docker/plugins/cfbcd2009d193760d0b441f622a2385bde857b3f4e1b66c827467e6b47fae543/propagated-mount xfs rw,noatime,attr2,inode64,noquota 0 0
/dev/nvme0n1p1 /var/lib/docker/plugins/399504751ea4753b38a6931240b4f1ae63be57bf6edaa50bf3535e11aae9ee34/propagated-mount xfs rw,noatime,attr2,inode64,noquota 0 0
/dev/xvdp /var/lib/docker/plugins/cfbcd2009d193760d0b441f622a2385bde857b3f4e1b66c827467e6b47fae543/propagated-mount/volumes/my-app-data xfs rw,relatime,nouuid,attr2,inode64,noquota 0 0
In order to verify what Telegraf is actually reporting directly, I adjusted the generated config CWAgent generates and ran the latest Telegraf
cat <<'EOF' > ~/telegraf-config.toml
[agent]
collection_jitter = "0s"
debug = false
flush_interval = "1s"
flush_jitter = "0s"
hostname = ""
interval = "60s"
logtarget = "stderr"
metric_batch_size = 1000
metric_buffer_limit = 10000
omit_hostname = false
precision = ""
quiet = false
round_interval = false
[inputs]
[[inputs.disk]]
fieldpass = ["used_percent"]
ignore_fs = ["devtmpfs", "overlay", "shm", "sysfs", "tmpfs"]
interval = "60s"
tagexclude = ["mode"]
[inputs.disk.tags]
[outputs]
[[outputs.file]]
files = ["stdout"]
EOF
curl -LO https://dl.influxdata.com/telegraf/releases/telegraf-1.34.4_linux_amd64.tar.gz
tar -xzf telegraf-1.34.4_linux_amd64.tar.gz
./telegraf-1.34.4/usr/bin/telegraf -config ~/telegraf-config.toml
Telegraf Results:
disk,device=nvme0n1p1,fstype=xfs,host=ip-10-0-91-150.us-east-2.compute.internal,label=/,path=/ used_percent=9.79178633890271363 1749864322000000000
disk,device=nvme0n1p1,fstype=xfs,host=ip-10-0-91-150.us-east-2.compute.internal,label=/,path=/var/lib/docker/plugins/cfbcd2009d193760d0b441f622a2385bde857b3f4e1b66c827467e6b47fae543/propagated-mount used_percent=9.79178633890271363 1749864322000000000
disk,device=nvme0n1p1,fstype=xfs,host=ip-10-0-91-150.us-east-2.compute.internal,label=/,path=/var/lib/docker/plugins/399504751ea4753b38a6931240b4f1ae63be57bf6edaa50bf3535e11aae9ee34/propagated-mount used_percent=9.79178633890271363 1749864322000000000
disk,device=xvdp,fstype=xfs,host=ip-10-0-91-150.us-east-2.compute.internal,path=/var/lib/docker/plugins/cfbcd2009d193760d0b441f622a2385bde857b3f4e1b66c827467e6b47fae543/propagated-mount/volumes/my-app-data used_percent=49.956185744611764 1749864322000000000