Skip to content

Conversation

dricross
Copy link
Contributor

@dricross dricross commented Jul 17, 2025

Description of the issue

We are adding support to push exponential histograms with cloudwatch (PMD) as a destination. This PR adds integration tests for this functionality.

Note

See companion PR for new agent functionality: aws/amazon-cloudwatch-agent#1677

Description of changes

  • Update existing integration tests to support exponential histograms
  • Refactor the metric fetcher to support percentile metrics
  • Updated test suite framework to output test failure reasons is provided

License

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

Tests

Ran integration test locally with updated agent, all histogram tests pass

Integration test run: https://github.com/aws/amazon-cloudwatch-agent/actions/runs/16371811763
Histogram test: https://github.com/aws/amazon-cloudwatch-agent/actions/runs/16371811763/job/46261959442

Starting new run here after fixing merge conflict: https://github.com/aws/amazon-cloudwatch-agent/actions/runs/17081105626

Example of test failures w/o reasons (current behavior):

2025/07/17 12:31:33 >>>>>>>>>>>>>><<<<<<<<<<<<<<
2025/07/17 12:31:33 >>>>>>>>>>>>>>Failed<<<<<<<<<<<<<<
2025/07/17 12:31:33 ==============otlp_histograms==============
2025/07/17 12:31:33 ==============Failed==============
my.delta.histogram/Minimum                        Successful
my.delta.histogram/Maximum                        Successful
my.delta.histogram/Sum                            Failed
my.delta.histogram/Average                        Successful
my.delta.histogram/SampleCount                    Failed
my.cumulative.histogram/Minimum                   Successful
my.cumulative.histogram/Maximum                   Successful
my.cumulative.histogram/Sum                       Failed
my.cumulative.histogram/Average                   Successful
my.cumulative.histogram/SampleCount               Failed
my.delta.exponential.histogram/Minimum            Successful
my.delta.exponential.histogram/Maximum            Successful
my.delta.exponential.histogram/Sum                Failed
my.delta.exponential.histogram/Average            Successful
my.delta.exponential.histogram/SampleCount        Failed
my.cumulative.exponential.histogram/Minimum       Successful
my.cumulative.exponential.histogram/Maximum       Successful
my.cumulative.exponential.histogram/Sum           Failed
my.cumulative.exponential.histogram/Average       Successful
my.cumulative.exponential.histogram/SampleCount   Failed
2025/07/17 12:31:33 ==============================
2025/07/17 12:31:33 >>>>>>>>>>>>>>><<<<<<<<<<<<<<<

Example of test failure w/ reasons:

2025/07/17 12:31:33 >>>>>>>>>>>>>><<<<<<<<<<<<<<
2025/07/17 12:31:33 >>>>>>>>>>>>>>Failed<<<<<<<<<<<<<<
2025/07/17 12:31:33 ==============otlp_histograms==============
2025/07/17 12:31:33 ==============Failed==============
my.delta.histogram/Minimum                        Successful   <nil>
my.delta.histogram/Maximum                        Successful   <nil>
my.delta.histogram/Sum                            Failed       The average value 72.000000 for metric my.delta.histogram are not within bound [20.400000, 27.600000]
my.delta.histogram/Average                        Successful   <nil>
my.delta.histogram/SampleCount                    Failed       The average value 36.000000 for metric my.delta.histogram are not within bound [10.200000, 13.800000]
my.cumulative.histogram/Minimum                   Successful   <nil>
my.cumulative.histogram/Maximum                   Successful   <nil>
my.cumulative.histogram/Sum                       Failed       The average value 4428.000000 for metric my.cumulative.histogram are not within bound [20.400000, 27.600000]
my.cumulative.histogram/Average                   Successful   <nil>
my.cumulative.histogram/SampleCount               Failed       The average value 2214.000000 for metric my.cumulative.histogram are not within bound [10.200000, 13.800000]
my.delta.exponential.histogram/Minimum            Successful   <nil>
my.delta.exponential.histogram/Maximum            Successful   <nil>
my.delta.exponential.histogram/Sum                Failed       The average value 180.000000 for metric my.delta.exponential.histogram are not within bound [51.000000, 69.000000]
my.delta.exponential.histogram/Average            Successful   <nil>
my.delta.exponential.histogram/SampleCount        Failed       The average value 54.000000 for metric my.delta.exponential.histogram are not within bound [15.300000, 20.700000]
my.cumulative.exponential.histogram/Minimum       Successful   <nil>
my.cumulative.exponential.histogram/Maximum       Successful   <nil>
my.cumulative.exponential.histogram/Sum           Failed       The average value 2052.000000 for metric my.cumulative.exponential.histogram are not within bound [10.200000, 13.800000]
my.cumulative.exponential.histogram/Average       Successful   <nil>
my.cumulative.exponential.histogram/SampleCount   Failed       The average value 2052.000000 for metric my.cumulative.exponential.histogram are not within bound [10.200000, 13.800000]
2025/07/17 12:31:33 ==============================
2025/07/17 12:31:33 >>>>>>>>>>>>>>><<<<<<<<<<<<<<<

"github.com/aws/amazon-cloudwatch-agent-test/util/common"
)

func TestOTLPMetrics(t *testing.T) {
instanceID := awsservice.GetInstanceId()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulling instance ID from IMDS instead of hardcoding to a dummy value so that concurrent integration tests don't interfere with each other

@@ -34,7 +36,6 @@ func TestOTLPMetrics(t *testing.T) {
expected []struct {
stat types.Statistic
value float64
check func(t *testing.T, expected, actual float64)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was actually completely unused

testGroupResult = t.TestRunner.Validate()
}
if testGroupResult.GetStatus() != status.SUCCESSFUL {
log.Printf("%v test group failed due to %v", testName, err)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would often print .. test group failed due to <nil> as err comes from RunAgent() call and the status comes from the Validate() call. Decided to just rework RunAgent to return an error only.

lisguo
lisguo previously approved these changes Jul 21, 2025
@@ -34,8 +34,8 @@
"aggregationTemporality": 1,
"dataPoints": [
{
"startTimeUnixNano": START_TIME,
"timeUnixNano": START_TIME,
"startTimeUnixNano": METRIC_TIME,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hate this weird template sed logic...it would be better to use some tool to generate these metrics programmatically like otelgen: https://github.com/krzko/otelgen

Not your fault though. I started this

Copy link
Contributor Author

@dricross dricross Jul 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree. I had Amazon Q write up a metric generator using the OTEL SDK. It got everything working except cumulative or delta exponential histograms. I then spent a day trying to figure out how to add exponential histograms, but I couldn't figure it out and eventually gave up. You can actually see the generator I had in the commit history.

okankoAMZ
okankoAMZ previously approved these changes Jul 23, 2025
uses: actions/setup-go@v3
with:
go-version: ~1.20.0
go-version: ~1.23.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why are we bumping go as part of this?

Copy link
Contributor Author

@dricross dricross Jul 24, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to upgrade the whole package from go 1.20 to go 1.23, hit an issue (forget what it was now...), and then tried to revert back, but I missed this. I don't think it should hurt as 1.23 is backwards compatible with our 1.20 go.mod file.

@dricross dricross dismissed stale reviews from okankoAMZ and lisguo via cbf9ede August 6, 2025 15:52
Comment on lines 11 to 15
AVERAGE types.Statistic = "Average"
SAMPLE_COUNT types.Statistic = "SampleCount"
MINIMUM types.Statistic = "Minimum"
MAXUMUM types.Statistic = "Maximum"
SUM types.Statistic = "Sum"
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I started to remove the usage of these consts as the types package in the SDK already defines these, but it would have made this PR even larger. I'd rather to that in a separate PR.

@dricross dricross force-pushed the dricross/exphistograms branch from 812b8e4 to 2603f8e Compare August 19, 2025 20:29
@dricross dricross merged commit 451f0c5 into main Aug 20, 2025
6 checks passed
@dricross dricross deleted the dricross/exphistograms branch August 20, 2025 16:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants