-
Notifications
You must be signed in to change notification settings - Fork 235
Add exponential histogram support to CloudWatch PMD Exporter #1677
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
97c4271
to
995902e
Compare
This PR was marked stale due to lack of activity. |
This PR was marked stale due to lack of activity. |
995902e
to
936253b
Compare
07375ab
to
5362dc8
Compare
@@ -4,6 +4,10 @@ go 1.24.4 | |||
|
|||
replace github.com/influxdata/telegraf => github.com/aws/telegraf v0.10.2-0.20250113150713-a2dfaa4cdf6d | |||
|
|||
replace collectd.org v0.4.0 => github.com/collectd/go-collectd v0.4.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are these dependencies of cumulativetodeltaprocessor? I don't see go-collectd
or clock
used in the new code
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was hitting some issues with downloading the collectd dependency after clearing my local go cache and I needed to redirect to github to pick it up. I'm not exactly sure what happened but it looks like collectd.org stopped vending their package via collectd.org. The issue may have been resolved by now though, so I can try again.
../../../.gvm/pkgsets/go1.22.7/global/pkg/mod/github.com/aws/[email protected]/plugins/parsers/collectd/parser.go:8:2: unrecognized import path "collectd.org": https fetch: Get "https://collectd.org/?go-get=1": dial tcp: lookup collectd.org on 10.4.4.10:53: read udp 10.169.109.191:52627->10.4.4.10:53: i/o timeout
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's been resolved. Was able to pull vendor files from collectd.org
directly the other day. Maybe their GitHub page (https://github.com/collectd/collectd.github.io) was down for a bit. Not opposed to this though.
|
||
func (d *ExpHistogramDistribution) Size() int { | ||
size := len(d.negativeBuckets) + len(d.positiveBuckets) | ||
if d.zeroCount > 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is zeroCount? the number of datapoints with 0?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, pretty much. OTLP exponential histograms splits the data into three section: negative values, zero values, positive values. Positive and negative buckets are defined separately by a series of buckets+counts. The zero values don't have any buckets so its just a counter stored in the histogram structure. Side note: the definition of "0" in OTLP exponential histograms is loose. Datapoints with a magnitude less than the configurable "zero threshold" is treated as 0.
} | ||
|
||
func (d *ExpHistogramDistribution) Resize(_ int) []*ExpHistogramDistribution { | ||
// TODO: split data points into separate PMD requests if the number of buckets exceeds the API limit |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what happens if we exceed the API limit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the API documentation, I believe the PMD request will be rejected with error code 400 InvalidParameterValue, though I haven't tried.
|
||
// ValuesAndCounts outputs two arrays representing the midpoints of each exponential histogram bucket and the | ||
// counter of datapoints within the corresponding exponential histogram buckets | ||
func (d *ExpHistogramDistribution) ValuesAndCounts() ([]float64, []float64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we can make the name of the function more descriptive. Maybe GetMidpointsAndCounts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This naming scheme was based off of the existing function in the Distribution interface. I think ValuesAndCounts is a fairly descriptive name as that's what is actually pushed to CloudWatch in the PMD request (an array of values and an array of counts).
5362dc8
to
d71cfd4
Compare
@@ -78,34 +77,6 @@ func setNewDistributionFunc(maxValuesPerDatumLimit int) { | |||
} | |||
} | |||
|
|||
func resize(dist distribution.Distribution, listMaxSize int) (distList []distribution.Distribution) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored as functions on each distribution
eb11f44
to
f7f02fe
Compare
@@ -4,6 +4,10 @@ go 1.24.4 | |||
|
|||
replace github.com/influxdata/telegraf => github.com/aws/telegraf v0.10.2-0.20250113150713-a2dfaa4cdf6d | |||
|
|||
replace collectd.org v0.4.0 => github.com/collectd/go-collectd v0.4.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's been resolved. Was able to pull vendor files from collectd.org
directly the other day. Maybe their GitHub page (https://github.com/collectd/collectd.github.io) was down for a bit. Not opposed to this though.
metric/distribution/exph/exph.go
Outdated
return values, counts | ||
} | ||
|
||
func (d *ExpHistogramDistribution) AddDistribution(other *ExpHistogramDistribution) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can probably remove this since the weights are all 1
expected: []int{0, 1, 2, 3, 4}, | ||
}, | ||
{ | ||
// for negative values, histogram buckets use lower-inclusive boundaries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confusing considering the test case uses positive values
not quite complete. need more unit tests
* Move OTLP implementation to separate file * Simplify map key sorting
PMD request already has Values/Counts so we don't need to send StatisticSet. This simplifies the logic since histograms that have been converted from cumulative to delta will have invalid min/max values which are defaulted to 0 in the StatisticSet which causes percentile metrics to be all 0 as well.
d2fc758
to
b6013bc
Compare
Description of the issue
The Cloudwatch/PMD exporter currently drops all exponential histogram metrics.
Description of changes
Note
See companion PR for updating cumulativetodelta processor: amazon-contributing/opentelemetry-collector-contrib#331
License
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.
Tests
Current histogram test fails for cumulative histograms as the current agent behavior for cumulative histograms is incorrect and it being fixed in this PR. Specifically, the agent does not convert cumulative histograms to delta before pushing to cloudwatch. With these changes, the agent will. So the existing test fails, but the updated tests pass.
Note
See companion PR for integration test: aws/amazon-cloudwatch-agent-test#558
Integration test run: https://github.com/aws/amazon-cloudwatch-agent/actions/runs/16839298907
Histogram test: https://github.com/aws/amazon-cloudwatch-agent/actions/runs/16839298907/job/47706510747
The data we are sending as part of the test is defined here: https://github.com/aws/amazon-cloudwatch-agent-test/blob/dricross/exphistograms/test/histograms/resources/otlp_metrics.json#L188. where CUMULATIVE_HIST_COUNT and CUMULATIVE_HIST_SUM are increasing by 2 every 10 seconds. Important things to note:
Formula for mapping bucket index to value:
(Base^(index+offset), Base^(index+offset+1]
The first bucket (with 0 count) maps to range (2,4] (
(2^(0+1), 2^(1+1+1]
)The second bucket (with a count of 2) maps to range (4, 8] (
(2^(1+1), 2^(1+1+1] = (4, 8]
)Since we take the midpoints of the buckets when sending to exponential histograms to CloudWatch (consistent with the EMF exporter), the values/counts we send to CloudWatch is:
That list of values and counts is sent 6 times per minute
CloudWatch metrics:

EMF Logs from test as verification (this is existing behavior but demonstrates the intended output)
Requirements
Before commit the code, please do the following steps.
make fmt
andmake fmt-sh
make lint