-
Notifications
You must be signed in to change notification settings - Fork 31
document metrics temporal aggregation #63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from 4 commits
7e8b073
46056ef
550af5e
b6a2013
34503d3
2f32e4a
04784f5
a7d124e
f75d349
21bea7e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,57 @@ | ||
# Elastic Distribution of OpenTelemetry limitations | ||
|
||
## Collector limitations | ||
|
||
The Elastic Distribution of the OpenTelemetry Collector has the following limitations: | ||
|
||
- Because of an upstream limitation, `host.network.*` metrics aren't present from the OpenTelemetry side. | ||
- `process.state` isn't present in the OpenTelemetry host metric. It's set to a dummy value of **Unknown** in the **State** column of the host processes table. | ||
- The Elasticsearch exporter handles the resource attributes, but **Host OS version** and **Operating system** may show as "N/A". | ||
- The CPU scraper needs to be enabled to collect the `systm.load.cores` metric, which affects the **Normalized Load** column in the **Hosts** table and the **Normalized Load** visualization on the host detailed view. | ||
- The [`hostmetrics receiver`](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/hostmetricsreceiver) doesn't support CPU and disk metrics on MacOS. These values will stay empty for collectors running on MacOS. | ||
- The console shows error Log messages when the [`hostmetrics receiver`](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/hostmetricsreceiver) can't access some of the process information due to permission issues. | ||
- The console shows mapping errors initially until mapping occurs. | ||
|
||
## Metrics temporal aggregation | ||
|
||
SylvainJuge marked this conversation as resolved.
Show resolved
Hide resolved
|
||
OpenTelemetry metrics data model provides multiple ways to report metrics temporality: | ||
SylvainJuge marked this conversation as resolved.
Show resolved
Hide resolved
SylvainJuge marked this conversation as resolved.
Show resolved
Hide resolved
|
||
- cumulative (default) | ||
- delta preferred | ||
- low memory | ||
|
||
A complete description and examples are provided in [aggregation temporality documentation](https://opentelemetry.io/docs/specs/otel/metrics/supplementary-guidelines/#aggregation-temporality). | ||
|
||
Temporal aggregation effect depends on the OpenTelemetry metric type: | ||
|
||
Gauge and up down counters always provide the "last value", which means that the producers of those metrics only reads | ||
the last value, they don't keep track of the previous nor compute a delta. | ||
|
||
| metric type / temporal aggregation | cumulative | delta preferred | low memory | | ||
|------------------------------------|------------|-----------------|----------------------------------------------| | ||
| gauge | last value | last value | last value | | ||
| up down counter | last value | last value | last value | | ||
| counter | cumulative | delta | synchronous: delta, asynchronous: cumulative | | ||
| histogram | cumulative | delta | delta | | ||
|
||
When metrics are stored in Elasticsearch with the `otel` mode, | ||
OpenTelemetry metrics will be written to Time Series Data Stream (TSDS) which currently only support delta histograms. | ||
SylvainJuge marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
As a consequence, metrics sent to Elasticsearch currently need to use the "delta preferred" to properly store histograms, | ||
otherwise they will be discarded by the collector. | ||
|
||
Setting `OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE=delta` should allow to configure SDKs to change the default value. | ||
(see [reference](https://github.com/open-telemetry/opentelemetry-specification/blob/main/spec-compliance-matrix.md#environment-variables) on supported SDKs). | ||
|
||
In the case were the producer of `counter` or `histogram` metrics can't be configured with `delta preferred` behavior to report them with `delta` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure we should recommend to convert counter metrics to delta temporality. There are challenges with visualizing counter metrics today but I'm not sure if it's always worth doing a cumulative to delta conversion to avoid it. Plus, ES|QL will be enhanced with better support for counter rates. What remains is that queries need to be aware of the temporality of the metrics. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Histograms is the only reason we are asking for "delta preferred", because we can't store them otherwise. Doing this change has a side effect all the If we already have dashboards that rely on cumulative counters, then we need to not apply this conversion, which means we need to instruct users to apply this conversion to some metrics and not others, which brings another layer of complexity to the end-user. In a sense, what we need is to apply I wonder if contributing the ability to set the time aggregation per metric type would be less painful that the path we are trying to take here. |
||
temporal aggregation, using the collector [`cumulativetodelta`](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/cumulativetodeltaprocessor) | ||
processor can be used to convert from `cumulative` to `delta`. | ||
|
||
Using [`cumulativetodelta`](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/processor/cumulativetodeltaprocessor) | ||
does however involves some challenges as it makes the processor stateful: | ||
|
||
- metrics from a given producer must be sent to the same collector instance | ||
- increases memory usage to keep track of per-metric state | ||
- metrics needs to be configured at the collector level to opt-in/out of this processing | ||
|
||
As a consequence, using the `cumulativetodelta` processor is recommended close to the edge (where metrics are produced), | ||
and less recommended late in the data pipeline due to scalability challenges. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is looking at the issue from an SDK point of view. For example, the
OTEL_EXPORTER_OTLP_METRICS_TEMPORALITY_PREFERENCE
environment variable is only relevant to SDKs. Maybe point that out a little more clearly.