-
Notifications
You must be signed in to change notification settings - Fork 479
Description
Context
Histograms are a one of the supported types of metrics of Prometheus toolkit. In general Histograms provide pre-aggregated numerical values in the form of groups.
In our Prometheus Integration we currently provide support of histogram type by enabling Use types
option. By enabling this option, we retrieve prometheus metrics categorised as histograms and index those inside Elasticsearch. We have identified that support of histograms through Elasticsearch needs specific pre-processing on index time in our integration package. Additionally, relevant efforts (1165, 26903) revealed possible enhancements that can be added to our code.
Diagnosis
Users have reported differences between the histograms scraped from Prometheus comparing to the ones we save inside Elasticsearch . This revealed the extra calculation we during ingestion time and also the need to document and explain the procedure to our users.
Prometheus Buckets scraped:
Bucket | Value |
---|---|
prometheus_http_request_duration_seconds_bucket{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus", le="+Inf"} | 1 |
prometheus_http_request_duration_seconds_bucket{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus", le="0.1"} | 1 |
prometheus_http_request_duration_seconds_bucket{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus", le="0.2"} | 1 |
prometheus_http_request_duration_seconds_bucket{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus", le="0.4"} | 1 |
prometheus_http_request_duration_seconds_bucket{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus", le="1"} | 1 |
prometheus_http_request_duration_seconds_bucket{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus", le="120"} | 1 |
prometheus_http_request_duration_seconds_bucket{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus", le="20"} | 1 |
prometheus_http_request_duration_seconds_bucket{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus", le="3"} | 1 |
prometheus_http_request_duration_seconds_bucket{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus", le="60"} | 1 |
prometheus_http_request_duration_seconds_bucket{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus", le="8"} | 1 |
Elasticsearch Histograms we ingest (retrieved from Kibana Discovery):
"prometheus": {
"prometheus_http_request_duration_seconds": {
"histogram": {
"counts": [
0,
0,
0,
0,
0,
0,
0,
0,
0,
0
],
"values": [
0.05,
0.15000000000000002,
0.30000000000000004,
0.7,
2,
5.5,
14,
40,
90,
180
]
}
},
"labels": {
"handler": "/api/v1/label/:name/values",
"instance": "prometheus-server-server.kube-system:80",
"job": "prometheus"
}
Questions that we need to answer:
- Why
le
Bucket values are different than the ones we see in Elastic? - What is the
value
in Elasticsearch of the+Inf
bucket ?
(Code Ref) In our example: 120 + (120-60) = 180, so it matches with 180 Value.
Additionally for http_request_duration_seconds, Prometheus offers prometheus_http_request_duration_seconds_count:1
and http_request_duration_seconds_sum:1
.
prometheus_http_request_duration_seconds_count{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus"}:1
prometheus_http_request_duration_seconds_sum{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus"} : 1
Count and Sum values are not returned in from our code, so not present in Elasticsearch. Is there any valid scenario where those might needed?
Also prometheus_http_request_duration_seconds_histogram field is not available to search and provide filters in Kibana Discovery
Action
This story summarizes all the actions we have categorised that are needed in order to enhance the Prometheus Histogram support in our integration:
Code Enhancements:
- Account for negative count values inside initial buckets
- Use the preceding bucket's value for +Inf "le"
- for the first bucket only: if it has a negative "le", use the value as-is; otherwise use half its value (midpoint to zero)
- Investigate if we need to provide
sum
andcount
values additonally to the ones we provide now - Can we retrieve and index histogram buckets exactly as retrieved from Prometheus? If no we need to document this but if yes we need to evaluate if we need to support this as a new enhancement in the code. Is there any Elasticsearch limitations that prevent us from doing this?
Kibana Support:
- We need to create a visualisation based on histograms. Understand all the different functions that are suggested to be used with histograms like aggregations, buckets etc.
Check available Use Cases of histograms here
Documentation Enhancement:
- Document and explain the current centroid calculation (https://github.com/elastic/beats/blob/main/x-pack/metricbeat/module/prometheus/collector/histogram.go#L34)
- Document why we have chosen to implement only T-Digest type of histograms (see comment here). We will probably need to sync with Elasticsearch team to understand more about the logic behind the choice
Deliverables
- Relevant code improvements in Prometheus code base
- Documentation updates that will explain the end-to-end user journey and support of Histogram type
Relevant Links
- https://www.elastic.co/guide/en/elasticsearch/reference/current/histogram.html --- We support only T-Digest type of histograms
- Prometheus code base