Skip to content

Enhance Histogram feature implementation of Prometheus Server Integration #5042

@gizas

Description

@gizas

Context

Histograms are a one of the supported types of metrics of Prometheus toolkit. In general Histograms provide pre-aggregated numerical values in the form of groups.

In our Prometheus Integration we currently provide support of histogram type by enabling Use types option. By enabling this option, we retrieve prometheus metrics categorised as histograms and index those inside Elasticsearch. We have identified that support of histograms through Elasticsearch needs specific pre-processing on index time in our integration package. Additionally, relevant efforts (1165, 26903) revealed possible enhancements that can be added to our code.

Diagnosis

Users have reported differences between the histograms scraped from Prometheus comparing to the ones we save inside Elasticsearch . This revealed the extra calculation we during ingestion time and also the need to document and explain the procedure to our users.

Prometheus Buckets scraped:

Bucket Value
prometheus_http_request_duration_seconds_bucket{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus", le="+Inf"} 1
prometheus_http_request_duration_seconds_bucket{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus", le="0.1"} 1
prometheus_http_request_duration_seconds_bucket{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus", le="0.2"} 1
prometheus_http_request_duration_seconds_bucket{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus", le="0.4"} 1
prometheus_http_request_duration_seconds_bucket{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus", le="1"} 1
prometheus_http_request_duration_seconds_bucket{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus", le="120"} 1
prometheus_http_request_duration_seconds_bucket{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus", le="20"} 1
prometheus_http_request_duration_seconds_bucket{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus", le="3"} 1
prometheus_http_request_duration_seconds_bucket{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus", le="60"} 1
prometheus_http_request_duration_seconds_bucket{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus", le="8"} 1

Elasticsearch Histograms we ingest (retrieved from Kibana Discovery):

"prometheus": {
      "prometheus_http_request_duration_seconds": {
        "histogram": {
          "counts": [
            0,
            0,
            0,
            0,
            0,
            0,
            0,
            0,
            0,
            0
          ],
          "values": [
            0.05,
            0.15000000000000002,
            0.30000000000000004,
            0.7,
            2,
            5.5,
            14,
            40,
            90,
            180
          ]
        }
      },
      "labels": {
        "handler": "/api/v1/label/:name/values",
        "instance": "prometheus-server-server.kube-system:80",
        "job": "prometheus"
      }

Questions that we need to answer:

  • Why le Bucket values are different than the ones we see in Elastic?
  • What is the value in Elasticsearch of the +Inf bucket ?
    (Code Ref) In our example: 120 + (120-60) = 180, so it matches with 180 Value.

Additionally for http_request_duration_seconds, Prometheus offers prometheus_http_request_duration_seconds_count:1 and http_request_duration_seconds_sum:1.

prometheus_http_request_duration_seconds_count{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus"}:1

prometheus_http_request_duration_seconds_sum{handler="/api/v1/label/:name/values", instance="localhost:9090", job="prometheus"} : 1

Count and Sum values are not returned in from our code, so not present in Elasticsearch. Is there any valid scenario where those might needed?

Also prometheus_http_request_duration_seconds_histogram field is not available to search and provide filters in Kibana Discovery

Screenshot 2023-01-19 at 2 59 27 PM

Comparing to other fields:
Screenshot 2023-01-19 at 3 01 45 PM

Action

This story summarizes all the actions we have categorised that are needed in order to enhance the Prometheus Histogram support in our integration:

Code Enhancements:

  • Account for negative count values inside initial buckets
  • Use the preceding bucket's value for +Inf "le"
  • for the first bucket only: if it has a negative "le", use the value as-is; otherwise use half its value (midpoint to zero)
  • Investigate if we need to provide sum and count values additonally to the ones we provide now
  • Can we retrieve and index histogram buckets exactly as retrieved from Prometheus? If no we need to document this but if yes we need to evaluate if we need to support this as a new enhancement in the code. Is there any Elasticsearch limitations that prevent us from doing this?

Kibana Support:

  • We need to create a visualisation based on histograms. Understand all the different functions that are suggested to be used with histograms like aggregations, buckets etc.

Check available Use Cases of histograms here

Documentation Enhancement:

Deliverables

  • Relevant code improvements in Prometheus code base
  • Documentation updates that will explain the end-to-end user journey and support of Histogram type

Relevant Links

Useful External links

Metadata

Metadata

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions