Skip to content

Cardinality for Elasticsearch span names is too high #439

@felixbarny

Description

@felixbarny

Currently, the span name is Elasticsearch: ${http.method} ${http.url.path}. The issue is that the path can contain document IDs, such as GET /customers/_doc/42.

This is problematic if we want to roll up metrics based on span.name.

Agents may only have the access to the HTTP request info that the low-level RestClients expose.

An idea that needs further validation is to remove any path segments that come after a path segment that begins with an underscore. Examples:

  • GET /customers/_doc/42 -> GET /customers/_doc
  • GET /_cluster/health/my-index-000001 -> GET /_cluster
  • GET /_alias/my-alias -> GET /_alias

While this strategy might sometimes result in what feels like too low cardinality (GET /_cluster/health/my-index-000001 or GET /_cluster/health is not problematic), it's an easy strategy that should work quite generically.

As this would chop off some parts of the URL, it becomes important for agents to collect the full URL in context.http.url. This overlaps with elastic/apm-agent-nodejs#2019

While at it, we should also consider dropping the Elasticsearch: prefix as suggested here: #420 (comment)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions