Skip to content

a span ended in an async function can become of a parent of spans after that function ends #2611

@trentm

Description

@trentm

The Problem

const apm = require('elastic-apm-node').start()
  
async function somethingToAwait () {
  return 42
}

async function firstThing () {
  // await Promise.resolve()
  const span = apm.startSpan('firstThing')
  await somethingToAwait()
  span.end()
}

async function secondThing () {
  const span = apm.startSpan('secondThing')
  // ...
  span.end()
}

async function main () {
  const trans = apm.startTransaction('myTransaction')
  await firstThing()
  await secondThing()
  trans.end()
}

main()

Tracing this results in this trace:

    trace 5d54ad
    `- transaction 731c00 "myTransaction"
       `- span 67b9ad "firstThing"
         `- span 2f21b1 "secondThing"

The problem is that secondThing is a child of firstThing when we would not expect it to be so.

Workaround

If you comment out await somethingToAwait() (i.e. have an async function that does not use await) then the parent/child relationships are as expected. However, obviously, avoiding await isn't a reasonable solution.

    trace b877f3
    `- transaction 6c3739 "myTransaction"
       `- span 285f98 "firstThing"
       `- span 54ac1b "secondThing"

A different workaround is to add await Promise.resolve() before apm.startSpan(...).
This is effectively the solution that Dario has used in @kbn/apm-utils's withSpan(): https://github.com/elastic/kibana/blob/595b6948a1492a04fff14c8d6b2f8b50308681c0/packages/kbn-apm-utils/src/index.ts#L54

This workaround is still required after the run context work, as discuss in the comments starting at: #2181 (comment)

(/cc @AlexanderWert This is the same thing you have hit in your Lambda example.)

To be continued...

tl;dr: Execution of an async function creates and returns a Promise either when it hits return (or the end of the function) or when it hits the first await. The way run context (a.k.a. async context) tracking is done with async_hooks, is to (indirectly) attach the current tracing state to newly created async tasks. When await somethingToAwait() is hit, the currentSpan is Span firstThing, and this is returned to the calling function main. When main resumes, its current execution async task is the same as that promise. By design, it doesn't matter that Span firstThing has ended, it is still used as the parent for the subsequently created Span secondThing. Why "by design"? See https://github.com/elastic/apm-agent-nodejs/blob/main/test/instrumentation/run-context/fixtures/parentage-with-ended-span.js

TODO: I'll add some details, and why I think an automatic fix for startSpan() may not be feasible, and adding a withSpan API (akin to OTel's withActiveSpan() and similar in other tracers) is an appropriate fix.

Metadata

Metadata

Assignees

Labels

agent-nodejsMake available for APM Agents project planning.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions