Skip to content

feat: OpenTelemetry Bridge #2641

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 78 commits into from
May 11, 2022
Merged

feat: OpenTelemetry Bridge #2641

merged 78 commits into from
May 11, 2022

Conversation

trentm
Copy link
Member

@trentm trentm commented Apr 7, 2022

The APM agent can act as an OpenTelemetry JS SDK that can be started with:
node -r elastic-apm-node/opentelemetry-sdk myapp.js
The usual OTel JS API can then be used in myapp.js for manual tracing.
Currently just OTel tracing, the OTel metrics API isn't yet supported.

Fixes: #2369

Current status

All working. I need to add user docs and the changelog entry. I need to check if there are any TypeScript snafus -- I don't think there should be, given it is just the straight @opentelemetry/api API.

Reviewer notes

Start here: https://github.com/elastic/apm-agent-nodejs/blob/trentm/otel-bridge/lib/opentelemetry-bridge/README.md

Checklist

  • Implement code
  • Add tests
  • Update documentation
  • Add CHANGELOG.asciidoc entry
  • add deprecation notice to elastic-apm-node-opentracing repo and to its npm package

trentm added 2 commits April 7, 2022 14:12
This commit is a first stab that is mostly just stubbing out adding
global providers (e.g. `api.trace.setGlobalTracerProvider`). It
includes a first very incomplete Tracer.startSpan() such that this:

    cd examples/otelbridge1
    npm install
    node -r elastic-apm-node/opentelemetry.js example-http-request.js

yields a trace like:

    trace 266fcd
    `- transaction ba0b1f "makeRequest" (218.075ms, outcome=unknown)

IOW, context management isn't being handled yet, so instrumentation that
adds spans does nothing.
…ntext class; add no-op setValue et al to RunContext so it supports OTel 'Context' interface
@trentm trentm self-assigned this Apr 7, 2022
@github-actions github-actions bot added the agent-nodejs Make available for APM Agents project planning. label Apr 7, 2022
@apmmachine
Copy link
Contributor

apmmachine commented Apr 7, 2022

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2022-05-11T22:13:34.073+0000

  • Duration: 21 min 42 sec

Test stats 🧪

Test Results
Failed 0
Passed 255639
Skipped 0
Total 255639

🤖 GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • run module tests for <modules> : Run TAV tests for one or more modules, where <modules> can be either a comma separated list of modules (e.g. memcached,redis) or the string literal ALL to test all modules

  • run benchmark tests : Run the benchmark test only.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

trentm added 25 commits April 7, 2022 17:00
The examples/otelbridge/example-http-request.js now generates:

    trace 2f773e
    `- transaction f87f78 "makeRequest" (125.801ms, outcome=unknown)
       `- span c4b91a "GET httpstat.us" (121.768ms, http)

because the otel.trace.with() context (with the transaction) is
now passed through properly.
Also drop eapm from simple test case to not confuse the issue and to
ensure there is no future accidental side-effect on having directly
imported the APM agent.
…geNonRecordingSpan concept (probably still underdeveloped)
Also implement the {get,set,delete}Value methods on RunContext so that
OTel Context values are now propagated.
This takes the coming changes from #2653 so that tests pass with
this restriction lifted.
@trentm
Copy link
Member Author

trentm commented May 7, 2022

The user docs and changelog entry remain, but otherwise this is ready for review.

@trentm trentm marked this pull request as ready for review May 7, 2022 01:00
@trentm trentm requested a review from astorm May 7, 2022 01:00
Copy link
Contributor

@astorm astorm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left some comments with minor places things might be improved but I'm generally OK with getting this going in as-is once we've

  1. Documented the new configuration for enabling/disabling
  2. Added a change log entry

With this many files/new lines-of-code I kept my review more on a macro level -- I focused on ensuring the examples/apis worked/behaved logically and ensuring that the new code wouldn't impact the current agent behavior/stability in a negative way.

From what I can see the configuration does isolate most of the the new behavior, and things that cross over (introduction of the root context, serializing/encoding the additional span information, etc.) make sense and stable to me.

If there's a specific area of the bridge you think needs eyes on it please let me know and I'll take a deeper look, otherwise 👍

const transOpts = otelSpanOptions.startTime === undefined
? {}
: { startTime: epochMsFromOTelTimeInput(otelSpanOptions.startTime) }
newTransOrSpan = this._ins.createTransaction(name, transOpts)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed the create transaction calls end up creating transactions of unknown type. Unsure if there's a better choice of it unknown is acceptable for now
.
Screen Shot 2022-05-10 at 12 07 09 PM

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is my understanding of the OTel tracing spec: https://github.com/elastic/apm/blob/main/specs/agents/tracing-api-otel.md#compatibility-mapping

There is an inference algorithm for setting transaction.type. It defaults to 'unknown' (rather than our API's effective default of 'custom' for apm.startTransaction(name)) unless there are some particular span kind and attribute values.

@trentm
Copy link
Member Author

trentm commented May 11, 2022

This may be a good reason for us to eventually cordon the OpenTelemetry Bridge off into its own module that users can add to their project if they want. That would keep the 2.6MB out of the agent proper.

I'd considered a separate package, but ultimately rejected it. Here are my notes from that:

Notes on whether to have a separate package for the OTel Bridge

(For comparison, the old OpenTracing bridge is/was a separate package.)

Separate package, e.g. '@elastic/opentelemetry-sdk-node' or whatever name, or just a part of current 'elastic-apm-node'?

Pros (for a separate package):

  • Package size. npm install @opentelemetry/api adds about 1.4M to our current install size of ~16M. While not a huge concern, this could be relevant for Lambda environments where the size of all layers is limited.
  • Supporting multiple major versions of OpenTelemetry is cleaner. If/when @opentelemetry/[email protected] is released, to support both with a given release of the APM Agent would require the agent (a) installing both major vers of the OTel API package and (b) registering global providers for both major versions.
  • Might facilitate writing the otel bridge in TypeScript, making the whole repo a TypeScript project. This assumes writing this in TypeScript is considered a good/helpful thing. :) The type checking against OTel API interfaces would likely be nice. Compilation/source-maps etc. would be a pain... but that's mostly learning curve.

Cons:

  • Will it be possible to implement the Bridge using just the public APM agent API? Seems unlikely. Do we need to restrict to a public API? For this reason we are using the same package. There is a fair amount of internal API usage for bridging.
  • Maintaining more packages can be a pain. For example, fixes/changes to dependent behaviour can often lead to two PRs (one in each repo) and release coordination, rather than just a single PR to one repo. (There are monorepos. Possible, but might be a big change.)

Pro or Con? Not sure:

  • Would OTel-specific compatiblity tests be easier in a separate package and repo?
  • What user experience do we want for users that are going to use the OTel Bridge? If the eventual goal or expectation is OTel API preference, then the "start the agent as an OTel provider" will become more important. Is that user experience nicer or more obtuse with a separate package? If the API available from require('whatever our package name') is just about OTel APIs, then maybe that is less confusing for the user.
  • Adding more packages that can be an entry point for the agent can be confusing for users, take this state, for example:
    @opentelemetry/node
    @opentelemetry/api
    @opentelemetry/core
    @opentelemetry/auto-instrumentations-node
    @opentelemetry/sdk-node
    @opentelemetry/node-sdk
    @opentelemetry/sdk-trace-node
    @sumologic/opentelemetry-node
    lightstep-opentelemetry-launcher-node
    @aspecto/opentelemetry
    @newrelic/telemetry-sdk
    @splunk/otel
    

@trentm
Copy link
Member Author

trentm commented May 11, 2022

... once ...

  1. Documented the new configuration for enabling/disabling

Added in 8ed5623

Rendered version here: https://apm-agent-nodejs_2641.docs-preview.app.elstc.co/diff
The main added doc page is: https://apm-agent-nodejs_2641.docs-preview.app.elstc.co/guide/en/apm/agent/nodejs/master/opentelemetry-bridge.html

@trentm trentm merged commit 60740b0 into main May 11, 2022
@trentm trentm deleted the trentm/otel-bridge branch May 11, 2022 23:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
agent-nodejs Make available for APM Agents project planning.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[META 524] OpenTelemetry Bridge
3 participants