Skip to content

Commit 48565ba

Browse files
SylvainJugebeniwohlifelixbarnytrentmstuartnelson3
authored
OTel bridge spec (#516)
* add OpenTracing spec from apm#32 * add otel bridge spec * cleanup + add label fallback * add some clarification on server-side mapping * fix wording specs/agents/tracing-api.md Co-authored-by: Benjamin Wohlwend <[email protected]> * extend spec with fallbacks + context activations * Apply suggestions from code review Co-authored-by: Felix Barnsteiner <[email protected]> * remove opentracing from spec * Apply suggestions from code review Co-authored-by: Felix Barnsteiner <[email protected]> * Fix typo specs/agents/tracing-api-otel.md Co-authored-by: Trent Mick <[email protected]> * Update tracing-api-otel.md add supported apm-server version for translation * add span type 'db' to spec * add type,subtype & resource algorithm * add sub-sections for algorithms * active context impl * add gherkin spec * Update specs/agents/README.md Co-authored-by: Trent Mick <[email protected]> * Update specs/agents/tracing-api-otel.md Co-authored-by: Felix Barnsteiner <[email protected]> * add status mapping + configurability * update gherkin spec * add a few clarifications * clarify user-experience * clarify bridge limitations * MAY use labels for server < 7.16 * Update specs/agents/tracing-api-otel.md Co-authored-by: Colton Myers <[email protected]> * clarify error capture + supported features Co-authored-by: Benjamin Wohlwend <[email protected]> Co-authored-by: Felix Barnsteiner <[email protected]> Co-authored-by: Trent Mick <[email protected]> Co-authored-by: stuart nelson <[email protected]> Co-authored-by: Colton Myers <[email protected]>
1 parent 273953d commit 48565ba

File tree

4 files changed

+529
-2
lines changed

4 files changed

+529
-2
lines changed

specs/agents/README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,7 @@ You can find details about each of these in the [APM Data Model](https://www.ela
5555
- [Messaging systems](tracing-instrumentation-messaging.md)
5656
- [gRPC](tracing-instrumentation-grpc.md)
5757
- [GraphQL](tracing-instrumentation-graphql.md)
58+
- [OpenTelemetry API Bridge](tracing-api-otel.md)
5859
- [Error/exception tracking](error-tracking.md)
5960
- [Metrics](metrics.md)
6061
- [Logging Correlation](log-correlation.md)

specs/agents/tracing-api-otel.md

Lines changed: 277 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,277 @@
1+
## OpenTelemetry API (Tracing)
2+
3+
[OpenTelemetry](https://opentelemetry.io) (OTel in short) provides a vendor-neutral API that allows to capture tracing, logs and metrics data.
4+
5+
Agents MAY provide a bridge implementation of OpenTelemetry Tracing API following this specification.
6+
When available, implementation MUST be configurable and should be disabled by default when marked as `experimental`.
7+
8+
The bridge implementation relies on APM Server version 7.16 or later. Agents SHOULD recommend this minimum version to users in bridge documentation.
9+
10+
Bridging here means that for each OTel span created with the API, a native span/transaction will be created and sent to APM server.
11+
12+
### User experience
13+
14+
On a high-level, from the perspective of the application code, using the OTel bridge should not differ from using the
15+
OTel API for tracing. See [limitations](#limitations) below for details on the currently unsupported OTel features.
16+
For tracing the support should include:
17+
- creating spans with attributes
18+
- context propagation
19+
- capturing errors
20+
21+
The aim of the bridge is to allow any application/library that is instrumented with OTel API to capture OTel spans to
22+
seamlessly delegate to Elastic APM span/transactions. Also, it provides a vendor-neutral alternative to any existing
23+
manual agent API with similar features.
24+
25+
One major difference though is that since the implementation of OTel API will be delegated to Elastic APM agent, the
26+
whole OTel configuration that might be present in the application code (OTel processor pipeline) or deployment
27+
(env. variables) will be ignored.
28+
29+
### Limitations
30+
31+
The OTel API/specification goes beyond tracing, as a result, the following OTel features are not supported:
32+
- metrics
33+
- logs
34+
- span events
35+
- span links
36+
37+
### Spans and Transactions
38+
39+
OTel only defines Spans, whereas Elastic APM relies on both Spans and Transactions.
40+
OTel allows users to provide the _remote context_ when creating a span, which is equivalent to providing a parent to a transaction or span,
41+
it also allows to provide a (local) parent span.
42+
43+
As a result, when creating Spans through OTel API with a bridge, agents must implement the following algorithm:
44+
45+
```javascript
46+
// otel_span contains the properties set through the OTel API
47+
span_or_transaction = null;
48+
if (otel_span.remote_contex != null) {
49+
span_or_transaction = createTransactionWithParent(otel_span.remote_context);
50+
} else if (otel_span.parent == null) {
51+
span_or_transaction = createRootTransaction();
52+
} else {
53+
span_or_transaction = createSpanWithParent(otel_span.parent);
54+
}
55+
```
56+
57+
### Span Kind
58+
59+
OTel spans have an `SpanKind` property ([specification](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#spankind)) which is close but not strictly equivalent to our definition of spans and transactions.
60+
61+
For both transactions and spans, an optional `otel.span_kind` property will be provided by agents when set through
62+
the OTel API.
63+
This value should be stored into Elasticsearch documents to preserve OTel semantics and help future OTel integration.
64+
65+
Possible values are `CLIENT`, `SERVER`, `PRODUCER`, `CONSUMER` and `INTERNAL`, refer to [specification](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#spankind) for details on semantics.
66+
67+
By default, OTel spans have their `SpanKind` set to `INTERNAL` by OTel API implementation, so it is assumed to always be provided when using the bridge.
68+
69+
For existing agents without OTel bridge or for data captured without the bridge, the APM server has to infer the value of `otel.span_kind` with the following algorithm:
70+
71+
```javascript
72+
span_kind = null;
73+
if (isTransaction(item)) {
74+
if (item.type == "messaging") {
75+
span_kind = "CONSUMER";
76+
} else if (item.type == "request") {
77+
span_kind = "SERVER";
78+
}
79+
} else {
80+
// span
81+
if (item.type == "external" || item.type == "storage" || item.type == "db") {
82+
span_kind = "CLIENT";
83+
}
84+
}
85+
86+
if (span_kind == null) {
87+
span_kind = "INTERNAL";
88+
}
89+
90+
```
91+
92+
While being optional, inferring the value of `otel.span_kind` helps to keep the data model closer to the OTel specification, even if the original data was sent using the native agent protocol.
93+
94+
### Span status
95+
96+
OTel spans have a [Status](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#set-status)
97+
field to indicate the status of the underlying task they represent.
98+
99+
When the [Set Status](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/trace/api.md#set-status) on OTel API is used, we can map it directly to `span.outcome`:
100+
- OK => Success
101+
- Error => Failure
102+
- Unset (default) => Unknown
103+
104+
However, when not provided explicitly agents can infer the outcome from the presence of a reported error.
105+
This behavior is not expected with OTel API with status, thus bridged spans/transactions should NOT have their outcome
106+
altered by reporting (or lack of reporting) of an error. Here the behavior should be identical to when the end-user provides
107+
the outcome explicitly and thus have higher priority over the inferred value.
108+
109+
### Attributes mapping
110+
111+
OTel relies on key-value pairs for span attributes.
112+
Keys and values are protocol-specific and are defined in [semantic convention](https://github.com/open-telemetry/opentelemetry-specification/tree/main/specification/trace/semantic_conventions) specification.
113+
114+
In order to minimize the mapping complexity in agents, most of the mapping between OTel attributes and agent protocol will be delegated to APM server:
115+
- All OTel span attributes should be captured as-is and written to agent protocol.
116+
- APM server will handle the mapping between OTel attributes and their native transaction/spans equivalents
117+
- Some native span/transaction attributes will still require mapping within agents for [compatibility with existing features](#compatibility-mapping)
118+
119+
OpenTelemetry attributes should be stored in `otel.attributes` as a flat key-value pair mapping added to `span` and `transaction` objects:
120+
```json
121+
{
122+
// [...] other span/transaction attributes
123+
"otel": {
124+
"span_kind": "CLIENT",
125+
"attributes": {
126+
"db.system": "mysql",
127+
"db.statement": "SELECT * from table_1"
128+
}
129+
}
130+
}
131+
```
132+
133+
Starting from version 7.16 onwards, APM server must provide a mapping that is equivalent to the native OpenTelemetry Protocol (OTLP) intake for the
134+
fields provided in `otel.attributes`.
135+
136+
When sending data to APM server version before 7.16, agents MAY use span and transaction labels as fallback to store OTel attributes to avoid dropping information.
137+
138+
### Compatibility mapping
139+
140+
Agents should ensure compatibility with the following features:
141+
- breakdown metrics
142+
- [dropped spans statistics](handling-huge-traces/tracing-spans-dropped-stats.md)
143+
- [compressed spans](handling-huge-traces/tracing-spans-compress.md)
144+
145+
As a consequence, agents must provide values for the following attributes:
146+
- `transaction.name` or `span.name` : value directly provided by OTel API
147+
- `transaction.type` : see inference algorithm below
148+
- `span.type` and `span.subtype` : see inference algorithm below
149+
- `span.destination.service.resource` : see inference algorithm below
150+
151+
#### Transaction type
152+
153+
```javascript
154+
a = transation.otel.attributes;
155+
span_kind = transaction.otel_span_kind;
156+
isRpc = a['rpc.system'] !== undefined;
157+
isHttp = a['http.url'] !== undefined || a['http.scheme'] !== undefined;
158+
isMessaging = a['messaging.system'] !== undefined;
159+
if (span_kind == 'SERVER' && (isRpc || isHttp)) {
160+
type = 'request';
161+
} else if (span_kind == 'CONSUMER' && isMessaging) {
162+
type = 'messaging';
163+
} else {
164+
type = 'unknown';
165+
}
166+
```
167+
168+
#### Span type, sub-type and destination service resource
169+
170+
```javascript
171+
a = span.otel.attributes;
172+
type = undefined;
173+
subtype = undefined;
174+
resource = undefined;
175+
176+
httpPortFromScheme = function (scheme) {
177+
if ('http' == scheme) {
178+
return 80;
179+
} else if ('https' == scheme) {
180+
return 443;
181+
}
182+
return -1;
183+
}
184+
185+
// extracts 'host' or 'host:port' from URL
186+
parseNetName = function (url) {
187+
var u = new URL(url); // https://developer.mozilla.org/en-US/docs/Web/API/URL
188+
if (u.port != '') {
189+
return u.hostname; // host:port already in URL
190+
} else {
191+
var port = httpPortFromScheme(u.protocol.substring(0, u.protocol.length - 1));
192+
return port > 0 ? u.host + ':'+ port : u.host;
193+
}
194+
}
195+
196+
peerPort = a['net.peer.port'];
197+
netName = a['net.peer.name'] || a['net.peer.ip'];
198+
199+
if (netName && peerPort > 0) {
200+
netName += ':';
201+
netName += peerPort;
202+
}
203+
204+
if (a['db.system']) {
205+
type = 'db'
206+
subtype = a['db.system'];
207+
resource = netName || subtype;
208+
if (a['db.name']) {
209+
resource += '/'
210+
resource += a['db.name'];
211+
}
212+
213+
} else if (a['messaging.system']) {
214+
type = 'messaging';
215+
subtype = a['messaging.system'];
216+
217+
if (!netName && a['messaging.url']) {
218+
netName = parseNetName(a['messaging.url']);
219+
}
220+
resource = netName || subtype;
221+
if (a['messaging.destination']) {
222+
resource += '/';
223+
resource += a['messaging.destination'];
224+
}
225+
226+
} else if (a['rpc.system']) {
227+
type = 'external';
228+
subtype = a['rpc.system'];
229+
resource = netName || subtype;
230+
if (a['rpc.service']) {
231+
resource += '/';
232+
resource += a['rpc.service'];
233+
}
234+
235+
} else if (a['http.url'] || a['http.scheme']) {
236+
type = 'external';
237+
subtype = 'http';
238+
239+
if (a['http.host'] && a['http.scheme']) {
240+
resource = a['http.host'] + ':' + httpPortFromScheme(a['http.scheme']);
241+
} else if (a['http.url']) {
242+
resource = parseNetName(a['http.url']);
243+
}
244+
}
245+
246+
if (type === undefined) {
247+
if (span.otel.span_kind == 'INTERNAL') {
248+
type = 'app';
249+
subtype = 'internal';
250+
} else {
251+
type = 'unknown';
252+
}
253+
}
254+
span.type = type;
255+
span.subtype = subtype;
256+
span.destination.service.resource = resource;
257+
```
258+
259+
### Active Spans and Context
260+
261+
When possible, bridge implementation MUST ensure proper interoperability between Elastic transactions/spans and OTel spans when
262+
used from their respective APIs:
263+
- After activating an Elastic span via the agent's API, the [`Context`] returned via the [get current context API] should contain that Elastic span
264+
- When an OTel context is [attached] (aka activated), the [get current context API] should return the same [`Context`] instance.
265+
- Starting an OTel span in the scope of an active Elastic span should make the OTel span a child of the Elastic span.
266+
- Starting an Elastic span in the scope of an active OTel span should make the Elastic span a child of the OTel span.
267+
268+
[`Context`]: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/context/context.md
269+
[attached]: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/context/context.md#attach-context
270+
[get current context API]: https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/context/context.md#get-current-context
271+
272+
Both OTel and our agents have their own definition of what "active context" is, for example:
273+
- Java Agent: Elastic active context is implemented as a thread-local stack
274+
- Java OTel API: active context is implemented as a key-value map propagated through thread local
275+
276+
In order to avoid potentially complex and tedious synchronization issues between OTel and our existing agent
277+
implementations, the bridge implementation SHOULD provide an abstraction to have a single "active context" storage.

specs/agents/tracing-api.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
## Tracer APIs
22

3-
All agents must provide an API to enable developers to instrument their applications manually, in addition to any automatic instrumentation. Agents document their APIs in the elastic.co docs:
3+
All agents must provide a native API to enable developers to instrument their applications manually, in addition to any
4+
automatic instrumentation.
5+
6+
Agents document their APIs in the elastic.co docs:
47

58
- [Node.js Agent](https://www.elastic.co/guide/en/apm/agent/nodejs/current/api.html)
69
- [Go Agent](https://www.elastic.co/guide/en/apm/agent/go/current/api.html)
@@ -10,4 +13,4 @@ All agents must provide an API to enable developers to instrument their applicat
1013
- [Ruby Agent](https://www.elastic.co/guide/en/apm/agent/ruby/current/api.html)
1114
- [RUM JS Agent](https://www.elastic.co/guide/en/apm/agent/js-base/current/api.html)
1215

13-
In addition to each agent having a "native" API for instrumentation, they also implement the [OpenTracing APIs](https://opentracing.io). Agents should align implementations according to https://github.com/elastic/apm/issues/32.
16+
In addition, each agent may provide "bridge" implementations of vendor-neutral [OpenTelemetry API](tracing-api-otel.md).

0 commit comments

Comments
 (0)