-
Notifications
You must be signed in to change notification settings - Fork 311
Introduced trace post-processing #6800
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
c340d45
to
adf13d0
Compare
BenchmarksStartupParameters
See matching parameters
SummaryFound 0 performance improvements and 0 performance regressions! Performance is the same for 49 metrics, 14 unstable metrics. Startup time reports for petclinicgantt
title petclinic - global startup overhead: candidate=1.33.0-SNAPSHOT~ae1ecea607, baseline=1.33.0-SNAPSHOT~90a6e4126b
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.076 s) : 0, 1075677
Total [baseline] (10.386 s) : 0, 10385832
Agent [candidate] (1.082 s) : 0, 1081685
Total [candidate] (10.41 s) : 0, 10409678
section appsec
Agent [baseline] (1.205 s) : 0, 1204729
Total [baseline] (10.511 s) : 0, 10511249
Agent [candidate] (1.205 s) : 0, 1204655
Total [candidate] (10.435 s) : 0, 10435305
section iast
Agent [baseline] (1.199 s) : 0, 1198556
Total [baseline] (10.707 s) : 0, 10707283
Agent [candidate] (1.198 s) : 0, 1197673
Total [candidate] (10.837 s) : 0, 10837312
section profiling
Agent [baseline] (1.266 s) : 0, 1265901
Total [baseline] (10.624 s) : 0, 10624017
Agent [candidate] (1.268 s) : 0, 1268075
Total [candidate] (10.656 s) : 0, 10655848
gantt
title petclinic - break down per module: candidate=1.33.0-SNAPSHOT~ae1ecea607, baseline=1.33.0-SNAPSHOT~90a6e4126b
dateFormat X
axisFormat %s
section tracing
BytebuddyAgent [baseline] (673.774 ms) : 0, 673774
BytebuddyAgent [candidate] (678.246 ms) : 0, 678246
GlobalTracer [baseline] (309.564 ms) : 0, 309564
GlobalTracer [candidate] (311.037 ms) : 0, 311037
AppSec [baseline] (49.642 ms) : 0, 49642
AppSec [candidate] (49.58 ms) : 0, 49580
Remote Config [baseline] (666.389 µs) : 0, 666
Remote Config [candidate] (655.047 µs) : 0, 655
Telemetry [baseline] (7.614 ms) : 0, 7614
Telemetry [candidate] (7.534 ms) : 0, 7534
section appsec
BytebuddyAgent [baseline] (699.616 ms) : 0, 699616
BytebuddyAgent [candidate] (699.182 ms) : 0, 699182
GlobalTracer [baseline] (292.955 ms) : 0, 292955
GlobalTracer [candidate] (292.925 ms) : 0, 292925
AppSec [baseline] (150.216 ms) : 0, 150216
AppSec [candidate] (149.764 ms) : 0, 149764
IAST [baseline] (19.177 ms) : 0, 19177
IAST [candidate] (18.868 ms) : 0, 18868
Remote Config [baseline] (611.363 µs) : 0, 611
Remote Config [candidate] (610.55 µs) : 0, 611
Telemetry [baseline] (7.489 ms) : 0, 7489
Telemetry [candidate] (8.735 ms) : 0, 8735
section iast
BytebuddyAgent [baseline] (794.453 ms) : 0, 794453
BytebuddyAgent [candidate] (793.569 ms) : 0, 793569
GlobalTracer [baseline] (287.88 ms) : 0, 287880
GlobalTracer [candidate] (288.583 ms) : 0, 288583
AppSec [baseline] (51.607 ms) : 0, 51607
AppSec [candidate] (49.523 ms) : 0, 49523
IAST [baseline] (22.441 ms) : 0, 22441
IAST [candidate] (22.822 ms) : 0, 22822
Remote Config [baseline] (561.653 µs) : 0, 562
Remote Config [candidate] (591.211 µs) : 0, 591
Telemetry [baseline] (7.32 ms) : 0, 7320
Telemetry [candidate] (8.234 ms) : 0, 8234
section profiling
BytebuddyAgent [baseline] (676.142 ms) : 0, 676142
BytebuddyAgent [candidate] (676.922 ms) : 0, 676922
GlobalTracer [baseline] (379.506 ms) : 0, 379506
GlobalTracer [candidate] (381.176 ms) : 0, 381176
AppSec [baseline] (50.116 ms) : 0, 50116
AppSec [candidate] (50.104 ms) : 0, 50104
Remote Config [baseline] (723.43 µs) : 0, 723
Remote Config [candidate] (706.439 µs) : 0, 706
Telemetry [baseline] (7.427 ms) : 0, 7427
Telemetry [candidate] (7.483 ms) : 0, 7483
ProfilingAgent [baseline] (95.706 ms) : 0, 95706
ProfilingAgent [candidate] (95.438 ms) : 0, 95438
Profiling [baseline] (95.73 ms) : 0, 95730
Profiling [candidate] (95.462 ms) : 0, 95462
Startup time reports for insecure-bankgantt
title insecure-bank - global startup overhead: candidate=1.33.0-SNAPSHOT~ae1ecea607, baseline=1.33.0-SNAPSHOT~90a6e4126b
dateFormat X
axisFormat %s
section tracing
Agent [baseline] (1.081 s) : 0, 1081499
Total [baseline] (8.559 s) : 0, 8558976
Agent [candidate] (1.099 s) : 0, 1099098
Total [candidate] (8.639 s) : 0, 8639062
section iast
Agent [baseline] (1.196 s) : 0, 1196121
Total [baseline] (9.001 s) : 0, 9000551
Agent [candidate] (1.2 s) : 0, 1199857
Total [candidate] (9.005 s) : 0, 9005325
section iast_HARDCODED_SECRET_DISABLED
Agent [baseline] (1.206 s) : 0, 1206467
Total [baseline] (8.978 s) : 0, 8977733
Agent [candidate] (1.2 s) : 0, 1199716
Total [candidate] (8.98 s) : 0, 8980289
section iast_TELEMETRY_OFF
Agent [baseline] (1.196 s) : 0, 1196195
Total [baseline] (9.012 s) : 0, 9011770
Agent [candidate] (1.207 s) : 0, 1206999
Total [candidate] (9.067 s) : 0, 9067034
gantt
title insecure-bank - break down per module: candidate=1.33.0-SNAPSHOT~ae1ecea607, baseline=1.33.0-SNAPSHOT~90a6e4126b
dateFormat X
axisFormat %s
section tracing
BytebuddyAgent [baseline] (677.732 ms) : 0, 677732
BytebuddyAgent [candidate] (688.508 ms) : 0, 688508
GlobalTracer [baseline] (311.152 ms) : 0, 311152
GlobalTracer [candidate] (316.532 ms) : 0, 316532
AppSec [baseline] (49.781 ms) : 0, 49781
AppSec [candidate] (50.484 ms) : 0, 50484
Remote Config [baseline] (663.623 µs) : 0, 664
Remote Config [candidate] (681.641 µs) : 0, 682
Telemetry [baseline] (7.578 ms) : 0, 7578
Telemetry [candidate] (7.723 ms) : 0, 7723
section iast
BytebuddyAgent [baseline] (792.546 ms) : 0, 792546
BytebuddyAgent [candidate] (794.825 ms) : 0, 794825
GlobalTracer [baseline] (287.101 ms) : 0, 287101
GlobalTracer [candidate] (288.854 ms) : 0, 288854
AppSec [baseline] (48.84 ms) : 0, 48840
AppSec [candidate] (50.179 ms) : 0, 50179
IAST [baseline] (23.916 ms) : 0, 23916
IAST [candidate] (22.138 ms) : 0, 22138
Remote Config [baseline] (569.674 µs) : 0, 570
Remote Config [candidate] (574.756 µs) : 0, 575
Telemetry [baseline] (8.888 ms) : 0, 8888
Telemetry [candidate] (8.924 ms) : 0, 8924
section iast_HARDCODED_SECRET_DISABLED
BytebuddyAgent [baseline] (800.093 ms) : 0, 800093
BytebuddyAgent [candidate] (795.162 ms) : 0, 795162
GlobalTracer [baseline] (289.549 ms) : 0, 289549
GlobalTracer [candidate] (288.548 ms) : 0, 288548
AppSec [baseline] (51.472 ms) : 0, 51472
AppSec [candidate] (50.821 ms) : 0, 50821
IAST [baseline] (22.875 ms) : 0, 22875
IAST [candidate] (21.997 ms) : 0, 21997
Remote Config [baseline] (579.551 µs) : 0, 580
Remote Config [candidate] (612.189 µs) : 0, 612
Telemetry [baseline] (7.331 ms) : 0, 7331
Telemetry [candidate] (8.171 ms) : 0, 8171
section iast_TELEMETRY_OFF
BytebuddyAgent [baseline] (792.158 ms) : 0, 792158
BytebuddyAgent [candidate] (799.567 ms) : 0, 799567
GlobalTracer [baseline] (287.819 ms) : 0, 287819
GlobalTracer [candidate] (291.483 ms) : 0, 291483
AppSec [baseline] (51.962 ms) : 0, 51962
AppSec [candidate] (49.858 ms) : 0, 49858
IAST [baseline] (22.644 ms) : 0, 22644
IAST [candidate] (21.987 ms) : 0, 21987
Remote Config [baseline] (586.448 µs) : 0, 586
Remote Config [candidate] (591.283 µs) : 0, 591
Telemetry [baseline] (6.649 ms) : 0, 6649
Telemetry [candidate] (8.896 ms) : 0, 8896
LoadParameters
See matching parameters
SummaryFound 0 performance improvements and 1 performance regressions! Performance is the same for 10 metrics, 17 unstable metrics.
Request duration reports for petclinicgantt
title petclinic - request duration [CI 0.99] : candidate=1.33.0-SNAPSHOT~ae1ecea607, baseline=1.33.0-SNAPSHOT~90a6e4126b
dateFormat X
axisFormat %s
section baseline
no_agent (1.326 ms) : 1307, 1345
. : milestone, 1326,
appsec (1.717 ms) : 1692, 1742
. : milestone, 1717,
appsec_no_iast (1.73 ms) : 1706, 1754
. : milestone, 1730,
iast (1.49 ms) : 1467, 1513
. : milestone, 1490,
profiling (1.491 ms) : 1466, 1515
. : milestone, 1491,
tracing (1.455 ms) : 1430, 1479
. : milestone, 1455,
section candidate
no_agent (1.344 ms) : 1324, 1363
. : milestone, 1344,
appsec (1.684 ms) : 1659, 1709
. : milestone, 1684,
appsec_no_iast (1.732 ms) : 1709, 1755
. : milestone, 1732,
iast (1.498 ms) : 1476, 1520
. : milestone, 1498,
profiling (1.554 ms) : 1528, 1580
. : milestone, 1554,
tracing (1.506 ms) : 1481, 1532
. : milestone, 1506,
Request duration reports for insecure-bankgantt
title insecure-bank - request duration [CI 0.99] : candidate=1.33.0-SNAPSHOT~ae1ecea607, baseline=1.33.0-SNAPSHOT~90a6e4126b
dateFormat X
axisFormat %s
section baseline
no_agent (359.641 µs) : 340, 379
. : milestone, 360,
iast (476.093 µs) : 455, 497
. : milestone, 476,
iast_FULL (535.183 µs) : 514, 557
. : milestone, 535,
iast_GLOBAL (500.211 µs) : 478, 522
. : milestone, 500,
iast_HARDCODED_SECRET_DISABLED (472.058 µs) : 451, 493
. : milestone, 472,
iast_INACTIVE (443.423 µs) : 423, 464
. : milestone, 443,
iast_TELEMETRY_OFF (461.946 µs) : 442, 482
. : milestone, 462,
tracing (442.266 µs) : 421, 463
. : milestone, 442,
section candidate
no_agent (364.933 µs) : 345, 385
. : milestone, 365,
iast (473.102 µs) : 451, 495
. : milestone, 473,
iast_FULL (533.563 µs) : 513, 554
. : milestone, 534,
iast_GLOBAL (492.803 µs) : 471, 514
. : milestone, 493,
iast_HARDCODED_SECRET_DISABLED (467.422 µs) : 447, 488
. : milestone, 467,
iast_INACTIVE (444.682 µs) : 423, 466
. : milestone, 445,
iast_TELEMETRY_OFF (464.09 µs) : 443, 485
. : milestone, 464,
tracing (440.667 µs) : 420, 462
. : milestone, 441,
Dacapo |
90ff9a9
to
966275f
Compare
966275f
to
3b692dd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have a few questions about the overall approach.
* | ||
* @param postProcessing flag to indicate the need for post-processing | ||
*/ | ||
void setRequiresPostProcessing(boolean postProcessing); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a rather low-level detail that I was hoping we wouldn't need to expose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was, to set (in some way) flag in context, that can be indicative of post-processing requirement. Since TraceSegment
is implemented in DDSpanContext
, for me it look like a perfect candidate, that can be used to set requiresPostProcessing
flag from any request handler code.
But, I suppose, we can easily move it to DDSpanContext
to avoid exposure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
setRequiresPostProcessing
is fully moved to DDSpanContext
@@ -133,5 +133,9 @@ public final class TracerConfig { | |||
|
|||
public static final String TRACE_FLUSH_INTERVAL = "trace.flush.interval"; | |||
|
|||
public static final String TRACE_POST_PROCESSING_ENABLED = "trace.post-processing.enabled"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it will cross-cut across products. Do want the user to be able to control this directly?
I think it might be better for there to knobs that control the individual pieces of post-processing.
Otherwise, we might disable this for ASM only to adversely impact APM as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I was imaging in future, we can have various independent post-processing tasks (each called by dd-post-processor
thread one-by-one). They could be represented as list:
class TraceProcessingWorker {
List<TracePostProcessor> tracePostProcessors
...
}
And for each TracePostProcessor will have to have it's own way to disable it (via config).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed option to disable post-processing. Instead will be used disabling for each post-processor
return; | ||
} | ||
|
||
Thread thread = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to split off a separate thread here?
Given the join further down, it looks this is basically still operating synchronously.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC the intent was to limit the post-processing time, i.e. allow cancellation of post-processing when the limit was exceeded.
If the post-processing can be represented as a loop, where each iteration is small enough, then this could be achieved without using a separate thread - by checking whether the limit has been exceeded on each iteration and breaking out of the loop early.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code as it is assumes that the post-processing work can be safely interrupted...
Now that I see the overall PR I think a better approach would be for the post-processor to accept a BooleanSupplier
(or Predicate
if there's a context object shared between the worker and post-processor)
The post-processor could then call that at the start of each loop to decide whether it needs to abort or not.
Then all the worker would need to do is provide an implementation of that BooleanSupplier
which changes the result when the time has been exceeded. Another benefit of this approach is that testing could provide a different implementation, which uses a different condition.
In other words - rather than have a separate thread, and assume the post-processing is interrupt-safe, we invert the check and have the post-processor check the condition supplied by the worker whether it should continue or abort. That would avoid the need for extra threads/interrupts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I confirm, my intention to use a thread was due to the need to interrupt it with a timeout. I think, post-processing duration could be related to amount of data (imagine request with very huge body).
@mcculls, good point! I'll try to get rid of creating new threads. The only concern here - in my particular case, I will need to call WAF (native module) during the post-processing, and there will be no way to interrupt native code execution (apart of built-in WAF timeout mechanism)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed thread creating and introduced timeoutCheck
as discussed
e3db013
to
fd8abcd
Compare
@@ -16,6 +16,7 @@ public enum AgentThread { | |||
TRACE_STARTUP("dd-agent-startup-datadog-tracer"), | |||
TRACE_MONITOR("dd-trace-monitor"), | |||
TRACE_PROCESSOR("dd-trace-processor"), | |||
TRACE_POST_PROCESSOR("dd-trace-post-processor"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is no longer needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, missed from pervious implementation, removed
dd-trace-core/src/main/java/datadog/trace/core/postprocessor/SpanPostProcessor.java
Outdated
Show resolved
Hide resolved
@@ -140,6 +140,7 @@ public class DDSpanContext | |||
private final boolean injectBaggageAsTags; | |||
private volatile int encodedOperationName; | |||
private volatile int encodedResourceName; | |||
private boolean requiresPostProcessing; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this need to be volatile
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, my bad
BooleanSupplier timeoutCheck = () -> System.currentTimeMillis() > deadline; | ||
|
||
for (DDSpan span : trace) { | ||
if (timeoutCheck.getAsBoolean()) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will incur a cost of calling System.currentTimeMillis()
for every span in the trace, even if no spans need post-processing. How about doing an initial loop to collect any spans that need post-processing and only setup the timeout check, etc. when we know we have spans to post-process?
You could also lazily create the collection to hold the spans to post-process, ie. only create it when you find a span to post-process, in which case for traces with no spans to post-process the only cost would be that first iteration to check the flag (ie. no allocation or time calls).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reworked as suggested
dd-trace-core/src/main/java/datadog/trace/core/postprocessor/SpanPostProcessor.java
Show resolved
Hide resolved
} | ||
|
||
if (context.isRequiresPostProcessing()) { | ||
spanPostProcessor.process(span, context, timeoutCheck); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The process
method returns a boolean
but its value isn't checked here - is the plan to use it to abort the post-processing or should the process
method be changed to return void
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I've missed it. Changed the logic to check return value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybeTracePostProcessing
should be changed to do an initial scan of the spans and collect those requiring post-processing. If no spans need post-processing it can exit early. Otherwise setup the timeout check, etc. and start post-processing the smaller subset of spans that need it.
Also the description mentions a feature-flag dd.trace.post-processing.enabled
to control this feature, but that doesn't appear to be implemented yet?
|
||
for (DDSpan span : spansToPostProcess) { | ||
if (timeoutCheck.getAsBoolean() || !spanPostProcessor.process(span, timeoutCheck)) { | ||
log.debug("Span post-processing is interrupted due to timeout."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log.debug("Span post-processing is interrupted due to timeout."); | |
log.debug("Span post-processing interrupted due to timeout."); |
af1b0cd
to
361b7ba
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm OK with the updated implementation - just a question about whether we should leave the no-op AppSecPostProcessor
in place before merging
@@ -225,6 +231,7 @@ public TraceSerializingHandler( | |||
} else { | |||
this.ticksRequiredToFlush = Long.MAX_VALUE; | |||
} | |||
this.spanPostProcessor = new AppSecPostProcessor(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since AppSecPostProcessor
is currently a no-op, how about leaving spanPostProcessor
as null
for the initial merge? You could then check spanPostProcessor != null
before calling maybeTracePostProcessing
(or check it inside that method) so the post-processing only happens when there's a registered post-processor...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That could be reasonable, but in this case, test coverage will not pass required minimum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't want to write our code to satisfy the coverage tool. The coverage tool is suppose to help make code better not worse.
I have to believe there's a better solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two options:
- expose a way to set a test post-processor, used to test out the implementation
- this would also help you test edge conditions as well as do micro-benchmarking
- exclude the class from the code-coverage requirements
- this is not ideal, but for some situations it is necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done + added test
4f4cfe9
to
361b7ba
Compare
@@ -2036,6 +2038,8 @@ PROFILING_DATADOG_PROFILER_ENABLED, isDatadogProfilerSafeInCurrentEnvironment()) | |||
"Agentless profiling activated but no api key provided. Profile uploading will likely fail"); | |||
} | |||
|
|||
this.tracePostProcessingTimeout = configProvider.getLong(TRACE_POST_PROCESSING_TIMEOUT, 1000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see the default as a static variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added default value
@@ -945,4 +946,12 @@ private String getTagName(String key) { | |||
// TODO is this decided? | |||
return "_dd." + key + ".json"; | |||
} | |||
|
|||
public void setRequiresPostProcessing(boolean postProcessing) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't particularly like that these methods are public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my case, setRequiresPostProcessing
, should be called from appsec
module, so we need to expose it in some way. Either in TraceSegmet
, either in DDSpanContext
.
Do you have any other options in mind?
* @return {@code true} if the span was successfully processed; {@code false} in case of a | ||
* timeout. | ||
*/ | ||
boolean process(DDSpan span, BooleanSupplier timeoutCheck); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the BooleanSupplier? Would it not be better to just have two different methods to cover the two different cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea of the supplier was for situations where the processor was doing something non-trivial - assuming that processing could be broken down into smaller chunks then the supplier would allow it to check to see if the timeout had been exceeded mid-processing (otherwise there's no easy way for a processor to check to see if it should abort its processing)
In other words during the process
call the value returned by the supplier could change, and it's not possible to represent that as two methods.
This is only really needed when the process
call might take a while and need to be cancelled mid-call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that makes sense. Thanks for the clarification.
08626e3
to
0338f34
Compare
…panPostProcessor.java Co-authored-by: Stuart McCulloch <[email protected]>
0338f34
to
749f577
Compare
What Does This Do
Introduced the Trace Post-Processing - experimental feature.
It allows to process trace at very late stage (in
dd-trace-processor
thread), before serialization and dispatching to Agent.The trace processor, check if there is spans that requires post-processing and perform
SpanPostProcessor
for each of them with timeout. Timeout used for case, when post-processing takes too long and need to be dropped.The post-processing happens only for span context marked as
requiresPostProcessing
.Also added extra configuration options:
trace.post-processing.timeout
- to set timeout for post-processing thread (default: 1000ms)Motivation
The idea is to have mechanism that can safely do extra computations without bringing overhead in requests latency.
This is preparation for new API Security sampling, where heavy schema extraction should happen after the request finished when we have information about span sampling priority. We intent to compute schema only if span will be retained.
Additional Notes