refactor(app/core): outline control plane recovery backoff by cratelyn · Pull Request #4450 · linkerd/linkerd2-proxy

cratelyn · 2026-03-11T18:29:44Z

linkerd_app_core::control provides utilities used by the data plane to
communicate with the linkerd control plane. this includes, among other
features such as load-balancing and configurability for settings like
connection timeout durations, an error recovery that respects DNS
record's negative TTL.

as of today, we do this within an inline, anonymous closure.

this commit pulls this business logic out of an inline closure, and into
an explicit pair of structures.

ResolveRecover is the Recover implementation that handles identifying
the proper backoff strategy, when presented with a given boxed error.
ResolveBackoff is the structure that acts as the sum type that
encompasses either a TTL-driven interval, or an exponential backoff.

see also, #4449. that introduces some additional
guardrails to prevent panicking if a negative ttl of zero is
encountered.

Signed-off-by: katelyn martin kate@buoyant.io

`linkerd_app_core::control` provides utilities used by the data plane to communicate with the linkerd control plane. this includes, among other features such as load-balancing and configurability for settings like connection timeout durations, an error recovery that respects DNS record's negative TTL. as of today, we do this within an inline, anonymous closure. this commit pulls this business logic out of an inline closure, and into an explicit pair of structures. ResolveRecover is the Recover implementation that handles identifying the proper backoff strategy, when presented with a given boxed error. ResolveBackoff is the structure that acts as the sum type that encompasses either a TTL-driven interval, or an exponential backoff. see also, #4449. that introduces some additional guardrails to prevent panicking if a negative ttl of zero is encountered. Signed-off-by: katelyn martin <kate@buoyant.io>

unleashed · 2026-03-12T17:55:12Z

linkerd/app/core/src/control.rs

+            // If we are recovering due to a DNS resolution error, check for a negative TTL.
+            if let Some(e) = crate::errors::cause_ref::<dns::ResolveError>(&*error) {
+                if let Some(ttl) = e.negative_ttl() {
+                    let interval = tokio::time::interval(ttl);
+                    let stream = IntervalStream::new(interval);
+                    return Ok(ResolveBackoff::NegativeTtl(stream));
+                }
+            }


I know this is just keeping the existing behavior, but isn't tokio::time::interval() going to fire immediately so we'll retry right away?

cratelyn self-assigned this Mar 11, 2026

cratelyn marked this pull request as ready for review March 11, 2026 19:51

cratelyn requested a review from a team as a code owner March 11, 2026 19:51

cratelyn requested a review from unleashed March 12, 2026 00:30

unleashed reviewed Mar 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(app/core): outline control plane recovery backoff#4450

refactor(app/core): outline control plane recovery backoff#4450
cratelyn wants to merge 1 commit intomainfrom
kate/dns.outline-app-core-recover

cratelyn commented Mar 11, 2026

Uh oh!

unleashed Mar 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

cratelyn commented Mar 11, 2026

Uh oh!

unleashed Mar 12, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants