Skip to content

refactor(app/core): outline control plane recovery backoff#4450

Open
cratelyn wants to merge 1 commit intomainfrom
kate/dns.outline-app-core-recover
Open

refactor(app/core): outline control plane recovery backoff#4450
cratelyn wants to merge 1 commit intomainfrom
kate/dns.outline-app-core-recover

Conversation

@cratelyn
Copy link
Member

linkerd_app_core::control provides utilities used by the data plane to
communicate with the linkerd control plane. this includes, among other
features such as load-balancing and configurability for settings like
connection timeout durations, an error recovery that respects DNS
record's negative TTL.

as of today, we do this within an inline, anonymous closure.

this commit pulls this business logic out of an inline closure, and into
an explicit pair of structures.

ResolveRecover is the Recover implementation that handles identifying
the proper backoff strategy, when presented with a given boxed error.
ResolveBackoff is the structure that acts as the sum type that
encompasses either a TTL-driven interval, or an exponential backoff.

see also, #4449. that introduces some additional
guardrails to prevent panicking if a negative ttl of zero is
encountered.

Signed-off-by: katelyn martin kate@buoyant.io

`linkerd_app_core::control` provides utilities used by the data plane to
communicate with the linkerd control plane. this includes, among other
features such as load-balancing and configurability for settings like
connection timeout durations, an error recovery that respects DNS
record's negative TTL.

as of today, we do this within an inline, anonymous closure.

this commit pulls this business logic out of an inline closure, and into
an explicit pair of structures.

ResolveRecover is the Recover implementation that handles identifying
the proper backoff strategy, when presented with a given boxed error.
ResolveBackoff is the structure that acts as the sum type that
encompasses either a TTL-driven interval, or an exponential backoff.

see also, #4449. that introduces some additional
guardrails to prevent panicking if a negative ttl of zero is
encountered.

Signed-off-by: katelyn martin <kate@buoyant.io>
@cratelyn cratelyn self-assigned this Mar 11, 2026
@cratelyn cratelyn marked this pull request as ready for review March 11, 2026 19:51
@cratelyn cratelyn requested a review from a team as a code owner March 11, 2026 19:51
@cratelyn cratelyn requested a review from unleashed March 12, 2026 00:30
Comment on lines +322 to +329
// If we are recovering due to a DNS resolution error, check for a negative TTL.
if let Some(e) = crate::errors::cause_ref::<dns::ResolveError>(&*error) {
if let Some(ttl) = e.negative_ttl() {
let interval = tokio::time::interval(ttl);
let stream = IntervalStream::new(interval);
return Ok(ResolveBackoff::NegativeTtl(stream));
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know this is just keeping the existing behavior, but isn't tokio::time::interval() going to fire immediately so we'll retry right away?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants