Open
Conversation
see linkerd/linkerd2#14954. some user reports describe situations in which, when the linkerd control plane's destination controller is OOM-killed, DNS resolution can momentarily cause the proxy to compute a negative-TTL duration of zero. this causes a panic in production environments, because `tokio::time::interval` asserts that it has not been provided a duration of zero. this manifests in errors that look like this: ``` thread 'main' panicked at linkerd/app/core/src/control.rs:118:49: period must be non-zero. ``` this commit patches `linkerd-dns::ResolveError::negative_ttl()` so that it will now log a warning and instead return `None` when a negative TTL of zero is encountered. a shared `duration_from_error()` helper (bikeshedding welcome) helps do this for both A/AAAA and SRV records. X-Ref: #3807 Signed-off-by: katelyn martin <kate@buoyant.io>
Member
Author
|
i will tend to linter errors in the morning. |
Signed-off-by: katelyn martin <kate@buoyant.io>
cratelyn
added a commit
that referenced
this pull request
Mar 11, 2026
`linkerd_app_core::control` provides utilities used by the data plane to communicate with the linkerd control plane. this includes, among other features such as load-balancing and configurability for settings like connection timeout durations, an error recovery that respects DNS record's negative TTL. as of today, we do this within an inline, anonymous closure. this commit pulls this business logic out of an inline closure, and into an explicit pair of structures. ResolveRecover is the Recover implementation that handles identifying the proper backoff strategy, when presented with a given boxed error. ResolveBackoff is the structure that acts as the sum type that encompasses either a TTL-driven interval, or an exponential backoff. see also, #4449. that introduces some additional guardrails to prevent panicking if a negative ttl of zero is encountered. Signed-off-by: katelyn martin <kate@buoyant.io>
unleashed
approved these changes
Mar 11, 2026
Member
unleashed
left a comment
There was a problem hiding this comment.
Looks good 👍, just a couple minor comments
#4449 (comment) Signed-off-by: katelyn martin <kate@buoyant.io>
we follow a convention in the proxy of "punning" fairly liberally. this comment was pointing to the internal ResolveError, not the hickory_resolver version of this type. #4449 (comment) Signed-off-by: katelyn martin <kate@buoyant.io>
we introduced logic for enforcing a minimum TTL in #3807. this commit moves that logic from the outer layer in `linkerd-dns-resolve` and into the `linkerd-dns` library. this will help us reuse/consolidate the same logic for *negative* TTL's. Signed-off-by: katelyn martin <kate@buoyant.io>
Signed-off-by: katelyn martin <kate@buoyant.io>
this introduces a new function to `linkerd_dns::minimum_ttl`, for working with `Duration`s. this is used in `negative_ttl_of()` so that we not only check for TTL's of zero, but also for pathologically small TTL's. Signed-off-by: katelyn martin <kate@buoyant.io>
this tweaks our `sleep_until_expired` function so that it provides a similar signature to `with_minimum_duration`. this way, callers that interact with Instants and Durations both have common interfaces. Signed-off-by: katelyn martin <kate@buoyant.io>
Member
Author
|
i've renamed this pull request now that, after some review feedback has yielded additional changes, this does slightly more than check for non-zero negative TTL's. i have also updated the pull request description to reflect the fact that this goes beyond zero TTL's, and also enforces a lower bound. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
see linkerd/linkerd2#14954.
some user reports describe situations in which, when the linkerd control
plane's destination controller is OOM-killed, DNS resolution can
momentarily cause the proxy to compute a negative-TTL duration of zero.
this causes a panic in production environments, because
tokio::time::intervalasserts that it has not been provided a durationof zero.
this manifests in errors that look like this:
this branch introduces changes to enforce a lower-bound for negative TTL's
that are zero, which would cause a panic, or are pathologically short,
which could cause proxies encountering resolution errors to thrash the
DNS server trying to recover.