Skip to content

Conversation

@olix0r
Copy link
Member

@olix0r olix0r commented Nov 23, 2024

There are a few things about the policy controller logging that can be cleaned
up for consistency and clarity:

  • We frequently log ERROR messages when processing resources with unexpected values. These messages are more appropriately emitted at WARN--we want to surface these situations, but they are not really exceptional.
  • The leadership status of the status controller is not logged at INFO level, so it's not possible to know about status changes without DEBUG logging.
  • We generally use Sentence-cased log messages when emitting user-facing messages. There are a few situations where we are not consistent.
  • The status controller reconciliation logging is somewhat noisy and misleading.
DEBUG status::Index: linkerd_policy_controller_k8s_status::index: Lease holder has changed
DEBUG status::Index: linkerd_policy_controller_k8s_status::index: Lease holder reconciling cluster index.name=linkerd-destination-74d7fdc45d-xfb8l
DEBUG status::Index: linkerd_policy_controller_k8s_status::index: Lease holder reconciling cluster index.name=linkerd-destination-74d7fdc45d-xfb8l
DEBUG status::Index: linkerd_policy_controller_k8s_status::index: Lease holder reconciling cluster index.name=linkerd-destination-74d7fdc45d-xfb8l
DEBUG status::Index: linkerd_policy_controller_k8s_status::index: Lease holder has changed
DEBUG status::Index: linkerd_policy_controller_k8s_status::index: Lease holder reconciling cluster index.name=linkerd-destination-74d7fdc45d-xfb8l
DEBUG status::Index: linkerd_policy_controller_k8s_status::index: Lease holder reconciling cluster index.name=linkerd-destination-74d7fdc45d-xfb8l
DEBUG status::Index: linkerd_policy_controller_k8s_status::index: Lease holder reconciling cluster index.name=linkerd-destination-74d7fdc45d-xfb8l
DEBUG status::Index: linkerd_policy_controller_k8s_status::index: Lease holder has changed
DEBUG status::Index: linkerd_policy_controller_k8s_status::index: Lease holder reconciling cluster index.name=linkerd-destination-74d7fdc45d-xfb8l
DEBUG status::Index: linkerd_policy_controller_k8s_status::index: Lease holder reconciling cluster index.name=linkerd-destination-74d7fdc45d-xfb8l
DEBUG status::Index: linkerd_policy_controller_k8s_status::index: Lease holder reconciling cluster index.name=linkerd-destination-74d7fdc45d-xfb8l

The "Lease holder has changed" message actually indicates that the lease has changed, though the holder may be unchanged.

To improve logging clarity, this change does the following:

  • Adds an INFO level log when the leadership status of the controller changes.
  • Adds an INFO level log when the status controller patches resources.
  • Reconciliation logging is moved to TRACE level.
  • Consistently uses sentence capitalization in user-facing log messages
  • Reduces ERROR messages to WARN when handling invalid user-provided data
    (including cluster resources). This ensures that ERRORs are reserved for
    exceptional policy controller states.

@olix0r olix0r requested a review from a team as a code owner November 23, 2024 02:09
@olix0r olix0r force-pushed the ver/policy-log-noise branch 5 times, most recently from a02eb15 to a9d05d1 Compare November 23, 2024 18:07
Comment on lines -536 to -567
} => match original_dst {
Some(original_dst) => outbound::Backend {
metadata: Some(Metadata {
kind: Some(metadata::Kind::Resource(api::meta::Resource {
group: "policy.linkerd.io".to_string(),
kind: "EgressNetwork".to_string(),
name,
namespace,
section: Default::default(),
port: u16::from(policy.port).into(),
})),
}),
} => {
debug_assert!(
original_dst.is_some(),
"Must not serve EgressNetwork for named lookups; IP:PORT required"
);
let metadata = Some(Metadata {
kind: Some(metadata::Kind::Resource(api::meta::Resource {
group: "policy.linkerd.io".to_string(),
kind: "EgressNetwork".to_string(),
name,
namespace,
section: Default::default(),
port: u16::from(policy.port).into(),
})),
});

let Some(addr) = original_dst else {
tracing::error!(
?metadata,
"Unexpected state: EgressNetworks should only be returned when lookup is by IP:PORT; synthesizing invalid backend"
);
return outbound::Backend {
metadata,
queue: None,
kind: None,
};
};

outbound::Backend {
metadata,
queue: Some(default_queue_config()),
kind: Some(outbound::backend::Kind::Forward(
destination::WeightedAddr {
addr: Some(original_dst.into()),
addr: Some(addr.into()),
weight: 1,
..Default::default()
},
)),
},
None => {
tracing::error!("no original_dst for Egresspolicy");
outbound::Backend {
metadata: Some(Metadata {
kind: Some(metadata::Kind::Default("invalid".to_string())),
}),
queue: None,
kind: None,
}
}
},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zaharidichev I tried to get clearer about this state, but I want to confirm that I understand it properly: this is really exceptional, it should not be possible for a named address to resolve to an EgressNetwork, so if we hit this, there's a bug.

To that end, I've made a debug_assert! so this will blow up in tests, etc. I've preserved the initial metadata for clarity and clarified the error message.

@olix0r olix0r changed the title chore(policy): improve status controller logging chore(policy): improve logging consistency Nov 23, 2024
@olix0r olix0r force-pushed the ver/policy-log-noise branch 5 times, most recently from 96d6e52 to cbf6550 Compare November 24, 2024 19:23
@olix0r
Copy link
Member Author

olix0r commented Nov 24, 2024

Reconciliation logs have been improved so that we log the number of resources being inspected, as well as the total patches in a reconciliation. Tracing contexts are set so we know which resource is being updated.

The awkward "Lease non-holder skipping controller update" DEBUG messages have been consolidated in a helper as

    fn reconcile_if_leader(&self) {
        let lease = self.claims.borrow();
        if !lease.is_current_for(&self.name) {
            tracing::trace!(%lease.holder, "Reconcilation skipped");
            return;
        }
        self.reconcile();
    } 

Example log output here.

@olix0r olix0r force-pushed the ver/policy-log-noise branch 3 times, most recently from bd88260 to 5e3ff63 Compare November 25, 2024 01:31
@olix0r olix0r changed the title chore(policy): improve logging consistency chore(policy): improve logging clarity Nov 25, 2024
@olix0r olix0r changed the title chore(policy): improve logging clarity chore(policy): polish logging Nov 25, 2024
Copy link
Member

@cratelyn cratelyn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks good to me. thank you for doing this! ✔️

@cratelyn
Copy link
Member

note that two ci jobs appear to be failing, my understanding is that these are known to be flaky.

@olix0r olix0r force-pushed the ver/policy-log-noise branch from 5e3ff63 to 712c9bd Compare November 25, 2024 16:04
Copy link
Member

@alpeb alpeb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! 👍

@olix0r olix0r enabled auto-merge (squash) November 25, 2024 16:24
There are a few things about the policy controller logging that can be cleaned
up for consistency and clarity:

* We frequently log ERROR messages when processing resources with unexpected
  values. These messages are more appropriately emitted at WARN--we want to
  surface these situations, but they are not really exceptional.
* The leadership status of the status controller is not logged at INFO level, so
  it's not possible to know about status changes without DEBUG logging.
* We generally use Sentence-cased log messages when emitting user-facing
  messages. There are a few situations where we are not consistent.
* The status controller reconciliation logging is somewhat noisy and misleading.
* The status controller does not log any messages when patching resources.

```
DEBUG status::Index: linkerd_policy_controller_k8s_status::index: Lease holder has changed
DEBUG status::Index: linkerd_policy_controller_k8s_status::index: Lease holder reconciling cluster index.name=linkerd-destination-74d7fdc45d-xfb8l
DEBUG status::Index: linkerd_policy_controller_k8s_status::index: Lease holder reconciling cluster index.name=linkerd-destination-74d7fdc45d-xfb8l
DEBUG status::Index: linkerd_policy_controller_k8s_status::index: Lease holder reconciling cluster index.name=linkerd-destination-74d7fdc45d-xfb8l
DEBUG status::Index: linkerd_policy_controller_k8s_status::index: Lease holder has changed
DEBUG status::Index: linkerd_policy_controller_k8s_status::index: Lease holder reconciling cluster index.name=linkerd-destination-74d7fdc45d-xfb8l
DEBUG status::Index: linkerd_policy_controller_k8s_status::index: Lease holder reconciling cluster index.name=linkerd-destination-74d7fdc45d-xfb8l
DEBUG status::Index: linkerd_policy_controller_k8s_status::index: Lease holder reconciling cluster index.name=linkerd-destination-74d7fdc45d-xfb8l
DEBUG status::Index: linkerd_policy_controller_k8s_status::index: Lease holder has changed
DEBUG status::Index: linkerd_policy_controller_k8s_status::index: Lease holder reconciling cluster index.name=linkerd-destination-74d7fdc45d-xfb8l
DEBUG status::Index: linkerd_policy_controller_k8s_status::index: Lease holder reconciling cluster index.name=linkerd-destination-74d7fdc45d-xfb8l
DEBUG status::Index: linkerd_policy_controller_k8s_status::index: Lease holder reconciling cluster index.name=linkerd-destination-74d7fdc45d-xfb8l
```

The "Lease holder has changed" message actually indicates that the _lease_ has
changed, though the holder may be unchanged.

To improve logging clarity, this change does the following:

* Adds an INFO level log when the leadership status of the controller changes.
* Adds an INFO level log when the status controller patches resources.
* Adds DEBUG level logs when the status controller patches resources.
* Reconciliation housekeeping logging is moved to TRACE level.
* Consistently uses sentence capitalization in user-facing log messages
* Reduces ERROR messages to WARN when handling invalid user-provided data
  (including cluster resources). This ensures that ERRORs are reserved for
  exceptional policy controller states.
@olix0r olix0r force-pushed the ver/policy-log-noise branch from 712c9bd to 9bd5387 Compare November 25, 2024 16:39
@olix0r olix0r merged commit 5a4b9fd into main Nov 25, 2024
43 checks passed
@olix0r olix0r deleted the ver/policy-log-noise branch November 25, 2024 17:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants