Skip to content

fix: workaround for long certificate lifetimes #363

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

dervoeti
Copy link
Member

Description

Workaround for #362

Tested by temporary setting the maximum delay to 1 minute (instead of 6 months) and requesting a cert with a couple minutes lifetime. It rescheduled a few times and then restarted the Pod with a new cert, so the Pod lived for multiple minutes even though the maximum delay was 1 minute, see the recheck_delay in these logs:

2025-06-24T13:47:21.234336Z  INFO reconciling object{object.ref=Pod.v1./example-secret-consumer-0.default object.reason=reconciler requested retry}: stackable_commons_operator::restart_controller::pod: Pod still valid, rescheduling check pod.expires_at=Some(2025-06-24T13:55:11.594779133+00:00) recheck_delay=60s
2025-06-24T13:48:21.237989Z  INFO reconciling object{object.ref=Pod.v1./example-secret-consumer-0.default object.reason=reconciler requested retry}: stackable_commons_operator::restart_controller::pod: Pod still valid, rescheduling check pod.expires_at=Some(2025-06-24T13:55:11.594779133+00:00) recheck_delay=60s
2025-06-24T13:49:21.240752Z  INFO reconciling object{object.ref=Pod.v1./example-secret-consumer-0.default object.reason=reconciler requested retry}: stackable_commons_operator::restart_controller::pod: Pod still valid, rescheduling check pod.expires_at=Some(2025-06-24T13:55:11.594779133+00:00) recheck_delay=60s
2025-06-24T13:50:21.244534Z  INFO reconciling object{object.ref=Pod.v1./example-secret-consumer-0.default object.reason=reconciler requested retry}: stackable_commons_operator::restart_controller::pod: Pod still valid, rescheduling check pod.expires_at=Some(2025-06-24T13:55:11.594779133+00:00) recheck_delay=60s
2025-06-24T13:51:21.247748Z  INFO reconciling object{object.ref=Pod.v1./example-secret-consumer-0.default object.reason=reconciler requested retry}: stackable_commons_operator::restart_controller::pod: Pod still valid, rescheduling check pod.expires_at=Some(2025-06-24T13:55:11.594779133+00:00) recheck_delay=60s
2025-06-24T13:52:21.250501Z  INFO reconciling object{object.ref=Pod.v1./example-secret-consumer-0.default object.reason=reconciler requested retry}: stackable_commons_operator::restart_controller::pod: Pod still valid, rescheduling check pod.expires_at=Some(2025-06-24T13:55:11.594779133+00:00) recheck_delay=60s
2025-06-24T13:53:21.254136Z  INFO reconciling object{object.ref=Pod.v1./example-secret-consumer-0.default object.reason=reconciler requested retry}: stackable_commons_operator::restart_controller::pod: Pod still valid, rescheduling check pod.expires_at=Some(2025-06-24T13:55:11.594779133+00:00) recheck_delay=60s
2025-06-24T13:54:21.257055Z  INFO reconciling object{object.ref=Pod.v1./example-secret-consumer-0.default object.reason=reconciler requested retry}: stackable_commons_operator::restart_controller::pod: Pod still valid, rescheduling check pod.expires_at=Some(2025-06-24T13:55:11.594779133+00:00) recheck_delay=50.33773894s

Which confirms certificate lifetimes greater than the new maximum delay of 6 months still work as expected.

Definition of Done Checklist

  • Not all of these items are applicable to all PRs, the author should update this template to only leave the boxes in that are relevant
  • Please make sure all these things are done and tick the boxes

Author

  • Changes are OpenShift compatible
  • CRD changes approved
  • CRD documentation for all fields, following the style guide.
  • Helm chart can be installed and deployed operator works
  • Integration tests passed (for non trivial changes)
  • Changes need to be "offline" compatible
  • Links to generated (nightly) docs added
  • Release note snippet added

Reviewer

  • Code contains useful comments
  • Code contains useful logging statements
  • (Integration-)Test cases added
  • Documentation added or updated. Follows the style guide.
  • Changelog updated
  • Cargo.toml only contains references to git tags (not specific commits or branches)

Acceptance

  • Feature Tracker has been updated
  • Proper release label has been added
  • Links to generated (nightly) docs added
  • Release note snippet added
  • Add type/deprecation label & add to the deprecation schedule
  • Add type/experimental label & add to the experimental features tracker

@dervoeti dervoeti force-pushed the fix/long-cert-expiry branch from 73844ad to 2e7e96c Compare June 24, 2025 14:19
@dervoeti dervoeti self-assigned this Jun 25, 2025
@lfrancke
Copy link
Member

The original error was when someone requested a cert with 3 years lifetime which leads to a crash. Have you tested this as well?

@dervoeti
Copy link
Member Author

The original error was when someone requested a cert with 3 years lifetime which leads to a crash. Have you tested this as well?

Yes, commons-operator crashlooped before, with this fix it is stable.

My upstream PR just got approved, so we can also decide not to merge this and wait for a new kube-rs release.
This fix is tested though, we could merge it before the next SDP release if a new kube-rs version is not released until then.

@lfrancke
Copy link
Member

Yes, please still merge and create a followup ticket and send it to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants