Skip to content

Conversation

ChenYi015
Copy link
Member

@ChenYi015 ChenYi015 commented Dec 24, 2024

Purpose of this PR

Proposed changes:

  • Add a Helm pre-install and pre-upgrade hook job to upgrade CRDs. (Will not rollback CRDs when helm rollback.)
  • Add a new values hook.upgradeCrd (defaults to false).
  • The hook job and RBAC resources will be created only if hook.upgradeCrd is true
  • The RBAC resources related to hook will be deleted immediately when hook job completes.

Change Category

  • Bugfix (non-breaking change which fixes an issue)
  • Feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that could affect existing functionality)
  • Documentation update

Rationale

Checklist

  • I have conducted a self-review of my own code.
  • I have updated documentation accordingly.
  • I have added tests that prove my changes are effective or that my feature works.
  • Existing unit tests pass locally with my changes.

Additional Notes

@ChenYi015
Copy link
Member Author

/hold for review

@ChenYi015
Copy link
Member Author

/assign @yuchaoran2011 @jacobsalway @ImpSy

@jacobsalway
Copy link
Member

The fact that Helm doesn't upgrade CRDs during an upgrade has bitten me in the past, so I'm definitely in favour of including some way to do so. It goes without saying that we should never delete CRDs. My only concern would be maintaining Go code to do this versus some other options I've seen in the wild:

If one of these options can work for our use case, there'd be less code/surface area to maintain. Curious what you think as well?

@ChenYi015
Copy link
Member Author

The fact that Helm doesn't upgrade CRDs during an upgrade has bitten me in the past, so I'm definitely in favour of including some way to do so. It goes without saying that we should never delete CRDs. My only concern would be maintaining Go code to do this versus some other options I've seen in the wild:

If one of these options can work for our use case, there'd be less code/surface area to maintain. Curious what you think as well?

  1. Helm creates a secret for each revision of the chart release, containing the base64 encoding of all rendered templates. By default, the size of a secret is limited to 1MB. Therefore, in our case, placing the CRD files under the chart templates would cause the secret to exceed the size limit.
  2. It is just my personal bias that I prefer a self-contained Spark operator docker image rather than using a non-official kubectl image to update the CRDs. BTW, if we choose to use the kubectl way, we need to kubectl replace -f instead of kubectl apply -f due to the size limit of annotations.
  3. I believe maintaining the CRDs in another helm chart will definitely increase the complexity of installing/upgrading spark operator.

@jacobsalway
Copy link
Member

Again apologies for late reply.

  1. Good point, I had forgotten how large the CRs were and hadn't thought about the release secret in any case.
  2. My personal preference is to avoid writing code if I can help it, especially for non-core pieces of functionality like CRD management. I think it helps reduce the total surface area of code to maintain and has less risk of bugs being introduced (especially without test coverage), but open to go with your approach if you'd prefer as you've already written the code.
    1. If we did decide to use kubectl, we would need a custom image with the CRDs embedded since they couldn't be mounted via a ConfigMap due to the above release secret limitation. What about using kubectl apply --server-side rather than kubectl replace as well?
  3. Fair point and agreed. I think most users would be used to installing a single Helm chart for most operators.

@ChenYi015
Copy link
Member Author

If we decide to use the kubectl way to update CRD, we will need to maintain another image which contains kubectl binary and CRD files as they cannot be mounted as ConfigMap due to the size limit.

Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@ChenYi015
Copy link
Member Author

/lifecycle frozen

Copy link
Contributor

@ChenYi015: The lifecycle/frozen label cannot be applied to Pull Requests.

In response to this:

/lifecycle frozen

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@ChenYi015 ChenYi015 force-pushed the helm/hooks branch 3 times, most recently from a543e9e to 2d9ef1d Compare June 10, 2025 12:59
@ChenYi015
Copy link
Member Author

My personal preference is to avoid writing code if I can help it, especially for non-core pieces of functionality like CRD management. I think it helps reduce the total surface area of code to maintain and has less risk of bugs being introduced (especially without test coverage), but open to go with your approach if you'd prefer as you've already written the code.
If we did decide to use kubectl, we would need a custom image with the CRDs embedded since they couldn't be mounted via a ConfigMap due to the above release secret limitation. What about using kubectl apply --server-side rather than kubectl replace as well?

@jacobsalway I have updated this PR to use a custom kubectl image to run kubectl apply --server-side -f /etc/spark-operator/crds to update CRDs.

@ChenYi015
Copy link
Member Author

@nabuskey Could you take a look at this PR?

Copy link
Contributor

@nabuskey nabuskey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CRD management is a pain and no method is perfect. I am inclined to support the kubectl approach.

Copy link
Contributor

@nabuskey nabuskey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jacobsalway Any other concerns from you?

@ChenYi015
Copy link
Member Author

/approve
/hold cancel

Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ChenYi015

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit a154f1d into kubeflow:master Jul 21, 2025
15 checks passed
@ChenYi015 ChenYi015 deleted the helm/hooks branch July 21, 2025 03:38
@ChenYi015 ChenYi015 mentioned this pull request Jul 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants