Skip to content

Conversation

pvbouwel
Copy link
Contributor

@pvbouwel pvbouwel commented Mar 5, 2025

Before this fix if you have a Spark 3.x spec where the executor has a lifecycle then the webhook will fail to identify the correct container. As described in issue 2457

Purpose of this PR

Solve the bug as detailed in #2457 where spark3 executors are blocked if they have a lifecycle

Proposed changes:

  • re-use code to identify the container in the pod

Change Category

  • Bugfix (non-breaking change which fixes an issue)
  • Feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that could affect existing functionality)
  • Documentation update

Rationale

Checklist

  • I have conducted a self-review of my own code.
  • I have updated documentation accordingly => Not applicable ?
  • I have added tests that prove my changes are effective or that my feature works.
  • Existing unit tests pass locally with my changes.

Additional Notes

pvbouwel added 2 commits March 5, 2025 10:20
Before this fix if you have  a Spark 3.x spec where the executor has a lifecycle then the webhook will fail to identify the correct container. As described in issue [2457](kubeflow#2457)

Signed-off-by: pvbouwel <[email protected]>
@ChenYi015
Copy link
Member

@pvbouwel Thank you for reporting the bug and raising a PR to fix it!
Please run make go-fmt to format the code to pass the CI check. By the way, we recommend using pod template feature to configure lifecycle of Spark pods since we are going to deprecate the webhook in the future.

@ChenYi015 ChenYi015 changed the title Bugfix/lifecycle on executor spark3 fix: webhook fail to add lifecycle to Spark3 executor pods Mar 5, 2025
Signed-off-by: pvbouwel <[email protected]>
@pvbouwel
Copy link
Contributor Author

pvbouwel commented Mar 5, 2025

Pushed formatting. Thanks for the info; I will look into the pod template feature such that we can migrate our workflows early on

Copy link
Member

@ChenYi015 ChenYi015 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ChenYi015

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-prow google-oss-prow bot merged commit 3128c7f into kubeflow:master Mar 6, 2025
16 checks passed
@pvbouwel pvbouwel deleted the bugfix/lifecycle_on_executor_spark3 branch March 7, 2025 07:35
ChenYi015 pushed a commit that referenced this pull request Mar 21, 2025
* bugfix: A lifecycle on a spark3 executor should not fail

Before this fix if you have  a Spark 3.x spec where the executor has a lifecycle then the webhook will fail to identify the correct container. As described in issue [2457](#2457)

Signed-off-by: pvbouwel <[email protected]>

* tests: Add coverage for spark3 executor with a lifecycle

Signed-off-by: pvbouwel <[email protected]>

* make go-fmt

Signed-off-by: pvbouwel <[email protected]>

---------

Signed-off-by: pvbouwel <[email protected]>
(cherry picked from commit 3128c7f)
@ChenYi015 ChenYi015 mentioned this pull request Mar 21, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants