Skip to content

Conversation

ChenYi015
Copy link
Member

Purpose of this PR

Close #1801.

Proposed changes:

  • Add a new CRD named SparkConnect
  • Implements controller for SparkConnect
  • Add example SparkConnect manifest
  • Update Helm chart

Change Category

  • Bugfix (non-breaking change which fixes an issue)
  • Feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that could affect existing functionality)
  • Documentation update

Rationale

Checklist

  • I have conducted a self-review of my own code.
  • I have updated documentation accordingly.
  • I have added tests that prove my changes are effective or that my feature works.
  • Existing unit tests pass locally with my changes.

Additional Notes

@google-oss-prow google-oss-prow bot requested review from ImpSy and nabuskey June 26, 2025 06:55
@ChenYi015
Copy link
Member Author

/hold for review


// mutateServerService mutates the server service for the SparkConnect resource.
func (r *Reconciler) mutateServerService(ctx context.Context, conn *v1alpha1.SparkConnect, svc *corev1.Service) error {
if svc.CreationTimestamp.IsZero() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to ensure this is applied every reconciliation loop? Not just the first time it's created.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should set immutable fields when creating the server pod. For mutable fields, we can try to update it in every reconciliation.

@hiboyang
Copy link
Contributor

This is great! Thanks @ChenYi015 for the PR!

A quick question, Spark Connect will need a GRPC ingress to expose the driver side Spark Connect server endpoint, similar like the HTTP ingress to expose Spark UI. Does this PR contain code to create such GRPC ingress?

@ChenYi015 ChenYi015 force-pushed the feature/spark-connect branch from 7db7c03 to c96c667 Compare July 1, 2025 04:03
@ChenYi015 ChenYi015 force-pushed the feature/spark-connect branch from c96c667 to cadc7e5 Compare July 1, 2025 13:14
@ChenYi015 ChenYi015 force-pushed the feature/spark-connect branch from cadc7e5 to 6715caf Compare July 1, 2025 13:18
@ChenYi015
Copy link
Member Author

A quick question, Spark Connect will need a GRPC ingress to expose the driver side Spark Connect server endpoint, similar like the HTTP ingress to expose Spark UI. Does this PR contain code to create such GRPC ingress?

@hiboyang Have not included it yet. We will implement this feature in this or following PRs.

Copy link
Contributor

@nabuskey nabuskey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!!

@MonkeyCanCode
Copy link

A quick question, Spark Connect will need a GRPC ingress to expose the driver side Spark Connect server endpoint, similar like the HTTP ingress to expose Spark UI. Does this PR contain code to create such GRPC ingress?

I think it will just be HTTP/2. You can port forward the pod/svc on 15002 then connect from your local.

@ChenYi015
Copy link
Member Author

Will merge this PR and improve it in the following PRs.
/approve

Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ChenYi015

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ChenYi015
Copy link
Member Author

/unhold

@google-oss-prow google-oss-prow bot merged commit 9773369 into kubeflow:master Jul 2, 2025
15 checks passed
@ChenYi015 ChenYi015 deleted the feature/spark-connect branch July 2, 2025 03:21
@kydim
Copy link

kydim commented Jul 11, 2025

I am very happy to see the adding spark-connect, thank you

When will be release spark-connect?

@wkd-woo
Copy link

wkd-woo commented Jul 14, 2025

+1

1 similar comment
@airkhin
Copy link

airkhin commented Jul 21, 2025

+1

@ChenYi015 ChenYi015 mentioned this pull request Jul 22, 2025
@rafagsiqueira
Copy link

@ChenYi015 Thank you for your work on this!
I created a SparkConnect object using the example you provided, but it doesn't seem to have any effect. I do not see any pods created and the status is of the spark-connect object never changes:

NAMESPACE   NAME            AGE   STATUS   PODNAME
spark       spark-connect   18m

Am I missing something?

@ChenYi015
Copy link
Member Author

@rafagsiqueira Please check whether the namespace of SparkConnect is included in spark.jobNamespaces, otherwise it will not be processed by the operater.

@rafagsiqueira
Copy link

@ChenYi015 I suspected that, but since I used the same namespace as the spark operator, I thought it wouldn't be an issue. Let me try with a different namespace.

@rafagsiqueira
Copy link

@ChenYi015 that was indeed the problem. Thank you very much for clarifying! Looking forward to using my newly deployed spark connect.

@torsol
Copy link

torsol commented Aug 21, 2025

This is great! The example yaml specifies spark 4.0.0, but I'm limited to using spark 3.5 to use sedona 1.7.2. What versions of spark is this CRD limited to?

edit:
Found the sparkConf-setting, where you can specify the missing spark-connect-jar. I had to resolve an error related to the ivy-cache directory not found aswell.

spec:
  sparkVersion: 3.5.4
  sparkConf: 
    spark.jars.packages: org.apache.spark:spark-connect_2.12:3.5.4
    spark.driver.extraJavaOptions: "-Divy.cache.dir=/tmp -Divy.home=/tmp"
    spark.jars.ivy: /tmp/.ivy2

@ChenYi015
Copy link
Member Author

The example yaml specifies spark 4.0.0, but I'm limited to using spark 3.5 to use sedona 1.7.2. What versions of spark is this CRD limited to?

@torsol For Spark v4, the spark connect jar is included by default. But for Spark 3.5, you will need to build a image which contains the spark connect jar or use ivy to specify it as a dependency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Spark Connect support
9 participants