Skip to content

[SPARK-46912] Use worker JAVA_HOME and SPARK_HOME instead of from submitter #44943

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

thanhdanh1803
Copy link

What changes were proposed in this pull request?

Replace JAVA_HOME and SPARK_HOME from submitter by value of worker when building localCommand.

Why are the changes needed?

There is a problem when submit a job in cluster mode to a standalone cluster. The worker start a java job using value from submitter JAVA_HOME instead of itself.

Does this PR introduce any user-facing change?

No

Was this patch authored or co-authored using generative AI tooling?

No

@github-actions github-actions bot added the CORE label Jan 30, 2024
@srowen
Copy link
Member

srowen commented Jan 30, 2024

Hm, how does JAVA_HOME get from the 'submitter' - what do you mean, the application submitter? but the worker is already running by that point

@thanhdanh1803
Copy link
Author

Hm, how does JAVA_HOME get from the 'submitter' - what do you mean, the application submitter? but the worker is already running by that point

  • The submitter is the client machine which run then command spark-submit (with deploy-mode = cluster).
  • The worker is already running at this point but the driver does not. When master received a submit request, it starts creating a driver on a worker, at that point, the driver copy the command and environment variable from submit command and use in its session. It sounds weird but that what I am facing in my case.

Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label May 11, 2024
@github-actions github-actions bot closed this May 12, 2024
@jywjyw
Copy link

jywjyw commented Dec 6, 2024

It's a bug. Try this command on machine A:

bin/spark-submit --master spark://{REMOTE}:7077 --deploy-mode cluster  --class org.apache.spark.examples.SparkPi  file:///opt/spark/examples/jars/spark-examples_2.12-3.5.3.jar 

it will submit application to standalone cluster(important: deploy-mode=cluster), then error occurs:

Exception from cluster was: java.io.IOException: Cannot run program "/usr/java/default//bin/java" (in directory "/opt/bitnami/spark/work/driver-20241206013526-0001"): error=2, No such file or directory

Because the worker node searched java from "/usr/java/default//bin/java", but "/usr/java/default//bin/java" is machine A's java path, not worker's java path

@MeltonSmith
Copy link

I've opened a new PR: #51314

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants