-
Notifications
You must be signed in to change notification settings - Fork 28.8k
[SPARK-3706][PySpark] Cannot run IPython REPL with IPYTHON set to "1" and PYSPARK_PYTHON unset #2554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-3706][PySpark] Cannot run IPython REPL with IPYTHON set to "1" and PYSPARK_PYTHON unset #2554
Conversation
… and PYSPARK_PYTHON unset
Can one of the admins verify this patch? |
thanks for identifying this issue and doing the analysis. the whole business of having a separate IPYTHON env variable complicates the situation. what about deprecating it? say, introduce a PYPYTHON_PYTHON_OPTS and change the docs to "set PYPYTHON_PYTHON=ipython and PYPYTHON_PYTHON_OPTS=notebook..." for backward compatibility the top of the file can detect IPYTHON and IPYTHON_OPTS and setup defaults correctly |
also, 'test "$IPYTHON" = "1" should be written as 'test -n "$IPYTHON"', requiring the value to be 1 isn't very shell-ish |
Thank you for the comment. I agree that using PYSPARK_PYTHON and PYSPARK_PYTHON_OPTS environment variables is simpler and IPYTHON flag should not be exposed. I will keep backward compatibility for IPYTHON and IPYTHON_OPTS. Please review the additional commit. |
… execution of PySpark REPL
Jenkins, this is ok to test. |
QA tests have started for PR 2554 at commit
|
QA tests have started for PR 2554 at commit
|
much nicer. you could even remove the doc note about backward compatibility. +1 lgtm |
Thanks for the very thorough description of this issue. It looks like I think that the original motivation for the The approach in this PR is very nice, since we no longer require special handling / detection of IPython. This looks good to me, too, so I'd like to merge it (pending Jenkins). |
QA tests have finished for PR 2554 at commit
|
Test FAILed. |
QA tests have finished for PR 2554 at commit
|
Test PASSed. |
This looks great, but I noticed one minor problem when running some manual tests: If I run
The problem here is that |
Switching to PYTHONUNBUFFERED should be a one- or two-line fix. Just remove the
|
…ad of -u option Because IPython cannot recognize -u option, we will use PYTHONUNBUFFERED environment variable which has exactly same effect as -u option.
QA tests have started for PR 2554 at commit
|
Thank you for the suggestions, @mattf and @JoshRosen . I deleted the sentence about IPYTHON and IPYTHON_OPTS, To confirm that PYTHONUNBUFFERED is set, # env.py
import os
print os.environ['PYTHONUNBUFFERED'] $ PYSPARK_PYTHON=ipython ./bin/pyspark env.py
...
YES |
QA tests have finished for PR 2554 at commit
|
Test PASSed. |
This looks good to me; I tested it out locally and everything works as expected. Thanks! |
…upport improvements: This pull request addresses a few issues related to PySpark's IPython support: - Fix the remaining uses of the '-u' flag, which IPython doesn't support (see SPARK-3772). - Change PYSPARK_PYTHON_OPTS to PYSPARK_DRIVER_PYTHON_OPTS, so that the old name is reserved in case we ever want to allow the worker Python options to be customized (this variable was introduced in #2554 and hasn't landed in a release yet, so this doesn't break any compatibility). - Introduce a PYSPARK_DRIVER_PYTHON option that allows the driver to use `ipython` while the workers use a different Python version. - Attempt to use Python 2.7 by default if PYSPARK_PYTHON is not specified. - Retain the old semantics for IPYTHON=1 and IPYTHON_OPTS (to avoid breaking existing example programs). There are more details in a block comment in `bin/pyspark`. Author: Josh Rosen <[email protected]> Closes #2651 from JoshRosen/SPARK-3772 and squashes the following commits: 7b8eb86 [Josh Rosen] More changes to PySpark python executable configuration: c4f5778 [Josh Rosen] [SPARK-3772] Allow ipython to be used by Pyspark workers; IPython fixes:
Problem
The section "Using the shell" in Spark Programming Guide (https://spark.apache.org/docs/latest/programming-guide.html#using-the-shell) says that we can run pyspark REPL through IPython.
But a folloing command does not run IPython but a default Python executable.
the spark/bin/pyspark script on the commit b235e01 decides which executable and options it use folloing way.
Therefore, when PYSPARK_PYTHON is unset, python is executed though IPYTHON is "1".
In other word, when PYSPARK_PYTHON is unset, IPYTHON_OPS and IPYTHON has no effect on decide which command to use.
Suggestion
The pyspark script should determine firstly whether a user wants to run IPython or other executables.
See the pull request for more detailed modification.