Skip to content

Commit 4e9b551

Browse files
committed
[SPARK-3772] Allow ipython to be used by Pyspark workers; IPython support improvements:
This pull request addresses a few issues related to PySpark's IPython support: - Fix the remaining uses of the '-u' flag, which IPython doesn't support (see SPARK-3772). - Change PYSPARK_PYTHON_OPTS to PYSPARK_DRIVER_PYTHON_OPTS, so that the old name is reserved in case we ever want to allow the worker Python options to be customized (this variable was introduced in apache#2554 and hasn't landed in a release yet, so this doesn't break any compatibility). - Introduce a PYSPARK_DRIVER_PYTHON option that allows the driver to use `ipython` while the workers use a different Python version. - Attempt to use Python 2.7 by default if PYSPARK_PYTHON is not specified. - Retain the old semantics for IPYTHON=1 and IPYTHON_OPTS (to avoid breaking existing example programs). There are more details in a block comment in `bin/pyspark`. Author: Josh Rosen <[email protected]> Closes apache#2651 from JoshRosen/SPARK-3772 and squashes the following commits: 7b8eb86 [Josh Rosen] More changes to PySpark python executable configuration: c4f5778 [Josh Rosen] [SPARK-3772] Allow ipython to be used by Pyspark workers; IPython fixes:
1 parent ac30205 commit 4e9b551

File tree

4 files changed

+51
-20
lines changed

4 files changed

+51
-20
lines changed

bin/pyspark

Lines changed: 38 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -50,22 +50,47 @@ fi
5050

5151
. "$FWDIR"/bin/load-spark-env.sh
5252

53-
# Figure out which Python executable to use
53+
# In Spark <= 1.1, setting IPYTHON=1 would cause the driver to be launched using the `ipython`
54+
# executable, while the worker would still be launched using PYSPARK_PYTHON.
55+
#
56+
# In Spark 1.2, we removed the documentation of the IPYTHON and IPYTHON_OPTS variables and added
57+
# PYSPARK_DRIVER_PYTHON and PYSPARK_DRIVER_PYTHON_OPTS to allow IPython to be used for the driver.
58+
# Now, users can simply set PYSPARK_DRIVER_PYTHON=ipython to use IPython and set
59+
# PYSPARK_DRIVER_PYTHON_OPTS to pass options when starting the Python driver
60+
# (e.g. PYSPARK_DRIVER_PYTHON_OPTS='notebook'). This supports full customization of the IPython
61+
# and executor Python executables.
62+
#
63+
# For backwards-compatibility, we retain the old IPYTHON and IPYTHON_OPTS variables.
64+
65+
# Determine the Python executable to use if PYSPARK_PYTHON or PYSPARK_DRIVER_PYTHON isn't set:
66+
if hash python2.7 2>/dev/null; then
67+
# Attempt to use Python 2.7, if installed:
68+
DEFAULT_PYTHON="python2.7"
69+
else
70+
DEFAULT_PYTHON="python"
71+
fi
72+
73+
# Determine the Python executable to use for the driver:
74+
if [[ -n "$IPYTHON_OPTS" || "$IPYTHON" == "1" ]]; then
75+
# If IPython options are specified, assume user wants to run IPython
76+
# (for backwards-compatibility)
77+
PYSPARK_DRIVER_PYTHON_OPTS="$PYSPARK_DRIVER_PYTHON_OPTS $IPYTHON_OPTS"
78+
PYSPARK_DRIVER_PYTHON="ipython"
79+
elif [[ -z "$PYSPARK_DRIVER_PYTHON" ]]; then
80+
PYSPARK_DRIVER_PYTHON="${PYSPARK_PYTHON:-"$DEFAULT_PYTHON"}"
81+
fi
82+
83+
# Determine the Python executable to use for the executors:
5484
if [[ -z "$PYSPARK_PYTHON" ]]; then
55-
if [[ "$IPYTHON" = "1" || -n "$IPYTHON_OPTS" ]]; then
56-
# for backward compatibility
57-
PYSPARK_PYTHON="ipython"
85+
if [[ $PYSPARK_DRIVER_PYTHON == *ipython* && $DEFAULT_PYTHON != "python2.7" ]]; then
86+
echo "IPython requires Python 2.7+; please install python2.7 or set PYSPARK_PYTHON" 1>&2
87+
exit 1
5888
else
59-
PYSPARK_PYTHON="python"
89+
PYSPARK_PYTHON="$DEFAULT_PYTHON"
6090
fi
6191
fi
6292
export PYSPARK_PYTHON
6393

64-
if [[ -z "$PYSPARK_PYTHON_OPTS" && -n "$IPYTHON_OPTS" ]]; then
65-
# for backward compatibility
66-
PYSPARK_PYTHON_OPTS="$IPYTHON_OPTS"
67-
fi
68-
6994
# Add the PySpark classes to the Python path:
7095
export PYTHONPATH="$SPARK_HOME/python/:$PYTHONPATH"
7196
export PYTHONPATH="$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip:$PYTHONPATH"
@@ -93,9 +118,9 @@ if [[ -n "$SPARK_TESTING" ]]; then
93118
unset YARN_CONF_DIR
94119
unset HADOOP_CONF_DIR
95120
if [[ -n "$PYSPARK_DOC_TEST" ]]; then
96-
exec "$PYSPARK_PYTHON" -m doctest $1
121+
exec "$PYSPARK_DRIVER_PYTHON" -m doctest $1
97122
else
98-
exec "$PYSPARK_PYTHON" $1
123+
exec "$PYSPARK_DRIVER_PYTHON" $1
99124
fi
100125
exit
101126
fi
@@ -111,5 +136,5 @@ if [[ "$1" =~ \.py$ ]]; then
111136
else
112137
# PySpark shell requires special handling downstream
113138
export PYSPARK_SHELL=1
114-
exec "$PYSPARK_PYTHON" $PYSPARK_PYTHON_OPTS
139+
exec "$PYSPARK_DRIVER_PYTHON" $PYSPARK_DRIVER_PYTHON_OPTS
115140
fi

core/src/main/scala/org/apache/spark/api/python/PythonWorkerFactory.scala

Lines changed: 6 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -108,10 +108,12 @@ private[spark] class PythonWorkerFactory(pythonExec: String, envVars: Map[String
108108
serverSocket = new ServerSocket(0, 1, InetAddress.getByAddress(Array(127, 0, 0, 1)))
109109

110110
// Create and start the worker
111-
val pb = new ProcessBuilder(Seq(pythonExec, "-u", "-m", "pyspark.worker"))
111+
val pb = new ProcessBuilder(Seq(pythonExec, "-m", "pyspark.worker"))
112112
val workerEnv = pb.environment()
113113
workerEnv.putAll(envVars)
114114
workerEnv.put("PYTHONPATH", pythonPath)
115+
// This is equivalent to setting the -u flag; we use it because ipython doesn't support -u:
116+
workerEnv.put("PYTHONUNBUFFERED", "YES")
115117
val worker = pb.start()
116118

117119
// Redirect worker stdout and stderr
@@ -149,10 +151,12 @@ private[spark] class PythonWorkerFactory(pythonExec: String, envVars: Map[String
149151

150152
try {
151153
// Create and start the daemon
152-
val pb = new ProcessBuilder(Seq(pythonExec, "-u", "-m", "pyspark.daemon"))
154+
val pb = new ProcessBuilder(Seq(pythonExec, "-m", "pyspark.daemon"))
153155
val workerEnv = pb.environment()
154156
workerEnv.putAll(envVars)
155157
workerEnv.put("PYTHONPATH", pythonPath)
158+
// This is equivalent to setting the -u flag; we use it because ipython doesn't support -u:
159+
workerEnv.put("PYTHONUNBUFFERED", "YES")
156160
daemon = pb.start()
157161

158162
val in = new DataInputStream(daemon.getInputStream)

core/src/main/scala/org/apache/spark/deploy/PythonRunner.scala

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -34,7 +34,8 @@ object PythonRunner {
3434
val pythonFile = args(0)
3535
val pyFiles = args(1)
3636
val otherArgs = args.slice(2, args.length)
37-
val pythonExec = sys.env.get("PYSPARK_PYTHON").getOrElse("python") // TODO: get this from conf
37+
val pythonExec =
38+
sys.env.getOrElse("PYSPARK_DRIVER_PYTHON", sys.env.getOrElse("PYSPARK_PYTHON", "python"))
3839

3940
// Format python file paths before adding them to the PYTHONPATH
4041
val formattedPythonFile = formatPath(pythonFile)
@@ -57,6 +58,7 @@ object PythonRunner {
5758
val builder = new ProcessBuilder(Seq(pythonExec, formattedPythonFile) ++ otherArgs)
5859
val env = builder.environment()
5960
env.put("PYTHONPATH", pythonPath)
61+
// This is equivalent to setting the -u flag; we use it because ipython doesn't support -u:
6062
env.put("PYTHONUNBUFFERED", "YES") // value is needed to be set to a non-empty string
6163
env.put("PYSPARK_GATEWAY_PORT", "" + gatewayServer.getListeningPort)
6264
builder.redirectErrorStream(true) // Ugly but needed for stdout and stderr to synchronize

docs/programming-guide.md

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -211,17 +211,17 @@ For a complete list of options, run `pyspark --help`. Behind the scenes,
211211

212212
It is also possible to launch the PySpark shell in [IPython](http://ipython.org), the
213213
enhanced Python interpreter. PySpark works with IPython 1.0.0 and later. To
214-
use IPython, set the `PYSPARK_PYTHON` variable to `ipython` when running `bin/pyspark`:
214+
use IPython, set the `PYSPARK_DRIVER_PYTHON` variable to `ipython` when running `bin/pyspark`:
215215

216216
{% highlight bash %}
217-
$ PYSPARK_PYTHON=ipython ./bin/pyspark
217+
$ PYSPARK_DRIVER_PYTHON=ipython ./bin/pyspark
218218
{% endhighlight %}
219219

220-
You can customize the `ipython` command by setting `PYSPARK_PYTHON_OPTS`. For example, to launch
220+
You can customize the `ipython` command by setting `PYSPARK_DRIVER_PYTHON_OPTS`. For example, to launch
221221
the [IPython Notebook](http://ipython.org/notebook.html) with PyLab plot support:
222222

223223
{% highlight bash %}
224-
$ PYSPARK_PYTHON=ipython PYSPARK_PYTHON_OPTS="notebook --pylab inline" ./bin/pyspark
224+
$ PYSPARK_DRIVER_PYTHON=ipython PYSPARK_DRIVER_PYTHON_OPTS="notebook --pylab inline" ./bin/pyspark
225225
{% endhighlight %}
226226

227227
</div>

0 commit comments

Comments
 (0)