Skip to content

Commit 427a5f0

Browse files
committed
Update docs
Note that this reflects changes incorporated in #799.
1 parent d32072c commit 427a5f0

File tree

2 files changed

+19
-16
lines changed

2 files changed

+19
-16
lines changed

docs/index.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,9 +46,10 @@ locally with one thread, or `local[N]` to run locally with N threads. You should
4646
Spark also provides a Python interface. To run an example Spark application written in Python, use
4747
`bin/pyspark <program> [params]`. For example,
4848

49-
./bin/pyspark examples/src/main/python/pi.py local[2] 10
49+
./bin/pyspark examples/src/main/python/pi.py 10
5050

5151
or simply `bin/pyspark` without any arguments to run Spark interactively in a python interpreter.
52+
As in Spark shell, you can also pass in the `--master` option to configure your master URL.
5253

5354
# Launching on a Cluster
5455

docs/python-programming-guide.md

Lines changed: 17 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -60,13 +60,9 @@ By default, PySpark requires `python` to be available on the system `PATH` and u
6060

6161
All of PySpark's library dependencies, including [Py4J](http://py4j.sourceforge.net/), are bundled with PySpark and automatically imported.
6262

63-
Standalone PySpark applications should be run using the `bin/spark-submit` script, which automatically
64-
configures the Java and Python environment for running Spark.
65-
66-
6763
# Interactive Use
6864

69-
The `bin/pyspark` script launches a Python interpreter that is configured to run PySpark applications. To use `pyspark` interactively, first build Spark, then launch it directly from the command line without any options:
65+
The `bin/pyspark` script launches a Python interpreter that is configured to run PySpark applications. To use `pyspark` interactively, first build Spark, then launch it directly from the command line:
7066

7167
{% highlight bash %}
7268
$ sbt/sbt assembly
@@ -83,20 +79,24 @@ The Python shell can be used explore data interactively and is a simple way to l
8379
{% endhighlight %}
8480

8581
By default, the `bin/pyspark` shell creates SparkContext that runs applications locally on all of
86-
your machine's logical cores.
87-
To connect to a non-local cluster, or to specify a number of cores, set the `MASTER` environment variable.
88-
For example, to use the `bin/pyspark` shell with a [standalone Spark cluster](spark-standalone.html):
82+
your machine's logical cores. To connect to a non-local cluster, or to specify a number of cores,
83+
set the `--master` flag. For example, to use the `bin/pyspark` shell with a
84+
[standalone Spark cluster](spark-standalone.html):
8985

9086
{% highlight bash %}
91-
$ MASTER=spark://IP:PORT ./bin/pyspark
87+
$ ./bin/pyspark --master spark://1.2.3.4:7077
9288
{% endhighlight %}
9389

9490
Or, to use exactly four cores on the local machine:
9591

9692
{% highlight bash %}
97-
$ MASTER=local[4] ./bin/pyspark
93+
$ ./bin/pyspark --master local[4]
9894
{% endhighlight %}
9995

96+
Under the hood `bin/pyspark` is a wrapper around the
97+
[Spark submit script](cluster-overview.html#launching-applications-with-spark-submit), so these
98+
two scripts share the same list of options. For a complete list of options, run `bin/pyspark` with
99+
the `--help` option.
100100

101101
## IPython
102102

@@ -115,13 +115,14 @@ the [IPython Notebook](http://ipython.org/notebook.html) with PyLab graphing sup
115115
$ IPYTHON_OPTS="notebook --pylab inline" ./bin/pyspark
116116
{% endhighlight %}
117117

118-
IPython also works on a cluster or on multiple cores if you set the `MASTER` environment variable.
118+
IPython also works on a cluster or on multiple cores if you set the `--master` flag.
119119

120120

121121
# Standalone Programs
122122

123-
PySpark can also be used from standalone Python scripts by creating a SparkContext in your script and running the script using `bin/spark-submit`.
124-
The Quick Start guide includes a [complete example](quick-start.html#standalone-applications) of a standalone Python application.
123+
PySpark can also be used from standalone Python scripts by creating a SparkContext in your script
124+
and running the script using `bin/spark-submit`. The Quick Start guide includes a
125+
[complete example](quick-start.html#standalone-applications) of a standalone Python application.
125126

126127
Code dependencies can be deployed by passing .zip or .egg files in the `--py-files` option of `spark-submit`:
127128

@@ -138,6 +139,7 @@ You can set [configuration properties](configuration.html#spark-properties) by p
138139
{% highlight python %}
139140
from pyspark import SparkConf, SparkContext
140141
conf = (SparkConf()
142+
.setMaster("local")
141143
.setAppName("My app")
142144
.set("spark.executor.memory", "1g"))
143145
sc = SparkContext(conf = conf)
@@ -164,6 +166,6 @@ some example applications.
164166
PySpark also includes several sample programs in the [`examples/src/main/python` folder](https://github.com/apache/spark/tree/master/examples/src/main/python).
165167
You can run them by passing the files to `pyspark`; e.g.:
166168

167-
./bin/spark-submit examples/src/main/python/wordcount.py local[2] README.md
169+
./bin/spark-submit examples/src/main/python/wordcount.py README.md
168170

169-
Each program prints usage help when run without arguments.
171+
Each program prints usage help when run without the sufficient arguments.

0 commit comments

Comments
 (0)