You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/python-programming-guide.md
+17-15Lines changed: 17 additions & 15 deletions
Original file line number
Diff line number
Diff line change
@@ -60,13 +60,9 @@ By default, PySpark requires `python` to be available on the system `PATH` and u
60
60
61
61
All of PySpark's library dependencies, including [Py4J](http://py4j.sourceforge.net/), are bundled with PySpark and automatically imported.
62
62
63
-
Standalone PySpark applications should be run using the `bin/spark-submit` script, which automatically
64
-
configures the Java and Python environment for running Spark.
65
-
66
-
67
63
# Interactive Use
68
64
69
-
The `bin/pyspark` script launches a Python interpreter that is configured to run PySpark applications. To use `pyspark` interactively, first build Spark, then launch it directly from the command line without any options:
65
+
The `bin/pyspark` script launches a Python interpreter that is configured to run PySpark applications. To use `pyspark` interactively, first build Spark, then launch it directly from the command line:
70
66
71
67
{% highlight bash %}
72
68
$ sbt/sbt assembly
@@ -83,20 +79,24 @@ The Python shell can be used explore data interactively and is a simple way to l
83
79
{% endhighlight %}
84
80
85
81
By default, the `bin/pyspark` shell creates SparkContext that runs applications locally on all of
86
-
your machine's logical cores.
87
-
To connect to a non-local cluster, or to specify a number of cores, set the `MASTER` environment variable.
88
-
For example, to use the `bin/pyspark` shell with a [standalone Spark cluster](spark-standalone.html):
82
+
your machine's logical cores. To connect to a non-local cluster, or to specify a number of cores,
83
+
set the `--master` flag. For example, to use the `bin/pyspark` shell with a
IPython also works on a cluster or on multiple cores if you set the `MASTER` environment variable.
118
+
IPython also works on a cluster or on multiple cores if you set the `--master` flag.
119
119
120
120
121
121
# Standalone Programs
122
122
123
-
PySpark can also be used from standalone Python scripts by creating a SparkContext in your script and running the script using `bin/spark-submit`.
124
-
The Quick Start guide includes a [complete example](quick-start.html#standalone-applications) of a standalone Python application.
123
+
PySpark can also be used from standalone Python scripts by creating a SparkContext in your script
124
+
and running the script using `bin/spark-submit`. The Quick Start guide includes a
125
+
[complete example](quick-start.html#standalone-applications) of a standalone Python application.
125
126
126
127
Code dependencies can be deployed by passing .zip or .egg files in the `--py-files` option of `spark-submit`:
127
128
@@ -138,6 +139,7 @@ You can set [configuration properties](configuration.html#spark-properties) by p
138
139
{% highlight python %}
139
140
from pyspark import SparkConf, SparkContext
140
141
conf = (SparkConf()
142
+
.setMaster("local")
141
143
.setAppName("My app")
142
144
.set("spark.executor.memory", "1g"))
143
145
sc = SparkContext(conf = conf)
@@ -164,6 +166,6 @@ some example applications.
164
166
PySpark also includes several sample programs in the [`examples/src/main/python` folder](https://github.com/apache/spark/tree/master/examples/src/main/python).
165
167
You can run them by passing the files to `pyspark`; e.g.:
0 commit comments