Skip to content

Commit 6ceed85

Browse files
jaceklaskowskisrowen
authored andcommitted
Docs small fixes
Author: Jacek Laskowski <[email protected]> Closes apache#8629 from jaceklaskowski/docs-fixes.
1 parent 9d8e838 commit 6ceed85

File tree

2 files changed

+19
-19
lines changed

2 files changed

+19
-19
lines changed

docs/building-spark.md

Lines changed: 11 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -61,12 +61,13 @@ If you don't run this, you may see errors like the following:
6161
You can fix this by setting the `MAVEN_OPTS` variable as discussed before.
6262

6363
**Note:**
64-
* *For Java 8 and above this step is not required.*
65-
* *If using `build/mvn` and `MAVEN_OPTS` were not already set, the script will automate this for you.*
64+
65+
* For Java 8 and above this step is not required.
66+
* If using `build/mvn` with no `MAVEN_OPTS` set, the script will automate this for you.
6667

6768
# Specifying the Hadoop Version
6869

69-
Because HDFS is not protocol-compatible across versions, if you want to read from HDFS, you'll need to build Spark against the specific HDFS version in your environment. You can do this through the "hadoop.version" property. If unset, Spark will build against Hadoop 2.2.0 by default. Note that certain build profiles are required for particular Hadoop versions:
70+
Because HDFS is not protocol-compatible across versions, if you want to read from HDFS, you'll need to build Spark against the specific HDFS version in your environment. You can do this through the `hadoop.version` property. If unset, Spark will build against Hadoop 2.2.0 by default. Note that certain build profiles are required for particular Hadoop versions:
7071

7172
<table class="table">
7273
<thead>
@@ -91,7 +92,7 @@ mvn -Dhadoop.version=1.2.1 -Phadoop-1 -DskipTests clean package
9192
mvn -Dhadoop.version=2.0.0-mr1-cdh4.2.0 -Phadoop-1 -DskipTests clean package
9293
{% endhighlight %}
9394

94-
You can enable the "yarn" profile and optionally set the "yarn.version" property if it is different from "hadoop.version". Spark only supports YARN versions 2.2.0 and later.
95+
You can enable the `yarn` profile and optionally set the `yarn.version` property if it is different from `hadoop.version`. Spark only supports YARN versions 2.2.0 and later.
9596

9697
Examples:
9798

@@ -125,7 +126,7 @@ mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -Phive -Phive-thriftserver -Dskip
125126
# Building for Scala 2.11
126127
To produce a Spark package compiled with Scala 2.11, use the `-Dscala-2.11` property:
127128

128-
dev/change-scala-version.sh 2.11
129+
./dev/change-scala-version.sh 2.11
129130
mvn -Pyarn -Phadoop-2.4 -Dscala-2.11 -DskipTests clean package
130131

131132
Spark does not yet support its JDBC component for Scala 2.11.
@@ -163,11 +164,9 @@ the `spark-parent` module).
163164

164165
Thus, the full flow for running continuous-compilation of the `core` submodule may look more like:
165166

166-
```
167-
$ mvn install
168-
$ cd core
169-
$ mvn scala:cc
170-
```
167+
$ mvn install
168+
$ cd core
169+
$ mvn scala:cc
171170

172171
# Building Spark with IntelliJ IDEA or Eclipse
173172

@@ -193,11 +192,11 @@ then ship it over to the cluster. We are investigating the exact cause for this.
193192

194193
# Packaging without Hadoop Dependencies for YARN
195194

196-
The assembly jar produced by `mvn package` will, by default, include all of Spark's dependencies, including Hadoop and some of its ecosystem projects. On YARN deployments, this causes multiple versions of these to appear on executor classpaths: the version packaged in the Spark assembly and the version on each node, included with yarn.application.classpath. The `hadoop-provided` profile builds the assembly without including Hadoop-ecosystem projects, like ZooKeeper and Hadoop itself.
195+
The assembly jar produced by `mvn package` will, by default, include all of Spark's dependencies, including Hadoop and some of its ecosystem projects. On YARN deployments, this causes multiple versions of these to appear on executor classpaths: the version packaged in the Spark assembly and the version on each node, included with `yarn.application.classpath`. The `hadoop-provided` profile builds the assembly without including Hadoop-ecosystem projects, like ZooKeeper and Hadoop itself.
197196

198197
# Building with SBT
199198

200-
Maven is the official recommendation for packaging Spark, and is the "build of reference".
199+
Maven is the official build tool recommended for packaging Spark, and is the *build of reference*.
201200
But SBT is supported for day-to-day development since it can provide much faster iterative
202201
compilation. More advanced developers may wish to use SBT.
203202

docs/cluster-overview.md

Lines changed: 8 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -5,18 +5,19 @@ title: Cluster Mode Overview
55

66
This document gives a short overview of how Spark runs on clusters, to make it easier to understand
77
the components involved. Read through the [application submission guide](submitting-applications.html)
8-
to submit applications to a cluster.
8+
to learn about launching applications on a cluster.
99

1010
# Components
1111

12-
Spark applications run as independent sets of processes on a cluster, coordinated by the SparkContext
12+
Spark applications run as independent sets of processes on a cluster, coordinated by the `SparkContext`
1313
object in your main program (called the _driver program_).
14+
1415
Specifically, to run on a cluster, the SparkContext can connect to several types of _cluster managers_
15-
(either Spark's own standalone cluster manager or Mesos/YARN), which allocate resources across
16+
(either Spark's own standalone cluster manager, Mesos or YARN), which allocate resources across
1617
applications. Once connected, Spark acquires *executors* on nodes in the cluster, which are
1718
processes that run computations and store data for your application.
1819
Next, it sends your application code (defined by JAR or Python files passed to SparkContext) to
19-
the executors. Finally, SparkContext sends *tasks* for the executors to run.
20+
the executors. Finally, SparkContext sends *tasks* to the executors to run.
2021

2122
<p style="text-align: center;">
2223
<img src="img/cluster-overview.png" title="Spark cluster components" alt="Spark cluster components" />
@@ -33,9 +34,9 @@ There are several useful things to note about this architecture:
3334
2. Spark is agnostic to the underlying cluster manager. As long as it can acquire executor
3435
processes, and these communicate with each other, it is relatively easy to run it even on a
3536
cluster manager that also supports other applications (e.g. Mesos/YARN).
36-
3. The driver program must listen for and accept incoming connections from its executors throughout
37-
its lifetime (e.g., see [spark.driver.port and spark.fileserver.port in the network config
38-
section](configuration.html#networking)). As such, the driver program must be network
37+
3. The driver program must listen for and accept incoming connections from its executors throughout
38+
its lifetime (e.g., see [spark.driver.port and spark.fileserver.port in the network config
39+
section](configuration.html#networking)). As such, the driver program must be network
3940
addressable from the worker nodes.
4041
4. Because the driver schedules tasks on the cluster, it should be run close to the worker
4142
nodes, preferably on the same local area network. If you'd like to send requests to the

0 commit comments

Comments
 (0)