Skip to content

Commit de1b207

Browse files
committed
Update docs to reflect new ports
1 parent b565079 commit de1b207

File tree

3 files changed

+192
-114
lines changed

3 files changed

+192
-114
lines changed

docs/configuration.md

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -558,13 +558,66 @@ Apart from these, the following properties are also available, and may be useful
558558
<td>(local hostname)</td>
559559
<td>
560560
Hostname or IP address for the driver to listen on.
561+
This is used for communicating with the executors and the standalone Master.
561562
</td>
562563
</tr>
563564
<tr>
564565
<td><code>spark.driver.port</code></td>
565566
<td>(random)</td>
566567
<td>
567568
Port for the driver to listen on.
569+
This is used for communicating with the executors and the standalone Master.
570+
</td>
571+
</tr>
572+
<tr>
573+
<td><code>spark.fileserver.port</code></td>
574+
<td>(random)</td>
575+
<td>
576+
Port for the driver's HTTP file server to listen on.
577+
</td>
578+
</tr>
579+
<tr>
580+
<td><code>spark.broadcast.port</code></td>
581+
<td>(random)</td>
582+
<td>
583+
Port for the driver's HTTP broadcast server to listen on.
584+
This is not relevant for torrent broadcast.
585+
</td>
586+
</tr>
587+
<tr>
588+
<td><code>spark.replClassServer.port</code></td>
589+
<td>(random)</td>
590+
<td>
591+
Port for the driver's HTTP class server to listen on.
592+
This is only relevant for Spark shell.
593+
</td>
594+
</tr>
595+
<tr>
596+
<td><code>spark.blockManager.port</code></td>
597+
<td>(random)</td>
598+
<td>
599+
Port for all block managers to listen on. These exist on both the driver and the executors.
600+
</td>
601+
</tr>
602+
<tr>
603+
<td><code>spark.executor.port</code></td>
604+
<td>(random)</td>
605+
<td>
606+
Port for the executor to listen on. This is used for communicating with the driver.
607+
</td>
608+
</tr>
609+
<tr>
610+
<td><code>spark.executor.env.port</code></td>
611+
<td>(random)</td>
612+
<td>
613+
Port used by the executor's actor system for various purposes.
614+
</td>
615+
</tr>
616+
<tr>
617+
<td><code>spark.standalone.cluster.port</code></td>
618+
<td>(random)</td>
619+
<td>
620+
Port used by <code>org.apache.spark.deploy.Client</code> in standalone cluster deploy mode.
568621
</td>
569622
</tr>
570623
<tr>

docs/security.md

Lines changed: 136 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,14 +7,147 @@ Spark currently supports authentication via a shared secret. Authentication can
77

88
* For Spark on [YARN](running-on-yarn.html) deployments, configuring `spark.authenticate` to `true` will automatically handle generating and distributing the shared secret. Each application will use a unique shared secret.
99
* For other types of Spark deployments, the Spark parameter `spark.authenticate.secret` should be configured on each of the nodes. This secret will be used by all the Master/Workers and applications.
10+
* **IMPORTANT NOTE:** *The experimental Netty shuffle path (`spark.shuffle.use.netty`) is not secured, so do not use Netty for shuffles if running with authentication.*
11+
12+
## Web UI
1013

1114
The Spark UI can also be secured by using [javax servlet filters](http://docs.oracle.com/javaee/6/api/javax/servlet/Filter.html) via the `spark.ui.filters` setting. A user may want to secure the UI if it has data that other users should not be allowed to see. The javax servlet filter specified by the user can authenticate the user and then once the user is logged in, Spark can compare that user versus the view ACLs to make sure they are authorized to view the UI. The configs `spark.ui.acls.enable` and `spark.ui.view.acls` control the behavior of the ACLs. Note that the user who started the application always has view access to the UI.
1215
On YARN, the Spark UI uses the standard YARN web application proxy mechanism and will authenticate via any installed Hadoop filters.
1316

17+
## Event Logging
18+
1419
If your applications are using event logging, the directory where the event logs go (`spark.eventLog.dir`) should be manually created and have the proper permissions set on it. If you want those log files secured, the permissions should be set to `drwxrwxrwxt` for that directory. The owner of the directory should be the super user who is running the history server and the group permissions should be restricted to super user group. This will allow all users to write to the directory but will prevent unprivileged users from removing or renaming a file unless they own the file or directory. The event log files will be created by Spark with permissions such that only the user and group have read and write access.
1520

16-
**IMPORTANT NOTE:** *The experimental Netty shuffle path (`spark.shuffle.use.netty`) is not secured, so do not use Netty for shuffles if running with authentication.*
21+
## Configuring Ports for Network Security
22+
23+
Spark makes heavy use of the network, and some environments have strict requirements for using tight
24+
firewall settings. Below are the primary ports that Spark uses for its communication and how to
25+
configure those ports.
26+
27+
### Standalone mode only
28+
29+
<table class="table">
30+
<tr>
31+
<th>From</th><th>To</th><th>Default Port</th><th>Purpose</th><th>Configuration
32+
Setting</th><th>Notes</th>
33+
</tr>
34+
<tr>
35+
<td>Browser</td>
36+
<td>Standalone Master</td>
37+
<td>8080</td>
38+
<td>Web UI</td>
39+
<td><code>master.ui.port<br>SPARK_MASTER_WEBUI_PORT</code></td>
40+
<td>Jetty-based. Standalone mode only.</td>
41+
</tr>
42+
<tr>
43+
<td>Browser</td>
44+
<td>Standalone Worker</td>
45+
<td>8081</td>
46+
<td>Web UI</td>
47+
<td><code>worker.ui.port<br>SPARK_WORKER_WEBUI_PORT</code></td>
48+
<td>Jetty-based. Standalone mode only.</td>
49+
</tr>
50+
<tr>
51+
<td>Driver<br>Standalone Worker</td>
52+
<td>Standalone Master</td>
53+
<td>7077</td>
54+
<td>Submit job to cluster<br>Join cluster</td>
55+
<td><code>SPARK_MASTER_PORT</code></td>
56+
<td>Akka-based. Set to "0" to choose a port randomly. Standalone mode only.</td>
57+
</tr>
58+
<tr>
59+
<td>Standalone Master</td>
60+
<td>Standalone Worker</td>
61+
<td>(random)</td>
62+
<td>Schedule executors</td>
63+
<td><code>SPARK_WORKER_PORT</code></td>
64+
<td>Akka-based. Set to "0" to choose a port randomly. Standalone mode only.</td>
65+
</tr>
66+
</table>
67+
68+
### All cluster managers
69+
70+
<table class="table">
71+
<tr>
72+
<th>From</th><th>To</th><th>Default Port</th><th>Purpose</th><th>Configuration
73+
Setting</th><th>Notes</th>
74+
</tr>
75+
<tr>
76+
<td>Browser</td>
77+
<td>Application</td>
78+
<td>4040</td>
79+
<td>Web UI</td>
80+
<td><code>spark.ui.port</code></td>
81+
<td>Jetty-based</td>
82+
</tr>
83+
<tr>
84+
<td>Browser</td>
85+
<td>History Server</td>
86+
<td>18080</td>
87+
<td>Web UI</td>
88+
<td><code>spark.history.ui.port</code></td>
89+
<td>Jetty-based</td>
90+
</tr>
91+
<tr>
92+
<td>Executor<br>Standalone Master</td>
93+
<td>Driver</td>
94+
<td>(random)</td>
95+
<td>Connect to application<br>Notify executor state changes</td>
96+
<td><code>spark.driver.port</code></td>
97+
<td>Akka-based. Set to "0" to choose a port randomly.</td>
98+
</tr>
99+
<tr>
100+
<td>Driver</td>
101+
<td>Executor</td>
102+
<td>(random)</td>
103+
<td>Schedule tasks</td>
104+
<td><code>spark.executor.port</code></td>
105+
<td>Akka-based. Set to "0" to choose a port randomly.</td>
106+
</tr>
107+
<tr>
108+
<td>Driver</td>
109+
<td>Executor</td>
110+
<td>(random)</td>
111+
<td>Executor actor system port</td>
112+
<td><code>spark.executor.env.port</code></td>
113+
<td>Akka-based. Set to "0" to choose a port randomly.</td>
114+
</tr>
115+
<tr>
116+
<td>Executor</td>
117+
<td>Driver</td>
118+
<td>(random)</td>
119+
<td>File server for files and jars</td>
120+
<td><code>spark.fileserver.port</code></td>
121+
<td>Jetty-based</td>
122+
</tr>
123+
<tr>
124+
<td>Executor</td>
125+
<td>Driver</td>
126+
<td>(random)</td>
127+
<td>HTTP Broadcast</td>
128+
<td><code>spark.broadcast.port</code></td>
129+
<td>Jetty-based. Not used by TorrentBroadcast, which sends data through the block manager
130+
instead.</td>
131+
</tr>
132+
<tr>
133+
<td>Executor</td>
134+
<td>Driver</td>
135+
<td>(random)</td>
136+
<td>Class file server</td>
137+
<td><code>spark.replClassServer.port</code></td>
138+
<td>Jetty-based. Only used in Spark shells.</td>
139+
</tr>
140+
<tr>
141+
<td>Executor / Driver</td>
142+
<td>Executor / Driver</td>
143+
<td>(random)</td>
144+
<td>Block Manager port</td>
145+
<td><code>spark.blockManager.port</code></td>
146+
<td>Raw socket via ServerSocketChannel</td>
147+
</tr>
148+
</table>
17149

18-
See the [configuration page](configuration.html) for more details on the security configuration parameters.
19150

20-
See <a href="{{site.SPARK_GITHUB_URL}}/tree/master/core/src/main/scala/org/apache/spark/SecurityManager.scala"><code>org.apache.spark.SecurityManager</code></a> for implementation details about security.
151+
See the [configuration page](configuration.html) for more details on the security configuration
152+
parameters, and <a href="{{site.SPARK_GITHUB_URL}}/tree/master/core/src/main/scala/org/apache/spark/SecurityManager.scala">
153+
<code>org.apache.spark.SecurityManager</code></a> for implementation details about security.

docs/spark-standalone.md

Lines changed: 3 additions & 111 deletions
Original file line numberDiff line numberDiff line change
@@ -299,118 +299,10 @@ You can run Spark alongside your existing Hadoop cluster by just launching it as
299299

300300
# Configuring Ports for Network Security
301301

302-
Spark makes heavy use of the network, and some environments have strict requirements for using tight
303-
firewall settings. Below are the primary ports that Spark uses for its communication and how to
304-
configure those ports.
302+
Spark makes heavy use of the network, and some environments have strict requirements for using
303+
tight firewall settings. For a complete list of ports to configure, see the [security page]
304+
(security.html#configuring-ports-for-network-security).
305305

306-
<table class="table">
307-
<tr>
308-
<th>From</th><th>To</th><th>Default Port</th><th>Purpose</th><th>Configuration
309-
Setting</th><th>Notes</th>
310-
</tr>
311-
<!-- Web UIs -->
312-
<tr>
313-
<td>Browser</td>
314-
<td>Master</td>
315-
<td>8080</td>
316-
<td>Web UI</td>
317-
<td><code>master.ui.port<br>SPARK_MASTER_WEBUI_PORT</code></td>
318-
<td>Jetty-based</td>
319-
</tr>
320-
<tr>
321-
<td>Browser</td>
322-
<td>Worker</td>
323-
<td>8081</td>
324-
<td>Web UI</td>
325-
<td><code>worker.ui.port<br>SPARK_WORKER_WEBUI_PORT</code></td>
326-
<td>Jetty-based</td>
327-
</tr>
328-
<tr>
329-
<td>Browser</td>
330-
<td>Application</td>
331-
<td>4040</td>
332-
<td>Web UI</td>
333-
<td><code>spark.ui.port</code></td>
334-
<td>Jetty-based</td>
335-
</tr>
336-
<tr>
337-
<td>Browser</td>
338-
<td>History Server</td>
339-
<td>18080</td>
340-
<td>Web UI</td>
341-
<td><code>spark.history.ui.port</code></td>
342-
<td>Jetty-based</td>
343-
</tr>
344-
<!-- Cluster interactions -->
345-
<tr>
346-
<td>Driver<br>Worker</td>
347-
<td>Master</td>
348-
<td>7077</td>
349-
<td>Submit job to cluster<br>Join cluster</td>
350-
<td><code>SPARK_MASTER_PORT</code></td>
351-
<td>Akka-based. Set to "0" to choose a port randomly.</td>
352-
</tr>
353-
<tr>
354-
<td>Master</td>
355-
<td>Worker</td>
356-
<td>(random)</td>
357-
<td>Schedule executors</td>
358-
<td><code>SPARK_WORKER_PORT</code></td>
359-
<td>Akka-based. Set to "0" to choose a port randomly.</td>
360-
</tr>
361-
<tr>
362-
<td>Executor<br>Master</td>
363-
<td>Driver</td>
364-
<td>(random)</td>
365-
<td>Connect to application<br>Notify Master and executor state changes</td>
366-
<td><code>spark.driver.port</code></td>
367-
<td>Akka-based. Set to "0" to choose a port randomly.</td>
368-
</tr>
369-
<tr>
370-
<td>Driver</td>
371-
<td>Executor</td>
372-
<td>(random)</td>
373-
<td>Schedule tasks</td>
374-
<td><code>spark.executor.port</code></td>
375-
<td>Akka-based. Set to "0" to choose a port randomly.</td>
376-
</tr>
377-
378-
<!-- Other misc stuff -->
379-
<tr>
380-
<td>Executor</td>
381-
<td>Driver</td>
382-
<td>(random)</td>
383-
<td>File server for files and jars</td>
384-
<td><code>spark.fileserver.port</code></td>
385-
<td>Jetty-based</td>
386-
</tr>
387-
<tr>
388-
<td>Executor</td>
389-
<td>Driver</td>
390-
<td>(random)</td>
391-
<td>HTTP Broadcast</td>
392-
<td><code>spark.broadcast.port</code></td>
393-
<td>Jetty-based. Not used by TorrentBroadcast, which sends data through the block manager
394-
instead.</td>
395-
</tr>
396-
<tr>
397-
<td>Executor</td>
398-
<td>Driver</td>
399-
<td>(random)</td>
400-
<td>Class file server</td>
401-
<td><code>spark.replClassServer.port</code></td>
402-
<td>Jetty-based. Only used in Spark shells.</td>
403-
</tr>
404-
<tr>
405-
<td>Executor</td>
406-
<td>Executor</td>
407-
<td>(random)</td>
408-
<td>Block Manager port</td>
409-
<td><code>spark.blockManager.port</code></td>
410-
<td>Raw socket via ServerSocketChannel</td>
411-
</tr>
412-
413-
</table>
414306

415307
# High Availability
416308

0 commit comments

Comments
 (0)