You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/security.md
+24Lines changed: 24 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -20,6 +20,30 @@ Spark allows for a set of administrators to be specified in the acls who always
20
20
21
21
If your applications are using event logging, the directory where the event logs go (`spark.eventLog.dir`) should be manually created and have the proper permissions set on it. If you want those log files secured, the permissions should be set to `drwxrwxrwxt` for that directory. The owner of the directory should be the super user who is running the history server and the group permissions should be restricted to super user group. This will allow all users to write to the directory but will prevent unprivileged users from removing or renaming a file unless they own the file or directory. The event log files will be created by Spark with permissions such that only the user and group have read and write access.
22
22
23
+
## Encryption
24
+
25
+
Spark supports SSL for Akka and HTTP (for broadcast and file server) protocols. However SSL is not supported yet for WebUI and block transfer service.
26
+
27
+
Connection encryption (SSL) configuration is organized hierarchically. The user can configure the default SSL settings which will be used for all the supported communication protocols unless they are overwritten by protocol-specific settings. This way the user can easily provide the common settings for all the protocols without disabling the ability to configure each one individually. The common SSL settings are at `spark.ssl` namespace in Spark configuration, while Akka SSL configuration is at `spark.ssl.akka` and HTTP for broadcast and file server SSL configuration is at `spark.ssl.fs`. The full breakdown can be found on the [configuration page](configuration.html).
28
+
29
+
SSL must be configured on each node and configured for each component involved in communication using the particular protocol.
30
+
31
+
### YARN mode
32
+
The key-store can be prepared on the client side and then distributed and used by the executors as the part of the application. It is possible because the user is able to deploy files before the application is started in YARN by using `spark.yarn.dist.files` or `spark.yarn.dist.archives` configuration settings. The responsibility for encryption of transferring these files is on YARN side and has nothing to do with Spark.
33
+
34
+
### Standalone mode
35
+
The user needs to provide key-stores and configuration options for master and workers. They have to be set by attaching appropriate Java system properties in `SPARK_MASTER_OPTS` and in `SPARK_WORKER_OPTS` environment variables, or just in `SPARK_DAEMON_JAVA_OPTS`. In this mode, the user may allow the executors to use the SSL settings inherited from the worker which spawned that executor. It can be accomplished by setting `spark.ssl.useNodeLocalConf` to `true`. If that parameter is set, the settings provided by user on the client side, are not used by the executors.
36
+
37
+
### Preparing the key-stores
38
+
Key-stores can be generated by `keytool` program. The reference documentation for this tool is
39
+
[here](https://docs.oracle.com/javase/7/docs/technotes/tools/solaris/keytool.html). The most basic
40
+
steps to configure the key-stores and the trust-store for the standalone deployment mode is as
41
+
follows:
42
+
* Generate a keys pair for each node
43
+
* Export the public key of the key pair to a file on each node
44
+
* Import all exported public keys into a single trust-store
45
+
* Distribute the trust-store over the nodes
46
+
23
47
## Configuring Ports for Network Security
24
48
25
49
Spark makes heavy use of the network, and some environments have strict requirements for using tight
0 commit comments