Skip to content

Commit fb31b49

Browse files
SPARK-3883: Added SSL setup documentation
1 parent 2532668 commit fb31b49

File tree

4 files changed

+188
-17
lines changed

4 files changed

+188
-17
lines changed

core/src/main/scala/org/apache/spark/SSLOptions.scala

Lines changed: 48 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,23 @@ import java.io.File
2222
import com.typesafe.config.{Config, ConfigFactory, ConfigValueFactory}
2323
import org.eclipse.jetty.util.ssl.SslContextFactory
2424

25+
/** SSLOptions class is a common container for SSL configuration options. It offers methods to
26+
* generate specific objects to configure SSL for different communication protocols.
27+
*
28+
* SSLOptions is intended to provide the maximum common set of SSL settings, which are supported
29+
* by the protocol, which it can generate the configuration for. Since Akka doesn't support client
30+
* authentication with SSL, SSLOptions cannot support it either.
31+
*
32+
* @param enabled enables or disables SSL; if it is set to false, the rest of the
33+
* settings are disregarded
34+
* @param keyStore a path to the key-store file
35+
* @param keyStorePassword a password to access the key-store file
36+
* @param keyPassword a password to access the private key in the key-store
37+
* @param trustStore a path to the trust-store file
38+
* @param trustStorePassword a password to access the trust-store file
39+
* @param protocol SSL protocol (remember that SSLv3 was compromised) supported by Java
40+
* @param enabledAlgorithms a set of encryption algorithms to use
41+
*/
2542
private[spark] case class SSLOptions(
2643
enabled: Boolean = false,
2744
keyStore: Option[File] = None,
@@ -32,9 +49,8 @@ private[spark] case class SSLOptions(
3249
protocol: Option[String] = None,
3350
enabledAlgorithms: Set[String] = Set.empty) {
3451

35-
/**
36-
* Creates a Jetty SSL context factory according to the SSL settings represented by this object.
37-
*/
52+
/** Creates a Jetty SSL context factory according to the SSL settings represented by this object.
53+
*/
3854
def createJettySslContextFactory(): Option[SslContextFactory] = {
3955
if (enabled) {
4056
val sslContextFactory = new SslContextFactory()
@@ -53,10 +69,9 @@ private[spark] case class SSLOptions(
5369
}
5470
}
5571

56-
/**
57-
* Creates an Akka configuration object which contains all the SSL settings represented by this
58-
* object. It can be used then to compose the ultimate Akka configuration.
59-
*/
72+
/** Creates an Akka configuration object which contains all the SSL settings represented by this
73+
* object. It can be used then to compose the ultimate Akka configuration.
74+
*/
6075
def createAkkaConfig: Option[Config] = {
6176
import scala.collection.JavaConversions._
6277
if (enabled) {
@@ -84,6 +99,7 @@ private[spark] case class SSLOptions(
8499
}
85100
}
86101

102+
/** Returns a string representation of this SSLOptions with all the passwords masked. */
87103
override def toString: String = s"SSLOptions{enabled=$enabled, " +
88104
s"keyStore=$keyStore, keyStorePassword=${keyStorePassword.map(_ => "xxx")}, " +
89105
s"trustStore=$trustStore, trustStorePassword=${trustStorePassword.map(_ => "xxx")}, " +
@@ -93,11 +109,31 @@ private[spark] case class SSLOptions(
93109

94110
private[spark] object SSLOptions extends Logging {
95111

96-
/**
97-
* Resolves SSLOptions settings from a given Spark configuration object at a given namespace.
98-
* The parent directory of that location is used as a base directory to resolve relative paths
99-
* to keystore and truststore.
100-
*/
112+
/** Resolves SSLOptions settings from a given Spark configuration object at a given namespace.
113+
*
114+
* The following settings are allowed:
115+
* $ - `[ns].enabled` - `true` or `false`, to enable or disable SSL respectively
116+
* $ - `[ns].keyStore` - a path to the key-store file; can be relative to the current directory
117+
* $ - `[ns].keyStorePassword` - a password to the key-store file
118+
* $ - `[ns].keyPassword` - a password to the private key
119+
* $ - `[ns].trustStore` - a path to the trust-store file; can be relative to the current
120+
* directory
121+
* $ - `[ns].trustStorePassword` - a password to the trust-store file
122+
* $ - `[ns].protocol` - a protocol name supported by a particular Java version
123+
* $ - `[ns].enabledAlgorithms` - a comma separated list of ciphers
124+
*
125+
* For a list of protocols and ciphers supported by particular Java versions, you may go to
126+
* [[https://blogs.oracle.com/java-platform-group/entry/diagnosing_tls_ssl_and_https Oracle
127+
* blog page]].
128+
*
129+
* You can optionally specify the default configuration. If you do, for each setting which is
130+
* missing in SparkConf, the corresponding setting is used from the default configuration.
131+
*
132+
* @param conf Spark configuration object where the settings are collected from
133+
* @param ns the namespace name
134+
* @param defaults the default configuration
135+
* @return [[org.apache.spark.SSLOptions]] object
136+
*/
101137
def parse(conf: SparkConf, ns: String, defaults: Option[SSLOptions] = None): SSLOptions = {
102138
val enabled = conf.getBoolean(s"$ns.enabled", defaultValue = defaults.exists(_.enabled))
103139

core/src/main/scala/org/apache/spark/SecurityManager.scala

Lines changed: 36 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -59,7 +59,7 @@ import org.apache.spark.network.sasl.SecretKeyHolder
5959
* Spark also has a set of admin acls (`spark.admin.acls`) which is a set of users/administrators
6060
* who always have permission to view or modify the Spark application.
6161
*
62-
* Spark does not currently support encryption after authentication.
62+
* Starting from version 1.3, Spark has partial support for encrypted connections with SSL.
6363
*
6464
* At this point spark has multiple communication protocols that need to be secured and
6565
* different underlying mechanisms are used depending on the protocol:
@@ -71,8 +71,9 @@ import org.apache.spark.network.sasl.SecretKeyHolder
7171
* to connect to the server. There is no control of the underlying
7272
* authentication mechanism so its not clear if the password is passed in
7373
* plaintext or uses DIGEST-MD5 or some other mechanism.
74-
* Akka also has an option to turn on SSL, this option is not currently supported
75-
* but we could add a configuration option in the future.
74+
*
75+
* Akka also has an option to turn on SSL, this option is currently supported (see
76+
* the details below).
7677
*
7778
* - HTTP for broadcast and file server (via HttpServer) -> Spark currently uses Jetty
7879
* for the HttpServer. Jetty supports multiple authentication mechanisms -
@@ -81,8 +82,9 @@ import org.apache.spark.network.sasl.SecretKeyHolder
8182
* to authenticate using DIGEST-MD5 via a single user and the shared secret.
8283
* Since we are using DIGEST-MD5, the shared secret is not passed on the wire
8384
* in plaintext.
84-
* We currently do not support SSL (https), but Jetty can be configured to use it
85-
* so we could add a configuration option for this in the future.
85+
*
86+
* We currently support SSL (https) for this communication protocol (see the details
87+
* below).
8688
*
8789
* The Spark HttpServer installs the HashLoginServer and configures it to DIGEST-MD5.
8890
* Any clients must specify the user and password. There is a default
@@ -146,6 +148,35 @@ import org.apache.spark.network.sasl.SecretKeyHolder
146148
* authentication. Spark will then use that user to compare against the view acls to do
147149
* authorization. If not filter is in place the user is generally null and no authorization
148150
* can take place.
151+
*
152+
* Connection encryption (SSL) configuration is organized hierarchically. The user can configure
153+
* the default SSL settings which will be used for all the supported communication protocols unless
154+
* they are overwritten by protocol specific settings. This way the user can easily provide the
155+
* common settings for all the protocols without disabling the ability to configure each one
156+
* individually.
157+
*
158+
* All the SSL settings like `spark.ssl.xxx` where `xxx` is a particular configuration property,
159+
* denote the global configuration for all the supported protocols. In order to override the global
160+
* configuration for the particular protocol, the properties must be overwritten in the
161+
* protocol-specific namespace. Use `spark.ssl.yyy.xxx` settings to overwrite the global
162+
* configuration for particular protocol denoted by `yyy`. Currently `yyy` can be either `akka` for
163+
* Akka based connections or `fs` for broadcast and file server.
164+
*
165+
* Refer to [[org.apache.spark.SSLOptions]] documentation for the list of
166+
* options that can be specified.
167+
*
168+
* SecurityManager initializes SSLOptions objects for different protocols separately. SSLOptions
169+
* object parses Spark configuration at a given namespace and builds the common representation
170+
* of SSL settings. SSLOptions is the used to provide protocol-specific configuration like TypeSafe
171+
* configuration for Akka or SSLContextFactory for Jetty.
172+
* SSL must be configured on each node and configured for each component involved in
173+
* communication using the particular protocol. In YARN clusters, the key-store can be prepared on
174+
* the client side then distributed and used by the executors as the part of the application
175+
* (YARN allows the user to deploy files before the application is started).
176+
* In standalone deployment, the user needs to provide key-stores and configuration
177+
* options for master and workers. In this mode, the user may allow the executors to use the SSL
178+
* settings inherited from the worker which spawned that executor. It can be accomplished by
179+
* setting `spark.ssl.useNodeLocalConf` to `true`.
149180
*/
150181

151182
private[spark] class SecurityManager(sparkConf: SparkConf)

docs/configuration.md

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1234,6 +1234,86 @@ Apart from these, the following properties are also available, and may be useful
12341234
</tr>
12351235
</table>
12361236

1237+
#### Encryption
1238+
1239+
<table class="table">
1240+
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>
1241+
<tr>
1242+
<td><code>spark.ssl.enabled</code></td>
1243+
<td>false</td>
1244+
<td>
1245+
<p>Whether to enable SSL connections on all supported protocols.</p>
1246+
1247+
<p>All the SSL settings like <code>spark.ssl.xxx</code> where <code>xxx</code> is a
1248+
particular configuration property, denote the global configuration for all the supported
1249+
protocols. In order to override the global configuration for the particular protocol,
1250+
the properties must be overwritten in the protocol-specific namespace.</p>
1251+
1252+
<p>Use <code>spark.ssl.YYY.XXX</code> settings to overwrite the global configuration for
1253+
particular protocol denoted by <code>YYY</code>. Currently <code>YYY</code> can be
1254+
either <code>akka</code> for Akka based connections or <code>fs</code> for broadcast and
1255+
file server.</p>
1256+
</td>
1257+
</tr>
1258+
<tr>
1259+
<td><code>spark.ssl.keyStore</code></td>
1260+
<td>None</td>
1261+
<td>
1262+
A path to a key-store file. The path can be absolute or relative to the directory where
1263+
the component is started in.
1264+
</td>
1265+
</tr>
1266+
<tr>
1267+
<td><code>spark.ssl.keyStorePassword</code></td>
1268+
<td>None</td>
1269+
<td>
1270+
A password to the key-store.
1271+
</td>
1272+
</tr>
1273+
<tr>
1274+
<td><code>spark.ssl.keyPassword</code></td>
1275+
<td>None</td>
1276+
<td>
1277+
A password to the private key in key-store.
1278+
</td>
1279+
</tr>
1280+
<tr>
1281+
<td><code>spark.ssl.trustStore</code></td>
1282+
<td>None</td>
1283+
<td>
1284+
A path to a trust-store file. The path can be absolute or relative to the directory
1285+
where the component is started in.
1286+
</td>
1287+
</tr>
1288+
<tr>
1289+
<td><code>spark.ssl.trustStorePassword</code></td>
1290+
<td>None</td>
1291+
<td>
1292+
A password to the trust-store.
1293+
</td>
1294+
</tr>
1295+
<tr>
1296+
<td><code>spark.ssl.protocol</code></td>
1297+
<td>None</td>
1298+
<td>
1299+
A protocol name. The protocol must be supported by JVM. The reference list of protocols
1300+
one can find on <a href="https://blogs.oracle.com/java-platform-group/entry/diagnosing_tls_ssl_and_https">this</a>
1301+
page.
1302+
</td>
1303+
</tr>
1304+
<tr>
1305+
<td><code>spark.ssl.enabledAlgorithms</code></td>
1306+
<td>Empty</td>
1307+
<td>
1308+
A comma separated list of ciphers. The specified ciphers must be supported by JVM.
1309+
The reference list of protocols one can find on
1310+
<a href="https://blogs.oracle.com/java-platform-group/entry/diagnosing_tls_ssl_and_https">this</a>
1311+
page.
1312+
</td>
1313+
</tr>
1314+
</table>
1315+
1316+
12371317
#### Spark Streaming
12381318
<table class="table">
12391319
<tr><th>Property Name</th><th>Default</th><th>Meaning</th></tr>

docs/security.md

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,6 +20,30 @@ Spark allows for a set of administrators to be specified in the acls who always
2020

2121
If your applications are using event logging, the directory where the event logs go (`spark.eventLog.dir`) should be manually created and have the proper permissions set on it. If you want those log files secured, the permissions should be set to `drwxrwxrwxt` for that directory. The owner of the directory should be the super user who is running the history server and the group permissions should be restricted to super user group. This will allow all users to write to the directory but will prevent unprivileged users from removing or renaming a file unless they own the file or directory. The event log files will be created by Spark with permissions such that only the user and group have read and write access.
2222

23+
## Encryption
24+
25+
Spark supports SSL for Akka and HTTP (for broadcast and file server) protocols. However SSL is not supported yet for WebUI and block transfer service.
26+
27+
Connection encryption (SSL) configuration is organized hierarchically. The user can configure the default SSL settings which will be used for all the supported communication protocols unless they are overwritten by protocol-specific settings. This way the user can easily provide the common settings for all the protocols without disabling the ability to configure each one individually. The common SSL settings are at `spark.ssl` namespace in Spark configuration, while Akka SSL configuration is at `spark.ssl.akka` and HTTP for broadcast and file server SSL configuration is at `spark.ssl.fs`. The full breakdown can be found on the [configuration page](configuration.html).
28+
29+
SSL must be configured on each node and configured for each component involved in communication using the particular protocol.
30+
31+
### YARN mode
32+
The key-store can be prepared on the client side and then distributed and used by the executors as the part of the application. It is possible because the user is able to deploy files before the application is started in YARN by using `spark.yarn.dist.files` or `spark.yarn.dist.archives` configuration settings. The responsibility for encryption of transferring these files is on YARN side and has nothing to do with Spark.
33+
34+
### Standalone mode
35+
The user needs to provide key-stores and configuration options for master and workers. They have to be set by attaching appropriate Java system properties in `SPARK_MASTER_OPTS` and in `SPARK_WORKER_OPTS` environment variables, or just in `SPARK_DAEMON_JAVA_OPTS`. In this mode, the user may allow the executors to use the SSL settings inherited from the worker which spawned that executor. It can be accomplished by setting `spark.ssl.useNodeLocalConf` to `true`. If that parameter is set, the settings provided by user on the client side, are not used by the executors.
36+
37+
### Preparing the key-stores
38+
Key-stores can be generated by `keytool` program. The reference documentation for this tool is
39+
[here](https://docs.oracle.com/javase/7/docs/technotes/tools/solaris/keytool.html). The most basic
40+
steps to configure the key-stores and the trust-store for the standalone deployment mode is as
41+
follows:
42+
* Generate a keys pair for each node
43+
* Export the public key of the key pair to a file on each node
44+
* Import all exported public keys into a single trust-store
45+
* Distribute the trust-store over the nodes
46+
2347
## Configuring Ports for Network Security
2448

2549
Spark makes heavy use of the network, and some environments have strict requirements for using tight

0 commit comments

Comments
 (0)