Adikteev
diff --git a/‎.gitignore
Lines changed: 1 addition & 0 deletions b/‎.gitignore
Lines changed: 1 addition & 0 deletions
diff --git a/‎docs/custom-docker.md
Lines changed: 3 additions & 3 deletions b/‎docs/custom-docker.md
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/fault-tolerance.md
Lines changed: 3 additions & 3 deletions b/‎docs/fault-tolerance.md
Lines changed: 3 additions & 3 deletions
diff --git a/‎docs/hdfs.md
Lines changed: 19 additions & 16 deletions b/‎docs/hdfs.md
Lines changed: 19 additions & 16 deletions
diff --git a/‎docs/history-server.md
Lines changed: 9 additions & 9 deletions b/‎docs/history-server.md
Lines changed: 9 additions & 9 deletions
diff --git a/‎docs/img/spark-gui-install.png
42.3 KB b/‎docs/img/spark-gui-install.png
42.3 KB
diff --git a/‎docs/index.md
Lines changed: 7 additions & 5 deletions b/‎docs/index.md
Lines changed: 7 additions & 5 deletions
diff --git a/‎docs/install.md
Lines changed: 75 additions & 43 deletions b/‎docs/install.md
Lines changed: 75 additions & 43 deletions
diff --git a/‎docs/job-scheduling.md
Lines changed: 1 addition & 1 deletion b/‎docs/job-scheduling.md
Lines changed: 1 addition & 1 deletion
@@ -1,3 +1,4 @@
+.idea/
 .cache/
 build/
 dcos-commons-tools/
 
@@ -19,9 +19,9 @@ You can customize the Docker image in which Spark runs by extending the standard
 
 1. Then, build an image from your Dockerfile.
 
-        $ docker build -t username/image:tag .
-        $ docker push username/image:tag
+        docker build -t username/image:tag .
+        docker push username/image:tag
 
 1. Reference your custom Docker image with the `--docker-image` option when running a Spark job.
 
-        $ dcos spark run --docker-image=myusername/myimage:v1 --submit-args="http://external.website/mysparkapp.py 30"
+        dcos spark run --docker-image=myusername/myimage:v1 --submit-args="http://external.website/mysparkapp.py 30"
@@ -7,13 +7,13 @@ enterprise: 'no'
 
 Failures such as host, network, JVM, or application failures can affect the behavior of three types of Spark components:
 
-- DC/OS Spark Service
+- DC/OS Apache Spark Service
 - Batch Jobs
 - Streaming Jobs
 
-# DC/OS Spark Service
+# DC/OS Apache Spark Service
 
-The DC/OS Spark service runs in Marathon and includes the Mesos Cluster Dispatcher and the Spark History Server.  The Dispatcher manages jobs you submit via `dcos spark run`.  Job data is persisted to Zookeeper. The Spark History Server reads event logs from HDFS. If the service dies, Marathon will restart it, and it will reload data from these highly available stores.
+The DC/OS Apache Spark service runs in Marathon and includes the Mesos Cluster Dispatcher and the Spark History Server.  The Dispatcher manages jobs you submit via `dcos spark run`.  Job data is persisted to Zookeeper. The Spark History Server reads event logs from HDFS. If the service dies, Marathon will restart it, and it will reload data from these highly available stores.
 
 # Batch Jobs
 
 
@@ -5,16 +5,19 @@ menu_order: 20
 enterprise: 'no'
 ---
 
-To configure Spark for a specific HDFS cluster, configure `hdfs.config-url` to be a URL that serves your `hdfs-site.xml` and `core-site.xml`. For example:
+You can configure Spark for a specific HDFS cluster.
 
-    {
-      "hdfs": {
-        "config-url": "http://mydomain.com/hdfs-config"
-      }
-    }
+To configure `hdfs.config-url` to be a URL that serves your `hdfs-site.xml` and `core-site.xml`, use this example where `http://mydomain.com/hdfs-config/hdfs-site.xml` and `http://mydomain.com/hdfs-config/core-site.xml` are valid URLs:
 
+```json
+{
+  "hdfs": {
+    "config-url": "http://mydomain.com/hdfs-config"
+  }
+}
+```
 
-where `http://mydomain.com/hdfs-config/hdfs-site.xml` and `http://mydomain.com/hdfs-config/core-site.xml` are valid URLs.[Learn more][8].
+For more information, see [Inheriting Hadoop Cluster Configuration][8].
 
 For DC/OS HDFS, these configuration files are served at `http://<hdfs.framework-name>.marathon.mesos:<port>/v1/connection`, where `<hdfs.framework-name>` is a configuration variable set in the HDFS package, and `<port>` is the port of its marathon app.
 
@@ -24,13 +27,13 @@ You can access external (i.e. non-DC/OS) Kerberos-secured HDFS clusters from Spa
 
 ## HDFS Configuration
 
-Once you've set up a Kerberos-enabled HDFS cluster, configure Spark to connect to it. See instructions [here](#hdfs).
+After you've set up a Kerberos-enabled HDFS cluster, configure Spark to connect to it. See instructions [here](#hdfs).
 
 ## Installation
 
-1.  A krb5.conf file tells Spark how to connect to your KDC.  Base64 encode this file:
+1.  A `krb5.conf` file tells Spark how to connect to your KDC.  Base64 encode this file:
 
-        $ cat krb5.conf | base64
+        cat krb5.conf | base64
 
 1.  Add the following to your JSON configuration file to enable Kerberos in Spark:
 
@@ -42,11 +45,11 @@ Once you've set up a Kerberos-enabled HDFS cluster, configure Spark to connect t
            }
         }
 
-1. If you've enabled the history server via `history-server.enabled`, you must also configure the principal and keytab for the history server.  **WARNING**: The keytab contains secrets, so you should ensure you have SSL enabled while installing DC/OS Spark.
+1. If you've enabled the history server via `history-server.enabled`, you must also configure the principal and keytab for the history server.  **WARNING**: The keytab contains secrets, so you should ensure you have SSL enabled while installing DC/OS Apache Spark.
 
     Base64 encode your keytab:
 
-        $ cat spark.keytab | base64
+        cat spark.keytab | base64
 
     And add the following to your configuration file:
 
@@ -61,25 +64,25 @@ Once you've set up a Kerberos-enabled HDFS cluster, configure Spark to connect t
 
 1.  Install Spark with your custom configuration, here called `options.json`:
 
-        $ dcos package install --options=options.json spark
+        dcos package install --options=options.json spark
 
 ## Job Submission
 
-To authenticate to a Kerberos KDC, DC/OS Spark supports keytab files as well as ticket-granting tickets (TGTs).
+To authenticate to a Kerberos KDC, DC/OS Apache Spark supports keytab files as well as ticket-granting tickets (TGTs).
 
 Keytabs are valid infinitely, while tickets can expire. Especially for long-running streaming jobs, keytabs are recommended.
 
 ### Keytab Authentication
 
 Submit the job with the keytab:
 
-    $ dcos spark run --submit-args="--principal user@REALM --keytab <keytab-file-path>..."
+    dcos spark run --submit-args="--principal user@REALM --keytab <keytab-file-path>..."
 
 ### TGT Authentication
 
 Submit the job with the ticket:
 
-    $ dcos spark run --principal user@REALM --tgt <ticket-file-path>
+    dcos spark run --principal user@REALM --tgt <ticket-file-path>
 
 **Note:** These credentials are security-critical. We highly recommended configuring SSL encryption between the Spark components when accessing Kerberos-secured HDFS clusters. See the Security section for information on how to do this.
 
 
@@ -4,18 +4,18 @@ menu_order: 30
 enterprise: 'no'
 ---
 
-DC/OS Spark includes The [Spark History Server][3]. Because the history server requires HDFS, you must explicitly enable it.
+DC/OS Apache Spark includes The [Spark History Server][3]. Because the history server requires HDFS, you must explicitly enable it.
 
 1.  Install HDFS:
 
-        $ dcos package install hdfs
+        dcos package install hdfs
 
     **Note:** HDFS requires 5 private nodes.
 
 1.  Create a history HDFS directory (default is `/history`). [SSH into your cluster][10] and run:
 
-        $ docker run -it mesosphere/hdfs-client:1.0.0-2.6.0 bash
-        $ ./bin/hdfs dfs -mkdir /history
+        docker run -it mesosphere/hdfs-client:1.0.0-2.6.0 bash
+        ./bin/hdfs dfs -mkdir /history
 
 1. Create `spark-history-options.json`:
 
@@ -25,26 +25,26 @@ DC/OS Spark includes The [Spark History Server][3]. Because the history server r
 
 1. Install The Spark History Server:
 
-        $ dcos package install spark-history --options=spark-history-options.json
+        dcos package install spark-history --options=spark-history-options.json
 
 1. Create `spark-dispatcher-options.json`;
 
         {
           "service": {
-            "spark-history-server-url": "http://<dcos_url>/service/spark-history
+            "spark-history-server-url": "http://<dcos_url>/service/spark-history"
           },
           "hdfs": {
             "config-url": "http://api.hdfs.marathon.l4lb.thisdcos.directory/v1/endpoints"
           }
         }
 
-1.  Install The Spark Dispatcher:
+1.  Install the Spark dispatcher:
 
-        $ dcos package install spark --options=spark-dispatcher-options.json
+        dcos package install spark --options=spark-dispatcher-options.json
 
 1.  Run jobs with the event log enabled:
 
-        $ dcos spark run --submit-args="--conf spark.eventLog.enabled=true --conf spark.eventLog.dir=hdfs://hdfs/history ... --class MySampleClass  http://external.website/mysparkapp.jar"
+        dcos spark run --submit-args="--conf spark.eventLog.enabled=true --conf spark.eventLog.dir=hdfs://hdfs/history ... --class MySampleClass  http://external.website/mysparkapp.jar"
 
 1.  Visit your job in the dispatcher at `http://<dcos_url>/service/spark/`. It will include a link to the history server entry for that job.
 
 
@@ -7,20 +7,22 @@ feature_maturity: stable
 enterprise: 'no'
 ---
 
+Welcome to the documentation for the DC/OS Apache Spark. For more information about new and changed features, see the [release notes](https://github.com/mesosphere/spark-build/releases/).
+
 Apache Spark is a fast and general-purpose cluster computing system for big data. It provides high-level APIs in Scala, Java, Python, and R, and an optimized engine that supports general computation graphs for data analysis. It also supports a rich set of higher-level tools including Spark SQL for SQL and DataFrames, MLlib for machine learning, GraphX for graph processing, and Spark Streaming for stream processing. For more information, see the [Apache Spark documentation][1].
 
-Apache DC/OS Spark consists of [Apache Spark with a few custom commits][17] along with [DC/OS-specific packaging][18].
+DC/OS Apache Spark consists of [Apache Spark with a few custom commits][17] along with [DC/OS-specific packaging][18].
 
-DC/OS Spark includes:
+DC/OS Apache Spark includes:
 
 *   [Mesos Cluster Dispatcher][2]
 *   [Spark History Server][3]
-*   DC/OS Spark CLI
+*   DC/OS Apache Spark CLI
 *   Interactive Spark shell
 
 # Benefits
 
-*   Utilization: DC/OS Spark leverages Mesos to run Spark on the same cluster as other DC/OS services
+*   Utilization: DC/OS Apache Spark leverages Mesos to run Spark on the same cluster as other DC/OS services
 *   Improved efficiency
 *   Simple Management
 *   Multi-team support
@@ -52,4 +54,4 @@ DC/OS Spark includes:
  [5]: https://docs.mesosphere.com/service-docs/kafka/
  [6]: https://zeppelin.incubator.apache.org/
  [17]: https://github.com/mesosphere/spark
- [18]: https://github.com/mesosphere/spark-build
+ [18]: https://github.com/mesosphere/spark-build
@@ -5,98 +5,130 @@ feature_maturity: stable
 enterprise: 'no'
 ---
 
-Spark is available in the Universe and can be installed by using either the web interface or the DC/OS CLI.
+Spark is available in the Universe and can be installed by using either the GUI or the DC/OS CLI.
 
-##  <a name="install-enterprise"></a>Prerequisites
+**Prerequisites:**
 
-- Depending on your security mode in Enterprise DC/OS, you may [need to provision a service account](https://docs.mesosphere.com/service-docs/spark/spark-auth/) before installing Spark. Only someone with `superuser` permission can create the service account.
-	- `strict` [security mode](https://docs.mesosphere.com/1.9/installing/custom/configuration-parameters/#security) requires a service account.  
-	- `permissive` security mode a service account is optional.
-	- `disabled` security mode does not require a service account.
+- [DC/OS and DC/OS CLI installed](https://docs.mesosphere.com/1.9/installing/).
+- Depending on your [security mode](https://docs.mesosphere.com/1.9/overview/security/security-modes/), Spark requires service authentication for access to DC/OS. For more information, see [Configuring DC/OS Access for Spark](https://docs.mesosphere.com/service-docs/spark/spark-auth/).
+  
+  | Security mode | Service Account |
+  |---------------|-----------------------|
+  | Disabled      | Not available   |
+  | Permissive    | Optional   |
+  | Strict        | Required |
 
 # Default Installation
+To install the DC/OS Apache Spark service, run the following command on the DC/OS CLI. This installs the Spark DC/OS service, Spark CLI, dispatcher, and, optionally, the history server. See [Custom Installation][7] to install the history server.
 
-To start a basic Spark cluster, run the following command on the DC/OS CLI.
+```bash
+dcos package install spark
+```
 
-    $ dcos package install spark
+Go to the **Services** > **Deployments** tab of the DC/OS GUI to monitor the deployment. When it has finished deploying , visit Spark at `http://<dcos-url>/service/spark/`.
 
-This command installs the dispatcher, and, optionally, the history server. See [Custom Installation][7] to install the history server.
+You can also [install Spark via the DC/OS GUI](https://docs.mesosphere.com/1.9/usage/webinterface/#universe).
 
-Go to the **Services** > **Deployments** tab of the DC/OS web interface to monitor the deployment. Once it is
-complete, visit Spark at `http://<dcos-url>/service/spark/`.
 
-You can also [install Spark via the DC/OS web interface](https://docs.mesosphere.com/1.9/usage/webinterface/#universe).
+## Spark CLI
+You can install the Spark CLI with this command. This is useful if you already have a Spark cluster running, but need the Spark CLI. 
 
-**Note:** If you install Spark via the web interface, run the following command from the DC/OS CLI to install the Spark CLI:
+**Important:** If you install Spark via the DC/OS GUI, you must install the Spark CLI as a separate step from the DC/OS CLI.
 
-    $ dcos package install spark --cli
+```bash
+dcos package install spark --cli
+```
 
 <a name="custom"></a>
 
 # Custom Installation
 
 You can customize the default configuration properties by creating a JSON options file and passing it to `dcos package install --options`. For example, to install the history server, create a file called `options.json`:
 
-    {
-      "history-server": {
-        "enabled": true
-      }
-    }
+```json
+{
+  "history-server": {
+    "enabled": true
+  }
+}
+```
 
-Then, install Spark with your custom configuration:
+Install Spark with the  configuration specified in the `options.json` file:
 
-    $ dcos package install --options=options.json spark
+```bash
+dcos package install --options=options.json spark
+```
 
-Run the following command to see all configuration options:
+**Tip:** Run this command to see all configuration options:
 
-    $ dcos package describe spark --config
+```bash
+dcos package describe spark --config
+```
 
 ## Customize Spark Distribution
 
-DC/OS Spark does not support arbitrary Spark distributions, but Mesosphere does provide multiple pre-built distributions, primarily used to select Hadoop versions.  To use one of these distributions, first select your desired Spark distribution from [here](https://github.com/mesosphere/spark-build/blob/master/docs/spark-versions.md), then select the corresponding docker image from [here](https://hub.docker.com/r/mesosphere/spark/tags/), then use those values to set the following configuration variables:
+DC/OS Apache Spark does not support arbitrary Spark distributions, but Mesosphere does provide multiple pre-built distributions, primarily used to select Hadoop versions.  
 
-    {
-      "service": {
-        "spark-dist-uri": "<spark-dist-uri>"
-        "docker-image": "<docker-image>"
-      }
-    }
+To use one of these distributions, select your Spark distribution from [here](https://github.com/mesosphere/spark-build/blob/master/docs/spark-versions.md), then select the corresponding Docker image from [here](https://hub.docker.com/r/mesosphere/spark/tags/), then use those values to set the following configuration variables:
+
+```json
+{
+  "service": {
+    "spark-dist-uri": "<spark-dist-uri>"
+    "docker-image": "<docker-image>"
+  }
+}
+```
 
 # Minimal Installation
 
-For development purposes, you may wish to install Spark on a local DC/OS cluster. For this, you can use [dcos-vagrant][16].
+For development purposes, you can install Spark on a local DC/OS cluster. For this, you can use [dcos-vagrant][16].
 
 1. Install DC/OS Vagrant:
 
 	Install a minimal DC/OS Vagrant according to the instructions [here][16].
 
 1. Install Spark:
 
-        $ dcos package install spark
+   ```bash
+   dcos package install spark
+   ```
 
 1. Run a simple Job:
 
-        $ dcos spark run --submit-args="--class org.apache.spark.examples.SparkPi http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar"
+   ```bash
+   dcos spark run --submit-args="--class org.apache.spark.examples.SparkPi http://downloads.mesosphere.com.s3.amazonaws.com/assets/spark/spark-examples_2.10-1.5.0.jar"
+   ```
 
-NOTE: A limited resource environment such as DC/OS Vagrant restricts some of the features available in DC/OS Spark.  For example, unless you have enough resources to start up a 5-agent cluster, you will not be able to install DC/OS HDFS, and you thus won't be able to enable the history server.
+NOTE: A limited resource environment such as DC/OS Vagrant restricts some of the features available in DC/OS Apache Spark.  For example, unless you have enough resources to start up a 5-agent cluster, you will not be able to install DC/OS HDFS, and you thus won't be able to enable the history server.
 
 Also, a limited resource environment can restrict how you size your executors, for example with `spark.executor.memory`.
 
 # Multiple Installations
 
-Installing multiple instances of the DC/OS Spark package provides basic multi-team support. Each dispatcher displays only the jobs submitted to it by a given team, and each team can be assigned different resources.
+Installing multiple instances of the DC/OS Apache Spark package provides basic multi-team support. Each dispatcher displays only the jobs submitted to it by a given team, and each team can be assigned different resources.
+
+To install multiple instances of the DC/OS Apache Spark package, set each `service.name` to a unique name (e.g.: `spark-dev`) in your JSON configuration file during installation. For example, create a JSON options file name `multiple.json`:
+
+```json
+{
+  "service": {
+    "name": "spark-dev"
+  }
+}
+```
 
-To install mutiple instances of the DC/OS Spark package, set each `service.name` to a unique name (e.g.: "spark-dev") in your JSON configuration file during installation:
+Install Spark with the options file specified:
 
-    {
-      "service": {
-        "name": "spark-dev"
-      }
-    }
+```bash
+dcos package install --options=multiple.json spark
+```
 
-To use a specific Spark instance from the DC/OS Spark CLI:
+Alternatively, you can specify a Spark instance directly from the CLI. For example:
 
-    $ dcos config set spark.app_id <service.name>
+```bash
+dcos config set spark.app_id spark-dev
+```
 
  [7]: #custom
  [16]: https://github.com/mesosphere/dcos-vagrant
@@ -59,7 +59,7 @@ The following is a description of the most common Spark on Mesos scheduling prop
 <tr>
 <td>spark.executor.cores</td>
 <td>All available cores in the offer</td>
-<td>Coarse-grained mode only. DC/OS Spark >= 1.6.1. Executor CPU allocation.</td>
+<td>Coarse-grained mode only. DC/OS Apache Spark >= 1.6.1. Executor CPU allocation.</td>
 </tr>
 
 <tr>
Original file line number	Diff line number	Diff line change
`@@ -1,3 +1,4 @@`
	`1`	`+.idea/`
`1`	`2`	`.cache/`
`2`	`3`	`build/`
`3`	`4`	`dcos-commons-tools/`