Skip to content

Commit a0a7f2f

Browse files
committed
[SPARK-6511] [DOCUMENTATION] Explain how to use Hadoop provided builds
This provides preliminary documentation pointing out how to use the Hadoop free builds. I am hoping over time this list can grow to include most of the popular Hadoop distributions. Getting more people using these builds will help us long term reduce the number of binaries we build. Author: Patrick Wendell <[email protected]> Closes #6729 from pwendell/hadoop-provided and squashes the following commits: 1113b76 [Patrick Wendell] [SPARK-6511] [Documentation] Explain how to use Hadoop provided builds (cherry picked from commit 6e4fb0c) Signed-off-by: Patrick Wendell <[email protected]>
1 parent 1175cfe commit a0a7f2f

File tree

2 files changed

+33
-3
lines changed

2 files changed

+33
-3
lines changed

docs/hadoop-provided.md

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
---
2+
layout: global
3+
displayTitle: Using Spark's "Hadoop Free" Build
4+
title: Using Spark's "Hadoop Free" Build
5+
---
6+
7+
Spark uses Hadoop client libraries for HDFS and YARN. Starting in version Spark 1.4, the project packages "Hadoop free" builds that lets you more easily connect a single Spark binary to any Hadoop version. To use these builds, you need to modify `SPARK_DIST_CLASSPATH` to include Hadoop's package jars. The most convenient place to do this is by adding an entry in `conf/spark-env.sh`.
8+
9+
This page describes how to connect Spark to Hadoop for different types of distributions.
10+
11+
# Apache Hadoop
12+
For Apache distributions, you can use Hadoop's 'classpath' command. For instance:
13+
14+
{% highlight bash %}
15+
### in conf/spark-env.sh ###
16+
17+
# If 'hadoop' binary is on your PATH
18+
export SPARK_DIST_CLASSPATH=$(hadoop classpath)
19+
20+
# With explicit path to 'hadoop' binary
21+
export SPARK_DIST_CLASSPATH=$(/path/to/hadoop/bin/hadoop classpath)
22+
23+
# Passing a Hadoop configuration directory
24+
export SPARK_DIST_CLASSPATH=$(hadoop classpath --config /path/to/configs)
25+
26+
{% endhighlight %}

docs/index.md

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -12,9 +12,13 @@ It also supports a rich set of higher-level tools including [Spark SQL](sql-prog
1212

1313
# Downloading
1414

15-
Get Spark from the [downloads page](http://spark.apache.org/downloads.html) of the project website. This documentation is for Spark version {{site.SPARK_VERSION}}. The downloads page
16-
contains Spark packages for many popular HDFS versions. If you'd like to build Spark from
17-
scratch, visit [Building Spark](building-spark.html).
15+
Get Spark from the [downloads page](http://spark.apache.org/downloads.html) of the project website. This documentation is for Spark version {{site.SPARK_VERSION}}. Spark uses Hadoop's client libraries for HDFS and YARN. Downloads are pre-packaged for a handful of popular Hadoop versions.
16+
Users can also download a "Hadoop free" binary and run Spark with any Hadoop version
17+
[by augmenting Spark's classpath](hadoop-provided.html).
18+
19+
If you'd like to build Spark from
20+
source, visit [Building Spark](building-spark.html).
21+
1822

1923
Spark runs on both Windows and UNIX-like systems (e.g. Linux, Mac OS). It's easy to run
2024
locally on one machine --- all you need is to have `java` installed on your system `PATH`,

0 commit comments

Comments
 (0)