Skip to content

Commit 7fb715d

Browse files
FavioVazquezsrowen
authored andcommitted
[SPARK-7249] Updated Hadoop dependencies due to inconsistency in the versions
Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons. Changes proposed by vanzin resulting from previous pull-request #5783 that did not fixed the problem correctly. Please let me know if this is the correct way of doing this, the comments of vanzin are in the pull-request mentioned. Author: FavioVazquez <[email protected]> Closes #5786 from FavioVazquez/update-hadoop-dependencies and squashes the following commits: 11670e5 [FavioVazquez] - Added missing instance of -Phadoop-2.2 in create-release.sh 379f50d [FavioVazquez] - Added instances of -Phadoop-2.2 in create-release.sh, run-tests, scalastyle and building-spark.md - Reconstructed docs to not ask users to rely on default behavior 3f9249d [FavioVazquez] Merge branch 'master' of https://github.com/apache/spark into update-hadoop-dependencies 31bdafa [FavioVazquez] - Added missing instances in -Phadoop-1 in create-release.sh, run-tests and in the building-spark documentation cbb93e8 [FavioVazquez] - Added comment related to SPARK-3710 about hadoop-yarn-server-tests in Hadoop 2.2 that fails to pull some needed dependencies 83dc332 [FavioVazquez] - Cleaned up the main POM concerning the yarn profile - Erased hadoop-2.2 profile from yarn/pom.xml and its content was integrated into yarn/pom.xml 93f7624 [FavioVazquez] - Deleted unnecessary comments and <activation> tag on the YARN profile in the main POM 668d126 [FavioVazquez] - Moved <dependencies> <activation> and <properties> sections of the hadoop-2.2 profile in the YARN POM to the YARN profile in the root POM - Erased unnecessary hadoop-2.2 profile from the YARN POM fda6a51 [FavioVazquez] - Updated hadoop1 releases in create-release.sh due to changes in the default hadoop version set - Erased unnecessary instance of -Dyarn.version=2.2.0 in create-release.sh - Prettify comment in yarn/pom.xml 0470587 [FavioVazquez] - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in create-release.sh - Updated how the releases are made in the create-release.sh no that the default hadoop version is the 2.2.0 - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in scalastyle - Erased unnecessary instance of -Phadoop-2.2 -Dhadoop.version=2.2.0 in run-tests - Better example given in the hadoop-third-party-distributions.md now that the default hadoop version is 2.2.0 a650779 [FavioVazquez] - Default value of avro.mapred.classifier has been set to hadoop2 in pom.xml - Cleaned up hadoop-2.3 and 2.4 profiles due to change in the default set in avro.mapred.classifier in pom.xml 199f40b [FavioVazquez] - Erased unnecessary CDH5-specific note in docs/building-spark.md - Remove example of instance -Phadoop-2.2 -Dhadoop.version=2.2.0 in docs/building-spark.md - Enabled hadoop-2.2 profile when the Hadoop version is 2.2.0, which is now the default .Added comment in the yarn/pom.xml to specify that. 88a8b88 [FavioVazquez] - Simplified Hadoop profiles due to new setting of global properties in the pom.xml file - Added comment to specify that the hadoop-2.2 profile is now the default hadoop profile in the pom.xml file - Erased hadoop-2.2 from related hadoop profiles now that is a no-op in the make-distribution.sh file 70b8344 [FavioVazquez] - Fixed typo in the make-distribution.sh file and added hadoop-1 in the Related profiles 287fa2f [FavioVazquez] - Updated documentation about specifying the hadoop version in building-spark. Now is clear that Spark will build against Hadoop 2.2.0 by default. - Added Cloudera CDH 5.3.3 without MapReduce example in the building-spark doc. 1354292 [FavioVazquez] - Fixed hadoop-1 version to match jenkins build profile in hadoop1.0 tests and documentation 6b4bfaf [FavioVazquez] - Cleanup in hadoop-2.x profiles since they contained mostly redundant stuff. 7e9955d [FavioVazquez] - Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons 660decc [FavioVazquez] - Updated Hadoop dependencies due to inconsistency in the versions. Now the global properties are the ones used by the hadoop-2.2 profile, and the profile was set to empty but kept for backwards compatibility reasons ec91ce3 [FavioVazquez] - Updated protobuf-java version of com.google.protobuf dependancy to fix blocking error when connecting to HDFS via the Hadoop Cloudera HDFS CDH5 (fix for 2.5.0-cdh5.3.3 version)
1 parent c1080b6 commit 7fb715d

File tree

8 files changed

+79
-90
lines changed

8 files changed

+79
-90
lines changed

dev/create-release/create-release.sh

Lines changed: 7 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -118,14 +118,14 @@ if [[ ! "$@" =~ --skip-publish ]]; then
118118

119119
rm -rf $SPARK_REPO
120120

121-
build/mvn -DskipTests -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 \
122-
-Pyarn -Phive -Phive-thriftserver -Phadoop-2.2 -Pspark-ganglia-lgpl -Pkinesis-asl \
121+
build/mvn -DskipTests -Pyarn -Phive \
122+
-Phive-thriftserver -Phadoop-2.2 -Pspark-ganglia-lgpl -Pkinesis-asl \
123123
clean install
124124

125125
./dev/change-version-to-2.11.sh
126126

127-
build/mvn -DskipTests -Dhadoop.version=2.2.0 -Dyarn.version=2.2.0 \
128-
-Dscala-2.11 -Pyarn -Phive -Phadoop-2.2 -Pspark-ganglia-lgpl -Pkinesis-asl \
127+
build/mvn -DskipTests -Pyarn -Phive \
128+
-Dscala-2.11 -Phadoop-2.2 -Pspark-ganglia-lgpl -Pkinesis-asl \
129129
clean install
130130

131131
./dev/change-version-to-2.10.sh
@@ -228,9 +228,9 @@ if [[ ! "$@" =~ --skip-package ]]; then
228228

229229
# We increment the Zinc port each time to avoid OOM's and other craziness if multiple builds
230230
# share the same Zinc server.
231-
make_binary_release "hadoop1" "-Phive -Phive-thriftserver -Dhadoop.version=1.0.4" "3030" &
232-
make_binary_release "hadoop1-scala2.11" "-Phive -Dscala-2.11" "3031" &
233-
make_binary_release "cdh4" "-Phive -Phive-thriftserver -Dhadoop.version=2.0.0-mr1-cdh4.2.0" "3032" &
231+
make_binary_release "hadoop1" "-Phadoop-1 -Phive -Phive-thriftserver" "3030" &
232+
make_binary_release "hadoop1-scala2.11" "-Phadoop-1 -Phive -Dscala-2.11" "3031" &
233+
make_binary_release "cdh4" "-Phadoop-1 -Phive -Phive-thriftserver -Dhadoop.version=2.0.0-mr1-cdh4.2.0" "3032" &
234234
make_binary_release "hadoop2.3" "-Phadoop-2.3 -Phive -Phive-thriftserver -Pyarn" "3033" &
235235
make_binary_release "hadoop2.4" "-Phadoop-2.4 -Phive -Phive-thriftserver -Pyarn" "3034" &
236236
make_binary_release "mapr3" "-Pmapr3 -Phive -Phive-thriftserver" "3035" &

dev/run-tests

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -40,11 +40,11 @@ function handle_error () {
4040
{
4141
if [ -n "$AMPLAB_JENKINS_BUILD_PROFILE" ]; then
4242
if [ "$AMPLAB_JENKINS_BUILD_PROFILE" = "hadoop1.0" ]; then
43-
export SBT_MAVEN_PROFILES_ARGS="-Dhadoop.version=1.0.4"
43+
export SBT_MAVEN_PROFILES_ARGS="-Phadoop-1 -Dhadoop.version=1.0.4"
4444
elif [ "$AMPLAB_JENKINS_BUILD_PROFILE" = "hadoop2.0" ]; then
45-
export SBT_MAVEN_PROFILES_ARGS="-Dhadoop.version=2.0.0-mr1-cdh4.1.1"
45+
export SBT_MAVEN_PROFILES_ARGS="-Phadoop-1 -Dhadoop.version=2.0.0-mr1-cdh4.1.1"
4646
elif [ "$AMPLAB_JENKINS_BUILD_PROFILE" = "hadoop2.2" ]; then
47-
export SBT_MAVEN_PROFILES_ARGS="-Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0"
47+
export SBT_MAVEN_PROFILES_ARGS="-Pyarn -Phadoop-2.2"
4848
elif [ "$AMPLAB_JENKINS_BUILD_PROFILE" = "hadoop2.3" ]; then
4949
export SBT_MAVEN_PROFILES_ARGS="-Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0"
5050
fi

dev/scalastyle

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,8 @@
2020
echo -e "q\n" | build/sbt -Phive -Phive-thriftserver scalastyle > scalastyle.txt
2121
echo -e "q\n" | build/sbt -Phive -Phive-thriftserver test:scalastyle >> scalastyle.txt
2222
# Check style with YARN built too
23-
echo -e "q\n" | build/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 scalastyle >> scalastyle.txt
24-
echo -e "q\n" | build/sbt -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 test:scalastyle >> scalastyle.txt
23+
echo -e "q\n" | build/sbt -Pyarn -Phadoop-2.2 scalastyle >> scalastyle.txt
24+
echo -e "q\n" | build/sbt -Pyarn -Phadoop-2.2 test:scalastyle >> scalastyle.txt
2525

2626
ERRORS=$(cat scalastyle.txt | awk '{if($1~/error/)print}')
2727
rm scalastyle.txt

docs/building-spark.md

Lines changed: 6 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -59,14 +59,14 @@ You can fix this by setting the `MAVEN_OPTS` variable as discussed before.
5959

6060
# Specifying the Hadoop Version
6161

62-
Because HDFS is not protocol-compatible across versions, if you want to read from HDFS, you'll need to build Spark against the specific HDFS version in your environment. You can do this through the "hadoop.version" property. If unset, Spark will build against Hadoop 1.0.4 by default. Note that certain build profiles are required for particular Hadoop versions:
62+
Because HDFS is not protocol-compatible across versions, if you want to read from HDFS, you'll need to build Spark against the specific HDFS version in your environment. You can do this through the "hadoop.version" property. If unset, Spark will build against Hadoop 2.2.0 by default. Note that certain build profiles are required for particular Hadoop versions:
6363

6464
<table class="table">
6565
<thead>
6666
<tr><th>Hadoop version</th><th>Profile required</th></tr>
6767
</thead>
6868
<tbody>
69-
<tr><td>1.x to 2.1.x</td><td>(none)</td></tr>
69+
<tr><td>1.x to 2.1.x</td><td>hadoop-1</td></tr>
7070
<tr><td>2.2.x</td><td>hadoop-2.2</td></tr>
7171
<tr><td>2.3.x</td><td>hadoop-2.3</td></tr>
7272
<tr><td>2.4.x</td><td>hadoop-2.4</td></tr>
@@ -77,19 +77,20 @@ For Apache Hadoop versions 1.x, Cloudera CDH "mr1" distributions, and other Hado
7777

7878
{% highlight bash %}
7979
# Apache Hadoop 1.2.1
80-
mvn -Dhadoop.version=1.2.1 -DskipTests clean package
80+
mvn -Dhadoop.version=1.2.1 -Phadoop-1 -DskipTests clean package
8181

8282
# Cloudera CDH 4.2.0 with MapReduce v1
83-
mvn -Dhadoop.version=2.0.0-mr1-cdh4.2.0 -DskipTests clean package
83+
mvn -Dhadoop.version=2.0.0-mr1-cdh4.2.0 -Phadoop-1 -DskipTests clean package
8484
{% endhighlight %}
8585

8686
You can enable the "yarn" profile and optionally set the "yarn.version" property if it is different from "hadoop.version". Spark only supports YARN versions 2.2.0 and later.
8787

8888
Examples:
8989

9090
{% highlight bash %}
91+
9192
# Apache Hadoop 2.2.X
92-
mvn -Pyarn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests clean package
93+
mvn -Pyarn -Phadoop-2.2 -DskipTests clean package
9394

9495
# Apache Hadoop 2.3.X
9596
mvn -Pyarn -Phadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests clean package

docs/hadoop-third-party-distributions.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,7 +14,7 @@ property. For certain versions, you will need to specify additional profiles. Fo
1414
see the guide on [building with maven](building-spark.html#specifying-the-hadoop-version):
1515

1616
mvn -Dhadoop.version=1.0.4 -DskipTests clean package
17-
mvn -Phadoop-2.2 -Dhadoop.version=2.2.0 -DskipTests clean package
17+
mvn -Phadoop-2.3 -Dhadoop.version=2.3.0 -DskipTests clean package
1818

1919
The table below lists the corresponding `hadoop.version` code for each CDH/HDP release. Note that
2020
some Hadoop releases are binary compatible across client versions. This means the pre-built Spark

make-distribution.sh

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ while (( "$#" )); do
5858
--hadoop)
5959
echo "Error: '--hadoop' is no longer supported:"
6060
echo "Error: use Maven profiles and options -Dhadoop.version and -Dyarn.version instead."
61-
echo "Error: Related profiles include hadoop-2.2, hadoop-2.3 and hadoop-2.4."
61+
echo "Error: Related profiles include hadoop-1, hadoop-2.2, hadoop-2.3 and hadoop-2.4."
6262
exit_with_usage
6363
;;
6464
--with-yarn)

pom.xml

Lines changed: 15 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -122,9 +122,9 @@
122122
<slf4j.version>1.7.10</slf4j.version>
123123
<log4j.version>1.2.17</log4j.version>
124124
<hadoop.version>2.2.0</hadoop.version>
125-
<protobuf.version>2.4.1</protobuf.version>
125+
<protobuf.version>2.5.0</protobuf.version>
126126
<yarn.version>${hadoop.version}</yarn.version>
127-
<hbase.version>0.98.7-hadoop1</hbase.version>
127+
<hbase.version>0.98.7-hadoop2</hbase.version>
128128
<hbase.artifact>hbase</hbase.artifact>
129129
<flume.version>1.4.0</flume.version>
130130
<zookeeper.version>3.4.5</zookeeper.version>
@@ -143,7 +143,7 @@
143143
<oro.version>2.0.8</oro.version>
144144
<codahale.metrics.version>3.1.0</codahale.metrics.version>
145145
<avro.version>1.7.7</avro.version>
146-
<avro.mapred.classifier></avro.mapred.classifier>
146+
<avro.mapred.classifier>hadoop2</avro.mapred.classifier>
147147
<jets3t.version>0.7.1</jets3t.version>
148148
<aws.java.sdk.version>1.8.3</aws.java.sdk.version>
149149
<aws.kinesis.client.version>1.1.0</aws.kinesis.client.version>
@@ -155,7 +155,7 @@
155155
<jline.version>${scala.version}</jline.version>
156156
<jline.groupid>org.scala-lang</jline.groupid>
157157
<jodd.version>3.6.3</jodd.version>
158-
<codehaus.jackson.version>1.8.8</codehaus.jackson.version>
158+
<codehaus.jackson.version>1.9.13</codehaus.jackson.version>
159159
<fasterxml.jackson.version>2.4.4</fasterxml.jackson.version>
160160
<snappy.version>1.1.1.7</snappy.version>
161161
<netlib.java.version>1.1.2</netlib.java.version>
@@ -1644,39 +1644,36 @@
16441644
-->
16451645

16461646
<profile>
1647-
<id>hadoop-2.2</id>
1647+
<id>hadoop-1</id>
16481648
<properties>
1649-
<hadoop.version>2.2.0</hadoop.version>
1650-
<protobuf.version>2.5.0</protobuf.version>
1651-
<hbase.version>0.98.7-hadoop2</hbase.version>
1652-
<avro.mapred.classifier>hadoop2</avro.mapred.classifier>
1653-
<codehaus.jackson.version>1.9.13</codehaus.jackson.version>
1649+
<hadoop.version>1.0.4</hadoop.version>
1650+
<protobuf.version>2.4.1</protobuf.version>
1651+
<hbase.version>0.98.7-hadoop1</hbase.version>
1652+
<avro.mapred.classifier>hadoop1</avro.mapred.classifier>
1653+
<codehaus.jackson.version>1.8.8</codehaus.jackson.version>
16541654
</properties>
16551655
</profile>
16561656

1657+
<profile>
1658+
<id>hadoop-2.2</id>
1659+
<!-- SPARK-7249: Default hadoop profile. Uses global properties. -->
1660+
</profile>
1661+
16571662
<profile>
16581663
<id>hadoop-2.3</id>
16591664
<properties>
16601665
<hadoop.version>2.3.0</hadoop.version>
1661-
<protobuf.version>2.5.0</protobuf.version>
16621666
<jets3t.version>0.9.3</jets3t.version>
1663-
<hbase.version>0.98.7-hadoop2</hbase.version>
16641667
<commons.math3.version>3.1.1</commons.math3.version>
1665-
<avro.mapred.classifier>hadoop2</avro.mapred.classifier>
1666-
<codehaus.jackson.version>1.9.13</codehaus.jackson.version>
16671668
</properties>
16681669
</profile>
16691670

16701671
<profile>
16711672
<id>hadoop-2.4</id>
16721673
<properties>
16731674
<hadoop.version>2.4.0</hadoop.version>
1674-
<protobuf.version>2.5.0</protobuf.version>
16751675
<jets3t.version>0.9.3</jets3t.version>
1676-
<hbase.version>0.98.7-hadoop2</hbase.version>
16771676
<commons.math3.version>3.1.1</commons.math3.version>
1678-
<avro.mapred.classifier>hadoop2</avro.mapred.classifier>
1679-
<codehaus.jackson.version>1.9.13</codehaus.jackson.version>
16801677
</properties>
16811678
</profile>
16821679

yarn/pom.xml

Lines changed: 44 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@
3030
<name>Spark Project YARN</name>
3131
<properties>
3232
<sbt.project.name>yarn</sbt.project.name>
33+
<jersey.version>1.9</jersey.version>
3334
</properties>
3435

3536
<dependencies>
@@ -85,7 +86,12 @@
8586
<artifactId>jetty-servlet</artifactId>
8687
</dependency>
8788
<!-- End of shaded deps. -->
88-
89+
90+
<!--
91+
See SPARK-3710. hadoop-yarn-server-tests in Hadoop 2.2 fails to pull some needed
92+
dependencies, so they need to be added manually for the tests to work.
93+
-->
94+
8995
<dependency>
9096
<groupId>org.apache.hadoop</groupId>
9197
<artifactId>hadoop-yarn-server-tests</artifactId>
@@ -97,59 +103,44 @@
97103
<artifactId>mockito-all</artifactId>
98104
<scope>test</scope>
99105
</dependency>
106+
<dependency>
107+
<groupId>org.mortbay.jetty</groupId>
108+
<artifactId>jetty</artifactId>
109+
<version>6.1.26</version>
110+
<exclusions>
111+
<exclusion>
112+
<groupId>org.mortbay.jetty</groupId>
113+
<artifactId>servlet-api</artifactId>
114+
</exclusion>
115+
</exclusions>
116+
<scope>test</scope>
117+
</dependency>
118+
<dependency>
119+
<groupId>com.sun.jersey</groupId>
120+
<artifactId>jersey-core</artifactId>
121+
<version>${jersey.version}</version>
122+
<scope>test</scope>
123+
</dependency>
124+
<dependency>
125+
<groupId>com.sun.jersey</groupId>
126+
<artifactId>jersey-json</artifactId>
127+
<version>${jersey.version}</version>
128+
<scope>test</scope>
129+
<exclusions>
130+
<exclusion>
131+
<groupId>stax</groupId>
132+
<artifactId>stax-api</artifactId>
133+
</exclusion>
134+
</exclusions>
135+
</dependency>
136+
<dependency>
137+
<groupId>com.sun.jersey</groupId>
138+
<artifactId>jersey-server</artifactId>
139+
<version>${jersey.version}</version>
140+
<scope>test</scope>
141+
</dependency>
100142
</dependencies>
101-
102-
<!--
103-
See SPARK-3710. hadoop-yarn-server-tests in Hadoop 2.2 fails to pull some needed
104-
dependencies, so they need to be added manually for the tests to work.
105-
-->
106-
<profiles>
107-
<profile>
108-
<id>hadoop-2.2</id>
109-
<properties>
110-
<jersey.version>1.9</jersey.version>
111-
</properties>
112-
<dependencies>
113-
<dependency>
114-
<groupId>org.mortbay.jetty</groupId>
115-
<artifactId>jetty</artifactId>
116-
<version>6.1.26</version>
117-
<exclusions>
118-
<exclusion>
119-
<groupId>org.mortbay.jetty</groupId>
120-
<artifactId>servlet-api</artifactId>
121-
</exclusion>
122-
</exclusions>
123-
<scope>test</scope>
124-
</dependency>
125-
<dependency>
126-
<groupId>com.sun.jersey</groupId>
127-
<artifactId>jersey-core</artifactId>
128-
<version>${jersey.version}</version>
129-
<scope>test</scope>
130-
</dependency>
131-
<dependency>
132-
<groupId>com.sun.jersey</groupId>
133-
<artifactId>jersey-json</artifactId>
134-
<version>${jersey.version}</version>
135-
<scope>test</scope>
136-
<exclusions>
137-
<exclusion>
138-
<groupId>stax</groupId>
139-
<artifactId>stax-api</artifactId>
140-
</exclusion>
141-
</exclusions>
142-
</dependency>
143-
<dependency>
144-
<groupId>com.sun.jersey</groupId>
145-
<artifactId>jersey-server</artifactId>
146-
<version>${jersey.version}</version>
147-
<scope>test</scope>
148-
</dependency>
149-
</dependencies>
150-
</profile>
151-
</profiles>
152-
143+
153144
<build>
154145
<outputDirectory>target/scala-${scala.binary.version}/classes</outputDirectory>
155146
<testOutputDirectory>target/scala-${scala.binary.version}/test-classes</testOutputDirectory>

0 commit comments

Comments
 (0)