[SPARK-17058] [build] Add maven snapshots-and-staging profile to build/test against staging artifacts #14646

steveloughran · 2016-08-15T11:17:30Z

What changes were proposed in this pull request?

Adds a snapshots-and-staging profile so that RCs of projects like Hadoop and HBase can be used in developer-only build and test runs. There's a comment above the profile telling people not to use this in production.

There's no attempt to do the same for SBT, as Ivy is different.

How was this patch tested?

Tested by building against the Hadoop 2.7.3 RC 1 JARs

without the profile (and without any local copy of the 2.7.3 artifacts), the build failed

mvn install -DskipTests -Pyarn,hadoop-2.7,hive -Dhadoop.version=2.7.3

...

[INFO] ------------------------------------------------------------------------
[INFO] Building Spark Project Launcher 2.1.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
Downloading: https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-client/2.7.3/hadoop-client-2.7.3.pom
[WARNING] The POM for org.apache.hadoop:hadoop-client:jar:2.7.3 is missing, no dependency information available
Downloading: https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-client/2.7.3/hadoop-client-2.7.3.jar
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Spark Project Parent POM ........................... SUCCESS [  4.482 s]
[INFO] Spark Project Tags ................................. SUCCESS [ 17.402 s]
[INFO] Spark Project Sketch ............................... SUCCESS [ 11.252 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 13.458 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [  9.043 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 16.027 s]
[INFO] Spark Project Launcher ............................. FAILURE [  1.653 s]
[INFO] Spark Project Core ................................. SKIPPED
...

With the profile, the build completed

mvn install -DskipTests -Pyarn,hadoop-2.7,hive,snapshots-and-staging -Dhadoop.version=2.7.3

Adds a snapshots-and-staging profile so that RCs of projects like Hadoop and HBase can be used in developer-only build and test runs. There's a comment above the profile telling people not to use this in production.

SparkQA · 2016-08-15T13:25:12Z

Test build #63782 has finished for PR 14646 at commit 09d96be.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

steveloughran · 2016-08-15T14:19:20Z

Note that Jenkins, being SBT-based, isn't going to explore the codepath here

srowen · 2016-08-15T15:02:03Z

How about adding this to the default set of repos rather than putting it behind a profile -- does that cause a problem?

steveloughran · 2016-08-15T16:45:02Z

I'd be against making it default for a few reasons

You don't want to accidentally pick up some staging artifact or upstream snapshot.
I don't know how SBT/Ivy handles remote staging artifact location; having looked through existing JIRAs related to differences between SBT and maven repositories, I didn't want to diverge things.
If you are doing local dev of upstream code, it's actually a real pain when a build suddenly decides to get a remote snapshot of an artifact which you haven't built locally that day. You generally do prefer the build to halt than for it to pick up some remote snapshot which doesn't have your code in. Best bit: maven will do this inside a project itself if the build spans midnight.

Keeping it isolated avoids that.

What I could do is add a section on this to the building-spark doc, to avoid people having to read through the POM to find it

rxin · 2016-08-16T07:49:19Z

Are you thinking about testing Hadoop versions that are unpublished? But if that's the case, you'd need to modify the version anyway. Why not just add this when you need to do those tests?

steveloughran · 2016-08-16T09:47:51Z

I'm adding the ability to test against staged releases, such as Hadoop 2.7.3 RC1. Add this profile and testing that spark runs with the new RC is a matter of setting the version with a -D and ask for staging artifacts -there's no need to edit the POMs at all:

dev/make-distribution.sh  -Pyarn,hadoop-2.7,snapshots-and-staging -Dhadoop.version=2.7.3

If all I wanted to do was test with locally built stuff, I wouldn't need the profile; just do the mvn install in Hadoop then build spark with -Dhadoop.version=2.8.0-SNAPSHOT; this works perfectly well. What this patch adds is the ability to test against the real ASF RC artifacts, so do regression testing against them.

I used this as part of the review of the RC; it'll need to be repeated when the 2.8.x RCs are out.

+1 binding


1. built and tested apache slider (incubating) against the Hadoop 2.7.3 artifacts

2. did a build & test of Apache Spark master branch iwth 2.7.3 JARs, 

For that I had to tweak spark's build to support the staging repo; hopefully that will get into Spark 

https://issues.apache.org/jira/browse/SPARK-17058

3. did a test run of my WiP SPARK-7481 spark-cloud module; after fixing a couple of things on the test setup side related to HADOOP-13058, 

    mvn test --pl cloud -Pyarn,hadoop-2.7,snapshots-and-staging -Dhadoop.version=2.7.3 -Dcloud.test.configuration.file=../conf/cloud-tests.xml

all was well —albeit measurably slower than Hadoop 2.8. That's proof that the 2.8 version of s3a really does deliver measurable speedup for those tests (currently just file input/seek; more to come). I had originally thought things were broken as s3 init was failing -but that's because the s3 bucket was in frankfurt, and the AWS library used can't talk to that endpoint (v4 auth protocol, see).

4. did a full spark distribution build of that SPARK-7481 branch

    dev/make-distribution.sh  -Pyarn,hadoop-2.7,snapshots-and-staging -Dhadoop.version=2.7.3

ran command line test to do read of s3a data:

    bin/spark-submit --class org.apache.spark.cloud.s3.examples.S3LineCount \
                      --conf spark.hadoop.fs.s3a.access.key=$AWS_KEY \
                      --conf spark.hadoop.fs.s3a.secret.key=$AWS_SECRET \
                      examples/jars/spark-examples_2.11-2.1.0-SNAPSHOT.jar

5. Pulled out the microsoft Azure JAR azure-storage-2.0.0.jar and repeated step 4

-this showed that the 2.7.x branch does handle the failure to load a filesystem due to dependency or other classloading problems —this was proving a big problem in adding the aws & azure stuff to the spark build, as it'd stop spark from starting up if the dependencies were absent.

I've not done any of the .tar.gz diligence; I've just looked at the staged JARs and how they worked with downstream apps —that being a key way that Hadoop artifacts are adopted.

SparkQA · 2016-09-28T06:40:31Z

Test build #66019 has finished for PR 14646 at commit 09d96be.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

steveloughran · 2016-11-02T12:04:36Z

Has anyone had a chance to review this. It's nicely self-contained, makes it easier to use Spark as regression testing for ASF prerelease binaries of any dependent project.

srowen

I think it's OK to add. It's off by default, isn't complicated, and is actually required if you want to test against an ASF snapshot, and I think that's a reasonable use case.

rxin · 2016-11-02T18:51:39Z

OK for the sake of moving it forward I'm going to merge this.

…/test against staging artifacts ## What changes were proposed in this pull request? Adds a `snapshots-and-staging profile` so that RCs of projects like Hadoop and HBase can be used in developer-only build and test runs. There's a comment above the profile telling people not to use this in production. There's no attempt to do the same for SBT, as Ivy is different. ## How was this patch tested? Tested by building against the Hadoop 2.7.3 RC 1 JARs without the profile (and without any local copy of the 2.7.3 artifacts), the build failed ``` mvn install -DskipTests -Pyarn,hadoop-2.7,hive -Dhadoop.version=2.7.3 ... [INFO] ------------------------------------------------------------------------ [INFO] Building Spark Project Launcher 2.1.0-SNAPSHOT [INFO] ------------------------------------------------------------------------ Downloading: https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-client/2.7.3/hadoop-client-2.7.3.pom [WARNING] The POM for org.apache.hadoop:hadoop-client:jar:2.7.3 is missing, no dependency information available Downloading: https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-client/2.7.3/hadoop-client-2.7.3.jar [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM ........................... SUCCESS [ 4.482 s] [INFO] Spark Project Tags ................................. SUCCESS [ 17.402 s] [INFO] Spark Project Sketch ............................... SUCCESS [ 11.252 s] [INFO] Spark Project Networking ........................... SUCCESS [ 13.458 s] [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 9.043 s] [INFO] Spark Project Unsafe ............................... SUCCESS [ 16.027 s] [INFO] Spark Project Launcher ............................. FAILURE [ 1.653 s] [INFO] Spark Project Core ................................. SKIPPED ... ``` With the profile, the build completed ``` mvn install -DskipTests -Pyarn,hadoop-2.7,hive,snapshots-and-staging -Dhadoop.version=2.7.3 ``` Author: Steve Loughran <[email protected]> Closes #14646 from steveloughran/stevel/SPARK-17058-support-asf-snapshots. (cherry picked from commit 37d9522) Signed-off-by: Reynold Xin <[email protected]>

steveloughran · 2016-11-02T19:13:26Z

thanks

…/test against staging artifacts ## What changes were proposed in this pull request? Adds a `snapshots-and-staging profile` so that RCs of projects like Hadoop and HBase can be used in developer-only build and test runs. There's a comment above the profile telling people not to use this in production. There's no attempt to do the same for SBT, as Ivy is different. ## How was this patch tested? Tested by building against the Hadoop 2.7.3 RC 1 JARs without the profile (and without any local copy of the 2.7.3 artifacts), the build failed ``` mvn install -DskipTests -Pyarn,hadoop-2.7,hive -Dhadoop.version=2.7.3 ... [INFO] ------------------------------------------------------------------------ [INFO] Building Spark Project Launcher 2.1.0-SNAPSHOT [INFO] ------------------------------------------------------------------------ Downloading: https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-client/2.7.3/hadoop-client-2.7.3.pom [WARNING] The POM for org.apache.hadoop:hadoop-client:jar:2.7.3 is missing, no dependency information available Downloading: https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-client/2.7.3/hadoop-client-2.7.3.jar [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Spark Project Parent POM ........................... SUCCESS [ 4.482 s] [INFO] Spark Project Tags ................................. SUCCESS [ 17.402 s] [INFO] Spark Project Sketch ............................... SUCCESS [ 11.252 s] [INFO] Spark Project Networking ........................... SUCCESS [ 13.458 s] [INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [ 9.043 s] [INFO] Spark Project Unsafe ............................... SUCCESS [ 16.027 s] [INFO] Spark Project Launcher ............................. FAILURE [ 1.653 s] [INFO] Spark Project Core ................................. SKIPPED ... ``` With the profile, the build completed ``` mvn install -DskipTests -Pyarn,hadoop-2.7,hive,snapshots-and-staging -Dhadoop.version=2.7.3 ``` Author: Steve Loughran <[email protected]> Closes apache#14646 from steveloughran/stevel/SPARK-17058-support-asf-snapshots.

[SPARK-17058]

09d96be

Adds a snapshots-and-staging profile so that RCs of projects like Hadoop and HBase can be used in developer-only build and test runs. There's a comment above the profile telling people not to use this in production.

srowen mentioned this pull request Sep 20, 2016

[SPARK-17558] Bump Hadoop 2.7 version from 2.7.2 to 2.7.3 #15115

Closed

srowen approved these changes Nov 2, 2016

View reviewed changes

asfgit closed this in 37d9522 Nov 2, 2016

steveloughran deleted the stevel/SPARK-17058-support-asf-snapshots branch November 2, 2016 19:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[SPARK-17058] [build] Add maven snapshots-and-staging profile to build/test against staging artifacts #14646

[SPARK-17058] [build] Add maven snapshots-and-staging profile to build/test against staging artifacts #14646

Uh oh!

steveloughran commented Aug 15, 2016 •

edited

Loading

Uh oh!

SparkQA commented Aug 15, 2016

Uh oh!

steveloughran commented Aug 15, 2016

Uh oh!

srowen commented Aug 15, 2016

Uh oh!

steveloughran commented Aug 15, 2016

Uh oh!

rxin commented Aug 16, 2016

Uh oh!

steveloughran commented Aug 16, 2016

Uh oh!

SparkQA commented Sep 28, 2016

Uh oh!

steveloughran commented Nov 2, 2016

Uh oh!

srowen left a comment

Uh oh!

rxin commented Nov 2, 2016

Uh oh!

steveloughran commented Nov 2, 2016

Uh oh!

Uh oh!

[SPARK-17058] [build] Add maven snapshots-and-staging profile to build/test against staging artifacts #14646

[SPARK-17058] [build] Add maven snapshots-and-staging profile to build/test against staging artifacts #14646

Uh oh!

Conversation

steveloughran commented Aug 15, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented Aug 15, 2016

Uh oh!

steveloughran commented Aug 15, 2016

Uh oh!

srowen commented Aug 15, 2016

Uh oh!

steveloughran commented Aug 15, 2016

Uh oh!

rxin commented Aug 16, 2016

Uh oh!

steveloughran commented Aug 16, 2016

Uh oh!

SparkQA commented Sep 28, 2016

Uh oh!

steveloughran commented Nov 2, 2016

Uh oh!

srowen left a comment

Choose a reason for hiding this comment

Uh oh!

rxin commented Nov 2, 2016

Uh oh!

steveloughran commented Nov 2, 2016

Uh oh!

Uh oh!

steveloughran commented Aug 15, 2016 •

edited

Loading