Skip to content

Conversation

steveloughran
Copy link
Contributor

@steveloughran steveloughran commented Aug 15, 2016

What changes were proposed in this pull request?

Adds a snapshots-and-staging profile so that RCs of projects like Hadoop and HBase can be used in developer-only build and test runs. There's a comment above the profile telling people not to use this in production.

There's no attempt to do the same for SBT, as Ivy is different.

How was this patch tested?

Tested by building against the Hadoop 2.7.3 RC 1 JARs

without the profile (and without any local copy of the 2.7.3 artifacts), the build failed

mvn install -DskipTests -Pyarn,hadoop-2.7,hive -Dhadoop.version=2.7.3

...

[INFO] ------------------------------------------------------------------------
[INFO] Building Spark Project Launcher 2.1.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
Downloading: https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-client/2.7.3/hadoop-client-2.7.3.pom
[WARNING] The POM for org.apache.hadoop:hadoop-client:jar:2.7.3 is missing, no dependency information available
Downloading: https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-client/2.7.3/hadoop-client-2.7.3.jar
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO] 
[INFO] Spark Project Parent POM ........................... SUCCESS [  4.482 s]
[INFO] Spark Project Tags ................................. SUCCESS [ 17.402 s]
[INFO] Spark Project Sketch ............................... SUCCESS [ 11.252 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 13.458 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [  9.043 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 16.027 s]
[INFO] Spark Project Launcher ............................. FAILURE [  1.653 s]
[INFO] Spark Project Core ................................. SKIPPED
...

With the profile, the build completed

mvn install -DskipTests -Pyarn,hadoop-2.7,hive,snapshots-and-staging -Dhadoop.version=2.7.3

Adds a snapshots-and-staging profile so that  RCs of projects like Hadoop and HBase can be used in developer-only build and test runs. There's a comment above the profile telling people not to use this in production.
@SparkQA
Copy link

SparkQA commented Aug 15, 2016

Test build #63782 has finished for PR 14646 at commit 09d96be.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@steveloughran
Copy link
Contributor Author

Note that Jenkins, being SBT-based, isn't going to explore the codepath here

@srowen
Copy link
Member

srowen commented Aug 15, 2016

How about adding this to the default set of repos rather than putting it behind a profile -- does that cause a problem?

@steveloughran
Copy link
Contributor Author

I'd be against making it default for a few reasons

  1. You don't want to accidentally pick up some staging artifact or upstream snapshot.
  2. I don't know how SBT/Ivy handles remote staging artifact location; having looked through existing JIRAs related to differences between SBT and maven repositories, I didn't want to diverge things.
  3. If you are doing local dev of upstream code, it's actually a real pain when a build suddenly decides to get a remote snapshot of an artifact which you haven't built locally that day. You generally do prefer the build to halt than for it to pick up some remote snapshot which doesn't have your code in. Best bit: maven will do this inside a project itself if the build spans midnight.

Keeping it isolated avoids that.

What I could do is add a section on this to the building-spark doc, to avoid people having to read through the POM to find it

@rxin
Copy link
Contributor

rxin commented Aug 16, 2016

Are you thinking about testing Hadoop versions that are unpublished? But if that's the case, you'd need to modify the version anyway. Why not just add this when you need to do those tests?

@steveloughran
Copy link
Contributor Author

I'm adding the ability to test against staged releases, such as Hadoop 2.7.3 RC1. Add this profile and testing that spark runs with the new RC is a matter of setting the version with a -D and ask for staging artifacts -there's no need to edit the POMs at all:

dev/make-distribution.sh  -Pyarn,hadoop-2.7,snapshots-and-staging -Dhadoop.version=2.7.3

If all I wanted to do was test with locally built stuff, I wouldn't need the profile; just do the mvn install in Hadoop then build spark with -Dhadoop.version=2.8.0-SNAPSHOT; this works perfectly well. What this patch adds is the ability to test against the real ASF RC artifacts, so do regression testing against them.

I used this as part of the review of the RC; it'll need to be repeated when the 2.8.x RCs are out.

+1 binding


1. built and tested apache slider (incubating) against the Hadoop 2.7.3 artifacts

2. did a build & test of Apache Spark master branch iwth 2.7.3 JARs, 

For that I had to tweak spark's build to support the staging repo; hopefully that will get into Spark 

https://issues.apache.org/jira/browse/SPARK-17058

3. did a test run of my WiP SPARK-7481 spark-cloud module; after fixing a couple of things on the test setup side related to HADOOP-13058, 

    mvn test --pl cloud -Pyarn,hadoop-2.7,snapshots-and-staging -Dhadoop.version=2.7.3 -Dcloud.test.configuration.file=../conf/cloud-tests.xml

all was well —albeit measurably slower than Hadoop 2.8. That's proof that the 2.8 version of s3a really does deliver measurable speedup for those tests (currently just file input/seek; more to come). I had originally thought things were broken as s3 init was failing -but that's because the s3 bucket was in frankfurt, and the AWS library used can't talk to that endpoint (v4 auth protocol, see).

4. did a full spark distribution build of that SPARK-7481 branch

    dev/make-distribution.sh  -Pyarn,hadoop-2.7,snapshots-and-staging -Dhadoop.version=2.7.3

ran command line test to do read of s3a data:

    bin/spark-submit --class org.apache.spark.cloud.s3.examples.S3LineCount \
                      --conf spark.hadoop.fs.s3a.access.key=$AWS_KEY \
                      --conf spark.hadoop.fs.s3a.secret.key=$AWS_SECRET \
                      examples/jars/spark-examples_2.11-2.1.0-SNAPSHOT.jar

5. Pulled out the microsoft Azure JAR azure-storage-2.0.0.jar and repeated step 4

-this showed that the 2.7.x branch does handle the failure to load a filesystem due to dependency or other classloading problems —this was proving a big problem in adding the aws & azure stuff to the spark build, as it'd stop spark from starting up if the dependencies were absent.

I've not done any of the .tar.gz diligence; I've just looked at the staged JARs and how they worked with downstream apps —that being a key way that Hadoop artifacts are adopted.

@SparkQA
Copy link

SparkQA commented Sep 28, 2016

Test build #66019 has finished for PR 14646 at commit 09d96be.

  • This patch passes all tests.
  • This patch merges cleanly.
  • This patch adds no public classes.

@steveloughran
Copy link
Contributor Author

Has anyone had a chance to review this. It's nicely self-contained, makes it easier to use Spark as regression testing for ASF prerelease binaries of any dependent project.

Copy link
Member

@srowen srowen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's OK to add. It's off by default, isn't complicated, and is actually required if you want to test against an ASF snapshot, and I think that's a reasonable use case.

@rxin
Copy link
Contributor

rxin commented Nov 2, 2016

OK for the sake of moving it forward I'm going to merge this.

@asfgit asfgit closed this in 37d9522 Nov 2, 2016
asfgit pushed a commit that referenced this pull request Nov 2, 2016
…/test against staging artifacts

## What changes were proposed in this pull request?

Adds a `snapshots-and-staging profile` so that  RCs of projects like Hadoop and HBase can be used in developer-only build and test runs. There's a comment above the profile telling people not to use this in production.

There's no attempt to do the same for SBT, as Ivy is different.
## How was this patch tested?

Tested by building against the Hadoop 2.7.3 RC 1 JARs

without the profile (and without any local copy of the 2.7.3 artifacts), the build failed

```
mvn install -DskipTests -Pyarn,hadoop-2.7,hive -Dhadoop.version=2.7.3

...

[INFO] ------------------------------------------------------------------------
[INFO] Building Spark Project Launcher 2.1.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
Downloading: https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-client/2.7.3/hadoop-client-2.7.3.pom
[WARNING] The POM for org.apache.hadoop:hadoop-client:jar:2.7.3 is missing, no dependency information available
Downloading: https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-client/2.7.3/hadoop-client-2.7.3.jar
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [  4.482 s]
[INFO] Spark Project Tags ................................. SUCCESS [ 17.402 s]
[INFO] Spark Project Sketch ............................... SUCCESS [ 11.252 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 13.458 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [  9.043 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 16.027 s]
[INFO] Spark Project Launcher ............................. FAILURE [  1.653 s]
[INFO] Spark Project Core ................................. SKIPPED
...
```

With the profile, the build completed

```
mvn install -DskipTests -Pyarn,hadoop-2.7,hive,snapshots-and-staging -Dhadoop.version=2.7.3
```

Author: Steve Loughran <[email protected]>

Closes #14646 from steveloughran/stevel/SPARK-17058-support-asf-snapshots.

(cherry picked from commit 37d9522)
Signed-off-by: Reynold Xin <[email protected]>
@steveloughran
Copy link
Contributor Author

thanks

@steveloughran steveloughran deleted the stevel/SPARK-17058-support-asf-snapshots branch November 2, 2016 19:14
uzadude pushed a commit to uzadude/spark that referenced this pull request Jan 27, 2017
…/test against staging artifacts

## What changes were proposed in this pull request?

Adds a `snapshots-and-staging profile` so that  RCs of projects like Hadoop and HBase can be used in developer-only build and test runs. There's a comment above the profile telling people not to use this in production.

There's no attempt to do the same for SBT, as Ivy is different.
## How was this patch tested?

Tested by building against the Hadoop 2.7.3 RC 1 JARs

without the profile (and without any local copy of the 2.7.3 artifacts), the build failed

```
mvn install -DskipTests -Pyarn,hadoop-2.7,hive -Dhadoop.version=2.7.3

...

[INFO] ------------------------------------------------------------------------
[INFO] Building Spark Project Launcher 2.1.0-SNAPSHOT
[INFO] ------------------------------------------------------------------------
Downloading: https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-client/2.7.3/hadoop-client-2.7.3.pom
[WARNING] The POM for org.apache.hadoop:hadoop-client:jar:2.7.3 is missing, no dependency information available
Downloading: https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-client/2.7.3/hadoop-client-2.7.3.jar
[INFO] ------------------------------------------------------------------------
[INFO] Reactor Summary:
[INFO]
[INFO] Spark Project Parent POM ........................... SUCCESS [  4.482 s]
[INFO] Spark Project Tags ................................. SUCCESS [ 17.402 s]
[INFO] Spark Project Sketch ............................... SUCCESS [ 11.252 s]
[INFO] Spark Project Networking ........................... SUCCESS [ 13.458 s]
[INFO] Spark Project Shuffle Streaming Service ............ SUCCESS [  9.043 s]
[INFO] Spark Project Unsafe ............................... SUCCESS [ 16.027 s]
[INFO] Spark Project Launcher ............................. FAILURE [  1.653 s]
[INFO] Spark Project Core ................................. SKIPPED
...
```

With the profile, the build completed

```
mvn install -DskipTests -Pyarn,hadoop-2.7,hive,snapshots-and-staging -Dhadoop.version=2.7.3
```

Author: Steve Loughran <[email protected]>

Closes apache#14646 from steveloughran/stevel/SPARK-17058-support-asf-snapshots.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants