-
Notifications
You must be signed in to change notification settings - Fork 28.7k
SPARK-3069 [DOCS] Build instructions in README are outdated #2014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
b1c04a1
8e83934
c18d140
999544e
91c921f
be82027
db2bd97
501507e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
## Contributing to Spark | ||
|
||
Contributions via GitHub pull requests are gladly accepted from their original | ||
author. Along with any pull requests, please state that the contribution is | ||
your original work and that you license the work to the project under the | ||
project's open source license. Whether or not you state this explicitly, by | ||
submitting any copyrighted material via pull request, email, or other means | ||
you agree to license the material under the project's open source license and | ||
warrant that you have the legal authority to do so. | ||
|
||
Please see [Contributing to Spark wiki page](https://cwiki.apache.org/SPARK/Contributing+to+Spark) | ||
for more information. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,16 +13,19 @@ and Spark Streaming for stream processing. | |
## Online Documentation | ||
|
||
You can find the latest Spark documentation, including a programming | ||
guide, on the project webpage at <http://spark.apache.org/documentation.html>. | ||
guide, on the [project web page](http://spark.apache.org/documentation.html). | ||
This README file only contains basic setup instructions. | ||
|
||
## Building Spark | ||
|
||
Spark is built on Scala 2.10. To build Spark and its example programs, run: | ||
Spark is built using [Apache Maven](http://maven.apache.org/). | ||
To build Spark and its example programs, run: | ||
|
||
./sbt/sbt assembly | ||
mvn -DskipTests clean package | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Per the related discussion here, I don't think we want to change this. There are definitely two ways on offer to build Spark, but I think Perhaps we should just clarify this here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Actually the officially documented way of building Spark is through maven: http://spark.apache.org/docs/latest/building-with-maven.html. We should keep this consistent with the docs. (The discussion you linked to refers to tests). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, we cleared this up in the main discussion thread. I stand corrected. |
||
|
||
(You do not need to do this if you downloaded a pre-built package.) | ||
More detailed documentation is available from the project site, at | ||
["Building Spark"](http://spark.apache.org/docs/latest/building-spark.html). | ||
|
||
## Interactive Scala Shell | ||
|
||
|
@@ -71,73 +74,24 @@ can be run using: | |
|
||
./dev/run-tests | ||
|
||
Please see the guidance on how to | ||
[run all automated tests](https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-AutomatedTesting) | ||
|
||
## A Note About Hadoop Versions | ||
|
||
Spark uses the Hadoop core library to talk to HDFS and other Hadoop-supported | ||
storage systems. Because the protocols have changed in different versions of | ||
Hadoop, you must build Spark against the same version that your cluster runs. | ||
You can change the version by setting `-Dhadoop.version` when building Spark. | ||
|
||
For Apache Hadoop versions 1.x, Cloudera CDH MRv1, and other Hadoop | ||
versions without YARN, use: | ||
|
||
# Apache Hadoop 1.2.1 | ||
$ sbt/sbt -Dhadoop.version=1.2.1 assembly | ||
|
||
# Cloudera CDH 4.2.0 with MapReduce v1 | ||
$ sbt/sbt -Dhadoop.version=2.0.0-mr1-cdh4.2.0 assembly | ||
|
||
For Apache Hadoop 2.2.X, 2.1.X, 2.0.X, 0.23.x, Cloudera CDH MRv2, and other Hadoop versions | ||
with YARN, also set `-Pyarn`: | ||
|
||
# Apache Hadoop 2.0.5-alpha | ||
$ sbt/sbt -Dhadoop.version=2.0.5-alpha -Pyarn assembly | ||
|
||
# Cloudera CDH 4.2.0 with MapReduce v2 | ||
$ sbt/sbt -Dhadoop.version=2.0.0-cdh4.2.0 -Pyarn assembly | ||
|
||
# Apache Hadoop 2.2.X and newer | ||
$ sbt/sbt -Dhadoop.version=2.2.0 -Pyarn assembly | ||
|
||
When developing a Spark application, specify the Hadoop version by adding the | ||
"hadoop-client" artifact to your project's dependencies. For example, if you're | ||
using Hadoop 1.2.1 and build your application using SBT, add this entry to | ||
`libraryDependencies`: | ||
|
||
"org.apache.hadoop" % "hadoop-client" % "1.2.1" | ||
|
||
If your project is built with Maven, add this to your POM file's `<dependencies>` section: | ||
|
||
<dependency> | ||
<groupId>org.apache.hadoop</groupId> | ||
<artifactId>hadoop-client</artifactId> | ||
<version>1.2.1</version> | ||
</dependency> | ||
|
||
|
||
## A Note About Thrift JDBC server and CLI for Spark SQL | ||
|
||
Spark SQL supports Thrift JDBC server and CLI. | ||
See sql-programming-guide.md for more information about using the JDBC server and CLI. | ||
You can use those features by setting `-Phive` when building Spark as follows. | ||
|
||
$ sbt/sbt -Phive assembly | ||
Please refer to the build documentation at | ||
["Specifying the Hadoop Version"](http://spark.apache.org/docs/latest/building-spark.html#specifying-the-hadoop-version) | ||
for detailed guidance on building for a particular distribution of Hadoop, including | ||
building for particular Hive and Hive Thriftserver distributions. See also | ||
["Third Party Hadoop Distributions"](http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html) | ||
for guidance on building a Spark application that works with a particular | ||
distribution. | ||
|
||
## Configuration | ||
|
||
Please refer to the [Configuration guide](http://spark.apache.org/docs/latest/configuration.html) | ||
in the online documentation for an overview on how to configure Spark. | ||
|
||
|
||
## Contributing to Spark | ||
|
||
Contributions via GitHub pull requests are gladly accepted from their original | ||
author. Along with any pull requests, please state that the contribution is | ||
your original work and that you license the work to the project under the | ||
project's open source license. Whether or not you state this explicitly, by | ||
submitting any copyrighted material via pull request, email, or other means | ||
you agree to license the material under the project's open source license and | ||
warrant that you have the legal authority to do so. | ||
|
||
Please see [Contributing to Spark wiki page](https://cwiki.apache.org/SPARK/Contributing+to+Spark) | ||
for more information. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,5 +1,7 @@ | ||
pygments: true | ||
highlighter: pygments | ||
markdown: kramdown | ||
gems: | ||
- jekyll-redirect-from | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Does this config mean that users don't need to install them gem manually? |
||
|
||
# These allow the documentation to be updated with nerw releases | ||
# of Spark, Scala, and Mesos. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@srowen I think it's a good idea to have a CONTRIBUTING file here, for the reasons you explained elsewhere, like this one. I'd actually favor moving the Contributing to Spark page entirely out of the wiki and into here.
I believe Spark accepts contributions entirely through GitHub, so it makes sense to have the contributing instructions live here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, seems fine to have this here. It might make it easier for people to find the contributing wiki page if they start on github.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pwendell Perhaps for a future PR: What do you think about removing the contributing guide from the wiki and having it live exclusively on GitHub? Seems like a better home for it since GitHub is the only way we accept contributions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having this is pretty nice! I like the banner you get when opening a new PR.