Skip to content

Commit a24fefc

Browse files
committed
Merge remote-tracking branch 'apache/master' into state-cleanup
Conflicts: core/src/main/scala/org/apache/spark/MapOutputTracker.scala core/src/main/scala/org/apache/spark/SparkContext.scala core/src/main/scala/org/apache/spark/scheduler/ResultTask.scala core/src/main/scala/org/apache/spark/scheduler/ShuffleMapTask.scala core/src/main/scala/org/apache/spark/storage/BlockManager.scala core/src/main/scala/org/apache/spark/util/TimeStampedHashMap.scala core/src/test/scala/org/apache/spark/MapOutputTrackerSuite.scala
2 parents 8512612 + 5d98cfc commit a24fefc

File tree

503 files changed

+11427
-4236
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

503 files changed

+11427
-4236
lines changed

LICENSE

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -396,3 +396,35 @@ INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
396396
CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
397397
ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
398398
POSSIBILITY OF SUCH DAMAGE.
399+
400+
401+
========================================================================
402+
For sbt and sbt-launch-lib.bash in sbt/:
403+
========================================================================
404+
405+
// Generated from http://www.opensource.org/licenses/bsd-license.php
406+
Copyright (c) 2011, Paul Phillips.
407+
All rights reserved.
408+
409+
Redistribution and use in source and binary forms, with or without
410+
modification, are permitted provided that the following conditions are met:
411+
412+
* Redistributions of source code must retain the above copyright notice,
413+
this list of conditions and the following disclaimer.
414+
* Redistributions in binary form must reproduce the above copyright notice,
415+
this list of conditions and the following disclaimer in the documentation
416+
and/or other materials provided with the distribution.
417+
* Neither the name of the author nor the names of its contributors may be
418+
used to endorse or promote products derived from this software without
419+
specific prior written permission.
420+
421+
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
422+
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
423+
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
424+
ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
425+
LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
426+
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
427+
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
428+
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
429+
NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE,
430+
EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

README.md

Lines changed: 3 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -1,12 +1,12 @@
11
# Apache Spark
22

3-
Lightning-Fast Cluster Computing - <http://spark.incubator.apache.org/>
3+
Lightning-Fast Cluster Computing - <http://spark.apache.org/>
44

55

66
## Online Documentation
77

88
You can find the latest Spark documentation, including a programming
9-
guide, on the project webpage at <http://spark.incubator.apache.org/documentation.html>.
9+
guide, on the project webpage at <http://spark.apache.org/documentation.html>.
1010
This README file only contains basic setup instructions.
1111

1212

@@ -92,21 +92,10 @@ If your project is built with Maven, add this to your POM file's `<dependencies>
9292

9393
## Configuration
9494

95-
Please refer to the [Configuration guide](http://spark.incubator.apache.org/docs/latest/configuration.html)
95+
Please refer to the [Configuration guide](http://spark.apache.org/docs/latest/configuration.html)
9696
in the online documentation for an overview on how to configure Spark.
9797

9898

99-
## Apache Incubator Notice
100-
101-
Apache Spark is an effort undergoing incubation at The Apache Software
102-
Foundation (ASF), sponsored by the Apache Incubator. Incubation is required of
103-
all newly accepted projects until a further review indicates that the
104-
infrastructure, communications, and decision making process have stabilized in
105-
a manner consistent with other successful ASF projects. While incubation status
106-
is not necessarily a reflection of the completeness or stability of the code,
107-
it does indicate that the project has yet to be fully endorsed by the ASF.
108-
109-
11099
## Contributing to Spark
111100

112101
Contributions via GitHub pull requests are gladly accepted from their original

assembly/pom.xml

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,17 +21,20 @@
2121
<parent>
2222
<groupId>org.apache.spark</groupId>
2323
<artifactId>spark-parent</artifactId>
24-
<version>0.9.0-incubating-SNAPSHOT</version>
24+
<version>1.0.0-SNAPSHOT</version>
2525
<relativePath>../pom.xml</relativePath>
2626
</parent>
2727

2828
<groupId>org.apache.spark</groupId>
2929
<artifactId>spark-assembly_2.10</artifactId>
3030
<name>Spark Project Assembly</name>
31-
<url>http://spark.incubator.apache.org/</url>
31+
<url>http://spark.apache.org/</url>
32+
<packaging>pom</packaging>
3233

3334
<properties>
34-
<spark.jar>${project.build.directory}/scala-${scala.binary.version}/${project.artifactId}-${project.version}-hadoop${hadoop.version}.jar</spark.jar>
35+
<spark.jar.dir>scala-${scala.binary.version}</spark.jar.dir>
36+
<spark.jar.basename>${project.artifactId}-${project.version}-hadoop${hadoop.version}.jar</spark.jar.basename>
37+
<spark.jar>${project.build.directory}/${spark.jar.dir}/${spark.jar.basename}</spark.jar>
3538
<deb.pkg.name>spark</deb.pkg.name>
3639
<deb.install.path>/usr/share/spark</deb.install.path>
3740
<deb.user>root</deb.user>

assembly/src/main/assembly/assembly.xml

Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,15 @@
5555
<include>**/*</include>
5656
</includes>
5757
</fileSet>
58+
<fileSet>
59+
<directory>
60+
${project.parent.basedir}/assembly/target/${spark.jar.dir}
61+
</directory>
62+
<outputDirectory>/</outputDirectory>
63+
<includes>
64+
<include>${spark.jar.basename}</include>
65+
</includes>
66+
</fileSet>
5867
</fileSets>
5968

6069
<dependencySets>
@@ -75,6 +84,8 @@
7584
<excludes>
7685
<exclude>org.apache.hadoop:*:jar</exclude>
7786
<exclude>org.apache.spark:*:jar</exclude>
87+
<exclude>org.apache.zookeeper:*:jar</exclude>
88+
<exclude>org.apache.avro:*:jar</exclude>
7889
</excludes>
7990
</dependencySet>
8091
</dependencySets>

bagel/pom.xml

Lines changed: 16 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,15 +21,29 @@
2121
<parent>
2222
<groupId>org.apache.spark</groupId>
2323
<artifactId>spark-parent</artifactId>
24-
<version>0.9.0-incubating-SNAPSHOT</version>
24+
<version>1.0.0-SNAPSHOT</version>
2525
<relativePath>../pom.xml</relativePath>
2626
</parent>
2727

2828
<groupId>org.apache.spark</groupId>
2929
<artifactId>spark-bagel_2.10</artifactId>
3030
<packaging>jar</packaging>
3131
<name>Spark Project Bagel</name>
32-
<url>http://spark.incubator.apache.org/</url>
32+
<url>http://spark.apache.org/</url>
33+
34+
<profiles>
35+
<profile>
36+
<!-- SPARK-1121: SPARK-1121: Adds an explicit dependency on Avro to work around
37+
a Hadoop 0.23.X issue -->
38+
<id>yarn-alpha</id>
39+
<dependencies>
40+
<dependency>
41+
<groupId>org.apache.avro</groupId>
42+
<artifactId>avro</artifactId>
43+
</dependency>
44+
</dependencies>
45+
</profile>
46+
</profiles>
3347

3448
<dependencies>
3549
<dependency>

bagel/src/main/scala/org/apache/spark/bagel/Bagel.scala

Lines changed: 35 additions & 25 deletions
Original file line numberDiff line numberDiff line change
@@ -27,22 +27,24 @@ object Bagel extends Logging {
2727

2828
/**
2929
* Runs a Bagel program.
30-
* @param sc [[org.apache.spark.SparkContext]] to use for the program.
31-
* @param vertices vertices of the graph represented as an RDD of (Key, Vertex) pairs. Often the Key will be
32-
* the vertex id.
33-
* @param messages initial set of messages represented as an RDD of (Key, Message) pairs. Often this will be an
34-
* empty array, i.e. sc.parallelize(Array[K, Message]()).
35-
* @param combiner [[org.apache.spark.bagel.Combiner]] combines multiple individual messages to a given vertex into one
36-
* message before sending (which often involves network I/O).
37-
* @param aggregator [[org.apache.spark.bagel.Aggregator]] performs a reduce across all vertices after each superstep,
38-
* and provides the result to each vertex in the next superstep.
39-
* @param partitioner [[org.apache.spark.Partitioner]] partitions values by key
30+
* @param sc org.apache.spark.SparkContext to use for the program.
31+
* @param vertices vertices of the graph represented as an RDD of (Key, Vertex) pairs. Often the
32+
* Key will be the vertex id.
33+
* @param messages initial set of messages represented as an RDD of (Key, Message) pairs. Often
34+
* this will be an empty array, i.e. sc.parallelize(Array[K, Message]()).
35+
* @param combiner [[org.apache.spark.bagel.Combiner]] combines multiple individual messages to a
36+
* given vertex into one message before sending (which often involves network
37+
* I/O).
38+
* @param aggregator [[org.apache.spark.bagel.Aggregator]] performs a reduce across all vertices
39+
* after each superstep and provides the result to each vertex in the next
40+
* superstep.
41+
* @param partitioner org.apache.spark.Partitioner partitions values by key
4042
* @param numPartitions number of partitions across which to split the graph.
4143
* Default is the default parallelism of the SparkContext
42-
* @param storageLevel [[org.apache.spark.storage.StorageLevel]] to use for caching of intermediate RDDs in each superstep.
43-
* Defaults to caching in memory.
44-
* @param compute function that takes a Vertex, optional set of (possibly combined) messages to the Vertex,
45-
* optional Aggregator and the current superstep,
44+
* @param storageLevel org.apache.spark.storage.StorageLevel to use for caching of
45+
* intermediate RDDs in each superstep. Defaults to caching in memory.
46+
* @param compute function that takes a Vertex, optional set of (possibly combined) messages to
47+
* the Vertex, optional Aggregator and the current superstep,
4648
* and returns a set of (Vertex, outgoing Messages) pairs
4749
* @tparam K key
4850
* @tparam V vertex type
@@ -71,7 +73,7 @@ object Bagel extends Logging {
7173
var msgs = messages
7274
var noActivity = false
7375
do {
74-
logInfo("Starting superstep "+superstep+".")
76+
logInfo("Starting superstep " + superstep + ".")
7577
val startTime = System.currentTimeMillis
7678

7779
val aggregated = agg(verts, aggregator)
@@ -97,7 +99,8 @@ object Bagel extends Logging {
9799
verts
98100
}
99101

100-
/** Runs a Bagel program with no [[org.apache.spark.bagel.Aggregator]] and the default storage level */
102+
/** Runs a Bagel program with no [[org.apache.spark.bagel.Aggregator]] and the default
103+
* storage level */
101104
def run[K: Manifest, V <: Vertex : Manifest, M <: Message[K] : Manifest, C: Manifest](
102105
sc: SparkContext,
103106
vertices: RDD[(K, V)],
@@ -106,8 +109,8 @@ object Bagel extends Logging {
106109
partitioner: Partitioner,
107110
numPartitions: Int
108111
)(
109-
compute: (V, Option[C], Int) => (V, Array[M])
110-
): RDD[(K, V)] = run(sc, vertices, messages, combiner, numPartitions, DEFAULT_STORAGE_LEVEL)(compute)
112+
compute: (V, Option[C], Int) => (V, Array[M])): RDD[(K, V)] = run(sc, vertices, messages,
113+
combiner, numPartitions, DEFAULT_STORAGE_LEVEL)(compute)
111114

112115
/** Runs a Bagel program with no [[org.apache.spark.bagel.Aggregator]] */
113116
def run[K: Manifest, V <: Vertex : Manifest, M <: Message[K] : Manifest, C: Manifest](
@@ -127,8 +130,8 @@ object Bagel extends Logging {
127130
}
128131

129132
/**
130-
* Runs a Bagel program with no [[org.apache.spark.bagel.Aggregator]], default [[org.apache.spark.HashPartitioner]]
131-
* and default storage level
133+
* Runs a Bagel program with no [[org.apache.spark.bagel.Aggregator]], default
134+
* org.apache.spark.HashPartitioner and default storage level
132135
*/
133136
def run[K: Manifest, V <: Vertex : Manifest, M <: Message[K] : Manifest, C: Manifest](
134137
sc: SparkContext,
@@ -138,9 +141,13 @@ object Bagel extends Logging {
138141
numPartitions: Int
139142
)(
140143
compute: (V, Option[C], Int) => (V, Array[M])
141-
): RDD[(K, V)] = run(sc, vertices, messages, combiner, numPartitions, DEFAULT_STORAGE_LEVEL)(compute)
144+
): RDD[(K, V)] = run(sc, vertices, messages, combiner, numPartitions,
145+
DEFAULT_STORAGE_LEVEL)(compute)
142146

143-
/** Runs a Bagel program with no [[org.apache.spark.bagel.Aggregator]] and the default [[org.apache.spark.HashPartitioner]]*/
147+
/**
148+
* Runs a Bagel program with no [[org.apache.spark.bagel.Aggregator]] and the
149+
* default org.apache.spark.HashPartitioner
150+
*/
144151
def run[K: Manifest, V <: Vertex : Manifest, M <: Message[K] : Manifest, C: Manifest](
145152
sc: SparkContext,
146153
vertices: RDD[(K, V)],
@@ -158,7 +165,8 @@ object Bagel extends Logging {
158165
}
159166

160167
/**
161-
* Runs a Bagel program with no [[org.apache.spark.bagel.Aggregator]], default [[org.apache.spark.HashPartitioner]],
168+
* Runs a Bagel program with no [[org.apache.spark.bagel.Aggregator]],
169+
* default org.apache.spark.HashPartitioner,
162170
* [[org.apache.spark.bagel.DefaultCombiner]] and the default storage level
163171
*/
164172
def run[K: Manifest, V <: Vertex : Manifest, M <: Message[K] : Manifest](
@@ -171,7 +179,8 @@ object Bagel extends Logging {
171179
): RDD[(K, V)] = run(sc, vertices, messages, numPartitions, DEFAULT_STORAGE_LEVEL)(compute)
172180

173181
/**
174-
* Runs a Bagel program with no [[org.apache.spark.bagel.Aggregator]], the default [[org.apache.spark.HashPartitioner]]
182+
* Runs a Bagel program with no [[org.apache.spark.bagel.Aggregator]],
183+
* the default org.apache.spark.HashPartitioner
175184
* and [[org.apache.spark.bagel.DefaultCombiner]]
176185
*/
177186
def run[K: Manifest, V <: Vertex : Manifest, M <: Message[K] : Manifest](
@@ -227,8 +236,9 @@ object Bagel extends Logging {
227236
})
228237

229238
numMsgs += newMsgs.size
230-
if (newVert.active)
239+
if (newVert.active) {
231240
numActiveVerts += 1
241+
}
232242

233243
Some((newVert, newMsgs))
234244
}.persist(storageLevel)

bin/spark-class

Lines changed: 28 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -40,34 +40,46 @@ if [ -z "$1" ]; then
4040
exit 1
4141
fi
4242

43-
# If this is a standalone cluster daemon, reset SPARK_JAVA_OPTS and SPARK_MEM to reasonable
44-
# values for that; it doesn't need a lot
45-
if [ "$1" = "org.apache.spark.deploy.master.Master" -o "$1" = "org.apache.spark.deploy.worker.Worker" ]; then
46-
SPARK_MEM=${SPARK_DAEMON_MEMORY:-512m}
47-
SPARK_DAEMON_JAVA_OPTS="$SPARK_DAEMON_JAVA_OPTS -Dspark.akka.logLifecycleEvents=true"
48-
# Do not overwrite SPARK_JAVA_OPTS environment variable in this script
49-
OUR_JAVA_OPTS="$SPARK_DAEMON_JAVA_OPTS" # Empty by default
50-
else
51-
OUR_JAVA_OPTS="$SPARK_JAVA_OPTS"
43+
if [ -n "$SPARK_MEM" ]; then
44+
echo "Warning: SPARK_MEM is deprecated, please use a more specific config option"
45+
echo "(e.g., spark.executor.memory or SPARK_DRIVER_MEMORY)."
5246
fi
5347

48+
# Use SPARK_MEM or 512m as the default memory, to be overridden by specific options
49+
DEFAULT_MEM=${SPARK_MEM:-512m}
50+
51+
SPARK_DAEMON_JAVA_OPTS="$SPARK_DAEMON_JAVA_OPTS -Dspark.akka.logLifecycleEvents=true"
5452

55-
# Add java opts for master, worker, executor. The opts maybe null
53+
# Add java opts and memory settings for master, worker, executors, and repl.
5654
case "$1" in
55+
# Master and Worker use SPARK_DAEMON_JAVA_OPTS (and specific opts) + SPARK_DAEMON_MEMORY.
5756
'org.apache.spark.deploy.master.Master')
58-
OUR_JAVA_OPTS="$OUR_JAVA_OPTS $SPARK_MASTER_OPTS"
57+
OUR_JAVA_OPTS="$SPARK_DAEMON_JAVA_OPTS $SPARK_MASTER_OPTS"
58+
OUR_JAVA_MEM=${SPARK_DAEMON_MEMORY:-$DEFAULT_MEM}
5959
;;
6060
'org.apache.spark.deploy.worker.Worker')
61-
OUR_JAVA_OPTS="$OUR_JAVA_OPTS $SPARK_WORKER_OPTS"
61+
OUR_JAVA_OPTS="$SPARK_DAEMON_JAVA_OPTS $SPARK_WORKER_OPTS"
62+
OUR_JAVA_MEM=${SPARK_DAEMON_MEMORY:-$DEFAULT_MEM}
6263
;;
64+
65+
# Executors use SPARK_JAVA_OPTS + SPARK_EXECUTOR_MEMORY.
6366
'org.apache.spark.executor.CoarseGrainedExecutorBackend')
64-
OUR_JAVA_OPTS="$OUR_JAVA_OPTS $SPARK_EXECUTOR_OPTS"
67+
OUR_JAVA_OPTS="$SPARK_JAVA_OPTS $SPARK_EXECUTOR_OPTS"
68+
OUR_JAVA_MEM=${SPARK_EXECUTOR_MEMORY:-$DEFAULT_MEM}
6569
;;
6670
'org.apache.spark.executor.MesosExecutorBackend')
67-
OUR_JAVA_OPTS="$OUR_JAVA_OPTS $SPARK_EXECUTOR_OPTS"
71+
OUR_JAVA_OPTS="$SPARK_JAVA_OPTS $SPARK_EXECUTOR_OPTS"
72+
OUR_JAVA_MEM=${SPARK_EXECUTOR_MEMORY:-$DEFAULT_MEM}
6873
;;
74+
75+
# All drivers use SPARK_JAVA_OPTS + SPARK_DRIVER_MEMORY. The repl also uses SPARK_REPL_OPTS.
6976
'org.apache.spark.repl.Main')
70-
OUR_JAVA_OPTS="$OUR_JAVA_OPTS $SPARK_REPL_OPTS"
77+
OUR_JAVA_OPTS="$SPARK_JAVA_OPTS $SPARK_REPL_OPTS"
78+
OUR_JAVA_MEM=${SPARK_DRIVER_MEMORY:-$DEFAULT_MEM}
79+
;;
80+
*)
81+
OUR_JAVA_OPTS="$SPARK_JAVA_OPTS"
82+
OUR_JAVA_MEM=${SPARK_DRIVER_MEMORY:-$DEFAULT_MEM}
7183
;;
7284
esac
7385

@@ -83,14 +95,10 @@ else
8395
fi
8496
fi
8597

86-
# Set SPARK_MEM if it isn't already set since we also use it for this process
87-
SPARK_MEM=${SPARK_MEM:-512m}
88-
export SPARK_MEM
89-
9098
# Set JAVA_OPTS to be able to load native libraries and to set heap size
9199
JAVA_OPTS="$OUR_JAVA_OPTS"
92100
JAVA_OPTS="$JAVA_OPTS -Djava.library.path=$SPARK_LIBRARY_PATH"
93-
JAVA_OPTS="$JAVA_OPTS -Xms$SPARK_MEM -Xmx$SPARK_MEM"
101+
JAVA_OPTS="$JAVA_OPTS -Xms$OUR_JAVA_MEM -Xmx$OUR_JAVA_MEM"
94102
# Load extra JAVA_OPTS from conf/java-opts, if it exists
95103
if [ -e "$FWDIR/conf/java-opts" ] ; then
96104
JAVA_OPTS="$JAVA_OPTS `cat $FWDIR/conf/java-opts`"

0 commit comments

Comments
 (0)