Skip to content

[SPARK-21708][BUILD] Migrate build to sbt 1.x #29286

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 15 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions .sbtopts
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

-J-Xmx4G
-J-Xss4m
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gemelen sorry for the very late comment but do you remember why we added this? The default memory has set in build/sbt-launch-lib.bash (e.g., see 35bab33). Were you using plain sbt instead of build/sbt in your local?

This file disables the memory option from the build/sbt script:

./build/sbt -mem 6144
.../jdk-11.0.3.jdk/Contents/Home as default JAVA_HOME.
Note, this will be overridden by -java-home if it is set.
Error occurred during initialization of VM
Initial heap size set to a larger value than the maximum heap size

because it adds these memory options at the last:

/.../bin/java -Xms6144m -Xmx6144m -XX:ReservedCodeCacheSize=256m -Xmx4G -Xss4m -jar build/sbt-launch-1.5.0.jar 1

and Java respects the rightmost memory configurations.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh, okay it sets the stack size here. I misread it. Okay but I will still move this to the script.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HyukjinKwon AFAIR stack size increase was introduced to overcome failures on some tasks (tests probably). Yep, definitely could be set anywhere it is more suitable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commit message says it was for Github Actions env in fact

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for explanation!

2 changes: 1 addition & 1 deletion build/sbt-launch-lib.bash
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ dlog () {

acquire_sbt_jar () {
SBT_VERSION=`awk -F "=" '/sbt\.version/ {print $2}' ./project/build.properties`
URL1=https://dl.bintray.com/typesafe/ivy-releases/org.scala-sbt/sbt-launch/${SBT_VERSION}/sbt-launch.jar
URL1=https://repo1.maven.org/maven2/org/scala-sbt/sbt-launch/${SBT_VERSION}/sbt-launch-${SBT_VERSION}.jar
JAR=build/sbt-launch-${SBT_VERSION}.jar

sbt_jar=$JAR
Expand Down
17 changes: 9 additions & 8 deletions project/MimaBuild.scala
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,7 @@ import com.typesafe.tools.mima.core._
import com.typesafe.tools.mima.core.MissingClassProblem
import com.typesafe.tools.mima.core.MissingTypesProblem
import com.typesafe.tools.mima.core.ProblemFilters._
import com.typesafe.tools.mima.plugin.MimaKeys.{mimaBinaryIssueFilters, mimaPreviousArtifacts}
import com.typesafe.tools.mima.plugin.MimaPlugin.mimaDefaultSettings

import com.typesafe.tools.mima.plugin.MimaKeys.{mimaBinaryIssueFilters, mimaPreviousArtifacts, mimaFailOnNoPrevious}

object MimaBuild {

Expand Down Expand Up @@ -86,14 +84,17 @@ object MimaBuild {
ignoredMembers.flatMap(excludeMember) ++ MimaExcludes.excludes(currentSparkVersion)
}

def mimaSettings(sparkHome: File, projectRef: ProjectRef) = {
def mimaSettings(sparkHome: File, projectRef: ProjectRef): Seq[Setting[_]] = {
val organization = "org.apache.spark"
val previousSparkVersion = "2.4.0"
val previousSparkVersion = "3.0.0"
val project = projectRef.project
val fullId = "spark-" + project + "_2.12"
mimaDefaultSettings ++
Seq(mimaPreviousArtifacts := Set(organization % fullId % previousSparkVersion),
mimaBinaryIssueFilters ++= ignoredABIProblems(sparkHome, version.value))

Seq(
mimaFailOnNoPrevious := true,
mimaPreviousArtifacts := Set(organization % fullId % previousSparkVersion),
mimaBinaryIssueFilters ++= ignoredABIProblems(sparkHome, version.value)
)
}

}
30 changes: 30 additions & 0 deletions project/MimaExcludes.scala
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,36 @@ object MimaExcludes {

// Exclude rules for 3.1.x
lazy val v31excludes = v30excludes ++ Seq(
// mima plugin update caused new incompatibilities to be detected
// core module
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.shuffle.sort.io.LocalDiskShuffleMapOutputWriter.commitAllPartitions"),
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.shuffle.api.ShuffleMapOutputWriter.commitAllPartitions"),
ProblemFilters.exclude[ReversedMissingMethodProblem]("org.apache.spark.shuffle.api.ShuffleMapOutputWriter.commitAllPartitions"),
// mllib module
ProblemFilters.exclude[NewMixinForwarderProblem]("org.apache.spark.ml.classification.LogisticRegressionTrainingSummary.totalIterations"),
ProblemFilters.exclude[DirectMissingMethodProblem]("org.apache.spark.ml.classification.LogisticRegressionTrainingSummary.$init$"),
ProblemFilters.exclude[NewMixinForwarderProblem]("org.apache.spark.ml.classification.LogisticRegressionSummary.labels"),
ProblemFilters.exclude[NewMixinForwarderProblem]("org.apache.spark.ml.classification.LogisticRegressionSummary.truePositiveRateByLabel"),
ProblemFilters.exclude[NewMixinForwarderProblem]("org.apache.spark.ml.classification.LogisticRegressionSummary.falsePositiveRateByLabel"),
ProblemFilters.exclude[NewMixinForwarderProblem]("org.apache.spark.ml.classification.LogisticRegressionSummary.precisionByLabel"),
ProblemFilters.exclude[NewMixinForwarderProblem]("org.apache.spark.ml.classification.LogisticRegressionSummary.recallByLabel"),
ProblemFilters.exclude[NewMixinForwarderProblem]("org.apache.spark.ml.classification.LogisticRegressionSummary.fMeasureByLabel"),
ProblemFilters.exclude[NewMixinForwarderProblem]("org.apache.spark.ml.classification.LogisticRegressionSummary.fMeasureByLabel"),
ProblemFilters.exclude[NewMixinForwarderProblem]("org.apache.spark.ml.classification.LogisticRegressionSummary.accuracy"),
ProblemFilters.exclude[NewMixinForwarderProblem]("org.apache.spark.ml.classification.LogisticRegressionSummary.weightedTruePositiveRate"),
ProblemFilters.exclude[NewMixinForwarderProblem]("org.apache.spark.ml.classification.LogisticRegressionSummary.weightedFalsePositiveRate"),
ProblemFilters.exclude[NewMixinForwarderProblem]("org.apache.spark.ml.classification.LogisticRegressionSummary.weightedRecall"),
ProblemFilters.exclude[NewMixinForwarderProblem]("org.apache.spark.ml.classification.LogisticRegressionSummary.weightedPrecision"),
ProblemFilters.exclude[NewMixinForwarderProblem]("org.apache.spark.ml.classification.LogisticRegressionSummary.weightedFMeasure"),
ProblemFilters.exclude[NewMixinForwarderProblem]("org.apache.spark.ml.classification.LogisticRegressionSummary.weightedFMeasure"),
ProblemFilters.exclude[NewMixinForwarderProblem]("org.apache.spark.ml.classification.BinaryLogisticRegressionSummary.roc"),
ProblemFilters.exclude[NewMixinForwarderProblem]("org.apache.spark.ml.classification.BinaryLogisticRegressionSummary.areaUnderROC"),
ProblemFilters.exclude[NewMixinForwarderProblem]("org.apache.spark.ml.classification.BinaryLogisticRegressionSummary.pr"),
ProblemFilters.exclude[NewMixinForwarderProblem]("org.apache.spark.ml.classification.BinaryLogisticRegressionSummary.fMeasureByThreshold"),
ProblemFilters.exclude[NewMixinForwarderProblem]("org.apache.spark.ml.classification.BinaryLogisticRegressionSummary.precisionByThreshold"),
ProblemFilters.exclude[NewMixinForwarderProblem]("org.apache.spark.ml.classification.BinaryLogisticRegressionSummary.recallByThreshold"),
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.ml.classification.FMClassifier.trainImpl"),
ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.ml.regression.FMRegressor.trainImpl"),
// [SPARK-31077] Remove ChiSqSelector dependency on mllib.ChiSqSelectorModel
// private constructor
ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.ml.feature.ChiSqSelectorModel.this"),
Expand Down
96 changes: 61 additions & 35 deletions project/SparkBuild.scala
Original file line number Diff line number Diff line change
Expand Up @@ -28,13 +28,13 @@ import scala.collection.mutable.Stack
import sbt._
import sbt.Classpaths.publishTask
import sbt.Keys._
import sbtunidoc.Plugin.UnidocKeys.unidocGenjavadocVersion
import com.etsy.sbt.checkstyle.CheckstylePlugin.autoImport._
import com.simplytyped.Antlr4Plugin._
import com.typesafe.sbt.pom.{PomBuild, SbtPomKeys}
import com.typesafe.tools.mima.plugin.MimaKeys
import org.scalastyle.sbt.ScalastylePlugin.autoImport._
import org.scalastyle.sbt.Tasks
import sbtassembly.AssemblyPlugin.autoImport._

import spray.revolver.RevolverPlugin._

Expand Down Expand Up @@ -83,6 +83,8 @@ object BuildCommons {
object SparkBuild extends PomBuild {

import BuildCommons._
import sbtunidoc.GenJavadocPlugin
import sbtunidoc.GenJavadocPlugin.autoImport._
import scala.collection.mutable.Map

val projectsMap: Map[String, Seq[Setting[_]]] = Map.empty
Expand All @@ -106,13 +108,10 @@ object SparkBuild extends PomBuild {
override val userPropertiesMap = System.getProperties.asScala.toMap

lazy val MavenCompile = config("m2r") extend(Compile)
lazy val publishLocalBoth = TaskKey[Unit]("publish-local", "publish local for m2 and ivy")
lazy val publishLocalBoth = TaskKey[Unit]("localPublish", "publish local for m2 and ivy", KeyRanks.ATask)

lazy val sparkGenjavadocSettings: Seq[sbt.Def.Setting[_]] = Seq(
libraryDependencies += compilerPlugin(
"com.typesafe.genjavadoc" %% "genjavadoc-plugin" % unidocGenjavadocVersion.value cross CrossVersion.full),
lazy val sparkGenjavadocSettings: Seq[sbt.Def.Setting[_]] = GenJavadocPlugin.projectSettings ++ Seq(
scalacOptions ++= Seq(
"-P:genjavadoc:out=" + (target.value / "java"),
"-P:genjavadoc:strictVisibility=true" // hide package private types
)
)
Expand Down Expand Up @@ -157,7 +156,7 @@ object SparkBuild extends PomBuild {
val scalaSourceV = Seq(file(scalaSource.in(config).value.getAbsolutePath))
val configV = (baseDirectory in ThisBuild).value / scalaStyleOnCompileConfig
val configUrlV = scalastyleConfigUrl.in(config).value
val streamsV = streams.in(config).value
val streamsV = (streams.in(config).value: @sbtUnchecked)
val failOnErrorV = true
val failOnWarningV = false
val scalastyleTargetV = scalastyleTarget.in(config).value
Expand Down Expand Up @@ -204,7 +203,6 @@ object SparkBuild extends PomBuild {
javaHome := sys.env.get("JAVA_HOME")
.orElse(sys.props.get("java.home").map { p => new File(p).getParentFile().getAbsolutePath() })
.map(file),
incOptions := incOptions.value.withNameHashing(true),
publishMavenStyle := true,
unidocGenjavadocVersion := "0.16",

Expand All @@ -219,10 +217,12 @@ object SparkBuild extends PomBuild {
),
externalResolvers := resolvers.value,
otherResolvers := SbtPomKeys.mvnLocalRepository(dotM2 => Seq(Resolver.file("dotM2", dotM2))).value,
publishLocalConfiguration in MavenCompile :=
new PublishConfiguration(None, "dotM2", packagedArtifacts.value, Seq(), ivyLoggingLevel.value),
publishLocalConfiguration in MavenCompile := PublishConfiguration()
.withResolverName("dotM2")
.withArtifacts(packagedArtifacts.value.toVector)
.withLogging(ivyLoggingLevel.value),
publishMavenStyle in MavenCompile := true,
publishLocal in MavenCompile := publishTask(publishLocalConfiguration in MavenCompile, deliverLocal).value,
publishLocal in MavenCompile := publishTask(publishLocalConfiguration in MavenCompile).value,
publishLocalBoth := Seq(publishLocal in MavenCompile, publishLocal).dependOn.value,

javacOptions in (Compile, doc) ++= {
Expand Down Expand Up @@ -251,6 +251,8 @@ object SparkBuild extends PomBuild {
"-sourcepath", (baseDirectory in ThisBuild).value.getAbsolutePath // Required for relative source links in scaladoc
),

SbtPomKeys.profiles := profiles,

// Remove certain packages from Scaladoc
scalacOptions in (Compile, doc) := Seq(
"-groups",
Expand All @@ -273,14 +275,15 @@ object SparkBuild extends PomBuild {
val out = streams.value

def logProblem(l: (=> String) => Unit, f: File, p: xsbti.Problem) = {
l(f.toString + ":" + p.position.line.fold("")(_ + ":") + " " + p.message)
val jmap = new java.util.function.Function[Integer, String]() {override def apply(i: Integer): String = {i.toString}}
l(f.toString + ":" + p.position.line.map[String](jmap.apply).map(_ + ":").orElse("") + " " + p.message)
l(p.position.lineContent)
l("")
}

var failed = 0
analysis.infos.allInfos.foreach { case (k, i) =>
i.reportedProblems foreach { p =>
analysis.asInstanceOf[sbt.internal.inc.Analysis].infos.allInfos.foreach { case (k, i) =>
i.getReportedProblems foreach { p =>
val deprecation = p.message.contains("deprecated")

if (!deprecation) {
Expand All @@ -302,7 +305,10 @@ object SparkBuild extends PomBuild {
sys.error(s"$failed fatal warnings")
}
analysis
}
},
// disable Mima check for all modules,
// to be enabled in specific ones that have previous artifacts
MimaKeys.mimaFailOnNoPrevious := false
)

def enable(settings: Seq[Setting[_]])(projectRef: ProjectRef) = {
Expand Down Expand Up @@ -411,7 +417,7 @@ object SparkBuild extends PomBuild {
}
))(assembly)

enable(Seq(sparkShell := sparkShell in LocalProject("assembly")))(spark)
enable(Seq(sparkShell := (sparkShell in LocalProject("assembly")).value))(spark)

// TODO: move this to its upstream project.
override def projectDefinitions(baseDirectory: File): Seq[Project] = {
Expand Down Expand Up @@ -485,20 +491,20 @@ object SparkParallelTestGrouping {
testGrouping in Test := {
val tests: Seq[TestDefinition] = (definedTests in Test).value
val defaultForkOptions = ForkOptions(
bootJars = Nil,
javaHome = javaHome.value,
connectInput = connectInput.value,
outputStrategy = outputStrategy.value,
runJVMOptions = (javaOptions in Test).value,
bootJars = Vector.empty[java.io.File],
workingDirectory = Some(baseDirectory.value),
runJVMOptions = (javaOptions in Test).value.toVector,
connectInput = connectInput.value,
envVars = (envVars in Test).value
)
tests.groupBy(test => testNameToTestGroup(test.name)).map { case (groupName, groupTests) =>
val forkOptions = {
if (groupName == DEFAULT_TEST_GROUP) {
defaultForkOptions
} else {
defaultForkOptions.copy(runJVMOptions = defaultForkOptions.runJVMOptions ++
defaultForkOptions.withRunJVMOptions(defaultForkOptions.runJVMOptions ++
Seq(s"-Djava.io.tmpdir=${baseDirectory.value}/target/tmp/$groupName"))
}
}
Expand All @@ -512,6 +518,7 @@ object SparkParallelTestGrouping {
}

object Core {
import scala.sys.process.Process
lazy val settings = Seq(
resourceGenerators in Compile += Def.task {
val buildScript = baseDirectory.value + "/../build/spark-build-info"
Expand Down Expand Up @@ -557,6 +564,7 @@ object DockerIntegrationTests {
*/
object KubernetesIntegrationTests {
import BuildCommons._
import scala.sys.process.Process

val dockerBuild = TaskKey[Unit]("docker-imgs", "Build the docker images for ITs.")
val runITs = TaskKey[Unit]("run-its", "Only run ITs, skip image build.")
Expand Down Expand Up @@ -634,7 +642,9 @@ object ExcludedDependencies {
*/
object OldDeps {

lazy val project = Project("oldDeps", file("dev"), settings = oldDepsSettings)
lazy val project = Project("oldDeps", file("dev"))
.settings(oldDepsSettings)
.disablePlugins(com.typesafe.sbt.pom.PomReaderPlugin)

lazy val allPreviousArtifactKeys = Def.settingDyn[Seq[Set[ModuleID]]] {
SparkBuild.mimaProjects
Expand All @@ -650,7 +660,10 @@ object OldDeps {
}

object Catalyst {
lazy val settings = antlr4Settings ++ Seq(
import com.simplytyped.Antlr4Plugin
import com.simplytyped.Antlr4Plugin.autoImport._

lazy val settings = Antlr4Plugin.projectSettings ++ Seq(
antlr4Version in Antlr4 := SbtPomKeys.effectivePom.value.getProperties.get("antlr4.version").asInstanceOf[String],
antlr4PackageName in Antlr4 := Some("org.apache.spark.sql.catalyst.parser"),
antlr4GenListener in Antlr4 := true,
Expand All @@ -660,6 +673,9 @@ object Catalyst {
}

object SQL {

import sbtavro.SbtAvro.autoImport._

lazy val settings = Seq(
initialCommands in console :=
"""
Expand All @@ -681,8 +697,10 @@ object SQL {
|import sqlContext.implicits._
|import sqlContext._
""".stripMargin,
cleanupCommands in console := "sc.stop()"
cleanupCommands in console := "sc.stop()",
Test / avroGenerate := (Compile / avroGenerate).value
)

}

object Hive {
Expand Down Expand Up @@ -721,27 +739,27 @@ object Hive {

object Assembly {
import sbtassembly.AssemblyUtils._
import sbtassembly.Plugin._
import AssemblyKeys._
import sbtassembly.AssemblyPlugin.autoImport._

val hadoopVersion = taskKey[String]("The version of hadoop that spark is compiled against.")

lazy val settings = assemblySettings ++ Seq(
lazy val settings = baseAssemblySettings ++ Seq(
test in assembly := {},
hadoopVersion := {
sys.props.get("hadoop.version")
.getOrElse(SbtPomKeys.effectivePom.value.getProperties.get("hadoop.version").asInstanceOf[String])
},
jarName in assembly := {
assemblyJarName in assembly := {
lazy val hdpVersion = hadoopVersion.value
if (moduleName.value.contains("streaming-kafka-0-10-assembly")
|| moduleName.value.contains("streaming-kinesis-asl-assembly")) {
s"${moduleName.value}-${version.value}.jar"
} else {
s"${moduleName.value}-${version.value}-hadoop${hadoopVersion.value}.jar"
s"${moduleName.value}-${version.value}-hadoop${hdpVersion}.jar"
}
},
jarName in (Test, assembly) := s"${moduleName.value}-test-${version.value}.jar",
mergeStrategy in assembly := {
assemblyJarName in (Test, assembly) := s"${moduleName.value}-test-${version.value}.jar",
assemblyMergeStrategy in assembly := {
case m if m.toLowerCase(Locale.ROOT).endsWith("manifest.mf")
=> MergeStrategy.discard
case m if m.toLowerCase(Locale.ROOT).matches("meta-inf.*\\.sf$")
Expand All @@ -756,8 +774,7 @@ object Assembly {
}

object PySparkAssembly {
import sbtassembly.Plugin._
import AssemblyKeys._
import sbtassembly.AssemblyPlugin.autoImport._
import java.util.zip.{ZipOutputStream, ZipEntry}

lazy val settings = Seq(
Expand Down Expand Up @@ -807,8 +824,13 @@ object PySparkAssembly {
object Unidoc {

import BuildCommons._
import sbtunidoc.Plugin._
import UnidocKeys._
import sbtunidoc.BaseUnidocPlugin
import sbtunidoc.JavaUnidocPlugin
import sbtunidoc.ScalaUnidocPlugin
import sbtunidoc.BaseUnidocPlugin.autoImport._
import sbtunidoc.GenJavadocPlugin.autoImport._
import sbtunidoc.JavaUnidocPlugin.autoImport._
import sbtunidoc.ScalaUnidocPlugin.autoImport._

private def ignoreUndocumentedPackages(packages: Seq[Seq[File]]): Seq[Seq[File]] = {
packages
Expand Down Expand Up @@ -838,6 +860,7 @@ object Unidoc {
.map(_.filterNot(_.getCanonicalPath.contains("org/apache/spark/sql/catalog/v2/utils")))
.map(_.filterNot(_.getCanonicalPath.contains("org/apache/hive")))
.map(_.filterNot(_.getCanonicalPath.contains("org/apache/spark/sql/v2/avro")))
.map(_.filterNot(_.getCanonicalPath.contains("SSLOptions")))
}

private def ignoreClasspaths(classpaths: Seq[Classpath]): Seq[Classpath] = {
Expand All @@ -848,7 +871,10 @@ object Unidoc {

val unidocSourceBase = settingKey[String]("Base URL of source links in Scaladoc.")

lazy val settings = scalaJavaUnidocSettings ++ Seq (
lazy val settings = BaseUnidocPlugin.projectSettings ++
ScalaUnidocPlugin.projectSettings ++
JavaUnidocPlugin.projectSettings ++
Seq (
publish := {},

unidocProjectFilter in(ScalaUnidoc, unidoc) :=
Expand Down
2 changes: 1 addition & 1 deletion project/build.properties
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,4 @@
# See the License for the specific language governing permissions and
# limitations under the License.
#
sbt.version=0.13.18
sbt.version=1.3.13
Loading