Skip to content

Commit 36f199d

Browse files
steveloughrandongjoon-hyun
authored andcommitted
[SPARK-35878][CORE] Revert S3A endpoint fixup logic of SPARK-35878
### What changes were proposed in this pull request? Revert [SPARK-35878][CORE] Add fs.s3a.endpoint if unset and fs.s3a.endpoint.region is null Removing the region/endpoint patching code of SPARK-35878 avoids authentication problems with versions of the S3A connector built with AWS v2 SDK -as is the case in Hadoop 3.4.0. That is: if fs.s3a.endpoint is unset it will stay unset. The v2 SDK does its binding to AWS Services differently, in what can be described as "region first" binding. Spark setting the endpoint blocks S3 Express support and is incompatible with HADOOP-18975 S3A: Add option fs.s3a.endpoint.fips to use AWS FIPS endpoints - apache/hadoop#6277 The change is compatible with all releases of the s3a connector other than hadoop 3.3.1 binaries deployed outside EC2 and without the endpoint explicitly set. ### Why are the changes needed? AWS v2 SDK has a different/complex binding mechanism; it doesn't need the endpoint to be set if the region (fs.s3a.region) value is set. This means the spark code to fix an endpoint is not only un-needed, it causes problems when trying to use specific storage options (S3 Express) or security options (FIPS) ### Does this PR introduce _any_ user-facing change? Only visible on hadoop 3.3.1 s3a connector when deployed outside of EC2 -the situation the original patch was added to work around. All other 3.3.x releases are good. ### How was this patch tested? Removed some obsolete tests. Relying on github and jenkins to do the testing so marking this PR as WiP until they are happy. ### Was this patch authored or co-authored using generative AI tooling? No Closes #44834 from steveloughran/SPARK-46793-revert-region-fixup-SPARK-35878. Authored-by: Steve Loughran <[email protected]> Signed-off-by: Dongjoon Hyun <[email protected]>
1 parent 0b907ed commit 36f199d

File tree

2 files changed

+0
-43
lines changed

2 files changed

+0
-43
lines changed

core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala

Lines changed: 0 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -529,16 +529,6 @@ private[spark] object SparkHadoopUtil extends Logging {
529529
if (conf.getOption("spark.hadoop.fs.s3a.downgrade.syncable.exceptions").isEmpty) {
530530
hadoopConf.set("fs.s3a.downgrade.syncable.exceptions", "true", setBySpark)
531531
}
532-
// In Hadoop 3.3.1, AWS region handling with the default "" endpoint only works
533-
// in EC2 deployments or when the AWS CLI is installed.
534-
// The workaround is to set the name of the S3 endpoint explicitly,
535-
// if not already set. See HADOOP-17771.
536-
if (hadoopConf.get("fs.s3a.endpoint", "").isEmpty &&
537-
hadoopConf.get("fs.s3a.endpoint.region") == null) {
538-
// set to US central endpoint which can also connect to buckets
539-
// in other regions at the expense of a HEAD request during fs creation
540-
hadoopConf.set("fs.s3a.endpoint", "s3.amazonaws.com", setBySpark)
541-
}
542532
}
543533

544534
private def appendSparkHiveConfigs(conf: SparkConf, hadoopConf: Configuration): Unit = {

core/src/test/scala/org/apache/spark/deploy/SparkHadoopUtilSuite.scala

Lines changed: 0 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -39,19 +39,6 @@ class SparkHadoopUtilSuite extends SparkFunSuite {
3939
assertConfigMatches(hadoopConf, "orc.filterPushdown", "true", SOURCE_SPARK_HADOOP)
4040
assertConfigMatches(hadoopConf, "fs.s3a.downgrade.syncable.exceptions", "true",
4141
SET_TO_DEFAULT_VALUES)
42-
assertConfigMatches(hadoopConf, "fs.s3a.endpoint", "s3.amazonaws.com", SET_TO_DEFAULT_VALUES)
43-
}
44-
45-
/**
46-
* An empty S3A endpoint will be overridden just as a null value
47-
* would.
48-
*/
49-
test("appendSparkHadoopConfigs with S3A endpoint set to empty string") {
50-
val sc = new SparkConf()
51-
val hadoopConf = new Configuration(false)
52-
sc.set("spark.hadoop.fs.s3a.endpoint", "")
53-
new SparkHadoopUtil().appendSparkHadoopConfigs(sc, hadoopConf)
54-
assertConfigMatches(hadoopConf, "fs.s3a.endpoint", "s3.amazonaws.com", SET_TO_DEFAULT_VALUES)
5542
}
5643

5744
/**
@@ -61,28 +48,8 @@ class SparkHadoopUtilSuite extends SparkFunSuite {
6148
val sc = new SparkConf()
6249
val hadoopConf = new Configuration(false)
6350
sc.set("spark.hadoop.fs.s3a.downgrade.syncable.exceptions", "false")
64-
sc.set("spark.hadoop.fs.s3a.endpoint", "s3-eu-west-1.amazonaws.com")
6551
new SparkHadoopUtil().appendSparkHadoopConfigs(sc, hadoopConf)
6652
assertConfigValue(hadoopConf, "fs.s3a.downgrade.syncable.exceptions", "false")
67-
assertConfigValue(hadoopConf, "fs.s3a.endpoint",
68-
"s3-eu-west-1.amazonaws.com")
69-
}
70-
71-
/**
72-
* If the endpoint region is set (even to a blank string) in
73-
* "spark.hadoop.fs.s3a.endpoint.region" then the endpoint is not set,
74-
* even when the s3a endpoint is "".
75-
* This supports a feature in hadoop 3.3.1 where this configuration
76-
* pair triggers a revert to the "SDK to work out the region" algorithm,
77-
* which works on EC2 deployments.
78-
*/
79-
test("appendSparkHadoopConfigs with S3A endpoint region set to an empty string") {
80-
val sc = new SparkConf()
81-
val hadoopConf = new Configuration(false)
82-
sc.set("spark.hadoop.fs.s3a.endpoint.region", "")
83-
new SparkHadoopUtil().appendSparkHadoopConfigs(sc, hadoopConf)
84-
// the endpoint value will not have been set
85-
assertConfigValue(hadoopConf, "fs.s3a.endpoint", null)
8653
}
8754

8855
/**

0 commit comments

Comments
 (0)