[SPARK-47113][CORE] Revert S3A endpoint fixup logic of SPARK-35878 #45193
+0
−43
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Revert [SPARK-35878][CORE] Add fs.s3a.endpoint if unset and fs.s3a.endpoint.region is null
Removing the region/endpoint patching code of SPARK-35878 avoids authentication problems with versions of the S3A connector built with AWS v2 SDK -as is the case in Hadoop 3.4.0.
That is: if fs.s3a.endpoint is unset it will stay unset.
The v2 SDK does its binding to AWS Services differently, in what can be described as "region first" binding. Spark setting the endpoint blocks S3 Express support and is incompatible with HADOOP-18975 S3A: Add option fs.s3a.endpoint.fips to use AWS FIPS endpoints
The change is compatible with all releases of the s3a connector other than hadoop 3.3.1 binaries deployed outside EC2 and without the endpoint explicitly set.
Why are the changes needed?
AWS v2 SDK has a different/complex binding mechanism; it doesn't need the endpoint to
be set if the region (fs.s3a.region) value is set. This means the spark code to
fix an endpoint is not only un-needed, it causes problems when trying to use specific
storage options (S3 Express) or security options (FIPS)
Does this PR introduce any user-facing change?
Only visible on hadoop 3.3.1 s3a connector when deployed outside of EC2 -the situation the original patch was added to work around. All other 3.3.x releases are good.
How was this patch tested?
Removed some obsolete tests. Relying on github and jenkins to do the testing so marking this PR as WiP until they are happy.
Was this patch authored or co-authored using generative AI tooling?
No