-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
What docker image(s) are you using?
all-spark-notebook
Host OS system and architecture running docker image
Ubuntu 22.04
What Docker command are you running?
docker run -it -p 8888:8888 --user root -e GRANT_SUDO=yes -v $(pwd):/home/jovyan/work jupyter/all-spark-notebook:spark-3.4.1
How to Reproduce the problem?
Visit localhost:8888
Open Terminal from Launcher
(base) jovyan@745e84c0ed21:/home$ find /usr/local/spark-3.4.1-bin-hadoop3/ -name "hadoop*"
/usr/local/spark-3.4.1-bin-hadoop3/jars/hadoop-yarn-server-web-proxy-3.3.4.jar
/usr/local/spark-3.4.1-bin-hadoop3/jars/hadoop-shaded-guava-1.1.1.jar
/usr/local/spark-3.4.1-bin-hadoop3/jars/hadoop-client-runtime-3.3.4.jar
/usr/local/spark-3.4.1-bin-hadoop3/jars/hadoop-client-api-3.3.4.jar
(base) jovyan@745e84c0ed21:/home$
(base) jovyan@745e84c0ed21:/home$
Command output
No response
Expected behavior
Expect to see hadoop-client-api-3.3.6.jar. Hadoop should be updated to latest which is 3.3.6 or greater.
Actual behavior
Although Spark is at version 3.4.1 the Hadoop library is still at 3.3.4
base) jovyan@745e84c0ed21:/home$ find /usr/local/spark-3.4.1-bin-hadoop3/ -name "hadoop*"
/usr/local/spark-3.4.1-bin-hadoop3/jars/hadoop-yarn-server-web-proxy-3.3.4.jar
/usr/local/spark-3.4.1-bin-hadoop3/jars/hadoop-shaded-guava-1.1.1.jar
/usr/local/spark-3.4.1-bin-hadoop3/jars/hadoop-client-runtime-3.3.4.jar
/usr/local/spark-3.4.1-bin-hadoop3/jars/hadoop-client-api-3.3.4.jar
(base) jovyan@745e84c0ed21:/home$
Anything else?
Our project uses AWS S3 and requires the requester-pays header on all S3 requests. This issue was described and fixed in Hadoop 3.3.5.
https://issues.apache.org/jira/browse/HADOOP-14661
The patch is here:
https://issues.apache.org/jira/secure/attachment/12877218/HADOOP-14661.patch
Per the patch we're required to set "fs.s3a.requester-pays.enabled" to "true"
This fix was enabled in aws-hadoop 3.3.5 and released on Mar 27, 2023.
I've tried to upgrade Hadoop in various ways and it still doesn't work. But I finally noticed that my hadoop is fixed at version 3.3.4. Somehow I can't seem to upgrade to 3.3.5. However Hadoop 3.3.5 was very recently released maybe something extra is needed to get the upgrade into Jupyter.
Latest Docker version
- I've updated my Docker version to the latest available, and the issue still persists