Skip to content

Update to latest Hadoop 3.3.6 #1937

@mikev

Description

@mikev

What docker image(s) are you using?

all-spark-notebook

Host OS system and architecture running docker image

Ubuntu 22.04

What Docker command are you running?

docker run -it -p 8888:8888 --user root -e GRANT_SUDO=yes -v $(pwd):/home/jovyan/work jupyter/all-spark-notebook:spark-3.4.1

How to Reproduce the problem?

Visit localhost:8888

Open Terminal from Launcher

(base) jovyan@745e84c0ed21:/home$ find /usr/local/spark-3.4.1-bin-hadoop3/ -name "hadoop*"
/usr/local/spark-3.4.1-bin-hadoop3/jars/hadoop-yarn-server-web-proxy-3.3.4.jar
/usr/local/spark-3.4.1-bin-hadoop3/jars/hadoop-shaded-guava-1.1.1.jar
/usr/local/spark-3.4.1-bin-hadoop3/jars/hadoop-client-runtime-3.3.4.jar
/usr/local/spark-3.4.1-bin-hadoop3/jars/hadoop-client-api-3.3.4.jar
(base) jovyan@745e84c0ed21:/home$
(base) jovyan@745e84c0ed21:/home$

Command output

No response

Expected behavior

Expect to see hadoop-client-api-3.3.6.jar. Hadoop should be updated to latest which is 3.3.6 or greater.

Actual behavior

Although Spark is at version 3.4.1 the Hadoop library is still at 3.3.4

base) jovyan@745e84c0ed21:/home$ find /usr/local/spark-3.4.1-bin-hadoop3/ -name "hadoop*"
/usr/local/spark-3.4.1-bin-hadoop3/jars/hadoop-yarn-server-web-proxy-3.3.4.jar
/usr/local/spark-3.4.1-bin-hadoop3/jars/hadoop-shaded-guava-1.1.1.jar
/usr/local/spark-3.4.1-bin-hadoop3/jars/hadoop-client-runtime-3.3.4.jar
/usr/local/spark-3.4.1-bin-hadoop3/jars/hadoop-client-api-3.3.4.jar
(base) jovyan@745e84c0ed21:/home$

Anything else?

Our project uses AWS S3 and requires the requester-pays header on all S3 requests. This issue was described and fixed in Hadoop 3.3.5.

https://issues.apache.org/jira/browse/HADOOP-14661
The patch is here:
https://issues.apache.org/jira/secure/attachment/12877218/HADOOP-14661.patch

Per the patch we're required to set "fs.s3a.requester-pays.enabled" to "true"
This fix was enabled in aws-hadoop 3.3.5 and released on Mar 27, 2023.

I've tried to upgrade Hadoop in various ways and it still doesn't work. But I finally noticed that my hadoop is fixed at version 3.3.4. Somehow I can't seem to upgrade to 3.3.5. However Hadoop 3.3.5 was very recently released maybe something extra is needed to get the upgrade into Jupyter.

Latest Docker version

  • I've updated my Docker version to the latest available, and the issue still persists

Metadata

Metadata

Assignees

No one assigned

    Labels

    tag:UpstreamA problem with one of the upstream packages installed in the docker imagestype:BugA problem with the definition of one of the docker images maintained here

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions