Skip to content

HADOOP-19624 Thread leak in ABFS AbfsClientThrottlingAnalyzer #7852

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 8 commits into
base: trunk
Choose a base branch
from

Conversation

mattkduran
Copy link

@mattkduran mattkduran commented Aug 3, 2025

Description of PR

The ABFS driver's auto-throttling feature (fs.azure.enable.autothrottling=true) creates Timer threads in AbfsClientThrottlingAnalyzer that are never properly cleaned up, leading to a memory leak that eventually causes OutOfMemoryError in long-running applications like Hive Metastore.

Impact:

  • Thread count grows indefinitely (observed >100,000 timer threads)
  • Affects any long-running service that creates multiple ABFS filesystem instances

Root Cause:

AbfsClientThrottlingAnalyzer creates Timer objects in its constructor but provides no mechanism to cancel them. When AbfsClient instances are closed, the associated timer threads continue running indefinitely.

Solution

Implement proper resource cleanup by making the throttling components implement Closeable and ensuring timers are cancelled when ABFS clients are closed.

Changes Made

  1. AbfsClientThrottlingAnalyzer.java
  • Added: implements Closeable
  • Added: close() method that calls timer.cancel() and timer.purge()
  • Purpose: Ensures timer threads are properly terminated when analyzer is no longer needed
  1. AbfsThrottlingIntercept.java (Interface)
  • Added: extends Closeable
  • Added: close() method signature
  • Purpose: Establishes cleanup contract for all throttling intercept implementations
  1. AbfsClientThrottlingIntercept.java
  • Added: close() method that closes both readThrottler and writeThrottler
  • Purpose: Coordinates cleanup of both read and write throttling analyzers
  1. AbfsNoOpThrottlingIntercept.java
  • Added: No-op close() method
  • Purpose: Satisfies interface contract for no-op implementation
  1. AbfsClient.java
  • Added: IOUtils.cleanupWithLogger(LOG, intercept) in existing close() method
  • Purpose: Integrates throttling cleanup into existing client resource management

https://github.com/mattkduran/ABFSleaktest
https://www.mail-archive.com/[email protected]/msg43483.html

How was this patch tested?

Standalone Validation Tool

This fix was validated using a standalone reproduction and testing tool that directly exercises the ABFS auto-throttling components outside of a full Hadoop deployment.
Repository: ABFSLeakTest

Testing Scope

  • Problem reproduction confirmed - demonstrates the timer thread leak
  • Fix validation confirmed - proves close() method resolves the leak
  • Resource cleanup verified - shows proper timer cancellation
  • Limited integration testing - standalone tool, not full Hadoop test suite

Test Results

Leak Reproduction Evidence

# Without fix: Timer threads accumulate over filesystem creation cycles
Cycle    Total Threads    ABFS Timer Threads    Status
1        50->52          0->2                   LEAK DETECTED
50       150->152        98->100               LEAK GROWING  
200      250->252        398->400              LEAK CONFIRMED

Final Analysis: 400 leaked timer threads named "abfs-timer-client-throttling-analyzer-*"
Memory Impact: ~90MB additional heap usage

# Direct analyzer testing:
🔴 Without close(): +3 timer threads (LEAKED)
✅ With close():    +0 timer threads (NO LEAK)

Test Environment

  • Java Version: OpenJDK 11.0.x
  • Hadoop Version: 3.3.6/3.4.1 (both affected)
  • Test Duration: 200 filesystem creation/destruction cycles
  • Thread Monitoring: JMX ThreadMXBean

For code changes:

  • [ X ] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')?
  • Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation?
  • If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under ASF 2.0?
  • If applicable, have you updated the LICENSE, LICENSE-binary, NOTICE-binary files?

@hadoop-yetus
Copy link

💔 -1 overall

Vote Subsystem Runtime Logfile Comment
+0 🆗 reexec 20m 54s Docker mode activated.
_ Prechecks _
+1 💚 dupname 0m 0s No case conflicting files found.
+0 🆗 codespell 0m 1s codespell was not available.
+0 🆗 detsecrets 0m 1s detect-secrets was not available.
+1 💚 @author 0m 0s The patch does not contain any @author tags.
-1 ❌ test4tests 0m 0s The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch.
_ trunk Compile Tests _
+1 💚 mvninstall 46m 41s trunk passed
+1 💚 compile 0m 42s trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 compile 0m 36s trunk passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 checkstyle 0m 32s trunk passed
+1 💚 mvnsite 0m 41s trunk passed
+1 💚 javadoc 0m 42s trunk passed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04
+1 💚 javadoc 0m 34s trunk passed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
+1 💚 spotbugs 1m 10s trunk passed
+1 💚 shadedclient 41m 9s branch has no errors when building and testing our client artifacts.
_ Patch Compile Tests _
-1 ❌ mvninstall 0m 21s /patch-mvninstall-hadoop-tools_hadoop-azure.txt hadoop-azure in the patch failed.
-1 ❌ compile 0m 23s /patch-compile-hadoop-tools_hadoop-azure-jdkUbuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04.txt hadoop-azure in the patch failed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04.
-1 ❌ javac 0m 23s /patch-compile-hadoop-tools_hadoop-azure-jdkUbuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04.txt hadoop-azure in the patch failed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04.
-1 ❌ compile 0m 21s /patch-compile-hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09.txt hadoop-azure in the patch failed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09.
-1 ❌ javac 0m 21s /patch-compile-hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09.txt hadoop-azure in the patch failed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09.
-1 ❌ blanks 0m 0s /blanks-eol.txt The patch has 3 line(s) that end in blanks. Use git apply --whitespace=fix <<patch_file>>. Refer https://git-scm.com/docs/git-apply
+1 💚 checkstyle 0m 21s the patch passed
-1 ❌ mvnsite 0m 23s /patch-mvnsite-hadoop-tools_hadoop-azure.txt hadoop-azure in the patch failed.
-1 ❌ javadoc 0m 22s /patch-javadoc-hadoop-tools_hadoop-azure-jdkUbuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04.txt hadoop-azure in the patch failed with JDK Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04.
-1 ❌ javadoc 0m 26s /patch-javadoc-hadoop-tools_hadoop-azure-jdkPrivateBuild-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09.txt hadoop-azure in the patch failed with JDK Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09.
-1 ❌ spotbugs 0m 22s /patch-spotbugs-hadoop-tools_hadoop-azure.txt hadoop-azure in the patch failed.
+1 💚 shadedclient 44m 10s patch has no errors when building and testing our client artifacts.
_ Other Tests _
-1 ❌ unit 0m 25s /patch-unit-hadoop-tools_hadoop-azure.txt hadoop-azure in the patch failed.
+1 💚 asflicense 0m 36s The patch does not generate ASF License warnings.
160m 49s
Subsystem Report/Notes
Docker ClientAPI=1.51 ServerAPI=1.51 base: https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7852/1/artifact/out/Dockerfile
GITHUB PR #7852
Optional Tests dupname asflicense compile javac javadoc mvninstall mvnsite unit shadedclient spotbugs checkstyle codespell detsecrets
uname Linux 39b56ca5b682 5.15.0-143-generic #153-Ubuntu SMP Fri Jun 13 19:10:45 UTC 2025 x86_64 x86_64 x86_64 GNU/Linux
Build tool maven
Personality dev-support/bin/hadoop.sh
git revision trunk / fee9861
Default Java Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Multi-JDK versions /usr/lib/jvm/java-11-openjdk-amd64:Ubuntu-11.0.27+6-post-Ubuntu-0ubuntu120.04 /usr/lib/jvm/java-8-openjdk-amd64:Private Build-1.8.0_452-8u452-gaus1-0ubuntu120.04-b09
Test Results https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7852/1/testReport/
Max. process+thread count 536 (vs. ulimit of 5500)
modules C: hadoop-tools/hadoop-azure U: hadoop-tools/hadoop-azure
Console output https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-7852/1/console
versions git=2.25.1 maven=3.6.3 spotbugs=4.2.2
Powered by Apache Yetus 0.14.0 https://yetus.apache.org

This message was automatically generated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants