Skip to content

Scaffolding tests failing on main branch with thread leaks and RuntimeError #4974

@ccs96307

Description

@ccs96307

Hi team,

Before starting my work on issue #3332 (Support more sampling parameters with openai worker), I wanted to ensure the existing tests on the main branch were passing in my local environment.

I ran the scaffolding unit tests with the following command:

cd tests/unittests/

pytest scaffolding/

However, the tests failed with multiple thread leak errors and a RuntimeError. Here is the summary from the test output:

...

scaffolding/test_worker.py::test_trtllm_worker_generation
  /code/tensorrt_llm/.venv-3.12/lib/python3.12/site-packages/_pytest/runner.py:246: PluggyTeardownRaisedWarning: A plugin raised an exception during an old-style hookwrapper teardown.
  Plugin: threadleak, Hook: pytest_runtest_call
  Failed: Test leaked [<Thread(Thread-12 (_manager_spawn), started 129198052079296)>]
  For more information see https://pluggy.readthedocs.io/en/stable/api_reference.html#pluggy.PluggyTeardownRaisedWarning
    lambda: runtest_hook(item=item, **kwds), when=when, reraise=reraise

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================= slowest durations =============================================================================
10.02s call     scaffolding/test_bench.py::test_scaffolding_benchmark
10.01s call     scaffolding/test_parallel_process.py::test_parallel_process_helper_with_two_level
8.01s call     scaffolding/test_parallel_process.py::test_parallel_process_helper
6.51s setup    scaffolding/test_worker.py::test_trtoai_worker_generation[pytorch-enable_processpool]
5.75s call     scaffolding/test_scaffolding.py::test_unbatched_scaffolding_sync
5.52s call     scaffolding/test_scaffolding.py::test_batched_scaffolding_sync
5.48s call     scaffolding/test_scaffolding.py::test_majority_vote
5.48s call     scaffolding/test_scaffolding.py::test_async_scaffolding_generation
5.43s call     scaffolding/test_worker.py::test_trtllm_worker_generation

(20 durations < 0.005s hidden.  Use -vv to show these durations.)
========================================================================== short test summary info ==========================================================================
FAILED scaffolding/test_bench.py::test_scaffolding_benchmark - Failed: Test leaked [<Thread(Thread-3 (main_loop_thread), started 129206254659264)>]
FAILED scaffolding/test_scaffolding.py::test_unbatched_scaffolding_sync - Failed: Test leaked [<Thread(Thread-6 (_manager_spawn), started 129204058322624)>]
FAILED scaffolding/test_scaffolding.py::test_batched_scaffolding_sync - Failed: Test leaked [<Thread(Thread-7 (_manager_spawn), started 129202548373184)>]
FAILED scaffolding/test_scaffolding.py::test_async_scaffolding_generation - Failed: Test leaked [<Thread(Thread-8 (_manager_spawn), started 129201004869312)>]
FAILED scaffolding/test_scaffolding.py::test_majority_vote - Failed: Test leaked [<Thread(Thread-9 (_manager_spawn), started 129199595583168)>]
FAILED scaffolding/test_worker.py::test_trtllm_worker_generation - Failed: Test leaked [<Thread(Thread-12 (_manager_spawn), started 129198052079296)>]
ERROR scaffolding/test_worker.py::test_trtoai_worker_generation[pytorch-enable_processpool] - RuntimeError: Server exited unexpectedly.
======================================================== 6 failed, 3 passed, 6 warnings, 1 error in 67.46s (0:01:07) ========================================================

It seems like some threads are not cleaning up their threads properly?

Could you please help confirm if this is a known issue on the main branch? I want to make sure my development environment is set up correctly before proceeding.

Any help or guidance would be greatly appreciated~ Thanks!

Metadata

Metadata

Assignees

Labels

bugSomething isn't workingtriagedIssue has been triaged by maintainers

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions