Skip to content

[rocprofiler-sdk] Optimize HSA queue write interceptor and async signal handler#4276

Open
jrmadsen wants to merge 8 commits intodevelopfrom
users/jrmadsen/optimize-hsa-write-interceptor
Open

[rocprofiler-sdk] Optimize HSA queue write interceptor and async signal handler#4276
jrmadsen wants to merge 8 commits intodevelopfrom
users/jrmadsen/optimize-hsa-write-interceptor

Conversation

@jrmadsen
Copy link
Copy Markdown
Contributor

Motivation

Rewrites how rocprofiler-sdk handles the signal creation and signal async handlers in queue interception.

Technical Details

  • Creates an initial batch of 4096 signals and creates new batches of 4096 as needed.
  • Only assigns async signal handler to last packet in a batch of packets

JIRA ID

Test Plan

  • Ideally, this just improves performance and any breakages will be detected in the existing tests.
  • Developing a test to prevent a performance regression will difficult.

Test Result

Submission Checklist

@jrmadsen jrmadsen requested review from a team as code owners March 20, 2026 23:37
Copilot AI review requested due to automatic review settings March 20, 2026 23:37
@jrmadsen jrmadsen changed the title Users/jrmadsen/optimize hsa write interceptor [rocprofiler-sdk] Optimize HSA queue write interceptor and async signal handler Mar 20, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors rocprofiler-sdk’s HSA queue write interception and async signal handling to reduce per-dispatch overhead by batching per-packet state and introducing pooled/batched HSA signals.

Changes:

  • Introduces packet_data_t and updates completion callbacks to operate on per-packet data rather than session-wide fields.
  • Adds a pooled signal infrastructure (pool/pool_object) and rewires queue interception to allocate/reuse signals in batches.
  • Adds a new HIP test binary (hip-graph-bubbles) intended to create many graph-based kernel dispatches.

Reviewed changes

Copilot reviewed 31 out of 34 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
projects/rocprofiler-sdk/tests/bin/hip-graph-bubbles/hip-graph-bubbles.cpp New test program that builds/launches a HIP graph repeatedly with roctx ranges.
projects/rocprofiler-sdk/tests/bin/hip-graph-bubbles/CMakeLists.txt Build rules for the new hip-graph-bubbles test binary.
projects/rocprofiler-sdk/tests/bin/CMakeLists.txt Adds hip-graph-bubbles subdirectory to the test build.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/tracing/fwd.hpp Changes external correlation map type to a small_vector-backed container.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/thread_trace/core.hpp Updates post_kernel_call signature to take packet_data_t.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/thread_trace/core.cpp Threads packet_data_t.user_data through post-dispatch data iteration.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/pc_sampling/tests/pc_sampling_internals.hpp Updates session type name references for completion callback signatures.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/pc_sampling/hsa_adapter.cpp Adapts to renamed session type and small_vector external correlation map.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/kernel_dispatch/tracing.hpp Updates dispatch tracing APIs to use queue_info_session_t + packet_data_t.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/kernel_dispatch/tracing.cpp Moves dispatch callback inputs from session-wide to per-packet storage.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/hsa/signal.hpp Adds signal_t wrapper used by pooled signal objects.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/hsa/queue_info_session.hpp Introduces packet_data_t and refactors session to hold a small_vector of packet data.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/hsa/queue.hpp Updates async completion callback signature and adds pooled-signal APIs.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/hsa/queue.cpp Core refactor: batching packet data, pooled signals, and async handler changes.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/hsa/memory_allocation.cpp Switches external correlation map alias to the new small_vector-backed type.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/hsa/async_copy.cpp Switches external correlation map alias to the new small_vector-backed type.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/hsa/CMakeLists.txt Adds signal.hpp to installed/compiled HSA headers list.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/counters/tests/core.cpp Updates tests for renamed session type and new completed_cb signature.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/counters/sample_processing.hpp Plumbs packet_data_t into callback processing params.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/counters/sample_processing.cpp Reads dispatch info/user_data/external corr IDs from packet_data_t.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/counters/dispatch_handlers.hpp Updates completed callback signature to include packet_data_t.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/counters/dispatch_handlers.cpp Passes packet_data_t through to sample processing.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/counters/core.cpp Updates controller callback wiring for new completed callback signature.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/code_object/code_object.cpp Switches external correlation map alias to the new small_vector-backed type.
projects/rocprofiler-sdk/source/lib/rocprofiler-sdk/buffer.cpp Optimizes get_buffer lookup from linear scan to direct indexing.
projects/rocprofiler-sdk/source/lib/common/utility.hpp Generalizes get_val to work with containers providing find (incl. small_vector pairs).
projects/rocprofiler-sdk/source/lib/common/mpl.hpp Extends pair detection trait to expose first_type / second_type.
projects/rocprofiler-sdk/source/lib/common/container/static_vector.hpp Simplifies emplace_back assignment path.
projects/rocprofiler-sdk/source/lib/common/container/stable_vector.hpp Initializes members to defaults to avoid uninitialized state.
projects/rocprofiler-sdk/source/lib/common/container/small_vector.hpp Adds map-like helpers for small_vector-of-pairs (find, at, emplace).
projects/rocprofiler-sdk/source/lib/common/container/record_header_buffer.cpp Uses memset to clear only the used header range; adds <cstring>.
projects/rocprofiler-sdk/source/lib/common/container/pool_object.hpp New pooled object wrapper with acquire/release semantics.
projects/rocprofiler-sdk/source/lib/common/container/pool.hpp New pool implementation for batched reusable objects (used for signals).
projects/rocprofiler-sdk/source/lib/common/container/CMakeLists.txt Adds new pool headers to the common container header list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +854 to +864
if(auto* pool = get_signal_pool(); use_pool && pool && attribute == 0)
{
auto& _signal = pool->acquire(construct_hsa_signal, 0, 0, nullptr, attribute);
ROCP_FATAL_IF(!_signal.in_use()) << "Acquired signal from pool that is not in use";
*signal = _signal.get().value;
// ROCP_INFO << fmt::format("acquired signal {} from pool: hsa_signal_t{{.handle={}}}",
// _signal.index(),
// _signal.get().value.handle);
get_core_table()->hsa_signal_store_screlease_fn(_signal.get().value, 1);
return &_signal;
}
Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pooled-signal path calls pool->acquire(construct_hsa_signal, ...), which creates a brand new HSA signal each time the pooled object is acquired. That overwrites the previous hsa_signal_t handle stored in the pool object without destroying it, effectively leaking HSA signals and defeating reuse.

For pooled signals, create the HSA signal once when the pool batch is constructed and only reset its value on reuse (e.g., via hsa_signal_store_*), or explicitly destroy the previous handle before re-creating it.

Copilot uses AI. Check for mistakes.
Comment on lines 305 to 309
ROCP_INFO << fmt::format("WriteInterceptor called with pkt_count={}", pkt_count);

using callback_record_t = packet_data_t::callback_record_t;
using packet_vector_t = common::container::small_vector<rocprofiler_packet, 512>;

Copy link

Copilot AI Mar 20, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ROCP_INFO logging inside WriteInterceptor will run on every intercepted queue write and can add significant overhead / log spam in performance-sensitive code paths.

Consider removing this log or demoting it to ROCP_TRACE / VLOG guarded by a debug flag.

Copilot uses AI. Check for mistakes.
@powderluv
Copy link
Copy Markdown

I pulled this PR into a clean local worktree and tried it against the same HIP graph kernel-trace repro cases we have been using for the queue/signal issue.

One build caveat first: on my ROCm 7.13 / TheRock venv, the PR head (9113c23e9d) does not build as-is because it is missing the separate fmt.hpp memory-copy-op compatibility fix for ROCm 7.13. I applied only that minimal compatibility patch locally, with no queue/signal behavior changes on top of the PR, so I could test the runtime behavior.

With that single compatibility patch added, I still could not get the PR branch to pass the HIP graph repro:

  • 1000 x 300 with --kernel-trace segfaulted very early, before any CSV output was written.
  • 256 x 200 with --kernel-trace also segfaulted before any CSV output was written.

I put the exact compatibility patch and the two crash logs into a secret gist here:

https://gist.github.com/powderluv/f65f4560fe338effd090fd7dd57d833d

Files in the gist:

  • README.md
  • pr4276_rocm713_compat.patch
  • pr4276_k1000_i300_run.log
  • pr4276_k256_i200_run.log

So at least on this setup, this alternative implementation is not yet passing the existing HIP graph test cases.

@powderluv
Copy link
Copy Markdown

I pulled this into a clean workspace and iterated on top of the PR head locally. The updated branch is here:

Local commit stack on top of the PR branch:

  • d33b45eda3 rocprofiler-sdk: handle ROCm 7.13 memory copy op layouts
  • d88004b100 rocprofiler-sdk: avoid host-thread state on async queue callbacks
  • 5e6cd16418 rocprofiler-sdk: prearm queue completion callbacks for hip graphs

What changed at a high level:

  • stopped using host-thread-only state on ROCr async-doorbell callbacks
  • skipped tool-side kernel rename / HIP stream external-correlation setup when there is no host correlation id
  • switched the queue completion path to fresh one-shot pre-armed slots
  • kept pre-armed handlers alive until a real queue session is attached
  • changed the queue completion async-handler condition from EQ -1 to LT 1, which was the turning point for actually draining dispatch completions on this HIP graph case

Validation on the HIP graph reproducer (--kernel-trace, CSV output):

  • 256 x 20: passes, 5120 rows / 5120 unique dispatch ids
  • 256 x 200: passes, 51200 rows / 51200 unique dispatch ids
  • 1000 x 200: passes, 200000 rows / 200000 unique dispatch ids
  • 2000 x 200: passes, 400000 rows / 400000 unique dispatch ids

This is materially different from the original state I tested earlier on this machine, where the branch either failed to build on the ROCm 7.13 venv or segfaulted / failed to emit profiler output on the same HIP graph kernel-trace cases.

@powderluv
Copy link
Copy Markdown

I added a local hotspot pass on the current pr4276-based branch using the HIP graph reproducer with queue-signal timing enabled.

Method:

  • staged local rocprofv3 from the current pr4276 workspace
  • ROCPROFILER_QUEUE_SIGNAL_TRACE=1
  • ROCPROFILER_QUEUE_SIGNAL_TRACE_PERIOD=65536
  • compared the first ~65536 traced-dispatch summary on two shapes:
    • 2000 x 300
    • 3000 x 200

The main result is that the async completion callback is not the dominant performance hotspot.

At the first summary window:

  • 2000 x 300
    • dispatch_setup_avg_us=32.306
    • completion_avg_us=1.265
    • create_avg_us=0.641
    • register_avg_us=1.262
    • enqueue_latency_avg_us=4152.152
    • direct_create_calls=24406
  • 3000 x 200
    • dispatch_setup_avg_us=33.673
    • completion_avg_us=1.384
    • create_avg_us=0.696
    • register_avg_us=1.202
    • enqueue_latency_avg_us=4457.345
    • direct_create_calls=23946

Interpretation:

  • enqueue-side WriteInterceptor work is roughly 24x-26x larger than the async completion callback work
  • hsa_amd_signal_create and hsa_amd_signal_async_handler are visible, but neither is the dominant cost by itself
  • callback subphases are small:
    • get_dispatch_avg_us ~ 0.116-0.125
    • dispatch_complete_avg_us ~ 0.367-0.406
    • callback_avg_us ~ 0.070-0.072
  • the queue is still accumulating noticeable completion lag (enqueue_latency_avg_us ~ 4.1-4.5 ms), but the direct callback body is not expensive enough to explain the overall slowdown
  • the prearmed slot path still falls back to direct creates frequently (~24k misses in the first ~65k dispatches), so slot availability is still part of the picture

The next useful step is finer instrumentation inside WriteInterceptor itself, especially around:

  • correlation / external-correlation work
  • tracing enter/exit callback overhead
  • queue callback fanout on enqueue
  • packet transformation / serialization path
  • slot-acquire miss path versus ready-slot hit path

So the current evidence says: optimize enqueue-side setup first, not async callback execution.

@powderluv
Copy link
Copy Markdown

Follow-up hotspot note from a second local instrumentation pass on the HIP graph repro.

I split the enqueue-side dispatch_setup_avg_us bucket into non-overlapping pieces on the current local pr4276 worktree and sampled the first ~65536 traced dispatches of two shapes:

  • 2000 x 300

    • dispatch_setup_avg_us=45.849
    • dispatch_packet_avg_us=0.180
    • dispatch_signal_avg_us=44.611
      • dispatch_signal_create_avg_us=44.392
      • dispatch_signal_arm_avg_us=0.219
    • completion_avg_us=0.849
    • enqueue_latency_avg_us=5814.085
    • direct_create_calls=8834 / 65532
  • 3000 x 200

    • dispatch_setup_avg_us=54.235
    • dispatch_packet_avg_us=0.488
    • dispatch_signal_avg_us=52.278
      • dispatch_signal_create_avg_us=52.044
      • dispatch_signal_arm_avg_us=0.233
    • completion_avg_us=1.618
    • enqueue_latency_avg_us=6893.844
    • direct_create_calls=1064 / 65521

Takeaway:

  • The main enqueue-side hotspot is the completion-signal acquisition / creation stage in WriteInterceptor, not packet building and not async-handler arm/register.
  • Packet build is sub-0.5 us here.
  • Arm/register is only about 0.22-0.23 us.
  • Completion callback work is still small (<2 us).
  • The wider graph shape (3000x200) is slower mainly because the signal-create/acquire stage grows, and enqueue latency grows with it.

One nuance: the raw create_avg_us counter for hsa_amd_signal_create itself is still sub-1 us, so this larger dispatch_signal_create_avg_us bucket is measuring the broader completion-signal acquisition path, not just the raw runtime call in isolation. That points more toward ready-slot acquisition / fallback / surrounding queue bookkeeping than the async callback path.

@powderluv
Copy link
Copy Markdown

Follow-up after cleaning up the local diff and updating the comparison branch.

I pushed a cleaned queue-only commit on top of users/powderluv/pr4276-hip-graph-fix:

  • b0db72c610 rocprofiler-sdk: use a ready queue for prearmed signals

What changed in this cleanup:

  • kept only the ready-queue optimization for prearmed completion slots
  • dropped the temporary hotspot instrumentation
  • kept the header-side async_signal_* type placement needed for a clean rebuild in this branch layout

Validated from a clean rebuild/stage in the venv-backed environment at:

  • /data/anush/github/bubble/SWDEV-583475/stage/rocprofiler-sdk-pr4276-push

Wide HIP graph kernel-trace reruns on the cleaned stage:

  • 3000 x 200: passed on rerun, full CSV written
    • log: /data/anush/github/bubble/SWDEV-583475/logs/hip-graph-cleanpush-k3000-i200-rerun-20260322T080944Z/run.log
    • csv: /data/anush/github/bubble/SWDEV-583475/profiles/hip-graph-cleanpush-k3000-i200-rerun-20260322T080944Z/rocprofv3/trace_kernel_trace.csv
    • result: 600000 rows / 600000 unique Dispatch_Id
  • 2000 x 300: passed, full CSV written
    • log: /data/anush/github/bubble/SWDEV-583475/logs/hip-graph-cleanpush-k2000-i300-20260322T080957Z/run.log
    • csv: /data/anush/github/bubble/SWDEV-583475/profiles/hip-graph-cleanpush-k2000-i300-20260322T080957Z/rocprofv3/trace_kernel_trace.csv
    • result: 600000 rows / 600000 unique Dispatch_Id

One caveat: the first fresh 3000 x 200 run after the clean rebuild hit a one-off hip::stream::get_stream_id segfault:

  • /data/anush/github/bubble/SWDEV-583475/logs/hip-graph-cleanpush-k3000-i200-20260322T080840Z/run.log

That fault did not reproduce on the immediate rerun above, and the second wide case also passed. So the ready-queue throughput fix is on the branch now, but there is still some residual instability outside the queue ready-queue path that may need a separate follow-up.

@bwelton
Copy link
Copy Markdown
Contributor

bwelton commented Mar 23, 2026

Creates an initial batch of 4096 signals and creates new batches of 4096 as needed.

If it ever needs to exceed 4096, you may run into this exact issue again. There is a limit to the number of signals that can be created before polling must be used for all of them (i believe that limit is 4096).

Is this specifically only with kernel-trace? Do we have experiments that show that this change is enough to resolve the underlying problem?

@bwelton
Copy link
Copy Markdown
Contributor

bwelton commented Mar 23, 2026

Given the discussion in https://amd-hub.atlassian.net/browse/ROCM-20396 as well. We should consider just doing the out of band solution for getting the profiling time for these kernels. It doesn't make much sense to hack on both sides here to get around an issue that could be resolved by just supporting out of band performance metrics collection. I suspect both of these independent solutions will be fragile in that they will either see performance degradation under different circumstances or experience bugs (which is more in relation to the HSA changes in the PR for ROCM_20396).

@jrmadsen jrmadsen force-pushed the users/jrmadsen/optimize-hsa-write-interceptor branch from 63341bc to 3f0a2bd Compare March 31, 2026 15:40
// indicate the number of used elements.
if(_n > 0)
{
std::memset(m_headers.data(), 0, _n * sizeof(rocprofiler_record_header_t));
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this here? I am not opposed to the change but why this PR?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This got pulled in from @itrowbri's optimizations in another PR. He found some performance improvements so they are included as part of the overall performance improvement of the write interceptor

// Use direct indexing instead of linear search (same pattern as destroy_buffer)
// See allocate_buffer below that the idx is assigned based on the size + address
auto idx = buffer_id.handle - get_buffer_offset();
auto& buf = get_buffers()->at(idx);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar question to record_header_buffer.cpp, why is this specifically included in this PR?

m_data[_idx] = {std::forward<Args>(_v)...};
else
m_data[_idx] = Tp{std::forward<Args>(_v)...};
m_data[_idx] = Tp{std::forward<Args>(_v)...};
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there issues with constexpr above?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It had to do with is_assignable being malformed when sizeof...(Args) > 1

@@ -297,6 +298,8 @@ class small_vector_template_common : public small_vector_base<small_vector_size_
using value_type = T;
using iterator = T*;
using const_iterator = const T*;
using key_type = typename mpl::is_pair<T>::first_type; // will be void if not pair
using mapped_type = typename mpl::is_pair<T>::second_type; // will be void if not pair
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is no longer a small vector but a flat map with this change. The change that needs this (external_correlation_id_map_t from map -> this flat map) doesn't seem like its necessary in this PR and is really a separate optimization.

}

hsa_status_t status =
get_amd_ext_table()->hsa_amd_signal_create_fn(1, 0, nullptr, attribute, signal);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do a general check here for get_amd_ext_table vs _ext_api usage at some point. These seem to be non-uniformly used in this PR.

{
ROCP_TRACE << fmt::format("Destroying interrupt signal {{.handle={}}}",
packet.interrupt_signal.handle);
hsa::get_core_table()->hsa_signal_destroy_fn(packet.interrupt_signal);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be simpler here to maybe just used the pools for all signals? There is some higher risk of this change so we may want to do this separately.

@@ -213,14 +311,17 @@ WriteInterceptor(const void* packets,
return;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a really nasty edge case when serialization and batching are both enabled. If they are both enabled, there seems like there would be a deadlock here since the batching would potentially cause kernel_completion_signal() to not be triggered when we expect it to be.

I would suggest we actually gate batching to not be used when serialization is enabled (the performance bubble's don't matter in serialized cases anyway).

Copy link
Copy Markdown
Collaborator

@ammarwa ammarwa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: [rocprofiler-sdk] Optimize HSA queue write interceptor and async signal handler

Reviewed the signal pooling, batched packet processing, and refactored queue_info_session.

Found 2 critical, 2 important, 2 suggestions, and 1 nit.

🤖 Generated with Claude Code

}
}

return acquire();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 SUGGESTION: acquire() uses unbounded recursion for retry

After creating a new batch (lines 147-161), this calls return acquire(); recursively. Under extreme contention (many threads exhausting the pool simultaneously), the new batch could be consumed by other threads before this recursive call runs, leading to repeated batch creation and unbounded stack growth.

With 4096-element batches this is extremely unlikely in practice, but a while(true) loop would be strictly safer and equally readable:

while(true)
{
    // ... try to acquire from m_available ...
    if(_idx.has_value()) { /* return */ }
    // ... create new batch if needed ...
}

// pair of hsa signal and user data pointer for async handler
struct signal_t
{
// bool handler_is_set = false;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 SUGGESTION: Remove commented-out members

These commented-out members (handler_is_set, data) appear to be leftover development code. They add noise and may confuse future readers about whether they should be re-enabled.

struct signal_t
{
    hsa_signal_t value = {.handle = 0};
};

// Copy kernel pkt, copy is to allow for signal to be modified
rocprofiler_packet kernel_pkt = packets_arr[i];
_packet_data.kernel_packet = packets_arr[i];
// create a referencce for short hand access
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NIT: Typo

referenccereference

@jrmadsen
Copy link
Copy Markdown
Contributor Author

jrmadsen commented Apr 1, 2026

There is a larger problem with the entire system... The background thread for processing counters (which appears to have been implemented by @ApoKalipse-V) is asynchronously operating on signals after they've been released back into the pool. It will take some time to resolve this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants