Skip to content

[AINode] Preliminary version of concurrent inference #15884

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

jtmer
Copy link
Contributor

@jtmer jtmer commented Jul 8, 2025

Description

This PR adds an request–pooling engine for multi-request inference on time-series models such as TimerXL. The change set introduces three core Python modules—requestpool.py, request.py, and utils.py—plus a self-contained benchmark harness (guarded by if name == "main":) to compare pooled vs. baseline generation speed and numerical fidelity.

Design

Overlap multiple user requests on one device: RequestPool.step() batches all ready requests every 15 ms if there are no requests running, and feeds a single forward pass to the model.
Handle variable sequence lengths: Left-padding to max_len per tensor type; preserves causal semantics while enabling torch.cat.

Behavior & configuration

RequestPool.add_request truncates inputs that are not an exact multiple of config.input_token_len, ensuring model state alignment;
Oversized write attempts are silently clipped to max_new_steps;

Class & method organization

RequestPool
Public API: add_request, run_inference (starts loop), step (single scheduling + forward pass).
Request
id, chunk_size, …, state, cur_step_idx, output_tensor
write_step_output pre-allocates and in-place fills a fixed buffer—no Python-side reallocation after start.
utils
split_moe_output Slices Moe[Causal]LMOutputWithPast into per-request objects.


This PR has:

  • been self-reviewed.
  • added comments explaining the "why" and the intent of the code wherever would not be obvious
    for an unfamiliar reader.
  • added unit tests.

Key changed/added classes (or packages if there are too many classes) in this PR

ainode.core.inference.requestpool.RequestPool
ainode.core.inference.request.Request
ainode.core.inference.utils.split_moe_output

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants