[AINode] Preliminary version of concurrent inference #15884

jtmer · 2025-07-08T02:04:18Z

Description

This PR adds an request–pooling engine for multi-request inference on time-series models such as TimerXL. The change set introduces three core Python modules—requestpool.py, request.py, and utils.py—plus a self-contained benchmark harness (guarded by if name == "main":) to compare pooled vs. baseline generation speed and numerical fidelity.

Design

Overlap multiple user requests on one device: RequestPool.step() batches all ready requests every 15 ms if there are no requests running, and feeds a single forward pass to the model.
Handle variable sequence lengths: Left-padding to max_len per tensor type; preserves causal semantics while enabling torch.cat.

Behavior & configuration

RequestPool.add_request truncates inputs that are not an exact multiple of config.input_token_len, ensuring model state alignment;
Oversized write attempts are silently clipped to max_new_steps;

Class & method organization

RequestPool
Public API: add_request, run_inference (starts loop), step (single scheduling + forward pass).
Request
id, chunk_size, …, state, cur_step_idx, output_tensor
write_step_output pre-allocates and in-place fills a fixed buffer—no Python-side reallocation after start.
utils
split_moe_output Slices Moe[Causal]LMOutputWithPast into per-request objects.

This PR has:

been self-reviewed.
added comments explaining the "why" and the intent of the code wherever would not be obvious
for an unfamiliar reader.
added unit tests.

Key changed/added classes (or packages if there are too many classes) in this PR

ainode.core.inference.requestpool.RequestPool
ainode.core.inference.request.Request
ainode.core.inference.utils.split_moe_output

OswinGuai added 3 commits July 8, 2025 09:47

Preliminary version of concurrent inference

1224c7f

Preliminary version of concurrent inference

05fb207

fix code style

4503fe6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[AINode] Preliminary version of concurrent inference #15884

[AINode] Preliminary version of concurrent inference #15884

Uh oh!

jtmer commented Jul 8, 2025

Uh oh!

Uh oh!

[AINode] Preliminary version of concurrent inference #15884

Are you sure you want to change the base?

[AINode] Preliminary version of concurrent inference #15884

Uh oh!

Conversation

jtmer commented Jul 8, 2025

Description

Design

Behavior & configuration

Class & method organization

Key changed/added classes (or packages if there are too many classes) in this PR

Uh oh!

Uh oh!