Request: 20–30 min technical screen + routing (H100 retrieval/memory primitive; reproducible under NDA)

### Proposal to improve performance

Hi NVIDIA team,

I’m looking to get routed to the right engineering owner for a short 20–30 min technical screen.

Public-safe evidence from our H100 runs:

- Explicit N×N fp16 materialization becomes infeasible at large N (measured CUDA OOM at N=500,000; attempted allocation is hundreds of GiB).
- An indexed O(N) retrieval path continues to operate at the same N (no N^2 matrix construction).
- Memoization on repeated queries yields a large hot-path speedup (example: 863×).

Under NDA we can provide a reproducible runbook + evidence bundle (logs/scripts) for a controlled review (no repo handover).

Could you route this to the right CUDA/perf + (Triton/TensorRT-LLM) owner for a 20–30 minute technical screen this week?

Thanks,

Stanislav Byriukov

@NVIDIA/trt-llm-triton-backend-devs , @NVIDIA/trt-llm-qa-perf , @QiJune 



### Report of performance regression

_No response_

### Misc discussion on performance

_No response_

### Your current environment (if you think it is necessary)

_No response_

### Before submitting a new issue...

- [x] Make sure you already searched for relevant issues, and checked the [documentation](https://nvidia.github.io/TensorRT-LLM/) and [examples](https://github.com/NVIDIA/TensorRT-LLM/tree/main/examples) for answers to frequently asked questions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Request: 20–30 min technical screen + routing (H100 retrieval/memory primitive; reproducible under NDA) #10164

Proposal to improve performance

Report of performance regression

Misc discussion on performance

Your current environment (if you think it is necessary)

Before submitting a new issue...

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Request: 20–30 min technical screen + routing (H100 retrieval/memory primitive; reproducible under NDA) #10164

Description

Proposal to improve performance

Report of performance regression

Misc discussion on performance

Your current environment (if you think it is necessary)

Before submitting a new issue...

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions