[https://nvbugs/5467062][fix] pass logitsPostProcessorBatched by reference #7110

milesial · 2025-08-21T06:14:59Z

Avoids a GIL grab from the scheduling C++ thread when unnecessary (blocking kernel scheduling of the sampler). Up to 20% throughput increase at high concurrencies when a batched processor is registered.

coderabbitai · 2025-08-21T06:15:06Z

Important

Review skipped

More than 25% of the files skipped due to max files limit. The review is being skipped to prevent a low-quality review.

193 files out of 300 files are above the max files limit of 100. Please upgrade to Pro plan to get higher limits.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

📝 Walkthrough

Walkthrough

Updated LogitsPostProcessor::operator() to accept the optional LogitsPostProcessorBatched parameter by const reference instead of by value in both declaration and definition. No logic or control flow changes.

Changes

Cohort / File(s)	Summary of Changes
LogitsPostProcessor signature update `cpp/include/tensorrt_llm/batch_manager/logitsPostProcessor.h`, `cpp/tensorrt_llm/batch_manager/logitsPostProcessor.cpp`	Changed operator() parameter from `std::optional<LogitsPostProcessorBatched>` to `std::optional<LogitsPostProcessorBatched> const&`, retaining default `std::nullopt`; no internal logic changes.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

✨ Finishing Touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai or @coderabbitai title anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

IzzyPutterman · 2025-08-21T06:18:16Z

/bot run

coderabbitai

Actionable comments posted: 0

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

cpp/tensorrt_llm/batch_manager/logitsPostProcessor.cpp (1)
73-81: Guard against nullopt before dereferencing logitsPostProcessorBatched

If any request sets mApplyLogitsPostProcessorBatched but the caller provides no batched functor (defaulted/nullopt), this will dereference an empty optional. Add a guard.

Minimal fix:
if (!reqIdsVec.empty())
{
    logitsPostProcessorIsApplied = true;
    if (replicateLogitsPostProcessor || worldConfig.isFirstTensorParallelRank())
    {
        if (!logitsPostProcessorBatched)
        {
            // Choose one: log-and-return, assert, or throw (consistent with project policy).
            TLLM_LOG_ERROR("Batched logits post processor is required but not provided.");
            return false;
        }
        (*logitsPostProcessorBatched)(reqIdsVec, logitsVec, beamTokensVec, stream, clientIdsVec);
    }
}

🧹 Nitpick comments (4)

cpp/include/tensorrt_llm/batch_manager/logitsPostProcessor.h (3)
48-51: ABI heads-up: signature change alters the mangled symbol

Changing from by-value to by-const-ref is an ABI-breaking change for external consumers compiled against older headers. Please confirm this is internal-only or plan a coordinated rebuild/release note.

18-18: Add include guard per project guidelines

Guidelines require TRTLLM__H include guards. Replace or complement #pragma once with a guard.

Example:
// At top (replace #pragma once)
#ifndef TRTLLM_LOGITSPOSTPROCESSOR_H
#define TRTLLM_LOGITSPOSTPROCESSOR_H

// ... file contents ...

#endif // TRTLLM_LOGITSPOSTPROCESSOR_H
2-2: Update copyright year to include 2025

The .cpp already uses 2025. Please align the header.
- * SPDX-FileCopyrightText: Copyright (c) 2022-2024 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
+ * SPDX-FileCopyrightText: Copyright (c) 2022-2025 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
cpp/tensorrt_llm/batch_manager/logitsPostProcessor.cpp (1)
42-47: Micro-opt: pre-reserve vectors when collecting batched arguments

To avoid repeated reallocations when decoderRequests.size() is known, reserve capacity before the loop.
reqIdsVec.reserve(inputBuffers.decoderRequests.size());
logitsVec.reserve(inputBuffers.decoderRequests.size());
beamTokensVec.reserve(inputBuffers.decoderRequests.size());
clientIdsVec.reserve(inputBuffers.decoderRequests.size());

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

💡 Knowledge Base configuration:

MCP integration is disabled by default for public repositories
Jira integration is disabled by default for public repositories
Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between ba0a86e and 827213e.

📒 Files selected for processing (2)

cpp/include/tensorrt_llm/batch_manager/logitsPostProcessor.h (1 hunks)
cpp/tensorrt_llm/batch_manager/logitsPostProcessor.cpp (1 hunks)

🧰 Additional context used

📓 Path-based instructions (5)

**/*.{cpp,cxx,cc,cu,h,hpp,hxx,hh,cuh}