Skip to content

[Bug] Invalid "Runner" Export in __all__ and Incorrect Metric Aggregation #659

@Muhammad-Ikhwan-Fathulloh

Description

Description

Two issues were identified in the current codebase:

  1. rllm/__init__.py exports "Runner" in __all__, but no corresponding Runner symbol exists in the package. This creates an inconsistent public API and may cause confusion for users relying on package exports.

  2. reduce_metrics_by_trajectory_name in rllm/trainer/algorithms/metrics.py incorrectly overwrites rewards when multiple trajectories share the same name. Instead of aggregating all rewards, only the last reward value is retained, resulting in inaccurate metric calculations.

Expected Behavior

  • __all__ should only contain valid, importable symbols.
  • Metrics should be calculated using all rewards associated with a trajectory name.

Actual Behavior

  • Runner is listed as a public export but does not exist.
  • Reward values are overwritten rather than accumulated.

Steps to Reproduce

Issue 1: Invalid Export

import rllm

print("Runner" in dir(rllm))

Observe that Runner is not available despite being included in __all__.

Issue 2: Metric Aggregation

Use multiple trajectory results with the same trajectory name:

metrics = [
    {"trajectory_name": "task_a", "reward": 0.5},
    {"trajectory_name": "task_a", "reward": 0.8},
]

Run reduce_metrics_by_trajectory_name().

Expected:

  • Both rewards are aggregated.

Actual:

  • Only the last reward (0.8) is retained.

Error Output / Traceback

No runtime exception is generated.

This is a logical bug causing:

* Invalid public API exports.
* Incorrect metric aggregation results.

rLLM Version

0fb9d26

Training Backend

verl

Python Version

3.11.4

GPU / CUDA Version

N/A

vLLM Version (if applicable)

N/A

Training Script / Config

Not required to reproduce.

The issue can be reproduced directly by inspecting:

* `rllm/__init__.py`
* `rllm/trainer/algorithms/metrics.py`

Additional Context

Proposed fix:

  1. Remove "Runner" from rllm/__init__.py::__all__.
  2. Update reduce_metrics_by_trajectory_name() to collect rewards into lists instead of overwriting previous values.
  3. Safely ignore None rewards during aggregation to avoid invalid metric entries.

This ensures accurate metric reporting and a consistent public API.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions