Description
Two issues were identified in the current codebase:
-
rllm/__init__.py exports "Runner" in __all__, but no corresponding Runner symbol exists in the package. This creates an inconsistent public API and may cause confusion for users relying on package exports.
-
reduce_metrics_by_trajectory_name in rllm/trainer/algorithms/metrics.py incorrectly overwrites rewards when multiple trajectories share the same name. Instead of aggregating all rewards, only the last reward value is retained, resulting in inaccurate metric calculations.
Expected Behavior
__all__ should only contain valid, importable symbols.
- Metrics should be calculated using all rewards associated with a trajectory name.
Actual Behavior
Runner is listed as a public export but does not exist.
- Reward values are overwritten rather than accumulated.
Steps to Reproduce
Issue 1: Invalid Export
import rllm
print("Runner" in dir(rllm))
Observe that Runner is not available despite being included in __all__.
Issue 2: Metric Aggregation
Use multiple trajectory results with the same trajectory name:
metrics = [
{"trajectory_name": "task_a", "reward": 0.5},
{"trajectory_name": "task_a", "reward": 0.8},
]
Run reduce_metrics_by_trajectory_name().
Expected:
- Both rewards are aggregated.
Actual:
- Only the last reward (
0.8) is retained.
Error Output / Traceback
No runtime exception is generated.
This is a logical bug causing:
* Invalid public API exports.
* Incorrect metric aggregation results.
rLLM Version
0fb9d26
Training Backend
verl
Python Version
3.11.4
GPU / CUDA Version
N/A
vLLM Version (if applicable)
N/A
Training Script / Config
Not required to reproduce.
The issue can be reproduced directly by inspecting:
* `rllm/__init__.py`
* `rllm/trainer/algorithms/metrics.py`
Additional Context
Proposed fix:
- Remove
"Runner" from rllm/__init__.py::__all__.
- Update
reduce_metrics_by_trajectory_name() to collect rewards into lists instead of overwriting previous values.
- Safely ignore
None rewards during aggregation to avoid invalid metric entries.
This ensures accurate metric reporting and a consistent public API.
Description
Two issues were identified in the current codebase:
rllm/__init__.pyexports"Runner"in__all__, but no correspondingRunnersymbol exists in the package. This creates an inconsistent public API and may cause confusion for users relying on package exports.reduce_metrics_by_trajectory_nameinrllm/trainer/algorithms/metrics.pyincorrectly overwrites rewards when multiple trajectories share the same name. Instead of aggregating all rewards, only the last reward value is retained, resulting in inaccurate metric calculations.Expected Behavior
__all__should only contain valid, importable symbols.Actual Behavior
Runneris listed as a public export but does not exist.Steps to Reproduce
Issue 1: Invalid Export
Observe that
Runneris not available despite being included in__all__.Issue 2: Metric Aggregation
Use multiple trajectory results with the same trajectory name:
Run
reduce_metrics_by_trajectory_name().Expected:
Actual:
0.8) is retained.Error Output / Traceback
rLLM Version
0fb9d26
Training Backend
verl
Python Version
3.11.4
GPU / CUDA Version
N/A
vLLM Version (if applicable)
N/A
Training Script / Config
Additional Context
Proposed fix:
"Runner"fromrllm/__init__.py::__all__.reduce_metrics_by_trajectory_name()to collect rewards into lists instead of overwriting previous values.Nonerewards during aggregation to avoid invalid metric entries.This ensures accurate metric reporting and a consistent public API.