Weaken num_new_engines into has_new_engines by fzyzcjy · Pull Request #938 · radixark/miles

fzyzcjy · 2026-04-07T09:17:57Z

to prepare for multi start() in one step

gemini-code-assist

Code Review

This pull request refactors the rollout engine tracking logic by replacing the integer-based num_new_engines with a boolean has_new_engines across the FSDP, Megatron, and Ray rollout modules. The start_engines method was also modified to return a tuple containing initialization handles and the count of new engines. Review feedback identifies several instances where docstrings, comments, and assertion messages were not updated to reflect these variable name and return type changes.

gemini-code-assist · 2026-04-07T09:20:31Z

miles/ray/rollout/server_group.py

-        logger.info(f"Recovered {self.num_new_engines} dead rollout engines (worker_type={self.worker_type})")
-        assert self.num_new_engines == len(dead_indices), "num_new_engines does not match dead_indices length"
+        logger.info(f"Recovered {curr_num_new_engines} dead rollout engines (worker_type={self.worker_type})")
+        assert curr_num_new_engines == len(dead_indices), "curr_num_new_engines does not match dead_indices length"


The assertion message still refers to num_new_engines. It should be updated to match the new variable name curr_num_new_engines for consistency.

Suggested change

assert curr_num_new_engines == len(dead_indices), "curr_num_new_engines does not match dead_indices length"

assert curr_num_new_engines == len(dead_indices), "curr_num_new_engines does not match dead_indices length"

gemini-code-assist · 2026-04-07T09:20:31Z

miles/ray/rollout/rollout_manager.py

+    def clear_updatable_has_new_engines(self):
+        # when fault tolerance is not enabled, we need to manually clear has_new_engines after update_weights


The comment on line 199 still refers to num_new_engines. It should be updated to has_new_engines to reflect the recent changes.

Suggested change

def clear_updatable_has_new_engines(self):

# when fault tolerance is not enabled, we need to manually clear has_new_engines after update_weights

def clear_updatable_has_new_engines(self):

# when fault tolerance is not enabled, we need to manually clear has_new_engines after update_weights

gemini-code-assist · 2026-04-07T09:20:31Z

miles/ray/rollout/server_group.py

+    def start_engines(self, port_cursors: PortCursors) -> tuple[list, int]:
        """Create Ray actors, allocate ports, and fire ``engine.init()`` without waiting.

-        Returns ``(init_handles, port_cursors)`` where *init_handles* is a list
+        Returns ``(init_handles, curr_num_new_engines)`` where *init_handles* is a list
        of Ray ObjectRefs and *port_cursors* maps node index -> next free port.
        """


The docstring for start_engines is outdated. It incorrectly states that the method returns (init_handles, port_cursors), whereas it now returns (init_handles, curr_num_new_engines). Additionally, port_cursors is modified in-place rather than returned.

Suggested change

def start_engines(self, port_cursors: PortCursors) -> tuple[list, int]:

"""Create Ray actors, allocate ports, and fire ``engine.init()`` without waiting.

Returns ``(init_handles, port_cursors)`` where *init_handles* is a list

Returns ``(init_handles, curr_num_new_engines)`` where *init_handles* is a list

of Ray ObjectRefs and *port_cursors* maps node index -> next free port.

"""

def start_engines(self, port_cursors: PortCursors) -> tuple[list, int]:

"""Create Ray actors, allocate ports, and fire engine.init() without waiting.

Returns (init_handles, curr_num_new_engines) where init_handles is a list

of Ray ObjectRefs and curr_num_new_engines is the number of newly started engines.

"""

fzyzcjy added 3 commits April 7, 2026 17:13

more

39a7d37

more

43bcb16

more

1d0ad2d

fzyzcjy requested review from maocheng23, yueming-yuan and yushengsu-thu as code owners April 7, 2026 09:17

more

d37f480

gemini-code-assist bot reviewed Apr 7, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Weaken num_new_engines into has_new_engines#938

Weaken num_new_engines into has_new_engines#938
fzyzcjy wants to merge 4 commits intorollout_ft/20from
rollout_ft/21

fzyzcjy commented Apr 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

gemini-code-assist bot Apr 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	assert curr_num_new_engines == len(dead_indices), "curr_num_new_engines does not match dead_indices length"
	assert curr_num_new_engines == len(dead_indices), "curr_num_new_engines does not match dead_indices length"

		def clear_updatable_has_new_engines(self):
		# when fault tolerance is not enabled, we need to manually clear has_new_engines after update_weights

Conversation

fzyzcjy commented Apr 7, 2026

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Apr 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant