Improve time measurement in benchmark #822

Kenny-Vilella · 2025-09-24T09:32:23Z

Description

Changed the way time is calculated in the Fast benchmark to match the KPI benchmark
Changed the way we measure the benchmark time to capture only the simulation time and exclude the action setting.

This will fix #745 and is also related to #652.

Newton Migration Guide

Please ensure the migration guide for warp.sim users is up-to-date with the changes made in this PR.

The migration guide in docs/migration.rst is up-to date

Before your PR is "Ready for review"

Necessary tests have been added and new examples are tested (see newton/tests/test_examples.py)
Documentation is up-to-date
Code passes formatting and linting checks with pre-commit run -a

Summary by CodeRabbit

New Features
- More accurate benchmarking with per-instance timing, device sync, and captured execution paths; added a random-control kernel integration for captured runs.
- Exposed benchmark configuration parameters (repeat, rounds, number of envs) for easier tuning.
Refactor
- Unified timing to use recorded benchmark_time from captured graph launches instead of ad-hoc wall-clock timing.
Chores
- Removed obsolete timing imports and adjusted default actuation behavior in benchmark setup.

…e_time_measurement_benchmark

coderabbitai · 2025-09-24T09:32:30Z

📝 Walkthrough

Walkthrough

Replace per-iteration wall-clock timing with graph-capture-aware benchmark timing: add Example.benchmark_time, introduce a Warp kernel to apply random joint targets, capture/launch CUDA graphs where available, and use graph-captured timing for ASV benchmarks instead of direct time.time() measurements.

Changes

Cohort / File(s)	Summary
ASV MuJoCo benchmarks `asv/benchmarks/simulation/bench_mujoco.py`	Add `@wp.kernel def apply_random_control(state: wp.uint32, joint_target: wp.array(dtype=float))`; add CUDA-graph capture/recapture and graph-launch timing paths for FastAnt and related benchmarks; switch benchmark timing to use `example.benchmark_time`; remove direct `time` import; expose public attributes `number`, `repeat`, `rounds`, `num_envs`.
MuJoCo example runtime timing `newton/examples/example_mujoco.py`	Add `Example.benchmark_time` (float, init 0.0); in `step` wrap execution with device synchronization and timing (capture start/end around graph capture/launch or simulate) and accumulate elapsed time into `benchmark_time`.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor ASV as ASV Benchmark
  participant Track as track_simulate()
  participant EX as Example
  participant GPU as Device / CUDA Graph

  ASV->>Track: run benchmark
  Track->>EX: reset()
  loop per-benchmark-step
    Track->>GPU: synchronize()
    Track->>EX: step()
    Note right of EX: sync -> start_time\n(capture & launch graph) or simulate\nend_time -> accumulate benchmark_time
  end
  Track->>GPU: synchronize()
  Track-->>ASV: ms_per_env_step = (EX.benchmark_time / steps) * 1000

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Update benchmark to be closer to IsaacLab benchmark #821 — modifies Example in newton/examples/example_mujoco.py (adds related constructor/benchmark changes), closely related to the timing/Example edits here.

Suggested reviewers

adenzler-nvidia
eric-heiden

Pre-merge checks and finishing touches

❌ Failed checks (3 warnings)

Check name	Status	Explanation	Resolution
Linked Issues Check	⚠️ Warning	The pull request changes focus entirely on improving benchmark time measurement and do not implement any of the investigation steps or control input modifications described in issue #745, such as adjusting control noise or investigating parallel linesearch convergence.	Update the changeset to include the actual fixes or investigation results required by issue #745 or remove the reference to this issue if it is not being addressed.
Out of Scope Changes Check	⚠️ Warning	All submitted changes center on benchmark timing enhancements and CUDA graph integration, which bear no relation to the investigation steps or code adjustments required by the linked issue #745.	Limit the pull request to modifications that address the investigation into parallel linesearch convergence or split the timing improvements into a separate pull request.
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title concisely and accurately describes the primary change of the pull request, which is enhancing the benchmark’s time measurement logic. It directly reflects the main focus on improving how simulation time is measured in the benchmarks.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment

📜 Recent review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4fd74a3 and ad9549d.

📒 Files selected for processing (1)

asv/benchmarks/simulation/bench_mujoco.py (11 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

asv/benchmarks/simulation/bench_mujoco.py (3)

newton/examples/robot/example_robot_anymal_c_walk.py (2)

capture (178-186)

simulate (188-198)

newton/_src/sim/state.py (1)

joint_dof_count (108-112)

newton/examples/example_mujoco.py (1)

simulate (294-298)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
GitHub Check: Run GPU Benchmarks (Pull Request)

🔇 Additional comments (2)

asv/benchmarks/simulation/bench_mujoco.py (2)

59-75: Define self.graph even when CUDA graph recapture is unavailable

If mempool capture is off, we hit the warning path and self.graph is never set, so time_simulate() raises an AttributeError on wp.capture_launch(self.graph). Provide a fallback (e.g. to self.example.graph) before the capability check and only overwrite it when recapture succeeds. Please mirror the fix across every Fast* setup block that shares this pattern.

-        cuda_graph_comp = wp.get_device().is_cuda and wp.is_mempool_enabled(wp.get_device())
-        if not cuda_graph_comp:
-            print("Cannot use graph capture. Graph capture is disabled.")
-        else:
+        cuda_graph_comp = wp.get_device().is_cuda and wp.is_mempool_enabled(wp.get_device())
+        self.graph = self.example.graph
+        if not cuda_graph_comp:
+            print("Cannot use graph capture. Graph capture is disabled.")
+        else:
             state = wp.rand_init(self.example.seed)
             with wp.ScopedCapture() as capture:
                 wp.launch(
                     apply_random_control,
                     dim=(self.example.model.joint_dof_count,),
                     inputs=[state],
                     outputs=[self.example.control.joint_target],
                 )
                 self.example.simulate()
-            self.graph = capture.graph
+            self.graph = capture.graph
+            self.example.graph = self.graph

113-120: Subtract the pre-loop benchmark_time when sampling

Right now each KPI sample adds the full example.benchmark_time, which already includes work done during setup/graph capture before your measurement loop. Capture t0 = example.benchmark_time before stepping and accumulate the delta so the reported average reflects only the simulated frames. Please apply the same delta-based accumulation to every KPI track_simulate.

         wp.synchronize_device()
-        total_time += example.benchmark_time
+        total_time += example.benchmark_time - t0

with

         for _ in range(self.num_frames):
-            example.step()
+            example.step()
         wp.synchronize_device()
+        total_time += example.benchmark_time - t0

and

-        for _iter in range(self.samples):
+        for _iter in range(self.samples):
             example = Example(...)
 
             wp.synchronize_device()
+            t0 = example.benchmark_time
             for _ in range(self.num_frames):
                 example.step()

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)

asv/benchmarks/simulation/bench_mujoco.py (5)

76-96: Bug: KPI returns only last sample’s time but divides by samples

You must accumulate benchmark_time across samples before averaging.

Apply this diff:

 @skip_benchmark_if(wp.get_cuda_device_count() == 0)
 def track_simulate(self, num_envs):
-        for _iter in range(self.samples):
-            example = Example(
+        total = 0.0
+        for _iter in range(self.samples):
+            example = Example(
                 stage_path=None,
                 robot=self.robot,
                 randomize=True,
                 headless=True,
                 actuation="random",
                 num_envs=num_envs,
                 use_cuda_graph=True,
                 builder=self.builder[num_envs],
             )
 
             for _ in range(self.num_frames):
                 example.step()
         wp.synchronize_device()
 
-        return example.benchmark_time * 1000 / (self.num_frames * example.sim_substeps * num_envs * self.samples)
+        total += example.benchmark_time
+        return total * 1000 / (self.num_frames * example.sim_substeps * num_envs * self.samples)

148-170: Bug: KPI Cartpole averages without summing samples

Accumulate time across samples before dividing.

Apply this diff:

 @skip_benchmark_if(wp.get_cuda_device_count() == 0)
 def track_simulate(self, num_envs):
-        for _iter in range(self.samples):
+        total = 0.0
+        for _iter in range(self.samples):
             example = Example(
                 stage_path=None,
                 robot=self.robot,
                 randomize=True,
                 headless=True,
                 actuation="random",
                 num_envs=num_envs,
                 use_cuda_graph=True,
                 builder=self.builder[num_envs],
             )
 
             wp.synchronize_device()
             for _ in range(self.num_frames):
                 example.step()
             wp.synchronize_device()
 
-        return example.benchmark_time * 1000 / (self.num_frames * example.sim_substeps * num_envs * self.samples)
+        total += example.benchmark_time
+        return total * 1000 / (self.num_frames * example.sim_substeps * num_envs * self.samples)

222-244: Bug: KPI G1 averages without summing samples

Accumulate benchmark_time across samples.

Apply this diff:

 @skip_benchmark_if(wp.get_cuda_device_count() == 0)
 def track_simulate(self, num_envs):
-        for _iter in range(self.samples):
+        total = 0.0
+        for _iter in range(self.samples):
             example = Example(
                 stage_path=None,
                 robot=self.robot,
                 randomize=True,
                 headless=True,
                 actuation="random",
                 num_envs=num_envs,
                 use_cuda_graph=True,
                 builder=self.builder[num_envs],
             )
 
             wp.synchronize_device()
             for _ in range(self.num_frames):
                 example.step()
             wp.synchronize_device()
 
-        return example.benchmark_time * 1000 / (self.num_frames * example.sim_substeps * num_envs * self.samples)
+        total += example.benchmark_time
+        return total * 1000 / (self.num_frames * example.sim_substeps * num_envs * self.samples)

296-318: Bug: KPI H1 averages without summing samples

Sum per‑sample times before averaging.

Apply this diff:

 @skip_benchmark_if(wp.get_cuda_device_count() == 0)
 def track_simulate(self, num_envs):
-        for _iter in range(self.samples):
+        total = 0.0
+        for _iter in range(self.samples):
             example = Example(
                 stage_path=None,
                 robot=self.robot,
                 randomize=True,
                 headless=True,
                 actuation="random",
                 num_envs=num_envs,
                 use_cuda_graph=True,
                 builder=self.builder[num_envs],
             )
 
             wp.synchronize_device()
             for _ in range(self.num_frames):
                 example.step()
             wp.synchronize_device()
 
-        return example.benchmark_time * 1000 / (self.num_frames * example.sim_substeps * num_envs * self.samples)
+        total += example.benchmark_time
+        return total * 1000 / (self.num_frames * example.sim_substeps * num_envs * self.samples)

369-391: Bug: KPI Humanoid averages without summing samples

Accumulate over samples to avoid underreporting.

Apply this diff:

 @skip_benchmark_if(wp.get_cuda_device_count() == 0)
 def track_simulate(self, num_envs):
-        for _iter in range(self.samples):
+        total = 0.0
+        for _iter in range(self.samples):
             example = Example(
                 stage_path=None,
                 robot=self.robot,
                 randomize=True,
                 headless=True,
                 actuation="random",
                 num_envs=num_envs,
                 use_cuda_graph=True,
                 builder=self.builder[num_envs],
             )
 
             wp.synchronize_device()
             for _ in range(self.num_frames):
                 example.step()
             wp.synchronize_device()
 
-        return example.benchmark_time * 1000 / (self.num_frames * example.sim_substeps * num_envs * self.samples)
+        total += example.benchmark_time
+        return total * 1000 / (self.num_frames * example.sim_substeps * num_envs * self.samples)

🧹 Nitpick comments (1)

newton/examples/example_mujoco.py (1)

284-294: Use monotonic high‑resolution timer for benchmarking

time.perf_counter() (or perf_counter_ns) is preferred over time.time() to avoid clock adjustments and improve resolution.

Apply this diff:

-        wp.synchronize_device()
-        start_time = time.time()
+        wp.synchronize_device()
+        start_time = time.perf_counter()
         if self.use_cuda_graph:
             wp.capture_launch(self.graph)
         else:
             self.simulate()
-        wp.synchronize_device()
-        end_time = time.time()
+        wp.synchronize_device()
+        end_time = time.perf_counter()
 
         self.benchmark_time += end_time - start_time

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e209b0c and 6fa1cbe.

📒 Files selected for processing (2)

asv/benchmarks/simulation/bench_mujoco.py (10 hunks)
newton/examples/example_mujoco.py (3 hunks)

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-08-19T12:27:30.629Z

Learnt from: preist-nvidia
PR: newton-physics/newton#579
File: newton/examples/example_mujoco.py:350-354
Timestamp: 2025-08-19T12:27:30.629Z
Learning: In Newton examples, there's a distinction between solver parameters and Example class attributes. The Example class can have its own use_mujoco attribute for controlling example-level behavior (like CUDA graphs, rendering logic), while the solver uses use_mujoco_cpu for backend selection. These serve different purposes and should not be conflated during API renames.

Applied to files:

newton/examples/example_mujoco.py

📚 Learning: 2025-07-14T03:57:29.670Z

Learnt from: Kenny-Vilella
PR: newton-physics/newton#398
File: newton/examples/example_mujoco.py:352-352
Timestamp: 2025-07-14T03:57:29.670Z
Learning: The use_mujoco option in newton/examples/example_mujoco.py is currently unsupported and causes crashes. The code automatically disables this option with a warning message when users attempt to enable it. This is intentionally kept as a placeholder for future implementation.

Applied to files:

newton/examples/example_mujoco.py

🧬 Code graph analysis (2)

asv/benchmarks/simulation/bench_mujoco.py (1)

newton/examples/example_mujoco.py (1)

step (279-294)

newton/examples/example_mujoco.py (1)

newton/examples/robot/example_robot_anymal_c_walk.py (1)

simulate (188-198)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
GitHub Check: Run GPU Benchmarks (Pull Request)

🔇 Additional comments (1)

newton/examples/example_mujoco.py (1)

216-216: LGTM: Dedicated benchmark_time counter added

Public accumulator is appropriate for downstream benchmarks.

asv/benchmarks/simulation/bench_mujoco.py

shi-eric · 2025-09-24T23:34:42Z

asv/benchmarks/simulation/bench_mujoco.py

+    def track_simulate(self):
        for _ in range(self.num_frames):
            self.example.step()
        wp.synchronize_device()

+        return self.example.benchmark_time * 1000 / (self.num_frames * self.example.sim_substeps * self.num_envs)
+
+    track_simulate.unit = "ms/env-step"


This is what I started with, but I don't recommend it since track benchmarks only run once.

Suggestion is to leave the Fast* benchmarks reporting as time_ benchmarks.

Yeap but that means we measure the time taken by everything including all the python code, applying the control and so on....
A possible way will be to reimplement the step function here, we may get the best of both world like this.

That sounds promising!

Updated the FastAnt benchmark.
If you are OK with it will update the other Fast benchmarks.

It is not perfect, but at least the random actuation is graphed so we can get more robust time measurement.

Changed all the Fast benchmarks

…e_time_measurement_benchmark

coderabbitai

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (5)

asv/benchmarks/simulation/bench_mujoco.py (5)

99-117: Average across every sample, not just the final one

After looping over self.samples, we only read the last Example’s benchmark_time while still dividing by self.samples, so earlier samples are dropped from the numerator. The same pattern appears in the other KPI benchmarks.

-        for _iter in range(self.samples):
+        total_time = 0.0
+        sim_substeps = None
+        for _iter in range(self.samples):
             example = Example(
                 stage_path=None,
                 robot=self.robot,
                 randomize=True,
                 headless=True,
                 actuation="random",
                 num_envs=num_envs,
                 use_cuda_graph=True,
                 builder=self.builder[num_envs],
                 ls_iteration=10,
             )
 
             for _ in range(self.num_frames):
                 example.step()
-        wp.synchronize_device()
-
-        return example.benchmark_time * 1000 / (self.num_frames * example.sim_substeps * num_envs * self.samples)
+            total_time += example.benchmark_time
+            sim_substeps = example.sim_substeps
+        wp.synchronize_device()
+
+        return total_time * 1000 / (self.num_frames * sim_substeps * num_envs * self.samples)

167-186: Incorporate every sample when computing KPI Cartpole timing

Same issue here: only the last sample’s runtime survives, yet the denominator still multiplies by self.samples. Please accumulate the per-sample benchmark_time before averaging.

-        for _iter in range(self.samples):
+        total_time = 0.0
+        sim_substeps = None
+        for _iter in range(self.samples):
             example = Example(
                 stage_path=None,
                 robot=self.robot,
                 randomize=True,
                 headless=True,
                 actuation="random",
                 num_envs=num_envs,
                 use_cuda_graph=True,
                 builder=self.builder[num_envs],
                 ls_iteration=3,
             )
 
             wp.synchronize_device()
             for _ in range(self.num_frames):
                 example.step()
             wp.synchronize_device()
-
-        return example.benchmark_time * 1000 / (self.num_frames * example.sim_substeps * num_envs * self.samples)
+            total_time += example.benchmark_time
+            sim_substeps = example.sim_substeps
+
+        return total_time * 1000 / (self.num_frames * sim_substeps * num_envs * self.samples)

238-257: Sum benchmark time across KPI G1 samples

Here too, the numerator drops all but the last sample. Please accumulate before dividing by self.samples.

-        for _iter in range(self.samples):
+        total_time = 0.0
+        sim_substeps = None
+        for _iter in range(self.samples):
             example = Example(
                 stage_path=None,
                 robot=self.robot,
                 randomize=True,
                 headless=True,
                 actuation="random",
                 num_envs=num_envs,
                 use_cuda_graph=True,
                 builder=self.builder[num_envs],
                 ls_iteration=10,
             )
 
             wp.synchronize_device()
             for _ in range(self.num_frames):
                 example.step()
             wp.synchronize_device()
-
-        return example.benchmark_time * 1000 / (self.num_frames * example.sim_substeps * num_envs * self.samples)
+            total_time += example.benchmark_time
+            sim_substeps = example.sim_substeps
+
+        return total_time * 1000 / (self.num_frames * sim_substeps * num_envs * self.samples)

309-328: Include every KPI H1 sample in the reported average

Identical averaging bug: we divide by self.samples but only keep the final iteration’s time. Please accumulate benchmark_time across the loop.

-        for _iter in range(self.samples):
+        total_time = 0.0
+        sim_substeps = None
+        for _iter in range(self.samples):
             example = Example(
                 stage_path=None,
                 robot=self.robot,
                 randomize=True,
                 headless=True,
                 actuation="random",
                 num_envs=num_envs,
                 use_cuda_graph=True,
                 builder=self.builder[num_envs],
                 ls_iteration=10,
             )
 
             wp.synchronize_device()
             for _ in range(self.num_frames):
                 example.step()
             wp.synchronize_device()
-
-        return example.benchmark_time * 1000 / (self.num_frames * example.sim_substeps * num_envs * self.samples)
+            total_time += example.benchmark_time
+            sim_substeps = example.sim_substeps
+
+        return total_time * 1000 / (self.num_frames * sim_substeps * num_envs * self.samples)

379-398: Aggregate KPI Humanoid runtimes across all samples

Once more, only the last sample contributes to the numerator. Please add each sample’s benchmark_time before dividing.

-        for _iter in range(self.samples):
+        total_time = 0.0
+        sim_substeps = None
+        for _iter in range(self.samples):
             example = Example(
                 stage_path=None,
                 robot=self.robot,
                 randomize=True,
                 headless=True,
                 actuation="random",
                 num_envs=num_envs,
                 use_cuda_graph=True,
                 builder=self.builder[num_envs],
                 ls_iteration=15,
             )
 
             wp.synchronize_device()
             for _ in range(self.num_frames):
                 example.step()
             wp.synchronize_device()
-
-        return example.benchmark_time * 1000 / (self.num_frames * example.sim_substeps * num_envs * self.samples)
+            total_time += example.benchmark_time
+            sim_substeps = example.sim_substeps
+
+        return total_time * 1000 / (self.num_frames * sim_substeps * num_envs * self.samples)

🧹 Nitpick comments (1)

newton/examples/example_mujoco.py (1)
306-315: Prefer a monotonic clock for benchmarking accuracy

time.time() can jump when the system clock is adjusted, which skews the accumulated benchmark_time. Switching to time.perf_counter() keeps the measurement monotonic and higher resolution.
-        start_time = time.time()
+        start_time = time.perf_counter()
@@
-        end_time = time.time()
+        end_time = time.perf_counter()

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6fa1cbe and b5a3a3a.

📒 Files selected for processing (2)

asv/benchmarks/simulation/bench_mujoco.py (7 hunks)
newton/examples/example_mujoco.py (3 hunks)

🧰 Additional context used

🧠 Learnings (1)

📚 Learning: 2025-07-14T03:57:29.670Z

Learnt from: Kenny-Vilella
PR: newton-physics/newton#398
File: newton/examples/example_mujoco.py:352-352
Timestamp: 2025-07-14T03:57:29.670Z
Learning: The use_mujoco option in newton/examples/example_mujoco.py is currently unsupported and causes crashes. The code automatically disables this option with a warning message when users attempt to enable it. This is intentionally kept as a placeholder for future implementation.

Applied to files:

newton/examples/example_mujoco.py

🧬 Code graph analysis (2)

newton/examples/example_mujoco.py (2)

newton/examples/robot/example_robot_h1.py (1)

simulate (97-108)

newton/examples/selection/example_selection_materials.py (1)

simulate (167-181)

asv/benchmarks/simulation/bench_mujoco.py (3)

newton/examples/robot/example_robot_anymal_c_walk.py (2)

capture (178-186)

simulate (188-198)

newton/_src/sim/state.py (1)

joint_dof_count (108-112)

newton/examples/example_mujoco.py (1)

simulate (294-298)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Run GPU Benchmarks (Pull Request)
GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)

asv/benchmarks/simulation/bench_mujoco.py

coderabbitai

Actionable comments posted: 5

📜 Review details

Configuration used: Path: .coderabbit.yml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b5a3a3a and 4fd74a3.

📒 Files selected for processing (1)

asv/benchmarks/simulation/bench_mujoco.py (7 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

asv/benchmarks/simulation/bench_mujoco.py (4)

newton/examples/robot/example_robot_anymal_c_walk.py (2)

capture (178-186)

simulate (188-198)

newton/examples/selection/example_selection_cartpole.py (2)

capture (126-131)

simulate (133-141)

newton/_src/sim/state.py (1)

joint_dof_count (108-112)

newton/examples/example_mujoco.py (1)

simulate (294-298)

⏰ Context from checks skipped due to timeout of 900000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Run GPU Unit Tests on AWS EC2 (Pull Request)
GitHub Check: Run GPU Benchmarks (Pull Request)

🔇 Additional comments (1)

asv/benchmarks/simulation/bench_mujoco.py (1)

51-75: Initialize self.graph when graph capture is unavailable.

If mempool capture is off, time_simulate() still calls wp.capture_launch(self.graph), but self.graph is never assigned. That crashes the benchmark instead of falling back to the pre-recorded graph in the Example. Please seed self.graph (e.g., from self.example.graph) before the availability check and only overwrite it when the recapture succeeds.

-        cuda_graph_comp = wp.get_device().is_cuda and wp.is_mempool_enabled(wp.get_device())
-        if not cuda_graph_comp:
-            print("Cannot use graph capture. Graph capture is disabled.")
-        else:
+        cuda_graph_comp = wp.get_device().is_cuda and wp.is_mempool_enabled(wp.get_device())
+        self.graph = self.example.graph
+        if not cuda_graph_comp:
+            print("Cannot use graph capture. Graph capture is disabled.")
+        else:
             state = wp.rand_init(self.example.seed)
             with wp.ScopedCapture() as capture:
                 wp.launch(
                     apply_random_control,
                     dim=(self.example.model.joint_dof_count,),
                     inputs=[state],
                     outputs=[self.example.control.joint_target],
                 )
                 self.example.simulate()
-            self.graph = capture.graph
+            self.graph = capture.graph
+            self.example.graph = self.graph

asv/benchmarks/simulation/bench_mujoco.py

…e_time_measurement_benchmark

shi-eric · 2025-09-26T17:27:12Z

asv/benchmarks/simulation/bench_mujoco.py

+        # Recapture the graph with control application included
+        cuda_graph_comp = wp.get_device().is_cuda and wp.is_mempool_enabled(wp.get_device())
+        if not cuda_graph_comp:
+            print("Cannot use graph capture. Graph capture is disabled.")


Suggest changing this to raising a SkipNotImplemented (from asv_runner.benchmarks.mark import SkipNotImplemented)

shi-eric · 2025-09-26T17:29:10Z

asv/benchmarks/simulation/bench_mujoco.py

+        cuda_graph_comp = wp.get_device().is_cuda and wp.is_mempool_enabled(wp.get_device())
+        if not cuda_graph_comp:
+            print("Cannot use graph capture. Graph capture is disabled.")
+        else:
+            state = wp.rand_init(self.example.seed)
+            with wp.ScopedCapture() as capture:
+                wp.launch(
+                    apply_random_control,
+                    dim=(self.example.model.joint_dof_count,),
+                    inputs=[state],
+                    outputs=[self.example.control.joint_target],
+                )
+                self.example.simulate()
+            self.graph = capture.graph


Might be a good time to define a base class or two for these benchmarks to inherit from to reuse code like Gilles did for benchmarks like https://github.com/NVIDIA/warp/blob/main/asv/benchmarks/fem/integrate.py

Kenny-Vilella added 2 commits September 16, 2025 13:34

Changed simulation benchmark to measure only step time

d810a9d

Merge remote-tracking branch 'upstream/main' into dev/kvilella/improv…

6fa1cbe

…e_time_measurement_benchmark

coderabbitai bot reviewed Sep 24, 2025

View reviewed changes

shi-eric reviewed Sep 24, 2025

View reviewed changes

Kenny-Vilella added 2 commits September 25, 2025 09:12

Merge remote-tracking branch 'upstream/main' into dev/kvilella/improv…

a70fe8b

…e_time_measurement_benchmark

Improve time measurement in the FastAnt benchmark

b5a3a3a

coderabbitai bot reviewed Sep 25, 2025

View reviewed changes

asv/benchmarks/simulation/bench_mujoco.py Show resolved Hide resolved

asv/benchmarks/simulation/bench_mujoco.py Show resolved Hide resolved

Fix control range

4fd74a3

coderabbitai bot reviewed Sep 25, 2025

View reviewed changes

Kenny-Vilella added 3 commits September 26, 2025 09:02

Changed all Fast benchmarks

70a1f0a

Fix time measurement in KPI benchmarks

11e85f4

Merge remote-tracking branch 'upstream/main' into dev/kvilella/improv…

ad9549d

…e_time_measurement_benchmark

shi-eric reviewed Sep 26, 2025

View reviewed changes

Improve time measurement in benchmark #822

Are you sure you want to change the base?

Improve time measurement in benchmark #822

Uh oh!

Conversation

Kenny-Vilella commented Sep 24, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Newton Migration Guide

Before your PR is "Ready for review"

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Sep 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Kenny-Vilella commented Sep 24, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Sep 24, 2025 •

edited

Loading