You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Oct 7, 2024. It is now read-only.
Calculate best episode using full episode return in cartpole_swingup.
Return is non-monotonic in this problem; currently this cherry-picks the peak of return during the episode.
Also applied same change to base cartpole for consistency and efficiency, but cartpole return is monotonic (so not a bug).
PiperOrigin-RevId: 308033113
Change-Id: I9add00d41f8e87d518e00c3fef9cd9ad7ad18d0b
0 commit comments