Summary
- A rolling leaderboard to support continuous submissions.
- Rule updates aimed at cost efficiency, e.g., removing held-out workloads, limiting to 3 repetition studies, and adjusting workload runtime budgets based on competition results.
- JIT-sharding for JAX workloads.
- Important bug fixes (e.g., batch norm behavior) and a more flexible API (e.g., a new prepare_for_eval function).
What's Changed
Improved and streamlined version of the benchmark, which includes important bug fixes, API improvements, and benchmark protocol changes following the lessons learned from the first competition.
Added
- [Code, Rules] Updated API to allow for
prepare_for_eval
function (PR/Issue). - [Docs] Document default dropout values for each workload (PR/Issue).
- [Docs] Unified versioning policy section (PR).
- [Code] Add the ability to change dropout values during training (PR/Issue).
Changed/Removed
- [Code, Docs] Rename package to
algoperf
(PR). - [Code, Docs] Switch to
ruff
for linting and formatting(PR). - [Code, Rules] Pass
train_state
toupdate_params
function (PR/Issue). - [Code, Rules] Reduced number of studies from 5 to 3 (PR). See also Section 5.1 in our results paper.
- [Code, Rules] Remove held-out workloads from the benchmark (PR). See also Section 5.1 in our results paper.
- [Code] Remove sacrebleu dependency (PR).
- [Code] Switch to
pyproject.toml
for package management (PR). - [Code] Update Python version to 3.11 and dependencies accordingly (PR/Issue).
- [Rules] Modify the runtime budgets and step hints for each workload (PR/Issue). See also Section 5.1 in our results paper.
- [Code] Automatically determine the package version via the latest GitHub tag (PR).
- [Code, Docs] Move all algorithms into a dedicated
algorithms
directory (PR). - [Code] Migrate from
pmap
tojit
in JAX for better performance and scalability (PR).
Fixed
- [Code] Batch norm bug (PR/PR/Issue).
- [Code] Fix bug of potentially giving a free evaluation to a submission that goes out of
max_runtime
(PR/Issue). - [Code] Fix that models in the self-tuning ruleset will always be initialized with default dropout (PR/PR/Issue).
Full Changelog: algoperf-benchmark-0.5.0...algoperf-benchmark-0.6.0