This repository follows a disciplined, maintainable, and CPU-first workflow. Agents must keep changes small, readable, and well tested, while aligning with Dribia's coding standards.
- CPU-only: all training and scripts must default to CPU execution. Do not assume GPU availability.
- Package management: use uv / uvx for environments, dependencies, and tool execution (e.g.,
uv venv,uv run,uvx ...). Do not useuv pip. - Testing: use pytest for unit tests (
make test,make test-unit). - Linting/formatting: use ruff (lint + format) and tombi for TOML formatting (
make lint,make format). - Type checking: use ty (
make lint). - Docs: MkDocs with Material theme (
mkdocs.yml,docs/).
- Write Pythonic, maintainable code emphasizing readability, composability, and clarity.
- Follow PEP 8 naming conventions and use explicit, descriptive names.
- Add module header docstrings and Google-style docstrings for functions and classes.
- Prefer vectorized pandas operations; avoid
applywhen possible. - Avoid
inplace=Truemutations; return new objects instead. - Keep code in English for identifiers, comments, and documentation.
- Keep tests organized: one test file per module, named
tests/test_<module>.py. - When testing main library code, prefer property-based testing with Hypothesis for invariant-driven behavior such as tensor shapes, default-resolution rules, configuration contracts, and deterministic transformations. Keep example-based tests for fixed API facts, imports/exports, object identity, and heavy integration behavior where property-based testing would add noise more than value.
- Before implementing custom logic, inspect the upstream library source and prefer built-in APIs (for example
cross_validationor native baseline models in NeuralForecast / StatsForecast) over handwritten benchmark orchestration whenever they provide the needed behavior.
- TDD red/green only:
- Write unit tests first, mocking external dependencies as needed.
- Run tests to confirm they fail (red).
- Implement the code to make tests pass (green).
- Keep unit and integration tests separate:
- Treat tests marked
integrationas heavy, optional validation. - The default quality gate is unit-only: use
make testormake test-unit. - Run integration tests with
make test-integrationonly when they are justified by the change. - In every substantial update, explicitly decide whether integration tests are worth running and state that decision in your summary.
- Treat tests marked
- After each change:
- Run the standard fast quality gates:
make format,make lint,make test. - Run
uv run --frozen python scripts/check_forecast_mae.pyonly when substantive model changes are performed, and display the MAE comparison table. - Run
make test-integrationonly for changes that affect end-to-end training, NeuralForecast integration, CLI workflows, or other cross-module behavior that unit tests cannot cover well.
- Run the standard fast quality gates:
- When everything passes:
- Update README and relevant documentation using the
write-docsskill. - Ensure the README notes that changes are agent-made when relevant.
- Update README and relevant documentation using the
- Use feature branches for new work:
- Whenever a new feature is requested, create a dedicated branch before making changes.
- Use a clear, descriptive branch name tied to the feature.
- Keep feature work isolated on that branch until the feature is considered complete.
- Commit after each relevant completed change:
- Use the
commitskill and follow Conventional Commits. - Do not leave substantial finished code or documentation changes uncommitted.
- After a relevant change clears its intended quality gates, create a commit before moving on to the next substantial task.
- Use the
- Close feature branches deliberately:
- When a feature is considered complete, explicitly prompt the user for validation before closing the work.
- After validation, squash the branch commits into a clean final commit.
- Provide a concise summary of what is being merged.
- Merge the squashed result and delete the feature branch once closure is complete.
- No cheating or shortcuts when training or evaluating models.
- Do not peek at future data, leak labels, or use the target horizon to initialize model states.
- Do not seed models with ground-truth values beyond the input window.
- All baselines and models must use the same train/validation/test splits.
When creating or modifying CLIs, use the create-cli skill and ensure the interface follows consistent CLI UX standards (clear help text, sensible defaults, subcommands where appropriate).
- CLI rendering: all CLI output must use Rich.
- Scripts/entrypoints: any script must be implemented with Typer.
- Prefer Matplotlib for charts, diagnostics, and generated reports across the repository.
- For HTML outputs, embed Matplotlib figures as static images rather than relying on browser-side chart runtimes.
- Keep figures readable, reproducible, scriptable, and suitable for docs, reports, and offline review.
- Avoid introducing Altair for new visualization work in this repository.
- When updating existing plots, prefer migrating them toward Matplotlib if the change is substantial or the plot is user-facing.
- Keep README and docs accurate and concise.
- Ensure any agent-driven change is reflected in documentation where it matters.
- After each run, refresh affected docs with MkDocs-friendly formatting (headings, lists, code blocks) and follow the repo's MkDocs conventions.
- The repository maintains two long-lived branches with different purposes:
benchmark: full benchmarking, tuning, experiment scripts, benchmark-oriented tests, and workflow docs.main: release-oriented library branch with publishable package code, curated docs, and published benchmark result pages, but without benchmark-generation scripts or related scaffolding.
- Agents must treat
benchmarkas the source branch for benchmark workflow development. - When preparing
main, agents should curate frombenchmarkrather than reimplementing content independently. - On
main, keep benchmark result reports and images when they support library documentation, but remove benchmark orchestration scripts, tuning scripts, and benchmark-only test scaffolding. - On
main, README and docs must clearly state that full benchmarking and tuning workflows live on thebenchmarkbranch. - When curating
main, preserve the library package, core library tests, and user-facing docs first; remove only workflow machinery that is not needed for the release-oriented branch.
- Treat
benchmarkas the canonical source branch for any release preparation.mainmust not be ahead ofbenchmarkin shared library functionality, shared workflows, or shared metadata. - Do not implement release changes independently on
mainwhen the same change belongs onbenchmarkfirst. - Classify branch differences into three buckets before curating
main:- shared files: files that should stay aligned across both branches, such as package code, shared tests, shared workflow files, and common metadata;
- benchmark-only files: scripts, tuning artifacts, benchmark-only tests, and workflow docs that should remain only on
benchmark; - main-curated files: README and user-facing docs that may be rewritten on
mainto explain that full workflows live onbenchmark.
- When generating a release from
benchmarktomain, follow this order:- update and validate the intended changes on
benchmarkfirst; - diff
benchmarkagainstmainand confirm that every shared-file difference is intentional; - bring shared files on
mainback in sync withbenchmarkbefore curating branch-specific removals; - remove benchmark-only files from the
maincandidate rather than reimplementing them differently; - keep curated benchmark result pages and images on
mainonly when they are copied frombenchmarkintentionally, not regenerated independently onmain; - update
mainREADME and docs so they explicitly point users tobenchmarkfor reproducible workflows.
- update and validate the intended changes on
- Before finalizing release work, verify that:
mainis a curated subset ofbenchmarkplus intentional main-only documentation wording;mainis not ahead ofbenchmarkin shared code or shared repository configuration;- the remaining branch diff is explained entirely by benchmark-only removals or approved main-doc curation.
- Always generate logs for script and command executions.
- Maintain a rolling log with a maximum size of 5 MB (rotate or truncate as needed).
- Follow Dribia logging practices: use the standard
loggingmodule (no bareprintfor operational logs), apply consistent log levels, and prefer structured, readable messages that satisfy Ruff logging rules. - If a script must emit output for human consumption, keep it concise and mirror it to the log file.