Skip to content

Add URL option to fetch packages from GitHub tarballs#968

Merged
rapids-bot[bot] merged 8 commits intorapidsai:mainfrom
bdice:support-github-tarballs
Feb 17, 2026
Merged

Add URL option to fetch packages from GitHub tarballs#968
rapids-bot[bot] merged 8 commits intorapidsai:mainfrom
bdice:support-github-tarballs

Conversation

@bdice
Copy link
Copy Markdown
Contributor

@bdice bdice commented Jan 21, 2026

Summary

  • Adds support for url and url_hash fields in versions.json as an alternative to git_url/git_tag
  • Packages can now use either git mode (git_url + git_tag) or url mode (url + url_hash)
  • URL mode enables faster downloads via tarballs with SHA256 integrity verification
  • Overrides can switch between modes (must provide all fields for the chosen mode)

Size comparison for CCCL

Method Download Size Disk Size
Tarball 9.4 MB 68 MB
Shallow clone (--depth 1) ~14 MB 83 MB
Full clone (git_shallow: false) ~140 MB 210 MB

The tarball is 15x smaller to download than a full clone and 3x smaller on disk.

Validation

  • Added unit tests for rapids_cpm_package_details_internal with url/url_hash mode
  • Added unit tests for rapids_cpm_package_info to verify URL/URL_HASH vs GIT_REPOSITORY/GIT_TAG arguments
  • All 6 new url-mode tests pass
  • All 27 existing CCCL tests pass (including override tests that switch from url to git mode)

When use_github_tarball is set to true in versions.json, the package
will be fetched using a tarball URL instead of git clone. The tarball
URL is automatically constructed from the git_url and git_tag fields.

This enables faster downloads since tarballs don't require git history
and can be cached more efficiently by CDNs.

Changes:
- package_details.cmake: Parse use_github_tarball field and construct
  tarball URL when set
- package_info.cmake: Use URL argument instead of GIT_REPOSITORY/GIT_TAG
  when tag is empty (signaling tarball mode)
- versions.json: Enable use_github_tarball for CCCL
- Add tests for the new feature
@bdice bdice requested a review from a team as a code owner January 21, 2026 18:11
@bdice
Copy link
Copy Markdown
Contributor Author

bdice commented Jan 21, 2026

CPM Package Size Comparison

Comparison of tarball vs git clone sizes for packages in versions.json.

This data shows CCCL is by far our largest dependency. We could do the same optimization for all packages, though.

Package Tarball Download Tarball Extracted Git Clone (full) Git .git Dir Download Savings
benchmark 200.0 KB 878.4 KB 4.7 MB 3.7 MB 95%
bs_thread_pool 80.7 KB 348.1 KB 1.3 MB 604.3 KB 87%
CCCL 9.4 MB 44.4 MB 185.8 MB 139.5 MB 93%
cuco 1.0 MB 4.2 MB 12.3 MB 8.1 MB 87%
rapids_logger 30.4 KB 109.9 KB 295.2 KB 195.9 KB 85%
GTest 856.9 KB 3.9 MB 18.6 MB 14.7 MB 94%
nvbench 513.3 KB 2.4 MB 4.4 MB 1.9 MB 74%
nvtx3 3.8 MB 11.9 MB 19.7 MB 6.3 MB 39%
rmm 395.9 KB 1.4 MB 9.0 MB 7.4 MB 95%

Total tarball download: 16.3 MB
Total git download (approx): 182.3 MB
Total savings: 91%

Notes:

  • Git clone sizes use git_shallow: false (full history) as configured in versions.json
  • 'Git .git Dir' represents approximate download size for git clone
  • nvcomp excluded (uses proprietary binary distribution)

@bdice bdice self-assigned this Jan 21, 2026
@bdice bdice added feature request New feature or request non-breaking Introduces a non-breaking change labels Jan 21, 2026
Copy link
Copy Markdown
Contributor

@robertmaynard robertmaynard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of hijacking GIT_URL and GIT_TAG we will want full support for url and url_hash in versions.json.

Therefore the json parser would either require git_url and git_tag or url and url_hash. This also provides cleaner rules when it comes to overrides ( the override can use a different mode but must provide all arguments ).

bdice added 2 commits January 23, 2026 14:37
Replace the use_github_tarball flag with explicit url and url_hash
fields in versions.json. This provides:

- Clear separation between git mode (git_url + git_tag) and url mode
  (url + url_hash)
- Integrity verification via url_hash (SHA256)
- Support for overrides to switch between modes
- Cleaner validation: exactly one mode must be specified

The override logic handles mode switching - when an override provides
git_url/git_tag, it clears any inherited url/url_hash from the default
(and vice versa), allowing overrides to change fetch modes.

CCCL now uses url mode with SHA256 hash verification.
@bdice bdice changed the title Add use_github_tarball option to fetch packages from GitHub tarballs Add URL option to fetch packages from GitHub tarballs Jan 23, 2026
@bdice
Copy link
Copy Markdown
Contributor Author

bdice commented Jan 23, 2026

@robertmaynard I updated the logic, please let me know if this is the design you had in mind.

@bdice bdice requested a review from robertmaynard January 23, 2026 20:53
Copy link
Copy Markdown
Contributor

@robertmaynard robertmaynard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition we need:

  1. Update to the json docs/ that cover these new properties
  2. Tests for cpm_generate_pins that cover it records url and url_hash correctly

bdice added 2 commits January 26, 2026 13:22
Fix pinning to correctly handle URL mode packages by preserving url and
url_hash fields instead of extracting (incorrect) git info. Add test to
verify URL mode packages generate valid pinned versions.
- Remove double negatives in package_details.cmake override detection
- Add error handling for incomplete modes (url without hash, hash without url)
- Add matching docs in git_url section about url mode alternative
- Simplify cpm_package_info-url-mode.cmake test with for loops
- Add tests for mode switching: url-to-git and git-to-url overrides
- Add tests for invalid incomplete modes (SHOULD_FAIL tests)
@bdice bdice requested a review from robertmaynard February 10, 2026 21:31
@bdice
Copy link
Copy Markdown
Contributor Author

bdice commented Feb 17, 2026

/merge

@rapids-bot rapids-bot bot merged commit a8e2496 into rapidsai:main Feb 17, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

feature request New feature or request non-breaking Introduces a non-breaking change

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants