Add URL option to fetch packages from GitHub tarballs#968
Add URL option to fetch packages from GitHub tarballs#968rapids-bot[bot] merged 8 commits intorapidsai:mainfrom
Conversation
When use_github_tarball is set to true in versions.json, the package will be fetched using a tarball URL instead of git clone. The tarball URL is automatically constructed from the git_url and git_tag fields. This enables faster downloads since tarballs don't require git history and can be cached more efficiently by CDNs. Changes: - package_details.cmake: Parse use_github_tarball field and construct tarball URL when set - package_info.cmake: Use URL argument instead of GIT_REPOSITORY/GIT_TAG when tag is empty (signaling tarball mode) - versions.json: Enable use_github_tarball for CCCL - Add tests for the new feature
CPM Package Size ComparisonComparison of tarball vs git clone sizes for packages in This data shows CCCL is by far our largest dependency. We could do the same optimization for all packages, though.
Total tarball download: 16.3 MB Notes:
|
robertmaynard
left a comment
There was a problem hiding this comment.
Instead of hijacking GIT_URL and GIT_TAG we will want full support for url and url_hash in versions.json.
Therefore the json parser would either require git_url and git_tag or url and url_hash. This also provides cleaner rules when it comes to overrides ( the override can use a different mode but must provide all arguments ).
Replace the use_github_tarball flag with explicit url and url_hash fields in versions.json. This provides: - Clear separation between git mode (git_url + git_tag) and url mode (url + url_hash) - Integrity verification via url_hash (SHA256) - Support for overrides to switch between modes - Cleaner validation: exactly one mode must be specified The override logic handles mode switching - when an override provides git_url/git_tag, it clears any inherited url/url_hash from the default (and vice versa), allowing overrides to change fetch modes. CCCL now uses url mode with SHA256 hash verification.
|
@robertmaynard I updated the logic, please let me know if this is the design you had in mind. |
robertmaynard
left a comment
There was a problem hiding this comment.
In addition we need:
- Update to the json
docs/that cover these new properties - Tests for
cpm_generate_pinsthat cover it recordsurlandurl_hashcorrectly
Fix pinning to correctly handle URL mode packages by preserving url and url_hash fields instead of extracting (incorrect) git info. Add test to verify URL mode packages generate valid pinned versions.
- Remove double negatives in package_details.cmake override detection - Add error handling for incomplete modes (url without hash, hash without url) - Add matching docs in git_url section about url mode alternative - Simplify cpm_package_info-url-mode.cmake test with for loops - Add tests for mode switching: url-to-git and git-to-url overrides - Add tests for invalid incomplete modes (SHOULD_FAIL tests)
|
/merge |
Summary
urlandurl_hashfields in versions.json as an alternative togit_url/git_tagSize comparison for CCCL
--depth 1)git_shallow: false)The tarball is 15x smaller to download than a full clone and 3x smaller on disk.
Validation
rapids_cpm_package_details_internalwith url/url_hash moderapids_cpm_package_infoto verify URL/URL_HASH vs GIT_REPOSITORY/GIT_TAG arguments