Match Python cache fallback when symlinks are unavailable#172
Open
i386 wants to merge 1 commit into
Open
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Mirror Python
huggingface_hubcache behavior for platforms where snapshot symlinks cannot be created, especially Windows without Developer Mode/admin symlink privileges.blobs/<etag>andsnapshots/<commit>/<filename>as full-size files.Python reference
This follows the Python
huggingface_hubcache fallback behavior:_create_symlinkdocuments the degraded behavior: newly downloaded blobs are moved into the destination, while existing blobs are copied to avoid breaking other snapshots: https://github.com/huggingface/huggingface_hub/blob/07a6ccc819fd531c3fdb984954133c27271a412a/src/huggingface_hub/file_download.py#L608-L634new_blob=Trueand copies otherwise: https://github.com/huggingface/huggingface_hub/blob/07a6ccc819fd531c3fdb984954133c27271a412a/src/huggingface_hub/file_download.py#L693-L699Why
The Python implementation avoids keeping both the extensionless blob and the named snapshot file for a fresh download when symlinks are unavailable. The current Rust implementation always copies on Windows, which can double disk usage for large model files.
Tests
Added focused regression coverage for the degraded no-symlink path:
pointer_fallback_moves_new_blobverifies fresh blobs are moved to the snapshot path and the blob path is removed.pointer_fallback_copies_existing_blobverifies existing blobs are copied so other snapshot references remain valid.cache_download_reuses_regular_snapshot_when_blob_is_absentverifies a cache already in degraded no-symlink mode reuses the regular snapshot file instead of downloading again.Validation
cargo fmt --checkwith nightly rustfmtcargo test -p hf-hub --lib(167 passed)cargo clippy -p hf-hub --all-features -- -D warningsgit diff --check