fix: follow relative redirects on metadata HEAD requests (#163)#176
Draft
assafvayner wants to merge 1 commit into
Draft
fix: follow relative redirects on metadata HEAD requests (#163)#176assafvayner wants to merge 1 commit into
assafvayner wants to merge 1 commit into
Conversation
When resolving file metadata via HEAD on a repo's resolve URL, the client used a no-redirect HTTP client and ignored 3xx responses. Mirror endpoints (e.g. hf-mirror.com) and renamed repos answer with a redirect that does not itself carry the ETag/X-Repo-Commit headers, so extract_etag returned None and downloads failed with "missing ETag header". Mirror Python's _httpx_follow_relative_redirects_with_backoff: add HFClient::head_following_relative_redirects, which follows only *relative* redirects (renamed repos, mirrors) while leaving absolute CDN redirects unfollowed so X-Linked-Etag / X-Repo-Commit stay readable off the 302. Use it for single-file downloads, snapshot downloads, and get_file_metadata. The latter previously followed *all* redirects, losing X-Repo-Commit for LFS/Xet files served via a CDN 302 on the plain hub endpoint. Closes #163
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #163 — downloads via a mirror endpoint (e.g.
hf-mirror.com) failed withmissing ETag header.When resolving file metadata with a
HEADon a repo'sresolveURL, the client usedno_redirect_client()and ignored3xxresponses. The plain hub endpoint serves the metadata headers (X-Linked-Etag,X-Repo-Commit,X-Linked-Size) on the302redirect to the CDN, so not following it works there. But mirror endpoints — and renamed repos — answer with a redirect that does not carry those headers; the metadata only appears on the final response after following.extract_etagthen returnedNone, surfacing asmissing ETag header.This mirrors how the Python client behaves.
huggingface_hub'sget_hf_file_metadatauses_httpx_follow_relative_redirects_with_backoff, which follows only relative redirects (aLocationwithout a host — renamed repos / mirror-internal hops) and leaves absolute redirects (the CDN302) unfollowed so it can read the metadata headers off them.Changes
HFClient::head_following_relative_redirects(client.rs) — aHEADthat follows only relative redirects, retrying each hop. Absolute / cross-hostLocations (including protocol-relative//host/…) are returned unfollowed; the path-swap logic matches Python'surlparse(url)._replace(path=urlparse(location).path). Bounded byMAX_RELATIVE_REDIRECTS.repository/download.rs— single-file and snapshot downloadHEADs now use the helper.repository/listing.rs::get_file_metadata— now uses the helper too. Previously it usedhttp_client(), which followed all redirects and chased the CDN302, droppingX-Repo-Commitand breaking metadata for LFS/Xet files on the plain hub endpoint. A redirect response is now expected and not treated as an error bycheck_response.extract_location(files.rs) — resolves the download location asLocationheader → request URL (Python parity), used byget_file_metadataand the snapshot path.Behavior matrix
HEADresult302→ absolute CDN, headers on302X-Linked-Etag/X-Repo-Commitoff302(unchanged)3xx→ relative path3xx→ relative, no ETag on it304 Not Modified/404LocationTesting
client.rs, local mock server):follows_relative_redirect_to_reach_etag(relative redirect chased to the200carrying the ETag) anddoes_not_follow_absolute_redirect(absolute CDN302left unfollowed,X-Linked-Etagstays readable). Written test-first — the first failed (got302, want200) before the fix.test_get_file_metadata_lfs_file— resolves metadata for an LFS/Xet-backed file, whoseHEADreturns a302to an absolute CDN URL. This would have failed before the fix (the oldhttp_client()chased the CDN and lostX-Repo-Commit).cargo test -p hf-hub --all-features(187 passing),cargo test -p integration-testsdownload/cache/xet/metadata suites all green,cargo clippy -p hf-hub --all-features -D warningsclean,cargo +nightly fmt.Note on reproducing the mirror case
The fix targets exact Python parity (follow relative redirects). From some network vantage points
hf-mirror.comreturns an absolute308straight tohuggingface.co, which neither Python nor this change follows; affected users hit an edge that serves a relative redirect, which is the path this fix handles. The deterministic mock-server unit tests cover both branches rather than depending on mirror geography.