Skip to content

fix: follow relative redirects on metadata HEAD requests (#163)#176

Draft
assafvayner wants to merge 1 commit into
mainfrom
assaf/fix-163-mirror-etag-redirects
Draft

fix: follow relative redirects on metadata HEAD requests (#163)#176
assafvayner wants to merge 1 commit into
mainfrom
assaf/fix-163-mirror-etag-redirects

Conversation

@assafvayner

Copy link
Copy Markdown
Contributor

Summary

Fixes #163 — downloads via a mirror endpoint (e.g. hf-mirror.com) failed with missing ETag header.

When resolving file metadata with a HEAD on a repo's resolve URL, the client used no_redirect_client() and ignored 3xx responses. The plain hub endpoint serves the metadata headers (X-Linked-Etag, X-Repo-Commit, X-Linked-Size) on the 302 redirect to the CDN, so not following it works there. But mirror endpoints — and renamed repos — answer with a redirect that does not carry those headers; the metadata only appears on the final response after following. extract_etag then returned None, surfacing as missing ETag header.

This mirrors how the Python client behaves. huggingface_hub's get_hf_file_metadata uses _httpx_follow_relative_redirects_with_backoff, which follows only relative redirects (a Location without a host — renamed repos / mirror-internal hops) and leaves absolute redirects (the CDN 302) unfollowed so it can read the metadata headers off them.

Changes

  • HFClient::head_following_relative_redirects (client.rs) — a HEAD that follows only relative redirects, retrying each hop. Absolute / cross-host Locations (including protocol-relative //host/…) are returned unfollowed; the path-swap logic matches Python's urlparse(url)._replace(path=urlparse(location).path). Bounded by MAX_RELATIVE_REDIRECTS.
  • repository/download.rs — single-file and snapshot download HEADs now use the helper.
  • repository/listing.rs::get_file_metadata — now uses the helper too. Previously it used http_client(), which followed all redirects and chased the CDN 302, dropping X-Repo-Commit and breaking metadata for LFS/Xet files on the plain hub endpoint. A redirect response is now expected and not treated as an error by check_response.
  • extract_location (files.rs) — resolves the download location as Location header → request URL (Python parity), used by get_file_metadata and the snapshot path.

Behavior matrix

Endpoint / case HEAD result Followed? Outcome
Plain hub, LFS file 302 → absolute CDN, headers on 302 no read X-Linked-Etag / X-Repo-Commit off 302 (unchanged)
Renamed repo 3xx → relative path yes reaches final response carrying metadata
Mirror, metadata redirect (relative) 3xx → relative, no ETag on it yes reaches response carrying ETag — fixes #163
304 Not Modified / 404 not a usable Location no returned as-is (cache / not-found logic unchanged)

Testing

  • Unit tests (client.rs, local mock server): follows_relative_redirect_to_reach_etag (relative redirect chased to the 200 carrying the ETag) and does_not_follow_absolute_redirect (absolute CDN 302 left unfollowed, X-Linked-Etag stays readable). Written test-first — the first failed (got 302, want 200) before the fix.
  • Integration test test_get_file_metadata_lfs_file — resolves metadata for an LFS/Xet-backed file, whose HEAD returns a 302 to an absolute CDN URL. This would have failed before the fix (the old http_client() chased the CDN and lost X-Repo-Commit).
  • cargo test -p hf-hub --all-features (187 passing), cargo test -p integration-tests download/cache/xet/metadata suites all green, cargo clippy -p hf-hub --all-features -D warnings clean, cargo +nightly fmt.

Note on reproducing the mirror case

The fix targets exact Python parity (follow relative redirects). From some network vantage points hf-mirror.com returns an absolute 308 straight to huggingface.co, which neither Python nor this change follows; affected users hit an edge that serves a relative redirect, which is the path this fix handles. The deterministic mock-server unit tests cover both branches rather than depending on mirror geography.

When resolving file metadata via HEAD on a repo's resolve URL, the client
used a no-redirect HTTP client and ignored 3xx responses. Mirror endpoints
(e.g. hf-mirror.com) and renamed repos answer with a redirect that does not
itself carry the ETag/X-Repo-Commit headers, so extract_etag returned None
and downloads failed with "missing ETag header".

Mirror Python's _httpx_follow_relative_redirects_with_backoff: add
HFClient::head_following_relative_redirects, which follows only *relative*
redirects (renamed repos, mirrors) while leaving absolute CDN redirects
unfollowed so X-Linked-Etag / X-Repo-Commit stay readable off the 302.

Use it for single-file downloads, snapshot downloads, and get_file_metadata.
The latter previously followed *all* redirects, losing X-Repo-Commit for
LFS/Xet files served via a CDN 302 on the plain hub endpoint.

Closes #163
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

missing ETag header while use mirror endpoint

1 participant