Eliminate HEAD requests during downloads, especially for faster transfers of small files #363
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR optimizes downloads through the TransferManager by removing the upfront HEAD request. Previously, every download issued a HEAD request to determine object size before starting the GET request. This
change eliminates that extra round-trip by extracting metadata from the first GET response instead.
For small files, the download time is dominated by the request latency, so eliminating one of the two requests results in a ~50% download time reduction. For large files, the effect is less noticeable because the download time is dominated by the the transfer time and because there are multiple chunks to download. In both cases, we save the cost of the HEAD request.
What Changed
• Removed HEAD requests: Downloads now start immediately with a ranged GET request for the first chunk
• Dynamic size detection: Extract object size and ETag from the first GET response headers (ContentRange or ContentLength)
• Dynamic chunk scheduling: After the first chunk completes, schedule additional chunks only if the object is larger than the chunk size
• Simplified code flow: Consolidated download logic into a single path instead of branching on size upfront
Testing
Unit, functional, and integ tests pass. I also added a new script to benchmark downloading many small files. For downloading 1000 1kB files on my laptop, the total duration dropped 41% from 15.0s to 8.9.
Backward Compatibility
External API unchanged. All download methods have the same signatures.
Flow Diagrams
Before (with HEAD request)
flowchart TD A[HEAD Request] --> C{Size < 8MB?} C -->|Yes| D[GET Request] C -->|No| E[Multiple GET Requests] D --> F[Complete] E --> FAfter (no HEAD request)
flowchart TD A[GET First Chunk] --> B{Size < 8MB?} B -->|Yes| C[Complete] B -->|No| D[GET Remaining Chunks] D --> CBy submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.