load_file: load tensors ordered by their offsets #571

yousong · 2025-02-10T09:26:47Z

What does this PR do?

This change introduces a new method offset_keys() returning names of tensors ordered by their offsets in the file. We then change load_file() func to use the new method so that they will try to load data in a sequential way, independent of naming of tensor names.

We put model files on S3-like object storages and use prefetch strategy to maximize throughput. The change can greatly shorten load time for files like revAnimated_v122.safetensors, and has negligible effects for others (e.g. sd_xl_base_1.0.safetensors)

Load time in seconds.

## sd_xl_base_1.0.safetensors np
old 15.66
new 17.65
## sd_xl_base_1.0.safetensors pt
old 32.34
new 31.55
## revAnimated_v122.safetensors np
old 67.13
new 14.15
## revAnimated_v122.safetensors pt
old 43.42
new 28.31

-rwxrwxrwx 1 root root 5.2G Sep 19 02:55 revAnimated_v122.safetensors
-rwxrwxrwx 1 root root 6.5G Dec 31 09:27 sd_xl_base_1.0.safetensors

yousong · 2025-02-10T09:35:59Z

I do not yet understand the stats of torch vs. others: In old version, sdxl is faster with np than pt, but the result is reversed for revAnimated.

Narsil · 2025-02-26T09:25:42Z

@yousong do you mind sharing more in detail the setup ?

OS (WSL if applicable)
Python version
Safetensors version
Link to the actual file
Code for the loading
Setup of the disk, especially things related to network mounts.

Sequential reads should only impact old drives (HDD). Afaik, it shouldn't impact SSD at all.
Is it a mounted disk ? Mounted network disk cause all kind of issues, key order being the least of all (The issue is that mmap doesn't play well with with network reads)

I'm not against doing offset based reading but if it has an actual impact, then maybe I should revisit the saving methods to enforce some kind of order.
Note that a lot of code does lazy loading anyway, so performance wouldn't be guaranteed.

Narsil · 2025-02-26T11:20:33Z

For simpler implementation, maybe we just add offset_keys() method everywhere, and use that within load_file instead of keys. The benefit is that it wouldn't be a breaking change anymore and reading keys in alphabetical order still has merits (for instance comparing files which might be written in different order).

yousong · 2025-02-27T07:29:41Z

@yousong do you mind sharing more in detail the setup ?

OS (WSL if applicable)

Python version

Safetensors version

Link to the actual file

Code for the loading

Setup of the disk, especially things related to network mounts.

Sequential reads should only impact old drives (HDD). Afaik, it shouldn't impact SSD at all. Is it a mounted disk ? Mounted network disk cause all kind of issues, key order being the least of all (The issue is that mmap doesn't play well with with network reads)

I'm not against doing offset based reading but if it has an actual impact, then maybe I should revisit the saving methods to enforce some kind of order. Note that a lot of code does lazy loading anyway, so performance wouldn't be guaranteed.

OS, python version is not very relevant here. The test stats posted above is on current tip version of this repo.

We store safetensors file on S3-like object store and mount it with FUSE daemons like ossfs. In the end it's all HTTP req/resp with long latency.

ossfs is a fork of s3fs. It can do direct_read which prefetches data into memory directly (vs. local tmp store). Prefetch is needed to maximize (network) throughput and sequential read is a requirement to make prefetch effective and useful in this case.

yousong · 2025-02-27T10:50:46Z

Link to the test files

Simple test code:

import time
import sys
import safetensors
import safetensors.torch

def safe_open(fname, framework='np'):
    f = safetensors.safe_open(fname, framework=framework)

    keys = f.keys()
    #keys_ordered = sorted(keys[:])

    tensors = {}
    for k in keys:
        #print(k)
        tensors[k] = f.get_tensor(k)
    return tensors

def torch_open(fname):
    tensors = safetensors.torch.load_file(fname)
    return tensors

#print(safetensors.__file__)
fname = sys.argv[1]
framework = 'torch'
if len(sys.argv) > 2:
    framework = sys.argv[2]

stime = time.perf_counter()
if framework == 'pt':
    torch_open(fname)
else:
    safe_open(fname)
etime = time.perf_counter()
print('{:.2f}'.format(etime-stime))

yousong · 2025-02-28T02:46:20Z

For simpler implementation, maybe we just add offset_keys() method everywhere, and use that within load_file instead of keys. The benefit is that it wouldn't be a breaking change anymore and reading keys in alphabetical order still has merits (for instance comparing files which might be written in different order).

I prepared a branch with the offset_keys() impl: yousong-offset-keys. Will force push it here in this mr at a later time. Not changing existing interface behavior is definitely a better practice.

Narsil · 2025-03-05T10:58:55Z

LGTM. Thanks

yousong · 2025-03-05T11:59:18Z

v3 <- v2: Fixed a lint issue suggested by clippy

(cherry picked from commit 467a605) (cherry picked from commit e01b18d)

(cherry picked from commit 467a605)

polyrabbit · 2025-05-29T14:33:47Z

Hi @Narsil, may I know when do you plan to release a new version including this patch? In our tests, this sequential read optimization does significantly reduce the model loading time.

yousong force-pushed the yousong-keys-ordered branch from 539b255 to 6f24f48 Compare February 11, 2025 02:44

Narsil mentioned this pull request Feb 26, 2025

Loading bigger models is very slow using AutoModelForCausalLM.from_pretrained #562

Open

4 tasks

yousong force-pushed the yousong-keys-ordered branch from 6f24f48 to 84a6bd1 Compare February 28, 2025 06:42

yousong changed the title ~~Let keys() ordered by offset~~ load_file: load tensors ordered by their offsets Feb 28, 2025

load_file: load tensors ordered by their offsets

03fb4c5

yousong force-pushed the yousong-keys-ordered branch from 84a6bd1 to 03fb4c5 Compare March 5, 2025 11:36

Narsil merged commit 467a605 into huggingface:main Mar 5, 2025
40 of 42 checks passed

yousong added a commit to yousong/safetensors that referenced this pull request Mar 31, 2025

load_file: load tensors ordered by their offsets (huggingface#571)

29891e5

(cherry picked from commit 467a605) (cherry picked from commit e01b18d)

yousong added a commit to yousong/safetensors that referenced this pull request Mar 31, 2025

load_file: load tensors ordered by their offsets (huggingface#571)

e01b18d

(cherry picked from commit 467a605)

yousong added a commit to yousong/safetensors that referenced this pull request Mar 31, 2025

load_file: load tensors ordered by their offsets (huggingface#571)

4447f54

(cherry picked from commit 467a605)

yousong added a commit to yousong/safetensors that referenced this pull request Mar 31, 2025

load_file: load tensors ordered by their offsets (huggingface#571)

95f49a9

(cherry picked from commit 467a605)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

load_file: load tensors ordered by their offsets #571

load_file: load tensors ordered by their offsets #571

Uh oh!

yousong commented Feb 10, 2025 •

edited

Loading

Uh oh!

yousong commented Feb 10, 2025

Uh oh!

Narsil commented Feb 26, 2025

Uh oh!

Narsil commented Feb 26, 2025

Uh oh!

yousong commented Feb 27, 2025

Uh oh!

yousong commented Feb 27, 2025

Uh oh!

yousong commented Feb 28, 2025

Uh oh!

Narsil commented Mar 5, 2025

Uh oh!

yousong commented Mar 5, 2025

Uh oh!

Uh oh!

polyrabbit commented May 29, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

load_file: load tensors ordered by their offsets #571

load_file: load tensors ordered by their offsets #571

Uh oh!

Conversation

yousong commented Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Uh oh!

yousong commented Feb 10, 2025

Uh oh!

Narsil commented Feb 26, 2025

Uh oh!

Narsil commented Feb 26, 2025

Uh oh!

yousong commented Feb 27, 2025

Uh oh!

yousong commented Feb 27, 2025

Uh oh!

yousong commented Feb 28, 2025

Uh oh!

Narsil commented Mar 5, 2025

Uh oh!

yousong commented Mar 5, 2025

Uh oh!

Uh oh!

polyrabbit commented May 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yousong commented Feb 10, 2025 •

edited

Loading

polyrabbit commented May 29, 2025 •

edited

Loading