Skip to content

Conversation

@yousong
Copy link
Contributor

@yousong yousong commented Feb 10, 2025

What does this PR do?

This change introduces a new method offset_keys() returning names of tensors ordered by their offsets in the file. We then change load_file() func to use the new method so that they will try to load data in a sequential way, independent of naming of tensor names.

We put model files on S3-like object storages and use prefetch strategy to maximize throughput. The change can greatly shorten load time for files like revAnimated_v122.safetensors, and has negligible effects for others (e.g. sd_xl_base_1.0.safetensors)

Load time in seconds.

## sd_xl_base_1.0.safetensors np
old 15.66
new 17.65
## sd_xl_base_1.0.safetensors pt
old 32.34
new 31.55
## revAnimated_v122.safetensors np
old 67.13
new 14.15
## revAnimated_v122.safetensors pt
old 43.42
new 28.31
-rwxrwxrwx 1 root root 5.2G Sep 19 02:55 revAnimated_v122.safetensors
-rwxrwxrwx 1 root root 6.5G Dec 31 09:27 sd_xl_base_1.0.safetensors

@yousong
Copy link
Contributor Author

yousong commented Feb 10, 2025

I do not yet understand the stats of torch vs. others: In old version, sdxl is faster with np than pt, but the result is reversed for revAnimated.

@yousong yousong force-pushed the yousong-keys-ordered branch from 539b255 to 6f24f48 Compare February 11, 2025 02:44
@Narsil
Copy link
Contributor

Narsil commented Feb 26, 2025

@yousong do you mind sharing more in detail the setup ?

  • OS (WSL if applicable)
  • Python version
  • Safetensors version
  • Link to the actual file
  • Code for the loading
  • Setup of the disk, especially things related to network mounts.

Sequential reads should only impact old drives (HDD). Afaik, it shouldn't impact SSD at all.
Is it a mounted disk ? Mounted network disk cause all kind of issues, key order being the least of all (The issue is that mmap doesn't play well with with network reads)

I'm not against doing offset based reading but if it has an actual impact, then maybe I should revisit the saving methods to enforce some kind of order.
Note that a lot of code does lazy loading anyway, so performance wouldn't be guaranteed.

@Narsil
Copy link
Contributor

Narsil commented Feb 26, 2025

For simpler implementation, maybe we just add offset_keys() method everywhere, and use that within load_file instead of keys. The benefit is that it wouldn't be a breaking change anymore and reading keys in alphabetical order still has merits (for instance comparing files which might be written in different order).

@yousong
Copy link
Contributor Author

yousong commented Feb 27, 2025

@yousong do you mind sharing more in detail the setup ?

  • OS (WSL if applicable)
  • Python version
  • Safetensors version
  • Link to the actual file
  • Code for the loading
  • Setup of the disk, especially things related to network mounts.

Sequential reads should only impact old drives (HDD). Afaik, it shouldn't impact SSD at all. Is it a mounted disk ? Mounted network disk cause all kind of issues, key order being the least of all (The issue is that mmap doesn't play well with with network reads)

I'm not against doing offset based reading but if it has an actual impact, then maybe I should revisit the saving methods to enforce some kind of order. Note that a lot of code does lazy loading anyway, so performance wouldn't be guaranteed.

OS, python version is not very relevant here. The test stats posted above is on current tip version of this repo.

We store safetensors file on S3-like object store and mount it with FUSE daemons like ossfs. In the end it's all HTTP req/resp with long latency.

ossfs is a fork of s3fs. It can do direct_read which prefetches data into memory directly (vs. local tmp store). Prefetch is needed to maximize (network) throughput and sequential read is a requirement to make prefetch effective and useful in this case.

@yousong
Copy link
Contributor Author

yousong commented Feb 27, 2025

Link to the test files

Simple test code:

import time
import sys
import safetensors
import safetensors.torch

def safe_open(fname, framework='np'):
    f = safetensors.safe_open(fname, framework=framework)

    keys = f.keys()
    #keys_ordered = sorted(keys[:])

    tensors = {}
    for k in keys:
        #print(k)
        tensors[k] = f.get_tensor(k)
    return tensors

def torch_open(fname):
    tensors = safetensors.torch.load_file(fname)
    return tensors

#print(safetensors.__file__)
fname = sys.argv[1]
framework = 'torch'
if len(sys.argv) > 2:
    framework = sys.argv[2]

stime = time.perf_counter()
if framework == 'pt':
    torch_open(fname)
else:
    safe_open(fname)
etime = time.perf_counter()
print('{:.2f}'.format(etime-stime))

@yousong
Copy link
Contributor Author

yousong commented Feb 28, 2025

For simpler implementation, maybe we just add offset_keys() method everywhere, and use that within load_file instead of keys. The benefit is that it wouldn't be a breaking change anymore and reading keys in alphabetical order still has merits (for instance comparing files which might be written in different order).

I prepared a branch with the offset_keys() impl: yousong-offset-keys. Will force push it here in this mr at a later time. Not changing existing interface behavior is definitely a better practice.

@yousong yousong force-pushed the yousong-keys-ordered branch from 6f24f48 to 84a6bd1 Compare February 28, 2025 06:42
@yousong yousong changed the title Let keys() ordered by offset load_file: load tensors ordered by their offsets Feb 28, 2025
@Narsil
Copy link
Contributor

Narsil commented Mar 5, 2025

LGTM. Thanks

@yousong yousong force-pushed the yousong-keys-ordered branch from 84a6bd1 to 03fb4c5 Compare March 5, 2025 11:36
@yousong
Copy link
Contributor Author

yousong commented Mar 5, 2025

v3 <- v2: Fixed a lint issue suggested by clippy

image

@Narsil Narsil merged commit 467a605 into huggingface:main Mar 5, 2025
40 of 42 checks passed
yousong added a commit to yousong/safetensors that referenced this pull request Mar 31, 2025
(cherry picked from commit 467a605)
(cherry picked from commit e01b18d)
yousong added a commit to yousong/safetensors that referenced this pull request Mar 31, 2025
yousong added a commit to yousong/safetensors that referenced this pull request Mar 31, 2025
yousong added a commit to yousong/safetensors that referenced this pull request Mar 31, 2025
@polyrabbit
Copy link

polyrabbit commented May 29, 2025

Hi @Narsil, may I know when do you plan to release a new version including this patch? In our tests, this sequential read optimization does significantly reduce the model loading time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants