Skip to content

Optimize directory lookup#23

Merged
dmcgowan merged 2 commits into
erofs:mainfrom
dmcgowan:read-dir-by-name
Apr 8, 2026
Merged

Optimize directory lookup#23
dmcgowan merged 2 commits into
erofs:mainfrom
dmcgowan:read-dir-by-name

Conversation

@dmcgowan

@dmcgowan dmcgowan commented Apr 7, 2026

Copy link
Copy Markdown
Member

When resolving paths, previously an entire directory would be scanned then iterated through. With this change it is able to directly do the lookup while reading from the blocks. It performs binary search within each block since EROFS directories are sorted by name.

Also fixes a panic when closing a file that was opened but never read.

Benchmark: directory lookup

goos: linux
goarch: amd64
cpu: Intel(R) Xeon(R) CPU D-1528 @ 1.90GHz

Time (sec/op)

Case main this PR Change
shallow 37.20µs 26.26µs -29.41%
deep 72.82µs 52.05µs -28.52%
bigdir-first 4094.9µs 43.71µs -98.93%
bigdir-last 3912.6µs 41.61µs -98.94%
bigdir-notfound 3911.6µs 39.34µs -98.99%
geomean 701.4µs 39.63µs -94.35%

Memory (B/op)

Case main this PR Change
shallow 16.8Ki 8.4Ki -50.30%
deep 33.4Ki 16.7Ki -50.05%
bigdir-first 659.8Ki 8.4Ki -98.73%
bigdir-last 659.9Ki 8.4Ki -98.73%
bigdir-notfound 659.8Ki 8.3Ki -98.74%
geomean 174.4Ki 9.6Ki -94.50%

Allocations (allocs/op)

Case main this PR Change
shallow 25 8 -68.00%
deep 43 16 -62.79%
bigdir-first 10,065 8 -99.92%
bigdir-last 10,065 8 -99.92%
bigdir-notfound 10,064 7 -99.93%
geomean 1,019 8.9 -99.12%

Copilot AI review requested due to automatic review settings April 7, 2026 03:38

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes EROFS path resolution by replacing full-directory scans with an on-the-fly directory entry lookup while reading directory blocks, improving lookup time and reducing allocations.

Changes:

  • Added dir.lookup that performs per-block binary search (with linear-scan fallback) during path resolution instead of reading all entries.
  • Standardized several path/operation errors via exported ErrNotDirectory, ErrIsDirectory, and ErrLoop, and fixed a Close() panic when a file is opened but never initialized via readInfo.
  • Expanded tests/benchmarks to cover new error behaviors and measure lookup performance.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.

File Description
erofs.go Implements block-level directory lookup for path resolution and adds/uses standardized errors; includes Close() panic fix.
internal/erofstest/testcase.go Adds test helpers and new assertions for correct errors on invalid operations (non-dir path components, ReadDir on file, ReadLink on regular file).
erofs_test.go Adds a benchmark that exercises shallow/deep and large-directory lookup scenarios.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread erofs.go
Comment thread erofs.go Outdated
Comment thread erofs.go
Comment thread erofs_test.go
Copilot AI review requested due to automatic review settings April 7, 2026 21:15
@dmcgowan dmcgowan marked this pull request as ready for review April 7, 2026 21:19

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread erofs.go
Comment thread erofs.go
Comment thread erofs.go
Comment thread erofs.go Outdated
Comment thread erofs.go
Comment thread erofs_test.go Outdated
Comment thread erofs.go Outdated
Comment thread erofs.go
@hsiangkao

Copy link
Copy Markdown
Member
readDir:
...
			} else {
				name = string(buf[dirents[0].NameOff:])
			}

needs to be fixed too.

Comment thread erofs.go
dmcgowan added 2 commits April 7, 2026 20:28
When resolving paths, previously an entire directory would be scanned
then iterated though. With this change it is able to directly do the
lookup while reading from the blocks. It peforms binary search within a
block for the common case and falls back to full scan when not found.

Signed-off-by: Derek McGowan <derek@mcg.dev>
Signed-off-by: Derek McGowan <derek@mcg.dev>
Copilot AI review requested due to automatic review settings April 8, 2026 03:28
@dmcgowan

dmcgowan commented Apr 8, 2026

Copy link
Copy Markdown
Member Author

@hsiangkao updated, binary search across the blocks is reflect in the benchmarks now, much faster

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread erofs.go
Comment thread erofs.go
Comment thread erofs.go

@hsiangkao hsiangkao left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have some random tests in go-erofs for the tree hierarchy? I suggest introducing a random test with two-level directories at least with random numbers of dirs/files; and check the correction with looking up each individual path.

Otherwise it looks good to me.

Comment thread erofs.go
@dmcgowan dmcgowan merged commit 3394750 into erofs:main Apr 8, 2026
10 checks passed
@dmcgowan dmcgowan deleted the read-dir-by-name branch April 8, 2026 04:21
openshift-merge-bot Bot pushed a commit to openshift/assisted-image-service that referenced this pull request Apr 23, 2026
Current (4.22) ISOs with erofs type root filesystems cause a crash while
trying to find the nmstatectl binary. This is due to a bug in the
go-erofs library that was fixed
(erofs/go-erofs#23), but even after the fix the library
doesn't support the compression strategy this filesystem uses leading to
errors like "unsupported incompatible feature 0x2: not implemented".

Using dump.erofs directly for searching for the file as well as
extracting it is the strategy that is least likely to break as the
dump.erofs utility is tied directly to the reference implementation of
the filesystem spec.

Resolves https://redhat.atlassian.net/browse/ACM-33009
openshift-merge-bot Bot pushed a commit to openshift/assisted-image-service that referenced this pull request Apr 23, 2026
Current (4.22) ISOs with erofs type root filesystems cause a crash while
trying to find the nmstatectl binary. This is due to a bug in the
go-erofs library that was fixed
(erofs/go-erofs#23), but even after the fix the library
doesn't support the compression strategy this filesystem uses leading to
errors like "unsupported incompatible feature 0x2: not implemented".

Using dump.erofs directly for searching for the file as well as
extracting it is the strategy that is least likely to break as the
dump.erofs utility is tied directly to the reference implementation of
the filesystem spec.

Resolves https://redhat.atlassian.net/browse/ACM-33009

Co-authored-by: Nick Carboni <ncarboni@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants