Optimize directory lookup#23
Conversation
63f6fde to
e862057
Compare
There was a problem hiding this comment.
Pull request overview
This PR optimizes EROFS path resolution by replacing full-directory scans with an on-the-fly directory entry lookup while reading directory blocks, improving lookup time and reducing allocations.
Changes:
- Added
dir.lookupthat performs per-block binary search (with linear-scan fallback) during path resolution instead of reading all entries. - Standardized several path/operation errors via exported
ErrNotDirectory,ErrIsDirectory, andErrLoop, and fixed aClose()panic when a file is opened but never initialized viareadInfo. - Expanded tests/benchmarks to cover new error behaviors and measure lookup performance.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
erofs.go |
Implements block-level directory lookup for path resolution and adds/uses standardized errors; includes Close() panic fix. |
internal/erofstest/testcase.go |
Adds test helpers and new assertions for correct errors on invalid operations (non-dir path components, ReadDir on file, ReadLink on regular file). |
erofs_test.go |
Adds a benchmark that exercises shallow/deep and large-directory lookup scenarios. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
e862057 to
68f7fb7
Compare
68f7fb7 to
385f7d5
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 2 out of 2 changed files in this pull request and generated 6 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
385f7d5 to
928b795
Compare
readDir:
...
} else {
name = string(buf[dirents[0].NameOff:])
}needs to be fixed too. |
When resolving paths, previously an entire directory would be scanned then iterated though. With this change it is able to directly do the lookup while reading from the blocks. It peforms binary search within a block for the common case and falls back to full scan when not found. Signed-off-by: Derek McGowan <derek@mcg.dev>
Signed-off-by: Derek McGowan <derek@mcg.dev>
928b795 to
0732d45
Compare
|
@hsiangkao updated, binary search across the blocks is reflect in the benchmarks now, much faster |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Do we have some random tests in go-erofs for the tree hierarchy? I suggest introducing a random test with two-level directories at least with random numbers of dirs/files; and check the correction with looking up each individual path.
Otherwise it looks good to me.
Current (4.22) ISOs with erofs type root filesystems cause a crash while trying to find the nmstatectl binary. This is due to a bug in the go-erofs library that was fixed (erofs/go-erofs#23), but even after the fix the library doesn't support the compression strategy this filesystem uses leading to errors like "unsupported incompatible feature 0x2: not implemented". Using dump.erofs directly for searching for the file as well as extracting it is the strategy that is least likely to break as the dump.erofs utility is tied directly to the reference implementation of the filesystem spec. Resolves https://redhat.atlassian.net/browse/ACM-33009
Current (4.22) ISOs with erofs type root filesystems cause a crash while trying to find the nmstatectl binary. This is due to a bug in the go-erofs library that was fixed (erofs/go-erofs#23), but even after the fix the library doesn't support the compression strategy this filesystem uses leading to errors like "unsupported incompatible feature 0x2: not implemented". Using dump.erofs directly for searching for the file as well as extracting it is the strategy that is least likely to break as the dump.erofs utility is tied directly to the reference implementation of the filesystem spec. Resolves https://redhat.atlassian.net/browse/ACM-33009 Co-authored-by: Nick Carboni <ncarboni@redhat.com>
When resolving paths, previously an entire directory would be scanned then iterated through. With this change it is able to directly do the lookup while reading from the blocks. It performs binary search within each block since EROFS directories are sorted by name.
Also fixes a panic when closing a file that was opened but never read.
Benchmark: directory lookup
Time (sec/op)
Memory (B/op)
Allocations (allocs/op)