Skip to content

zfs-2.3.3 patchset #17468

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 105 commits into from
Closed

Conversation

behlendorf
Copy link
Contributor

@behlendorf behlendorf commented Jun 17, 2025

Motivation and Context

Proposed zfs-2.3.3 patchset based on the commits in the zfs-2.3.3-staging branch.

Description

Bump META file to support 6.15 kernel, zfs send/recv encryption fixes, misc fixes.

How Has This Been Tested?

Previously tested by CI, well be retested in this PR.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Quality assurance (non-breaking change which makes the code more robust against bugs)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

amotin and others added 30 commits May 28, 2025 16:00
Previous implementation of zap_leaf_array_free() put chunks on the
free list in reverse order.  Also zap_leaf_transfer_entry() and
zap_entry_remove() were freeing name and value arrays in reverse
order.  Together this created a mess in the free list, making
following allocations much more fragmented than necessary.

This patch re-implements zap_leaf_array_free() to keep existing
chunks order, and implements non-destructive zap_leaf_array_copy()
to be used in zap_leaf_transfer_entry() to allow properly ordered
freeing name and value arrays there and in zap_entry_remove().

With this change test of some writes and deletes shows percent of
non-contiguous chunks in DDT reducing from 61% and 47% to 0% and
17% for arrays and frees respectively.  Sure some explicit sorting
could do even better, especially for ZAPs with variable-size arrays,
but it would also cost much more, while this should be very cheap.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#16766
(cherry picked from commit 9a81484)
Since embedded blocks introduction 11 years ago, their writing was
blocked if dedup is enabled.  After searching through the modern
code I see no reason for this restriction to exist.  Same time
embedded blocks are dramatically cheaper.  Even regular write of
so small blocks would likely be cheaper than deduplication, even
if the last is successful, not mentioning otherwise.

Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#17113
(cherry picked from commit 09f4dd0)
It seems `fio` in `ddt_dedup_vdev_limit` overwhelms the system
with the amount of dirty data caused by DDT updates within one
TXG due to tiny 1KB records used, while I see no reason for this
test to extend the TXGs beyond default.

Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: @ImAwsumm
Reviewed-by: Tony Hutter <[email protected]>
(cherry picked from commit 21850f5)
dbuf_prefetch_impl() should look on level of current indirect, not
the target prefetch level.  dbuf_prefetch_indirect_done() should
call dnode_level_is_l2cacheable() if we have dpa_dnode to pass it.
It should fix some both false positive and negative L2ARC caching.

While there, fix redacted feature activation assertions.  One was
always true, while another could give false positive if dpa_dnode
is NULL.

George Amanakis <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#17204
(cherry picked from commit a497c5f)
…nzfs#17228)

Various tools will display draid vdev names with parameters embedded in
them, but would not accept them as valid vdev names when looking them
up, making it difficult to build pipelines involving draid vdevs.

This commit makes it so that if a full draid name is offered for match,
it gets truncated at the first ':' character.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Rob Norris <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
(cherry picked from commit 131df3b)
- Fix VERIFY3B() when given non-boolean values.
 - Map EQUIV() into VERIFY3B(,==,) as equivalent.
 - Tune messages for better readability and to closer match source
code for easier search.  Unify user-space messages with kernel.
 - Tune printed types and remove %px outside of Linux kernel.

Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Reviewed-by: @ImAwsumm
Reviewed-by: Rob Norris <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
(cherry picked from commit 4866c2f)
The test writes 1M of 1KB blocks, which may produce up to 1GB of
dirty data.  On top of that ashift=12 likely produces additional
4GB of ZIO buffers during sync process.  On top of that we likely
need some page cache since the pool reside on files.  And finally
we need to cache the DDT.  Not surprising that the test regularly
ends up in OOMs, possibly depending on TXG size variations.

Also replace fio with pretty strange parameter set with a set of
dd writes and TXG commits, just as we neeed here.

While here, remove compression.  It has nothing to do here, but
waste CI CPU time.

Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
(cherry picked from commit 1d8f625)
 - Kill workload first for faster cleanup.
 - Use `zpool wait` for resilver instead of `sleep`.
 - Remove irrelevant workload from `online_offline_003_neg`.

Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes: openzfs#17259
(cherry picked from commit cb49e77)
Replace `sleep 15` with `zpool wait`, which should take much less
than the 15 seconds.  And considering it is called 16 times, this
should save us up to 4 minutes total.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Tino Reichardt <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes: openzfs#17257
(cherry picked from commit 8f2c2de)
All defined variants:
- freebsd13-4r, freebsd13-5r, freebsd14-1r, freebsd14-2r (RELEASE)
- freebsd13-5s, freebsd14-2s (STABLE)
- freebsd15-0c (CURRENT)

Used for testing:
- freebsd13-4r (RELEASE)
- freebsd14-2s (STABLE)
- freebsd15-0c (CURRENT)

Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Tino Reichardt <[email protected]>
Closes: openzfs#17260
(cherry picked from commit 6afb405)
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes: openzfs#17267
(cherry picked from commit 38c3a8b)
Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Tino Reichardt <[email protected]>
Reviewed by: Attila Fülöp <[email protected]>
Signed-off-by: Artem-OSSRevival <[email protected]>
Fixes: openzfs#14538
Closes: openzfs#17234
(cherry picked from commit 37a3e26)
Originally the Lustre ZFS OSD code was going to use zfs_uio_t structs
for supporting Direct I/O with ZFS. However, this has changed to using
abd_t structs instead. This exports the proper symbols that will be used
by the Lustre ZFS OSD code.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Brian Atkinson <[email protected]>
Closes openzfs#17256
(cherry picked from commit 7031a48)
This commit adds support for using llvm-libunwind for kernels built
using llvm and clang. The two differences are that the largest register
index is given by _LIBUNWIND_HIGHEST_DWARF_REGISTER, we need to check
whether the register is a floating point register and the prototype
for unw_regname takes the unwind cursor as the first argument.

Reviewed-by: Rob Norris <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Sebastian Pauka <[email protected]>
Closes openzfs#17230
(cherry picked from commit 1b4826b)
Those tests are write-mostly at the nested pool.  Considering we have
3 more layers of caching underneath, we can hint ZFS how to use the
memory better by setting primarycache=metadata.

While there, add missing zpool sync after rm in checkpoint_capacity
before we could potentially see the freed space, would not there be
a pool checkpoint.

Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.

Reviewed-by: Tino Reichardt <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
(cherry picked from commit 1ef706c)
Sometimes it fails unable to see any injected write errors.
I guess writing 25KB of zeroes might be not enough to trigger
errors with probability set to 10%.  Lets try to write more.

Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#17270
(cherry picked from commit d947b9a)
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Tino Reichardt <[email protected]>
Closes openzfs#17278
(cherry picked from commit 88ec6c4)
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Alexander Ziaee <[email protected]>
Reviewed-by: Rob Norris <[email protected]>
Signed-off-by: Quentin Thébault <[email protected]>
Closes openzfs#17282
(cherry picked from commit 63de2d2)
Don't use KSM on the FreeBSD VMs and optimize KSM settings for
Linux to have faster run times.

Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Tino Reichardt <[email protected]>
Closes openzfs#17247
(cherry picked from commit ba17ced)
### Background

Various admin operations will be invoked by some userspace task, but the
work will be done on a separate kernel thread at a later time. Snapshots
are an example, which are triggered through zfs_ioc_snapshot() ->
dsl_dataset_snapshot(), but the actual work is from a task dispatched to
dp_sync_taskq.

Many such tasks end up in dsl_enforce_ds_ss_limits(), where various
limits and permissions are enforced. Among other things, it is necessary
to ensure that the invoking task (that is, the user) has permission to
do things. We can't simply check if the running task has permission; it
is a privileged kernel thread, which can do anything.

However, in the general case it's not safe to simply query the task for
its permissions at the check time, as the task may not exist any more,
or its permissions may have changed since it was first invoked. So
instead, we capture the permissions by saving CRED() in the user task,
and then using it for the check through the secpolicy_* functions.

### Current implementation

The current code calls CRED() to get the credential, which gets a
pointer to the cred_t inside the current task and passes it to the
worker task. However, it doesn't take a reference to the cred_t, and so
expects that it won't change, and that the task continues to exist. In
practice that is always the case, because we don't let the calling task
return from the kernel until the work is done.

For Linux, we also take a reference to the current task, because the
Linux credential APIs for the most part do not check an arbitrary
credential, but rather, query what a task can do. See
secpolicy_zfs_proc(). Again, we don't take a reference on the task, just
a pointer to it.

### Changes

We change to calling crhold() on the task credential, and crfree() when
we're done with it. This ensures it stays alive and unchanged for the
duration of the call.

On the Linux side, we change the main policy checking function
priv_policy_ns() to use override_creds()/revert_creds() if necessary to
make the provided credential active in the current task, allowing the
standard task-permission APIs to do the needed check. Since the task
pointer is no longer required, this lets us entirely remove
secpolicy_zfs_proc() and the need to carry a task pointer around as
well.

Sponsored-by: https://despairlabs.com/sponsor/

Signed-off-by: Rob Norris <[email protected]>
Reviewed-by: Pavel Snajdr <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Kyle Evans <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
(cherry picked from commit c8fa39b)
This change goes through and quotes variables where appropriate to
avoid issues with incorrect splitting. The performance tests ran into
an issue with $SUDO_COMMAND splitting incorrectly because it was not
quoted. This change fixes that issue and hopefully gets ahead of any
other similar problems.

Reviewed by: John Wren Kennedy <[email protected]>
Reviewed-by: Tony Nguyen <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Aleksandr Liber <[email protected]>
Closes openzfs#17235
(cherry picked from commit aa46cc9)
When multiple snapshots prevent the destruction/rollback of the
respective dataset/snapshot/volume via zfs destroy or zfs rollback,
the error message does not list the blocking snapshots sorted
according to their order of creation. This causes inconvenience and can
lead to confusion, and also creates a contrast with a returned message
from zfs list -t snap function.

Closes: openzfs#12751

Signed-off-by: Artem-OSSRevival <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
(cherry picked from commit 27f3d94)
Decrease the RESILVER_MIN_TIME_MS variable from 50 to 20.
So the test, which expects two 2 resilver starts will see them.

Logfile of the seen failures before this fix:
log: NOTE: expected 2 resilver start(s) after offline/online, found 1
log: expected 2 resilver start(s) after offline/online, found 1

The test time decreases also from around 00:42 to 00:24 seconds.

Reviewed-by: Tony Hutter <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Tino Reichardt <[email protected]>
Closes openzfs#16822
Closes openzfs#17279
(cherry picked from commit 3b18877)
It's possible for two spares to get attached to a single failed vdev.
This happens when you have a failed disk that is spared, and then you
replace the failed disk with a new disk, but during the resilver
the new disk fails, and ZED kicks in a spare for the failed new
disk.  This commit checks for that condition and disallows it.

Reviewed-by: Akash B <[email protected]>
Reviewed-by: Ameer Hamza <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes: openzfs#16547
Closes: openzfs#17231
(cherry picked from commit f40ab9e)
zpool_status_003 and zpool_status_004_pos use 'dd' to trigger a read of
a file without specifying 'of=/dev/null'.  This spams the ZTS logs
with ~20MB of garbage data. This commit adds 'of=/dev/null'.

Signed-off-by: Tony Hutter <[email protected]>
Reviewed-by: Tino Reichardt <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
(cherry picked from commit a6cca8a)
`S_IFMT` is declared in `sys/stat.h`, but we cannot include this header
because it redeclares the `statx` function with different argument
types. Therefore, we define `S_IFMT` ourselves, in the same way as the
other definitions.

Reviewed-by: Rob Norris <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: José Luis Salvador Rufo <[email protected]>
Closes openzfs#17293
Closes openzfs#17294
(cherry picked from commit 634c172)
We should not clear scn_state and notify waiters until we call
vdev_dtl_reassess(), otherwise following offline/detach request
may fail with "no valid replicas".

Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
(cherry picked from commit f86d9af)
After more CI runs and code reading after openzfs#17259 I've found that
online starts resilver via async mechanism, which does not provide
wait primitives at this time.  Restore some delays to restore CI
until this is properly fixed.

Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
(cherry picked from commit f85c96e)
…nzfs#17284)

txg_wait_synced_sig() is "wait for txg, unless a signal arrives". We
expect that future development will require similar "wait unless X"
behaviour.

This generalises the API as txg_wait_synced_flags(), where the provided
flags describe the events that should cause the call to return.

Instead of a boolean, the return is now an error code, which the caller
can use to know which event caused the call to return.

The existing call to txg_wait_synced_sig() is now
txg_wait_synced_flags(TXG_WAIT_SIGNAL).

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.

Signed-off-by: Rob Norris <[email protected]>
Reviewed-by: Allan Jude <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
(cherry picked from commit a7de203)
With certain combinations of target ARC states balance and ghost
hit rates it was possible to get the fractions outside of allowed
range.  This patch limits maximum balance adjustment speed, which
should make it impossible, and also asserts it.

Fixes openzfs#17210
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Reviewed-by: Rob Norris <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
(cherry picked from commit b1ccab1)
behlendorf and others added 22 commits June 17, 2025 10:50
Fedora 40 has gone EOL as of May 2025, retire the CI builder.

Reviewed-by: Tino Reichardt <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Brian Behlendorf <[email protected]>
Closes openzfs#17408
After openzfs#17401 the Linux build produces some stack related warnings.

Silence them with the `STACK_FRAME_NON_STANDARD` macro.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Tino Reichardt <[email protected]>
Signed-off-by: Attila Fülöp <[email protected]>
Co-authored-by: Brian Behlendorf <[email protected]>
Closes openzfs#17410
The io_uring interface is available as a Technology Preview.
Details: https://access.redhat.com/solutions/4723221

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tino Reichardt <[email protected]>
Closes openzfs#17447
Having high-refcount dedup entries for zero blocks is inefficient
when they could be recorded as a holes instead.  Normally, zero
compression is not done if compression is disabled to not confuse
naive benchmarks.  But with dedup enabled, it is expected that the
write will be skipped anyway, so we are just optimizing the way it
is skipped.

Reviewed-by: Rob Norris <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#17435
It is not right, but there are few examples when TX is aborted
after being assigned in case of error.  To handle it better on
production systems add extra cleanup steps.

While here, replace couple dmu_tx_abort() in simple cases.

Reviewed-by: Rob Norris <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Igor Kozhukhov <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#17438
Looking on txg_wait_synced(, 0) I've noticed that it always syncs
5 TXGs: 3 TXG_CONCURRENT_STATES + 2 TXG_DEFER_SIZE.  But in case
of dmu_offset_next() we do not care about deferred frees. And even
concurrent TXGs we might not need sync all 3 if the dnode was not
dirtied in last few TXGs.

This patch makes dmu_offset_next() to sync one TXG at a time until
the dnode is clean, but no more than 3 TXG_CONCURRENT_STATES times.
My tests with random simultaneous writes and seeks over many files
on HDD pool show 7-14% performance increase.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Rob Norris <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#17434
Previous dmu_tx_count_clone() was broken, stating that cloning is
similar to free.  While they might be from some points, cloning
is not net-free.  It will likely consume space and memory, and
unlike free it will do it no matter whether the destination has
the blocks or not (usually not, so previous code did nothing).

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#17431
Fairly coarse, but if it returns while the pool suspends, it must be
with an error.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#17420
The superblock pointer will always be set, as will z_log, so remove code
supporting cases that can't occur (on Linux at least).

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#17420
If the pool is suspended, we'll just block in zil_commit(). If the
system is shutting down, blocking wouldn't help anyone. So, we should
keep this test for now, but at least return an error for anyone who is
actually interested.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#17420
If the kernel will honour our error returns, use them. If not, fool it
by setting a writeback error on the superblock, if available.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Paul Dagnelie <[email protected]>
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#17420
If a write is split across mutliple itxs, we only want the callback on
the last one, otherwise it will be called for every itx associated with
this single write, which makes it very hard to know what to clean up.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Mark Johnston <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#17445
zfs_putpages() would put the entire range of pages onto the ZIL, then
return VM_PAGER_OK for each page to the kernel. However, an associated
zil_commit() or txg sync had not happened at this point, so the write
may not actually be on disk.

So, we rework it to use a ZIL commit callback, and do the post-write
work of undirtying the page and signaling completion there. We return
VM_PAGER_PEND to the kernel instead so it knows that we will take care
of it.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Mark Johnston <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#17445
- Set/remove "Work in Progress"/"Code Review Needed" for drafts.
 - Remove "Accepted", "Inactive", "Revision Needed" and "Stale" on
pushes and reopens.

I hope this reduce chances of PRs being forgotten after requested
modifications done due to stale labels.  It is better to have no
labels than incorrect ones saying there is nothing to look at.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#16721
This also includes removing L2 vdevs asynchronously.

This commit also guarantees that spa_load_guid is unique.

The zpool reguid feature introduced the spa_load_guid, which is a
transient value used for runtime identification purposes in the ARC.
This value is not the same as the spa's persistent pool guid.

However, the value is seeded from spa_generate_load_guid() which
does not check for uniqueness against the spa_load_guid from other
pools.  Although extremely rare, you can end up with two different
pools sharing the same spa_load_guid value! So we guarantee that
the value is always unique and additionally not still in use by an
async arc flush task.

Sponsored-by: Klara, Inc.
Sponsored-by: Wasabi Technology, Inc.
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Allan Jude <[email protected]>
Signed-off-by: Don Brady <[email protected]>
Closes openzfs#16215
On systems with enormous amounts of memory, the single arc_evict thread
can become a bottleneck if reads and writes are stuck behind it, waiting
for old data to be evicted before new data can take its place.

This commit adds support for evicting from multiple ARC lists in
parallel, by farming the evict work out to some number of threads and
then accumulating their results.

A new tuneable, zfs_arc_evict_threads, sets the number of threads. By
default, it will scale based on the number of CPUs.

Sponsored-by: Expensify, Inc.
Sponsored-by: Klara, Inc.
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Youzhong Yang <[email protected]>
Signed-off-by: Allan Jude <[email protected]>
Signed-off-by: Mateusz Piotrowski <[email protected]>
Signed-off-by: Alexander Stetsenko <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Co-authored-by: Rob Norris <[email protected]>
Co-authored-by: Mateusz Piotrowski <[email protected]>
Co-authored-by: Alexander Stetsenko <[email protected]>
Closes openzfs#16486
It's been many years, we can probably do without.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: George Melikov <[email protected]>
Reviewed-by: Pavel Snajdr <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#17376
It makes no sense to limit read size below the block size, since
DMU will any way consume resources for the whole block, while the
current zfs_vnops_read_chunk_size is only 1MB, which is smaller
that maximum block size of 16MB.  Plus in case of misaligned
Uncached I/O the buffer may get evicted between the chunks,
requiring repeating I/Os.

On 64-bit platforms increase zfs_vnops_read_chunk_size to 32MB.
It allows to less depend on speculative prefetcher if application
requests specific size, first not waiting for prefetcher to start
and later not prefetching more than needed.

Also while there, we don't need to align reads to the chunk size,
but only to a block size, which is smaller and so more forgiving.

My profiles show ~4% of CPU time saving when reading 16MB blocks.

Reviewed-by: Brian Behlendorf <[email protected]>
Reviewed by: Igor Kozhukhov <[email protected]>
Signed-off-by:	Alexander Motin <[email protected]>
Sponsored by:	iXsystems, Inc.
Closes openzfs#17415
These are only required to support these ioctls on Linux <4.5. Since
4.18 is our cutoff, we don't need this code anymore.

Also removing related test things that will never match again.

Sponsored-by: https://despairlabs.com/sponsor/
Reviewed-by: Alexander Motin <[email protected]>
Reviewed-by: Tony Hutter <[email protected]>
Signed-off-by: Rob Norris <[email protected]>
Closes openzfs#17308
Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Germano Massullo <[email protected]>
Closes openzfs#17461
Update the META file to reflect compatibility with the 6.15
kernel.

Reviewed-by: Brian Behlendorf <[email protected]>
Signed-off-by: Tony Hutter <[email protected]>
Closes openzfs#17393
Signed-off-by: Rob Norris <[email protected]>
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Jun 17, 2025
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Jun 19, 2025
@behlendorf
Copy link
Contributor Author

Merged and tagged 2.3.3 tagged.

https://github.com/openzfs/zfs/releases/tag/zfs-2.3.3

@behlendorf behlendorf closed this Jun 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.