-
Notifications
You must be signed in to change notification settings - Fork 1.9k
zfs-2.3.3 patchset #17468
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
zfs-2.3.3 patchset #17468
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Previous implementation of zap_leaf_array_free() put chunks on the free list in reverse order. Also zap_leaf_transfer_entry() and zap_entry_remove() were freeing name and value arrays in reverse order. Together this created a mess in the free list, making following allocations much more fragmented than necessary. This patch re-implements zap_leaf_array_free() to keep existing chunks order, and implements non-destructive zap_leaf_array_copy() to be used in zap_leaf_transfer_entry() to allow properly ordered freeing name and value arrays there and in zap_entry_remove(). With this change test of some writes and deletes shows percent of non-contiguous chunks in DDT reducing from 61% and 47% to 0% and 17% for arrays and frees respectively. Sure some explicit sorting could do even better, especially for ZAPs with variable-size arrays, but it would also cost much more, while this should be very cheap. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#16766 (cherry picked from commit 9a81484)
Since embedded blocks introduction 11 years ago, their writing was blocked if dedup is enabled. After searching through the modern code I see no reason for this restriction to exist. Same time embedded blocks are dramatically cheaper. Even regular write of so small blocks would likely be cheaper than deduplication, even if the last is successful, not mentioning otherwise. Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#17113 (cherry picked from commit 09f4dd0)
It seems `fio` in `ddt_dedup_vdev_limit` overwhelms the system with the amount of dirty data caused by DDT updates within one TXG due to tiny 1KB records used, while I see no reason for this test to extend the TXGs beyond default. Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Reviewed-by: Allan Jude <[email protected]> Reviewed-by: @ImAwsumm Reviewed-by: Tony Hutter <[email protected]> (cherry picked from commit 21850f5)
dbuf_prefetch_impl() should look on level of current indirect, not the target prefetch level. dbuf_prefetch_indirect_done() should call dnode_level_is_l2cacheable() if we have dpa_dnode to pass it. It should fix some both false positive and negative L2ARC caching. While there, fix redacted feature activation assertions. One was always true, while another could give false positive if dpa_dnode is NULL. George Amanakis <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#17204 (cherry picked from commit a497c5f)
…nzfs#17228) Various tools will display draid vdev names with parameters embedded in them, but would not accept them as valid vdev names when looking them up, making it difficult to build pipelines involving draid vdevs. This commit makes it so that if a full draid name is offered for match, it gets truncated at the first ':' character. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]> (cherry picked from commit 131df3b)
- Fix VERIFY3B() when given non-boolean values. - Map EQUIV() into VERIFY3B(,==,) as equivalent. - Tune messages for better readability and to closer match source code for easier search. Unify user-space messages with kernel. - Tune printed types and remove %px outside of Linux kernel. Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Reviewed-by: @ImAwsumm Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Tony Hutter <[email protected]> (cherry picked from commit 4866c2f)
The test writes 1M of 1KB blocks, which may produce up to 1GB of dirty data. On top of that ashift=12 likely produces additional 4GB of ZIO buffers during sync process. On top of that we likely need some page cache since the pool reside on files. And finally we need to cache the DDT. Not surprising that the test regularly ends up in OOMs, possibly depending on TXG size variations. Also replace fio with pretty strange parameter set with a set of dd writes and TXG commits, just as we neeed here. While here, remove compression. It has nothing to do here, but waste CI CPU time. Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Tony Hutter <[email protected]> (cherry picked from commit 1d8f625)
- Kill workload first for faster cleanup. - Use `zpool wait` for resilver instead of `sleep`. - Remove irrelevant workload from `online_offline_003_neg`. Reviewed-by: George Melikov <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes: openzfs#17259 (cherry picked from commit cb49e77)
Replace `sleep 15` with `zpool wait`, which should take much less than the 15 seconds. And considering it is called 16 times, this should save us up to 4 minutes total. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes: openzfs#17257 (cherry picked from commit 8f2c2de)
All defined variants: - freebsd13-4r, freebsd13-5r, freebsd14-1r, freebsd14-2r (RELEASE) - freebsd13-5s, freebsd14-2s (STABLE) - freebsd15-0c (CURRENT) Used for testing: - freebsd13-4r (RELEASE) - freebsd14-2s (STABLE) - freebsd15-0c (CURRENT) Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Closes: openzfs#17260 (cherry picked from commit 6afb405)
Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes: openzfs#17267 (cherry picked from commit 38c3a8b)
Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed by: Attila Fülöp <[email protected]> Signed-off-by: Artem-OSSRevival <[email protected]> Fixes: openzfs#14538 Closes: openzfs#17234 (cherry picked from commit 37a3e26)
Originally the Lustre ZFS OSD code was going to use zfs_uio_t structs for supporting Direct I/O with ZFS. However, this has changed to using abd_t structs instead. This exports the proper symbols that will be used by the Lustre ZFS OSD code. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Brian Atkinson <[email protected]> Closes openzfs#17256 (cherry picked from commit 7031a48)
This commit adds support for using llvm-libunwind for kernels built using llvm and clang. The two differences are that the largest register index is given by _LIBUNWIND_HIGHEST_DWARF_REGISTER, we need to check whether the register is a floating point register and the prototype for unw_regname takes the unwind cursor as the first argument. Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Sebastian Pauka <[email protected]> Closes openzfs#17230 (cherry picked from commit 1b4826b)
Those tests are write-mostly at the nested pool. Considering we have 3 more layers of caching underneath, we can hint ZFS how to use the memory better by setting primarycache=metadata. While there, add missing zpool sync after rm in checkpoint_capacity before we could potentially see the freed space, would not there be a pool checkpoint. Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: Tony Hutter <[email protected]> (cherry picked from commit 1ef706c)
Sometimes it fails unable to see any injected write errors. I guess writing 25KB of zeroes might be not enough to trigger errors with probability set to 10%. Lets try to write more. Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#17270 (cherry picked from commit d947b9a)
Reviewed-by: George Melikov <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Closes openzfs#17278 (cherry picked from commit 88ec6c4)
Reviewed-by: George Melikov <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Alexander Ziaee <[email protected]> Reviewed-by: Rob Norris <[email protected]> Signed-off-by: Quentin Thébault <[email protected]> Closes openzfs#17282 (cherry picked from commit 63de2d2)
Don't use KSM on the FreeBSD VMs and optimize KSM settings for Linux to have faster run times. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Closes openzfs#17247 (cherry picked from commit ba17ced)
### Background Various admin operations will be invoked by some userspace task, but the work will be done on a separate kernel thread at a later time. Snapshots are an example, which are triggered through zfs_ioc_snapshot() -> dsl_dataset_snapshot(), but the actual work is from a task dispatched to dp_sync_taskq. Many such tasks end up in dsl_enforce_ds_ss_limits(), where various limits and permissions are enforced. Among other things, it is necessary to ensure that the invoking task (that is, the user) has permission to do things. We can't simply check if the running task has permission; it is a privileged kernel thread, which can do anything. However, in the general case it's not safe to simply query the task for its permissions at the check time, as the task may not exist any more, or its permissions may have changed since it was first invoked. So instead, we capture the permissions by saving CRED() in the user task, and then using it for the check through the secpolicy_* functions. ### Current implementation The current code calls CRED() to get the credential, which gets a pointer to the cred_t inside the current task and passes it to the worker task. However, it doesn't take a reference to the cred_t, and so expects that it won't change, and that the task continues to exist. In practice that is always the case, because we don't let the calling task return from the kernel until the work is done. For Linux, we also take a reference to the current task, because the Linux credential APIs for the most part do not check an arbitrary credential, but rather, query what a task can do. See secpolicy_zfs_proc(). Again, we don't take a reference on the task, just a pointer to it. ### Changes We change to calling crhold() on the task credential, and crfree() when we're done with it. This ensures it stays alive and unchanged for the duration of the call. On the Linux side, we change the main policy checking function priv_policy_ns() to use override_creds()/revert_creds() if necessary to make the provided credential active in the current task, allowing the standard task-permission APIs to do the needed check. Since the task pointer is no longer required, this lets us entirely remove secpolicy_zfs_proc() and the need to carry a task pointer around as well. Sponsored-by: https://despairlabs.com/sponsor/ Signed-off-by: Rob Norris <[email protected]> Reviewed-by: Pavel Snajdr <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Kyle Evans <[email protected]> Reviewed-by: Tony Hutter <[email protected]> (cherry picked from commit c8fa39b)
This change goes through and quotes variables where appropriate to avoid issues with incorrect splitting. The performance tests ran into an issue with $SUDO_COMMAND splitting incorrectly because it was not quoted. This change fixes that issue and hopefully gets ahead of any other similar problems. Reviewed by: John Wren Kennedy <[email protected]> Reviewed-by: Tony Nguyen <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Aleksandr Liber <[email protected]> Closes openzfs#17235 (cherry picked from commit aa46cc9)
When multiple snapshots prevent the destruction/rollback of the respective dataset/snapshot/volume via zfs destroy or zfs rollback, the error message does not list the blocking snapshots sorted according to their order of creation. This causes inconvenience and can lead to confusion, and also creates a contrast with a returned message from zfs list -t snap function. Closes: openzfs#12751 Signed-off-by: Artem-OSSRevival <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]> (cherry picked from commit 27f3d94)
Decrease the RESILVER_MIN_TIME_MS variable from 50 to 20. So the test, which expects two 2 resilver starts will see them. Logfile of the seen failures before this fix: log: NOTE: expected 2 resilver start(s) after offline/online, found 1 log: expected 2 resilver start(s) after offline/online, found 1 The test time decreases also from around 00:42 to 00:24 seconds. Reviewed-by: Tony Hutter <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Closes openzfs#16822 Closes openzfs#17279 (cherry picked from commit 3b18877)
It's possible for two spares to get attached to a single failed vdev. This happens when you have a failed disk that is spared, and then you replace the failed disk with a new disk, but during the resilver the new disk fails, and ZED kicks in a spare for the failed new disk. This commit checks for that condition and disallows it. Reviewed-by: Akash B <[email protected]> Reviewed-by: Ameer Hamza <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes: openzfs#16547 Closes: openzfs#17231 (cherry picked from commit f40ab9e)
zpool_status_003 and zpool_status_004_pos use 'dd' to trigger a read of a file without specifying 'of=/dev/null'. This spams the ZTS logs with ~20MB of garbage data. This commit adds 'of=/dev/null'. Signed-off-by: Tony Hutter <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Alexander Motin <[email protected]> (cherry picked from commit a6cca8a)
`S_IFMT` is declared in `sys/stat.h`, but we cannot include this header because it redeclares the `statx` function with different argument types. Therefore, we define `S_IFMT` ourselves, in the same way as the other definitions. Reviewed-by: Rob Norris <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: José Luis Salvador Rufo <[email protected]> Closes openzfs#17293 Closes openzfs#17294 (cherry picked from commit 634c172)
We should not clear scn_state and notify waiters until we call vdev_dtl_reassess(), otherwise following offline/detach request may fail with "no valid replicas". Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. (cherry picked from commit f86d9af)
After more CI runs and code reading after openzfs#17259 I've found that online starts resilver via async mechanism, which does not provide wait primitives at this time. Restore some delays to restore CI until this is properly fixed. Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. (cherry picked from commit f85c96e)
…nzfs#17284) txg_wait_synced_sig() is "wait for txg, unless a signal arrives". We expect that future development will require similar "wait unless X" behaviour. This generalises the API as txg_wait_synced_flags(), where the provided flags describe the events that should cause the call to return. Instead of a boolean, the return is now an error code, which the caller can use to know which event caused the call to return. The existing call to txg_wait_synced_sig() is now txg_wait_synced_flags(TXG_WAIT_SIGNAL). Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Signed-off-by: Rob Norris <[email protected]> Reviewed-by: Allan Jude <[email protected]> Reviewed-by: Alexander Motin <[email protected]> (cherry picked from commit a7de203)
With certain combinations of target ARC states balance and ghost hit rates it was possible to get the fractions outside of allowed range. This patch limits maximum balance adjustment speed, which should make it impossible, and also asserts it. Fixes openzfs#17210 Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Tony Hutter <[email protected]> (cherry picked from commit b1ccab1)
Fedora 40 has gone EOL as of May 2025, retire the CI builder. Reviewed-by: Tino Reichardt <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Brian Behlendorf <[email protected]> Closes openzfs#17408
After openzfs#17401 the Linux build produces some stack related warnings. Silence them with the `STACK_FRAME_NON_STANDARD` macro. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Tino Reichardt <[email protected]> Signed-off-by: Attila Fülöp <[email protected]> Co-authored-by: Brian Behlendorf <[email protected]> Closes openzfs#17410
The io_uring interface is available as a Technology Preview. Details: https://access.redhat.com/solutions/4723221 Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tino Reichardt <[email protected]> Closes openzfs#17447
Having high-refcount dedup entries for zero blocks is inefficient when they could be recorded as a holes instead. Normally, zero compression is not done if compression is disabled to not confuse naive benchmarks. But with dedup enabled, it is expected that the write will be skipped anyway, so we are just optimizing the way it is skipped. Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: George Melikov <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#17435
It is not right, but there are few examples when TX is aborted after being assigned in case of error. To handle it better on production systems add extra cleanup steps. While here, replace couple dmu_tx_abort() in simple cases. Reviewed-by: Rob Norris <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Igor Kozhukhov <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#17438
Looking on txg_wait_synced(, 0) I've noticed that it always syncs 5 TXGs: 3 TXG_CONCURRENT_STATES + 2 TXG_DEFER_SIZE. But in case of dmu_offset_next() we do not care about deferred frees. And even concurrent TXGs we might not need sync all 3 if the dnode was not dirtied in last few TXGs. This patch makes dmu_offset_next() to sync one TXG at a time until the dnode is clean, but no more than 3 TXG_CONCURRENT_STATES times. My tests with random simultaneous writes and seeks over many files on HDD pool show 7-14% performance increase. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Rob Norris <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#17434
Previous dmu_tx_count_clone() was broken, stating that cloning is similar to free. While they might be from some points, cloning is not net-free. It will likely consume space and memory, and unlike free it will do it no matter whether the destination has the blocks or not (usually not, so previous code did nothing). Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#17431
Fairly coarse, but if it returns while the pool suspends, it must be with an error. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#17420
The superblock pointer will always be set, as will z_log, so remove code supporting cases that can't occur (on Linux at least). Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#17420
If the pool is suspended, we'll just block in zil_commit(). If the system is shutting down, blocking wouldn't help anyone. So, we should keep this test for now, but at least return an error for anyone who is actually interested. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#17420
If the kernel will honour our error returns, use them. If not, fool it by setting a writeback error on the superblock, if available. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Paul Dagnelie <[email protected]> Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#17420
If a write is split across mutliple itxs, we only want the callback on the last one, otherwise it will be called for every itx associated with this single write, which makes it very hard to know what to clean up. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Mark Johnston <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#17445
zfs_putpages() would put the entire range of pages onto the ZIL, then return VM_PAGER_OK for each page to the kernel. However, an associated zil_commit() or txg sync had not happened at this point, so the write may not actually be on disk. So, we rework it to use a ZIL commit callback, and do the post-write work of undirtying the page and signaling completion there. We return VM_PAGER_PEND to the kernel instead so it knows that we will take care of it. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Mark Johnston <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#17445
- Set/remove "Work in Progress"/"Code Review Needed" for drafts. - Remove "Accepted", "Inactive", "Revision Needed" and "Stale" on pushes and reopens. I hope this reduce chances of PRs being forgotten after requested modifications done due to stale labels. It is better to have no labels than incorrect ones saying there is nothing to look at. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#16721
This also includes removing L2 vdevs asynchronously. This commit also guarantees that spa_load_guid is unique. The zpool reguid feature introduced the spa_load_guid, which is a transient value used for runtime identification purposes in the ARC. This value is not the same as the spa's persistent pool guid. However, the value is seeded from spa_generate_load_guid() which does not check for uniqueness against the spa_load_guid from other pools. Although extremely rare, you can end up with two different pools sharing the same spa_load_guid value! So we guarantee that the value is always unique and additionally not still in use by an async arc flush task. Sponsored-by: Klara, Inc. Sponsored-by: Wasabi Technology, Inc. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Allan Jude <[email protected]> Signed-off-by: Don Brady <[email protected]> Closes openzfs#16215
On systems with enormous amounts of memory, the single arc_evict thread can become a bottleneck if reads and writes are stuck behind it, waiting for old data to be evicted before new data can take its place. This commit adds support for evicting from multiple ARC lists in parallel, by farming the evict work out to some number of threads and then accumulating their results. A new tuneable, zfs_arc_evict_threads, sets the number of threads. By default, it will scale based on the number of CPUs. Sponsored-by: Expensify, Inc. Sponsored-by: Klara, Inc. Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Youzhong Yang <[email protected]> Signed-off-by: Allan Jude <[email protected]> Signed-off-by: Mateusz Piotrowski <[email protected]> Signed-off-by: Alexander Stetsenko <[email protected]> Signed-off-by: Rob Norris <[email protected]> Co-authored-by: Rob Norris <[email protected]> Co-authored-by: Mateusz Piotrowski <[email protected]> Co-authored-by: Alexander Stetsenko <[email protected]> Closes openzfs#16486
It's been many years, we can probably do without. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Brian Behlendorf <[email protected]> Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: George Melikov <[email protected]> Reviewed-by: Pavel Snajdr <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#17376
It makes no sense to limit read size below the block size, since DMU will any way consume resources for the whole block, while the current zfs_vnops_read_chunk_size is only 1MB, which is smaller that maximum block size of 16MB. Plus in case of misaligned Uncached I/O the buffer may get evicted between the chunks, requiring repeating I/Os. On 64-bit platforms increase zfs_vnops_read_chunk_size to 32MB. It allows to less depend on speculative prefetcher if application requests specific size, first not waiting for prefetcher to start and later not prefetching more than needed. Also while there, we don't need to align reads to the chunk size, but only to a block size, which is smaller and so more forgiving. My profiles show ~4% of CPU time saving when reading 16MB blocks. Reviewed-by: Brian Behlendorf <[email protected]> Reviewed by: Igor Kozhukhov <[email protected]> Signed-off-by: Alexander Motin <[email protected]> Sponsored by: iXsystems, Inc. Closes openzfs#17415
These are only required to support these ioctls on Linux <4.5. Since 4.18 is our cutoff, we don't need this code anymore. Also removing related test things that will never match again. Sponsored-by: https://despairlabs.com/sponsor/ Reviewed-by: Alexander Motin <[email protected]> Reviewed-by: Tony Hutter <[email protected]> Signed-off-by: Rob Norris <[email protected]> Closes openzfs#17308
Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Germano Massullo <[email protected]> Closes openzfs#17461
Update the META file to reflect compatibility with the 6.15 kernel. Reviewed-by: Brian Behlendorf <[email protected]> Signed-off-by: Tony Hutter <[email protected]> Closes openzfs#17393
Signed-off-by: Rob Norris <[email protected]>
amotin
approved these changes
Jun 18, 2025
Merged and tagged 2.3.3 tagged. https://github.com/openzfs/zfs/releases/tag/zfs-2.3.3 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Motivation and Context
Proposed zfs-2.3.3 patchset based on the commits in the zfs-2.3.3-staging branch.
Description
Bump META file to support 6.15 kernel, zfs send/recv encryption fixes, misc fixes.
How Has This Been Tested?
Previously tested by CI, well be retested in this PR.
Types of changes
Checklist:
Signed-off-by
.