Skip to content

Clarify and restrict dmu_tx_assign() errors #17463

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 23, 2025

Conversation

behlendorf
Copy link
Contributor

Motivation and Context

ztest could ASSERT in ztest_tx_assign() when an IO error was encountered. This occurs because it only checks for ERESTART and ENOSPC. It does not consider either the over quota or IO error cases. Let's fix that.

Rather than attempting to enumerate each possible error code all tx->tx_err's are converted to EIO. I couldn't find any cases where the additional detail was actually useful to the caller, so I opted to simply the interface. The intent is to make it easier for callers to reason about each possible error and handle it appropriately.

Description

There are three possible cases where dmu_tx_assign() may encounter a fatal error. When there is a true lack of free space (ENOSPC), when there is a lack of quota space (EDQUOT), or when data required to perform the transaction cannot be read from disk (EIO). See the dmu_tx_check_ioerr() function for additional details on the motivation for checking for I/O errors early.

Prior to this change dmu_tx_assign() would return the contents of tx->tx_err which covered a wide range of possible error codes (EIO, ECKSUM, ESRCH, etc). In practice, none of the callers could do anything useful with this level of detail and simply returned the error.

Therefore, this change converts all tx->tx_err errors to EIO, adds ASSERTs to dmu_tx_assign() to cover the only possible errors, and clarifies the function comment to include EIO as a possible fatal error.

How Has This Been Tested?

Locally compiled. It still need a full run through the test suite by the CI.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Quality assurance (non-breaking change which makes the code more robust against bugs)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@behlendorf behlendorf added the Component: Test Suite Indicates an issue with the test framework or a test case label Jun 16, 2025
Copy link
Member

@robn robn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems right. I dunno if it's worth noting that the call to dsl_dir_tempreserve_space() in dmu_tx_try_assign() is the only place ENOSPC and EDQUOT can come from. It's easy to figure out by implication I guess, and the asserts in dsl_dir_tempreserve_space() won't let anything else happen, and if it changed in the future, the other asserts would catch it all. So now I've said it all, I guess not :)

There are three possible cases where dmu_tx_assign() may
encounter a fatal error.  When there is a true lack of free
space (ENOSPC), when there is a lack of quota space (EDQUOT),
or when data required to perform the transaction cannot be
read from disk (EIO).  See the dmu_tx_check_ioerr() function
for additional details of on the motivation for check for
I/O error early.

Prior to this change dmu_tx_assign() would return the
contents of tx->tx_err which covered a wide range of possible
error codes (EIO, ECKSUM, ESRCH, etc).  In practice, none
of the callers could do anything useful with this level of
detail and simply returned the error.

Therefore, this change converts all tx->tx_err errors to EIO,
adds ASSERTs to dmu_tx_assign() to cover the only possible
errors, and clarifies the function comment to include EIO as
a possible fatal error.

Signed-off-by: Brian D Behlendorf <[email protected]>
@behlendorf behlendorf force-pushed the ztest-dmu_tx_assign branch from d0d2e34 to 32498a3 Compare June 19, 2025 18:14
@behlendorf
Copy link
Contributor Author

the asserts in dsl_dir_tempreserve_space() won't let anything else happen

Yeah, I may have gotten a bit carried away. I went ahead and removed most of the asserts in dsl_dir_tempreserve_space() since as you said they'll get caught later. I opted to keep the one that verifies that arc_tempreserve_space() can only safely return 0, EAGAIN, or ERESTART as a form of documentation.

@amotin amotin added the Status: Code Review Needed Ready for review and testing label Jun 19, 2025
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Jun 19, 2025
@amotin amotin merged commit 48ce292 into openzfs:master Jun 23, 2025
21 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: Test Suite Indicates an issue with the test framework or a test case Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants