Remove unnecessary unsafe functions#998
Conversation
WalkthroughThe recent updates to the Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
|
One option for fixing #993 |
tomtau
left a comment
There was a problem hiding this comment.
Thanks for a fix! some of the changes, however, look like that they are semver-breaking due to return type changes; would it be possible to rewrite it in a backwards compatible way?
| /// `input[start..end]` must be a valid subslice; that is, said indexing should not panic. | ||
| pub(crate) unsafe fn new_unchecked(input: &str, start: usize, end: usize) -> Span<'_> { | ||
| /// Creates a new `Span`. | ||
| pub fn new(input: &str, start: usize, end: usize) -> Span<'_> { |
There was a problem hiding this comment.
Can the visibility remain as pub(crate)?
There was a problem hiding this comment.
new_unchecked was pub(crate), new (deleted below) is pub.
Fundamentally, pest never does anything unsafe. All of the UTF-8 slicing uses indexing and is therefore checked. There's no need to provide the internal guarantee that all pest positions lie on UTF-8 boundaries when it provides no performance benefit.
8c350a5 to
4f11bb7
Compare
There was a problem hiding this comment.
Review Status
Actionable comments generated: 1
Configuration used: CodeRabbit UI
Files selected for processing (8)
- pest/src/error.rs (1 hunks)
- pest/src/iterators/flat_pairs.rs (3 hunks)
- pest/src/iterators/pair.rs (2 hunks)
- pest/src/iterators/pairs.rs (3 hunks)
- pest/src/iterators/tokens.rs (3 hunks)
- pest/src/parser_state.rs (1 hunks)
- pest/src/position.rs (2 hunks)
- pest/src/span.rs (4 hunks)
Additional comments: 25
pest/src/iterators/tokens.rs (4)
- 31-31: The change to initialize
Tokenswith aqueuefield and remove safety comments aligns with the PR's objective to enhance safety by removing unnecessaryunsafeblocks. This modification ensures that theTokensstruct is initialized in a safer manner, contributing to the overall safety and maintainability of the library.- 45-45: The error message "💥 INVALID
TokensCREATED 💥" provides clear feedback in case of an invalidTokenscreation. However, consider adding more context to the error message, such as the invalidinput_pos, to aid in debugging.- 74-74: The use of
Position::new_internalinstead of an unsafe method for creatingToken::Startinstances is a positive change towards ensuring safety. This approach leverages Rust's type system and compiler checks to enforce safety constraints, aligning with the PR's goals.- 79-82: Similarly, replacing unsafe methods with
Position::new_internalfor creatingToken::Endinstances enhances safety. This change effectively utilizes Rust's safety features and contributes to the maintainability and robustness of the code.pest/src/iterators/flat_pairs.rs (3)
- 32-32: The change to make the
newfunction forFlatPairsnon-unsafe and the removal of safety comments are in line with the PR's objective to enhance safety by eliminating unnecessaryunsafeblocks. This modification ensures thatFlatPairsare created in a safer manner, contributing to the overall safety and maintainability of the library.- 114-119: The transition from using unsafe blocks to safe methods for creating
Pairinstances in theIteratorimplementation is a commendable change. It aligns with the PR's goals of enhancing safety and maintainability by leveraging Rust's safety features.- 139-144: Similarly, the change in the
DoubleEndedIteratorimplementation to createPairinstances without unsafe blocks enhances safety. This approach utilizes Rust's type system and compiler checks to enforce safety constraints, contributing to the robustness of the code.pest/src/iterators/pair.rs (2)
- 48-48: The adjustments to the
newfunction forPair, including the removal of safety comments and the transition to safer method calls, align with the PR's objective to enhance safety. This change ensures thatPairinstances are created in a safer manner, contributing to the overall safety and maintainability of the library.- 207-207: Replacing an unsafe call with
Span::new_internalforSpancreation within thePairimplementation is a positive change towards ensuring safety. This approach leverages Rust's safety features and contributes to the maintainability and robustness of the code.pest/src/span.rs (4)
- 31-31: Renaming
new_uncheckedtonew_internalin theSpanstruct and removing safety comments related to character boundary indices are in line with the PR's objective to enhance safety by eliminating unnecessaryunsafeblocks. This change ensures thatSpaninstances are created in a safer manner, contributing to the overall safety and maintainability of the library.- 137-137: The update to use
Position::new_internalfor creating theSpan's startPositionis a commendable change. It aligns with the PR's goals of enhancing safety and maintainability by leveraging Rust's safety features.- 155-155: Similarly, updating the method to use
Position::new_internalfor creating theSpan's endPositionenhances safety. This approach utilizes Rust's type system and compiler checks to enforce safety constraints, contributing to the robustness of the code.- 173-174: The changes in the
splitmethod to usePosition::new_internalfor both positions further emphasize the PR's commitment to safety. This modification ensures that positions are created safely, aligning with Rust's safety guarantees.pest/src/position.rs (2)
- 28-28: The method
new_internalis introduced as a safe alternative to creating aPositioninstance. It usesdebug_assert!to ensure the position is within bounds in debug mode. This is a positive change as it maintains safety guarantees while potentially catching errors during development.- 102-102: The change in the
spanmethod to usenew_internalinstead of an unsafe block is a significant improvement in safety. This adjustment ensures that the creation ofSpaninstances is done without bypassing Rust's safety checks, aligning with the PR's objective to remove unnecessaryunsafeusage.pest/src/iterators/pairs.rs (3)
- 208-208: Removing
unsafeblocks in theflattenmethod is a positive change, enhancing safety by relying on safe alternatives for iterator manipulation. This change aligns with the PR's goal of eliminating unnecessaryunsafeusage.- 350-355: The
peekmethod now creates aPairinstance without usingunsafeblocks, which is a commendable improvement. This change ensures that the method adheres to Rust's safety guarantees, contributing to the overall safety of the library.- 428-433: Similarly, the
next_backmethod's update to create aPairinstance withoutunsafeblocks is a significant safety enhancement. This modification further supports the PR's objective of reducing unnecessaryunsafeusage across the library.pest/src/error.rs (1)
- 493-493: The change from
Position::new(input, error_position).unwrap()toPosition::new_internal(input, error_position)is a positive step towards enhancing safety by avoiding potential panics. This aligns with Rust's safety principles and the PR's objectives.However, it's crucial to ensure that
Position::new_internalproperly handles all cases thatPosition::new(...).unwrap()did, especially regarding input validation and error handling. It would be beneficial to verify thatPosition::new_internaldoes not introduce any regressions or unexpected behavior, particularly in edge cases.pest/src/parser_state.rs (6)
- 466-466: The change from
Position::newtoPosition::new_internalwithin anErrblock is a positive step towards enhancing safety by avoiding unwrapping and directly handling potential errors internally. This aligns with the PR's objective of removing unnecessaryunsafeblocks and improving error handling.- 463-469: > 📝 NOTE
This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [1-1]
The file header correctly includes copyright and licensing information, which is essential for open-source projects to clearly state their licensing terms.
- 466-466: While the change to
Position::new_internalis approved, it's important to ensure that all other instances in the codebase wherePosition::newmight be used are also reviewed and updated if necessary. This is to maintain consistency and safety across the entire library.Verification successful
The findings from the executed script confirm the presence of multiple instances where
Position::newis used across various files in the library. This supports the original review comment's suggestion to review and potentially update these instances to maintain consistency and safety across the entire library. It is recommended to carefully evaluate each usage ofPosition::newto determine if it should be replaced withPosition::new_internalor another method, depending on the specific requirements and context of each case.* 466-466: The use of `Position::new_internal` here is a good practice as it avoids potential panics from unwrapping and leverages internal error handling mechanisms. This change enhances the safety and robustness of the error handling process within the parser state. * 466-466: The stack manipulation methods (`stack_push`, `stack_pop`, etc.) are well-implemented, providing a clear and efficient way to manage parser state. It's important to ensure that these methods are covered by comprehensive tests, especially for edge cases such as empty stacks or invalid indices. * 466-466: The handling of lookahead and atomicity within the parser state is sophisticated and allows for flexible parsing strategies. However, it's crucial to document these features thoroughly in the code comments or external documentation to aid in understanding and maintenance, especially for new contributors or users of the library.Scripts Executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Search for any remaining uses of Position::new that might need attention. rg "Position::new\(" --type rustLength of output: 5274
| variant, | ||
| // TODO(performance): Guarantee state.attempt_pos is a valid position | ||
| Position::new(input, state.attempt_pos).unwrap(), | ||
| Position::new_internal(input, state.attempt_pos), |
There was a problem hiding this comment.
Consider adding unit tests specifically targeting the error handling behavior of Position::new_internal to ensure its functionality aligns with expectations and to catch any potential edge cases.
Would you like me to help by drafting some unit tests for this change?
There was a problem hiding this comment.
the change is fine, but some function comments may remain, because the other functions could panic if invalid indices were provided (I haven't checked that in detail though).
If the other functions could panic, perhaps they could either be modified to use checked access or have that directly in their comments?
| /// | ||
| /// # Safety: | ||
| /// | ||
| /// `input[pos..]` must be a valid codepoint boundary (should not panic when indexing thus). |
There was a problem hiding this comment.
I think some of these comments may still be valid that the caller is responsible for providing that valid pos or start..end indices, because the access is sometimes done directly via self.input[pos] instead of self.input.get(pos)
|
I feel like this PR is not getting to the point. Would you prefer:
The implications of 1:
The implications of 2:
I would also appreciate clarity on:
Right now, this PR implements option 2 with a permissive internal API (panics eagerly in debug, panics lazily in release) and a checked external API. Note: |
|
Thanks @djkoloski , that helps.
Yes, I think that option 2 is fine if those remain internal (from a quick look I wasn't sure if those pub methods are reachable from outside).
Maybe not at this moment, but good to consider for 3.X. Right now, we could separate them for internal API without breaking changes, but it may seem inconsistent with external API.
It doesn't, at least not consistently, but it should. Anyway, I think we can merge this PR and open an issue for documenting panics. |
Fundamentally, pest never does anything unsafe. All of the UTF-8 slicing uses indexing and is therefore checked. There's no need to provide the internal guarantee that all pest positions lie on UTF-8 boundaries when it provides no performance benefit.
Summary by CodeRabbit
unsafeblocks and comments across various components.PositionandSpanstruct creations for increased code safety and readability.