Remove unnecessary unsafe functions by djkoloski · Pull Request #998 · pest-parser/pest

djkoloski · 2024-03-21T16:46:19Z

Fundamentally, pest never does anything unsafe. All of the UTF-8 slicing uses indexing and is therefore checked. There's no need to provide the internal guarantee that all pest positions lie on UTF-8 boundaries when it provides no performance benefit.

Summary by CodeRabbit

Refactor
- Improved error handling mechanisms for better stability.
- Enhanced safety by removing unnecessary unsafe blocks and comments across various components.
- Streamlined Position and Span struct creations for increased code safety and readability.

coderabbitai · 2024-03-21T16:46:29Z

Walkthrough

The recent updates to the pest library involve significant improvements in error handling and safety. The changes include eliminating unsafe code blocks and refining the creation of Position and Span objects for better reliability. These modifications enhance the overall safety and maintainability of the codebase, making it more robust and error-resistant.

Changes

Files	Change Summary
`error.rs`, `parser_state.rs`	Replaced direct `Position::new` with `new_internal` for improved error handling.
`iterators/flat_pairs.rs`	Removed `unsafe` and safety comments in `FlatPairs`.
`iterators/pair.rs`	Updated safety comments and calls in `Pair` for safer `Span` creation.
`iterators/pairs.rs`	Eliminated `unsafe` blocks in `flatten`, `peek`, and `next_back`.
`iterators/tokens.rs`	Updated struct initialization and error handling in `Tokens`.
`position.rs`, `span.rs`	Refactored `Position` and `Span` creation to use `new_internal`, removing `unsafe` usage.

🐇✨
In the realm of code where bugs dare to tread,
A rabbit hopped in, making errors dread.
With a flick and a hop, unsafe tags were shed,
Positions and spans, now safely led.
"To safer pastures!" the rabbit said.
🌟🌿

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit-tests for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit tests for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit tests.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger a review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
The JSON schema for the configuration file is available here.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/coderabbit-overrides.v2.json

CodeRabbit Discord Community

Join our Discord Community to get help, request features, and share feedback.

djkoloski · 2024-03-21T16:47:26Z

One option for fixing #993

tomtau

Thanks for a fix! some of the changes, however, look like that they are semver-breaking due to return type changes; would it be possible to rewrite it in a backwards compatible way?

tomtau · 2024-03-22T00:34:03Z

-    /// `input[start..end]` must be a valid subslice; that is, said indexing should not panic.
-    pub(crate) unsafe fn new_unchecked(input: &str, start: usize, end: usize) -> Span<'_> {
+    /// Creates a new `Span`.
+    pub fn new(input: &str, start: usize, end: usize) -> Span<'_> {


Can the visibility remain as pub(crate)?

new_unchecked was pub(crate), new (deleted below) is pub.

tomtau · 2024-03-22T00:38:25Z

This test fails: https://github.com/pest-parser/pest/actions/runs/8378691318/job/22959618598#step:6:1

Fundamentally, pest never does anything unsafe. All of the UTF-8 slicing uses indexing and is therefore checked. There's no need to provide the internal guarantee that all pest positions lie on UTF-8 boundaries when it provides no performance benefit.

coderabbitai

Review Status

Actionable comments generated: 1

Configuration used: CodeRabbit UI

Commits

Files that changed from the base of the PR and between 9f9094e and 4f11bb7.

Files selected for processing (8)

pest/src/error.rs (1 hunks)
pest/src/iterators/flat_pairs.rs (3 hunks)
pest/src/iterators/pair.rs (2 hunks)
pest/src/iterators/pairs.rs (3 hunks)
pest/src/iterators/tokens.rs (3 hunks)
pest/src/parser_state.rs (1 hunks)
pest/src/position.rs (2 hunks)
pest/src/span.rs (4 hunks)

Additional comments: 25

pest/src/iterators/tokens.rs (4)

31-31: The change to initialize Tokens with a queue field and remove safety comments aligns with the PR's objective to enhance safety by removing unnecessary unsafe blocks. This modification ensures that the Tokens struct is initialized in a safer manner, contributing to the overall safety and maintainability of the library.

45-45: The error message "💥 INVALID Tokens CREATED 💥" provides clear feedback in case of an invalid Tokens creation. However, consider adding more context to the error message, such as the invalid input_pos, to aid in debugging.

74-74: The use of Position::new_internal instead of an unsafe method for creating Token::Start instances is a positive change towards ensuring safety. This approach leverages Rust's type system and compiler checks to enforce safety constraints, aligning with the PR's goals.

79-82: Similarly, replacing unsafe methods with Position::new_internal for creating Token::End instances enhances safety. This change effectively utilizes Rust's safety features and contributes to the maintainability and robustness of the code.

pest/src/iterators/flat_pairs.rs (3)

32-32: The change to make the new function for FlatPairs non-unsafe and the removal of safety comments are in line with the PR's objective to enhance safety by eliminating unnecessary unsafe blocks. This modification ensures that FlatPairs are created in a safer manner, contributing to the overall safety and maintainability of the library.

114-119: The transition from using unsafe blocks to safe methods for creating Pair instances in the Iterator implementation is a commendable change. It aligns with the PR's goals of enhancing safety and maintainability by leveraging Rust's safety features.

139-144: Similarly, the change in the DoubleEndedIterator implementation to create Pair instances without unsafe blocks enhances safety. This approach utilizes Rust's type system and compiler checks to enforce safety constraints, contributing to the robustness of the code.

pest/src/iterators/pair.rs (2)

48-48: The adjustments to the new function for Pair, including the removal of safety comments and the transition to safer method calls, align with the PR's objective to enhance safety. This change ensures that Pair instances are created in a safer manner, contributing to the overall safety and maintainability of the library.

207-207: Replacing an unsafe call with Span::new_internal for Span creation within the Pair implementation is a positive change towards ensuring safety. This approach leverages Rust's safety features and contributes to the maintainability and robustness of the code.

pest/src/span.rs (4)

31-31: Renaming new_unchecked to new_internal in the Span struct and removing safety comments related to character boundary indices are in line with the PR's objective to enhance safety by eliminating unnecessary unsafe blocks. This change ensures that Span instances are created in a safer manner, contributing to the overall safety and maintainability of the library.

137-137: The update to use Position::new_internal for creating the Span's start Position is a commendable change. It aligns with the PR's goals of enhancing safety and maintainability by leveraging Rust's safety features.

155-155: Similarly, updating the method to use Position::new_internal for creating the Span's end Position enhances safety. This approach utilizes Rust's type system and compiler checks to enforce safety constraints, contributing to the robustness of the code.

173-174: The changes in the split method to use Position::new_internal for both positions further emphasize the PR's commitment to safety. This modification ensures that positions are created safely, aligning with Rust's safety guarantees.

pest/src/position.rs (2)

28-28: The method new_internal is introduced as a safe alternative to creating a Position instance. It uses debug_assert! to ensure the position is within bounds in debug mode. This is a positive change as it maintains safety guarantees while potentially catching errors during development.

102-102: The change in the span method to use new_internal instead of an unsafe block is a significant improvement in safety. This adjustment ensures that the creation of Span instances is done without bypassing Rust's safety checks, aligning with the PR's objective to remove unnecessary unsafe usage.

pest/src/iterators/pairs.rs (3)

208-208: Removing unsafe blocks in the flatten method is a positive change, enhancing safety by relying on safe alternatives for iterator manipulation. This change aligns with the PR's goal of eliminating unnecessary unsafe usage.

350-355: The peek method now creates a Pair instance without using unsafe blocks, which is a commendable improvement. This change ensures that the method adheres to Rust's safety guarantees, contributing to the overall safety of the library.

428-433: Similarly, the next_back method's update to create a Pair instance without unsafe blocks is a significant safety enhancement. This modification further supports the PR's objective of reducing unnecessary unsafe usage across the library.

pest/src/error.rs (1)

493-493: The change from Position::new(input, error_position).unwrap() to Position::new_internal(input, error_position) is a positive step towards enhancing safety by avoiding potential panics. This aligns with Rust's safety principles and the PR's objectives.

However, it's crucial to ensure that Position::new_internal properly handles all cases that Position::new(...).unwrap() did, especially regarding input validation and error handling. It would be beneficial to verify that Position::new_internal does not introduce any regressions or unexpected behavior, particularly in edge cases.
pest/src/parser_state.rs (6)
466-466: The change from Position::new to Position::new_internal within an Err block is a positive step towards enhancing safety by avoiding unwrapping and directly handling potential errors internally. This aligns with the PR's objective of removing unnecessary unsafe blocks and improving error handling.

463-469: > 📝 NOTE

This review was outside the diff hunks, and no overlapping diff hunk was found. Original lines [1-1]

The file header correctly includes copyright and licensing information, which is essential for open-source projects to clearly state their licensing terms.

466-466: While the change to Position::new_internal is approved, it's important to ensure that all other instances in the codebase where Position::new might be used are also reviewed and updated if necessary. This is to maintain consistency and safety across the entire library.

Verification successful

The findings from the executed script confirm the presence of multiple instances where Position::new is used across various files in the library. This supports the original review comment's suggestion to review and potentially update these instances to maintain consistency and safety across the entire library. It is recommended to carefully evaluate each usage of Position::new to determine if it should be replaced with Position::new_internal or another method, depending on the specific requirements and context of each case.
Scripts Executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Search for any remaining uses of Position::new that might need attention.
rg "Position::new\(" --type rust
Length of output: 5274
* 466-466: The use of `Position::new_internal` here is a good practice as it avoids potential panics from unwrapping and leverages internal error handling mechanisms. This change enhances the safety and robustness of the error handling process within the parser state. * 466-466: The stack manipulation methods (`stack_push`, `stack_pop`, etc.) are well-implemented, providing a clear and efficient way to manage parser state. It's important to ensure that these methods are covered by comprehensive tests, especially for edge cases such as empty stacks or invalid indices. * 466-466: The handling of lookahead and atomicity within the parser state is sophisticated and allows for flexible parsing strategies. However, it's crucial to document these features thoroughly in the code comments or external documentation to aid in understanding and maintenance, especially for new contributors or users of the library.

coderabbitai · 2024-03-22T20:05:46Z

                variant,
-                // TODO(performance): Guarantee state.attempt_pos is a valid position
-                Position::new(input, state.attempt_pos).unwrap(),
+                Position::new_internal(input, state.attempt_pos),


Consider adding unit tests specifically targeting the error handling behavior of Position::new_internal to ensure its functionality aligns with expectations and to catch any potential edge cases.

Would you like me to help by drafting some unit tests for this change?

tomtau

the change is fine, but some function comments may remain, because the other functions could panic if invalid indices were provided (I haven't checked that in detail though).

If the other functions could panic, perhaps they could either be modified to use checked access or have that directly in their comments?

tomtau · 2024-03-23T02:30:17Z

-    ///
-    /// # Safety:
-    ///
-    /// `input[pos..]` must be a valid codepoint boundary (should not panic when indexing thus).


I think some of these comments may still be valid that the caller is responsible for providing that valid pos or start..end indices, because the access is sometimes done directly via self.input[pos] instead of self.input.get(pos)

djkoloski · 2024-03-23T16:41:30Z

I feel like this PR is not getting to the point. Would you prefer:

Pest keeps the type invariant that Position always lands on a UTF-8 codepoint boundary, or
Pest stops caring about whether Position lands on a codepoint boundary because all slicing and indexing operations are checked.

The implications of 1:

All Positions must refer to a valid UTF-8 codepoint boundary. Similar invariants propagate into Span, Pair, FlatPairs, etc.
Instead of removing unsafe from the new_unchecked functions, all uses of the unsafe functions are verified.
Indexing and slicing operations using Position switch to unchecked versions, skipping bounds and codepoint boundary checking.

The implications of 2:

All of the unsafe functions are turned safe.
No more safety docs required. They should be removed because safety docs are for unsafe code.

I would also appreciate clarity on:

Whether Pest wants separate strict/checked APIs. Compare: checked_pow vs strict_pow. Strict APIs panic on invalid input, checked APIs return None on invalid input.
Whether Pest documents panics in a # Panics section following the standard library pattern. Note that unlike safety docs, panic docs are not required for soundness.

Right now, this PR implements option 2 with a permissive internal API (panics eagerly in debug, panics lazily in release) and a checked external API. Note: flat_pairs::new, pair::new, pairs::new, and tokens::new are all internal APIs (checked by enabling the unreachable_pub lint).

tomtau · 2024-03-24T01:17:52Z

Thanks @djkoloski , that helps.

Right now, this PR implements option 2 with a permissive internal API (panics eagerly in debug, panics lazily in release) and a checked external API.

Yes, I think that option 2 is fine if those remain internal (from a quick look I wasn't sure if those pub methods are reachable from outside).

Whether Pest wants separate strict/checked APIs. Compare: checked_pow vs strict_pow. Strict APIs panic on invalid input, checked APIs return None on invalid input.

Maybe not at this moment, but good to consider for 3.X. Right now, we could separate them for internal API without breaking changes, but it may seem inconsistent with external API.

Whether Pest documents panics in a # Panics section following the standard library pattern. Note that unlike safety docs, panic docs are not required for soundness.

It doesn't, at least not consistently, but it should.

Anyway, I think we can merge this PR and open an issue for documenting panics.

djkoloski requested a review from a team as a code owner March 21, 2024 16:46

djkoloski requested review from NoahTheDuke and removed request for a team March 21, 2024 16:46

tomtau added the pr label Mar 22, 2024

tomtau reviewed Mar 22, 2024

View reviewed changes

djkoloski force-pushed the remove_unsafe branch from 8c350a5 to 4f11bb7 Compare March 22, 2024 20:02

coderabbitai Bot reviewed Mar 22, 2024

View reviewed changes

tomtau approved these changes Mar 23, 2024

View reviewed changes

tomtau linked an issue Mar 24, 2024 that may be closed by this pull request

Pairs can be made with mismatched input str and Vec<QueueableToken> using pest::state #993

Closed

tomtau merged commit 9d25248 into pest-parser:master Mar 24, 2024

tomtau mentioned this pull request Mar 24, 2024

Document panics #999

Open

Uh oh!

Conversation

djkoloski commented Mar 21, 2024 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

CodeRabbit Discord Community

Uh oh!

djkoloski commented Mar 21, 2024

Uh oh!

tomtau left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

tomtau Mar 22, 2024

Choose a reason for hiding this comment

Uh oh!

NoahTheDuke Mar 22, 2024

Choose a reason for hiding this comment

Uh oh!

tomtau commented Mar 22, 2024

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 22, 2024

Choose a reason for hiding this comment

Uh oh!

tomtau left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tomtau Mar 23, 2024

Choose a reason for hiding this comment

Uh oh!

djkoloski commented Mar 23, 2024

Uh oh!

tomtau commented Mar 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

djkoloski commented Mar 21, 2024 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 21, 2024 •

edited

Loading

CodeRabbit Configration File (`.coderabbit.yaml`)

tomtau left a comment •

edited

Loading