-
Notifications
You must be signed in to change notification settings - Fork 417
Remove stringi dependency #936 #986
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for starting on this! It's going to be a bit tricky to verify that we've preserved behaviour as much as possible, but once you've added a few tests, we can do a complete revdep check run to see if the behaviour of any CRAN packages changes.
R/extract.R
Outdated
@@ -55,7 +55,9 @@ str_extract <- function(x, into, regex, convert = FALSE) { | |||
is_character(into) | |||
) | |||
|
|||
matches <- stringi::stri_match_first_regex(x, regex)[, -1, drop = FALSE] | |||
matches <- lapply(regmatches(x, regexec(regex, x)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please pull this out into a separate function? It's going to need unit tests for edge cases that stri_match_first_regex()
might handle, but the base methods do not. I don't know exactly what those are, but I'd start with tests around no matches, NA matches, ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pulled it out and added a test suite. Based it off of stringi and adjusted to match. stringi had a lot of tests that worked around capture groups and blanks however tidyr won't need it as it checks earlier in extract to ensure the number of capture groups is consistent with the into value
Co-authored-by: Hadley Wickham <[email protected]>
R/extract.R
Outdated
|
||
str_match_first <- function(x, regex) { | ||
if (length(x) == 0) { | ||
# Can't determine number of matches |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this needs a little extra thought (I added the branch just as a placeholder:
stringi::stri_match_first(character(), regex = "(.)-(.)")
#> [,1] [,2] [,3]
tidyr:::str_match_first(character(), regex = "(.)-(.)")
#> NULL
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmmm, also problematic when there are no matches:
str(stringi::stri_match_first("", regex = "(.)-(.)"))
#> chr [1, 1:3] NA NA NA
str(tidyr:::str_match_first("", regex = "(.)-(.)"))
#> chr[1, 0 ]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just remembered I have some code lying around for exactly this problem 😀
Thanks @rjpat! |
Closes #936
Have tried to completely sub out the stringi functions with the equivalent base ones and work without modifying any existing tests.