Skip to content

Conversation

rjpat
Copy link
Contributor

@rjpat rjpat commented Jun 21, 2020

Closes #936

Have tried to completely sub out the stringi functions with the equivalent base ones and work without modifying any existing tests.

Copy link
Member

@hadley hadley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for starting on this! It's going to be a bit tricky to verify that we've preserved behaviour as much as possible, but once you've added a few tests, we can do a complete revdep check run to see if the behaviour of any CRAN packages changes.

R/extract.R Outdated
@@ -55,7 +55,9 @@ str_extract <- function(x, into, regex, convert = FALSE) {
is_character(into)
)

matches <- stringi::stri_match_first_regex(x, regex)[, -1, drop = FALSE]
matches <- lapply(regmatches(x, regexec(regex, x)),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please pull this out into a separate function? It's going to need unit tests for edge cases that stri_match_first_regex() might handle, but the base methods do not. I don't know exactly what those are, but I'd start with tests around no matches, NA matches, ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulled it out and added a test suite. Based it off of stringi and adjusted to match. stringi had a lot of tests that worked around capture groups and blanks however tidyr won't need it as it checks earlier in extract to ensure the number of capture groups is consistent with the into value

R/extract.R Outdated

str_match_first <- function(x, regex) {
if (length(x) == 0) {
# Can't determine number of matches
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this needs a little extra thought (I added the branch just as a placeholder:

stringi::stri_match_first(character(), regex = "(.)-(.)")
#>      [,1] [,2] [,3]
tidyr:::str_match_first(character(), regex = "(.)-(.)")
#> NULL

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmmm, also problematic when there are no matches:

str(stringi::stri_match_first("", regex = "(.)-(.)"))
#>  chr [1, 1:3] NA NA NA
str(tidyr:::str_match_first("", regex = "(.)-(.)"))
#>  chr[1, 0 ]

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just remembered I have some code lying around for exactly this problem 😀

@hadley hadley merged commit f0bcfd9 into tidyverse:master Jul 21, 2020
@hadley
Copy link
Member

hadley commented Jul 21, 2020

Thanks @rjpat!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Drop stringi dependency
2 participants