-
Notifications
You must be signed in to change notification settings - Fork 418
Description
I was somewhat surprised at the pivot_longer()
results below. It seems to attempt to keep the row values close together (i.e. the original row 1 values became the new row 1 and 2 values), when really I wanted to keep the column values together (i.e. the original column 1 values became the new row 1 and row 2 values).
This seems very related to names_vary
in pivot_wider()
, but I don't quite think the name is exactly right here.
I think a good name might actually be cols_vary = "fastest"
(i.e. it iterates through all the columns before moving on to the next row). This goes nicely with the cols
argument.
library(tidyr)
df <- tibble(
start = as.Date(c("2019-01-01", "2019-01-02")),
end = as.Date(c("2019-01-03", "2019-01-04"))
)
df
#> # A tibble: 2 × 2
#> start end
#> <date> <date>
#> 1 2019-01-01 2019-01-03
#> 2 2019-01-02 2019-01-04
pivot_longer(df, c(start, end))
#> # A tibble: 4 × 2
#> name value
#> <chr> <date>
#> 1 start 2019-01-01
#> 2 end 2019-01-03
#> 3 start 2019-01-02
#> 4 end 2019-01-04
# I sort of expected this here:
pivot_longer(df, c(start, end)) %>%
dplyr::arrange(desc(name))
#> # A tibble: 4 × 2
#> name value
#> <chr> <date>
#> 1 start 2019-01-01
#> 2 start 2019-01-02
#> 3 end 2019-01-03
#> 4 end 2019-01-04
# This is what we get from gather
gather(df, "name", "value", start, end)
#> # A tibble: 4 × 2
#> name value
#> <chr> <date>
#> 1 start 2019-01-01
#> 2 start 2019-01-02
#> 3 end 2019-01-03
#> 4 end 2019-01-04
df <- tibble(
id = c(1L, 1L, 2L, 2L),
start = as.Date(c("2019-01-01")) + 0:3,
end = as.Date(c("2019-01-03")) + 0:3
)
df
#> # A tibble: 4 × 3
#> id start end
#> <int> <date> <date>
#> 1 1 2019-01-01 2019-01-03
#> 2 1 2019-01-02 2019-01-04
#> 3 2 2019-01-03 2019-01-05
#> 4 2 2019-01-04 2019-01-06
# Not this:
pivot_longer(df, c(start, end))
#> # A tibble: 8 × 3
#> id name value
#> <int> <chr> <date>
#> 1 1 start 2019-01-01
#> 2 1 end 2019-01-03
#> 3 1 start 2019-01-02
#> 4 1 end 2019-01-04
#> 5 2 start 2019-01-03
#> 6 2 end 2019-01-05
#> 7 2 start 2019-01-04
#> 8 2 end 2019-01-06
# I actually dont want this either because i think all of `id == 1` should
# be kept together
gather(df, "name", "value", start, end)
#> # A tibble: 8 × 3
#> id name value
#> <int> <chr> <date>
#> 1 1 start 2019-01-01
#> 2 1 start 2019-01-02
#> 3 2 start 2019-01-03
#> 4 2 start 2019-01-04
#> 5 1 end 2019-01-03
#> 6 1 end 2019-01-04
#> 7 2 end 2019-01-05
#> 8 2 end 2019-01-06
# This is what I really wanted, and is what `cols_vary = "slowest"` would give
pivot_longer(df, c(start, end)) %>%
dplyr::arrange(id, desc(name))
#> # A tibble: 8 × 3
#> id name value
#> <int> <chr> <date>
#> 1 1 start 2019-01-01
#> 2 1 start 2019-01-02
#> 3 1 end 2019-01-03
#> 4 1 end 2019-01-04
#> 5 2 start 2019-01-03
#> 6 2 start 2019-01-04
#> 7 2 end 2019-01-05
#> 8 2 end 2019-01-06
Implementation wise, I think we need to not interleave here:
Lines 259 to 265 in 48ba23d
out <- vec_c(!!!val_cols, .ptype = val_type) | |
# Interleave into correct order | |
# TODO somehow `t(matrix(x))` is _faster_ than `matrix(x, byrow = TRUE)` | |
# if this gets fixed in R this should use `byrow = TRUE` again | |
n_vals <- nrow(data) * length(val_cols) | |
idx <- t(matrix(seq_len(n_vals), ncol = length(val_cols))) | |
vals[[value]] <- vec_slice(out, as.integer(idx)) |
And then maybe use vec_rep_each()
here instead of vec_rep()
(that feels very similar to how names_vary
works)
Line 274 in 48ba23d
vec_rep(keys, vec_size(data)), |