`pivot_longer()` should allow for varying the columns slower than the rows

I was somewhat surprised at the `pivot_longer()` results below. It seems to attempt to keep the _row_ values close together (i.e. the original row 1 values became the new row 1 and 2 values), when really I wanted to keep the _column_ values together (i.e. the original column 1 values became the new row 1 and row 2 values).

This seems very related to `names_vary` in `pivot_wider()`, but I don't quite think the name is exactly right here.

I think a good name might actually be `cols_vary = "fastest"` (i.e. it iterates through all the columns before moving on to the next row). This goes nicely with the `cols` argument.

``` r
library(tidyr)

df <- tibble(
  start = as.Date(c("2019-01-01", "2019-01-02")),
  end = as.Date(c("2019-01-03", "2019-01-04"))
)
df
#> # A tibble: 2 × 2
#>   start      end       
#>   <date>     <date>    
#> 1 2019-01-01 2019-01-03
#> 2 2019-01-02 2019-01-04

pivot_longer(df, c(start, end))
#> # A tibble: 4 × 2
#>   name  value     
#>   <chr> <date>    
#> 1 start 2019-01-01
#> 2 end   2019-01-03
#> 3 start 2019-01-02
#> 4 end   2019-01-04

# I sort of expected this here:
pivot_longer(df, c(start, end)) %>%
  dplyr::arrange(desc(name))
#> # A tibble: 4 × 2
#>   name  value     
#>   <chr> <date>    
#> 1 start 2019-01-01
#> 2 start 2019-01-02
#> 3 end   2019-01-03
#> 4 end   2019-01-04

# This is what we get from gather
gather(df, "name", "value", start, end)
#> # A tibble: 4 × 2
#>   name  value     
#>   <chr> <date>    
#> 1 start 2019-01-01
#> 2 start 2019-01-02
#> 3 end   2019-01-03
#> 4 end   2019-01-04


df <- tibble(
  id = c(1L, 1L, 2L, 2L),
  start = as.Date(c("2019-01-01")) + 0:3,
  end = as.Date(c("2019-01-03")) + 0:3
)
df
#> # A tibble: 4 × 3
#>      id start      end       
#>   <int> <date>     <date>    
#> 1     1 2019-01-01 2019-01-03
#> 2     1 2019-01-02 2019-01-04
#> 3     2 2019-01-03 2019-01-05
#> 4     2 2019-01-04 2019-01-06

# Not this:
pivot_longer(df, c(start, end))
#> # A tibble: 8 × 3
#>      id name  value     
#>   <int> <chr> <date>    
#> 1     1 start 2019-01-01
#> 2     1 end   2019-01-03
#> 3     1 start 2019-01-02
#> 4     1 end   2019-01-04
#> 5     2 start 2019-01-03
#> 6     2 end   2019-01-05
#> 7     2 start 2019-01-04
#> 8     2 end   2019-01-06

# I actually dont want this either because i think all of `id == 1` should
# be kept together
gather(df, "name", "value", start, end)
#> # A tibble: 8 × 3
#>      id name  value     
#>   <int> <chr> <date>    
#> 1     1 start 2019-01-01
#> 2     1 start 2019-01-02
#> 3     2 start 2019-01-03
#> 4     2 start 2019-01-04
#> 5     1 end   2019-01-03
#> 6     1 end   2019-01-04
#> 7     2 end   2019-01-05
#> 8     2 end   2019-01-06

# This is what I really wanted, and is what `cols_vary = "slowest"` would give
pivot_longer(df, c(start, end)) %>%
  dplyr::arrange(id, desc(name))
#> # A tibble: 8 × 3
#>      id name  value     
#>   <int> <chr> <date>    
#> 1     1 start 2019-01-01
#> 2     1 start 2019-01-02
#> 3     1 end   2019-01-03
#> 4     1 end   2019-01-04
#> 5     2 start 2019-01-03
#> 6     2 start 2019-01-04
#> 7     2 end   2019-01-05
#> 8     2 end   2019-01-06
```

Implementation wise, I think we need to _not_ interleave here:
https://github.com/tidyverse/tidyr/blob/48ba23db97ebeb78101a7542564d573106a7f682/R/pivot-long.R#L259-L265

And then maybe use `vec_rep_each()` here instead of `vec_rep()` (that feels very similar to how `names_vary` works)
https://github.com/tidyverse/tidyr/blob/48ba23db97ebeb78101a7542564d573106a7f682/R/pivot-long.R#L274



	out <- vec_c(!!!val_cols, .ptype = val_type)
	# Interleave into correct order
	# TODO somehow `t(matrix(x))` is _faster_ than `matrix(x, byrow = TRUE)`
	# if this gets fixed in R this should use `byrow = TRUE` again
	n_vals <- nrow(data) * length(val_cols)
	idx <- t(matrix(seq_len(n_vals), ncol = length(val_cols)))
	vals[[value]] <- vec_slice(out, as.integer(idx))

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`pivot_longer()` should allow for varying the columns slower than the rows #1312

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

pivot_longer() should allow for varying the columns slower than the rows #1312

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`pivot_longer()` should allow for varying the columns slower than the rows #1312