Skip to content

pivot_longer() should allow for varying the columns slower than the rows #1312

@DavisVaughan

Description

@DavisVaughan

I was somewhat surprised at the pivot_longer() results below. It seems to attempt to keep the row values close together (i.e. the original row 1 values became the new row 1 and 2 values), when really I wanted to keep the column values together (i.e. the original column 1 values became the new row 1 and row 2 values).

This seems very related to names_vary in pivot_wider(), but I don't quite think the name is exactly right here.

I think a good name might actually be cols_vary = "fastest" (i.e. it iterates through all the columns before moving on to the next row). This goes nicely with the cols argument.

library(tidyr)

df <- tibble(
  start = as.Date(c("2019-01-01", "2019-01-02")),
  end = as.Date(c("2019-01-03", "2019-01-04"))
)
df
#> # A tibble: 2 × 2
#>   start      end       
#>   <date>     <date>    
#> 1 2019-01-01 2019-01-03
#> 2 2019-01-02 2019-01-04

pivot_longer(df, c(start, end))
#> # A tibble: 4 × 2
#>   name  value     
#>   <chr> <date>    
#> 1 start 2019-01-01
#> 2 end   2019-01-03
#> 3 start 2019-01-02
#> 4 end   2019-01-04

# I sort of expected this here:
pivot_longer(df, c(start, end)) %>%
  dplyr::arrange(desc(name))
#> # A tibble: 4 × 2
#>   name  value     
#>   <chr> <date>    
#> 1 start 2019-01-01
#> 2 start 2019-01-02
#> 3 end   2019-01-03
#> 4 end   2019-01-04

# This is what we get from gather
gather(df, "name", "value", start, end)
#> # A tibble: 4 × 2
#>   name  value     
#>   <chr> <date>    
#> 1 start 2019-01-01
#> 2 start 2019-01-02
#> 3 end   2019-01-03
#> 4 end   2019-01-04


df <- tibble(
  id = c(1L, 1L, 2L, 2L),
  start = as.Date(c("2019-01-01")) + 0:3,
  end = as.Date(c("2019-01-03")) + 0:3
)
df
#> # A tibble: 4 × 3
#>      id start      end       
#>   <int> <date>     <date>    
#> 1     1 2019-01-01 2019-01-03
#> 2     1 2019-01-02 2019-01-04
#> 3     2 2019-01-03 2019-01-05
#> 4     2 2019-01-04 2019-01-06

# Not this:
pivot_longer(df, c(start, end))
#> # A tibble: 8 × 3
#>      id name  value     
#>   <int> <chr> <date>    
#> 1     1 start 2019-01-01
#> 2     1 end   2019-01-03
#> 3     1 start 2019-01-02
#> 4     1 end   2019-01-04
#> 5     2 start 2019-01-03
#> 6     2 end   2019-01-05
#> 7     2 start 2019-01-04
#> 8     2 end   2019-01-06

# I actually dont want this either because i think all of `id == 1` should
# be kept together
gather(df, "name", "value", start, end)
#> # A tibble: 8 × 3
#>      id name  value     
#>   <int> <chr> <date>    
#> 1     1 start 2019-01-01
#> 2     1 start 2019-01-02
#> 3     2 start 2019-01-03
#> 4     2 start 2019-01-04
#> 5     1 end   2019-01-03
#> 6     1 end   2019-01-04
#> 7     2 end   2019-01-05
#> 8     2 end   2019-01-06

# This is what I really wanted, and is what `cols_vary = "slowest"` would give
pivot_longer(df, c(start, end)) %>%
  dplyr::arrange(id, desc(name))
#> # A tibble: 8 × 3
#>      id name  value     
#>   <int> <chr> <date>    
#> 1     1 start 2019-01-01
#> 2     1 start 2019-01-02
#> 3     1 end   2019-01-03
#> 4     1 end   2019-01-04
#> 5     2 start 2019-01-03
#> 6     2 start 2019-01-04
#> 7     2 end   2019-01-05
#> 8     2 end   2019-01-06

Implementation wise, I think we need to not interleave here:

tidyr/R/pivot-long.R

Lines 259 to 265 in 48ba23d

out <- vec_c(!!!val_cols, .ptype = val_type)
# Interleave into correct order
# TODO somehow `t(matrix(x))` is _faster_ than `matrix(x, byrow = TRUE)`
# if this gets fixed in R this should use `byrow = TRUE` again
n_vals <- nrow(data) * length(val_cols)
idx <- t(matrix(seq_len(n_vals), ncol = length(val_cols)))
vals[[value]] <- vec_slice(out, as.integer(idx))

And then maybe use vec_rep_each() here instead of vec_rep() (that feels very similar to how names_vary works)

vec_rep(keys, vec_size(data)),

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurea feature request or enhancementpivoting ♻️pivot rectangular data to different "shapes"

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions