Skip to content

Feature request: Can pivot_wider() have an option to preserve non-id_cols column by chopping into list columns? #990

@yutannihilation

Description

@yutannihilation

(I asked this on RStudio Community first, but couldn't get a nice answer. So I thought this is worth a feature request.)

At first, honestly I didn't feel it's useful that pivot_wider() creates list columns when there are any duplicated values. But, now I find it very useful, in that it allows us to delay the decision of how to handle the duplication. This request is basically about extending it to non-id_cols column.

Suppose I have this data:

library(dplyr, warn.conflicts = FALSE)
library(tidyr)

d <- data.frame(
  id = rep(1:5, each = 3),
  update = c(as.Date("2020-07-01") + 1:15),
  key = rep(c("a", "b", "c"), 5),
  val = 1:15
)

d
#>    id     update key val
#> 1   1 2020-07-02   a   1
#> 2   1 2020-07-03   b   2
#> 3   1 2020-07-04   c   3
#> 4   2 2020-07-05   a   4
#> 5   2 2020-07-06   b   5
#> 6   2 2020-07-07   c   6
#> 7   3 2020-07-08   a   7
#> 8   3 2020-07-09   b   8
#> 9   3 2020-07-10   c   9
#> 10  4 2020-07-11   a  10
#> 11  4 2020-07-12   b  11
#> 12  4 2020-07-13   c  12
#> 13  5 2020-07-14   a  13
#> 14  5 2020-07-15   b  14
#> 15  5 2020-07-16   c  15

and want to

  • pivot key to columns and
  • get the latest update of each id

at the same time. Currently, I can achieve this by summarising update with mutate() beforehand:

d %>% 
  # summarise `update` by using `mutate()`
  group_by(id) %>% 
  mutate(last_update = max(update)) %>% 
  ungroup() %>% 
  # choose `id` and `last_update` as key
  pivot_wider(c(id, last_update),
    names_from = key,
    values_from = val
  )
#> # A tibble: 5 x 5
#>      id last_update     a     b     c
#>   <int> <date>      <int> <int> <int>
#> 1     1 2020-07-04      1     2     3
#> 2     2 2020-07-07      4     5     6
#> 3     3 2020-07-10      7     8     9
#> 4     4 2020-07-13     10    11    12
#> 5     5 2020-07-16     13    14    15

This is fine. But, I wander if it's possible that pivot_wider() chops non-id_cols column into a list column instead of just dropping. If it does, I can take max()s of them. Here's a pseudo-code; update isn't specified either on id_cols, names_from, or values_from), so it gets dropped, but I want to preserve the column.

d %>% 
  pivot_wider(id,
    names_from = key,
    values_from = val
  ) %>%
  rowwise() %>%
  mutate(last_update = max(update), .keep = "unused")

Or, more simply, this:

d %>% 
  pivot_wider(id,
    names_from = key,
    values_from = val,
    values_fn = max
  )

Metadata

Metadata

Assignees

No one assigned

    Labels

    featurea feature request or enhancementpivoting ♻️pivot rectangular data to different "shapes"

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions