-
Notifications
You must be signed in to change notification settings - Fork 417
Closed
Labels
bugan unexpected problem or unintended behavioran unexpected problem or unintended behaviorgrids #️⃣expanding, nesting, crossing, ...expanding, nesting, crossing, ...group 👨👨👦👦missing values 💀
Description
When running complete()
on a grouped tibble and some of the variables completed on are also grouping variables, the resulting tibble is incorrect.
Here’s a reprex. First, I’ll create a simple tibble with three factors (gr1
, gr2
, splitgroup
) and one numeric variable (x
). The factor splitgroup
is identical to gr1
, so grouping on either variable should results in identical output. However, it doesn’t (I’ll remove splitgroup
from the output just so that it doesn’t effect the ordering of the columns). There’s not even the same number of rows in the output:
library(tidyverse)
#> ── Attaching packages ───────────────────────────────────────────── tidyverse 1.2.1 ──
#> ✔ ggplot2 2.2.1 ✔ purrr 0.2.4
#> ✔ tibble 1.4.1 ✔ dplyr 0.7.4.9000
#> ✔ tidyr 0.7.2.9000 ✔ stringr 1.2.0
#> ✔ readr 1.1.1 ✔ forcats 0.2.0
#> ── Conflicts ──────────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag() masks stats::lag()
# Example data
d = tibble(
gr1 = factor(c("A", "B", "B")),
gr2 = factor(c(1, 2, 2)),
x = c(10, 20, 30),
splitgroup = gr1
)
# Complete on grouping variable
d %>%
group_by(gr1) %>%
complete(gr1, gr2) %>%
select(-splitgroup) # 10 rows
#> # A tibble: 10 x 3
#> # Groups: gr1 [2]
#> gr1 gr2 x
#> <fctr> <fctr> <dbl>
#> 1 A 1 10.0
#> 2 A 2 NA
#> 3 B 1 NA
#> 4 B 2 20.0
#> 5 B 2 30.0
#> 6 A 1 10.0
#> 7 A 2 NA
#> 8 B 1 NA
#> 9 B 2 20.0
#> 10 B 2 30.0
# Completing on non-grouping but identical variable (should give same output)
d %>%
group_by(splitgroup) %>%
complete(gr1, gr2) %>%
ungroup %>% select(-splitgroup) # 9 rows
#> # A tibble: 9 x 3
#> gr1 gr2 x
#> <fctr> <fctr> <dbl>
#> 1 A 1 10.0
#> 2 A 2 NA
#> 3 B 1 NA
#> 4 B 2 NA
#> 5 A 1 NA
#> 6 A 2 NA
#> 7 B 1 NA
#> 8 B 2 20.0
#> 9 B 2 30.0
# Alternative method to find the *expected* results
# (which are identical to the results from the
# `group_by(splitgroup)` approach)
d %>%
split(.$gr1) %>%
map_df(~complete(., gr1, gr2)) %>%
select(-splitgroup) # 9 rows
#> # A tibble: 9 x 3
#> gr1 gr2 x
#> <fctr> <fctr> <dbl>
#> 1 A 1 10.0
#> 2 A 2 NA
#> 3 B 1 NA
#> 4 B 2 NA
#> 5 A 1 NA
#> 6 A 2 NA
#> 7 B 1 NA
#> 8 B 2 20.0
#> 9 B 2 30.0
Metadata
Metadata
Assignees
Labels
bugan unexpected problem or unintended behavioran unexpected problem or unintended behaviorgrids #️⃣expanding, nesting, crossing, ...expanding, nesting, crossing, ...group 👨👨👦👦missing values 💀