complete() on grouping variables gives wrong output

When running `complete()` on a grouped tibble and some of the variables completed on are *also* grouping variables, the resulting tibble is incorrect.

Here’s a reprex. First, I’ll create a simple tibble with three factors (`gr1`, `gr2`, `splitgroup`) and one numeric variable (`x`). The factor `splitgroup` is identical to `gr1`, so grouping on either variable *should* results in identical output. However, it doesn’t (I’ll remove `splitgroup` from the output just so that it doesn’t effect the ordering of the columns). There’s not even the same number of *rows* in the output:

```r
library(tidyverse)
#> ── Attaching packages ───────────────────────────────────────────── tidyverse 1.2.1 ──
#> ✔ ggplot2 2.2.1          ✔ purrr   0.2.4     
#> ✔ tibble  1.4.1          ✔ dplyr   0.7.4.9000
#> ✔ tidyr   0.7.2.9000     ✔ stringr 1.2.0     
#> ✔ readr   1.1.1          ✔ forcats 0.2.0
#> ── Conflicts ──────────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()

# Example data
d = tibble(
  gr1 = factor(c("A", "B", "B")),
  gr2 = factor(c(1, 2, 2)),
  x = c(10, 20, 30),
  splitgroup = gr1
)

# Complete on grouping variable
d %>% 
  group_by(gr1) %>% 
  complete(gr1, gr2) %>% 
  select(-splitgroup) # 10 rows
#> # A tibble: 10 x 3
#> # Groups:   gr1 [2]
#>    gr1    gr2        x
#>    <fctr> <fctr> <dbl>
#>  1 A      1       10.0
#>  2 A      2       NA  
#>  3 B      1       NA  
#>  4 B      2       20.0
#>  5 B      2       30.0
#>  6 A      1       10.0
#>  7 A      2       NA  
#>  8 B      1       NA  
#>  9 B      2       20.0
#> 10 B      2       30.0

# Completing on non-grouping but identical variable (should give same output)
d %>% 
  group_by(splitgroup) %>% 
  complete(gr1, gr2) %>% 
  ungroup %>% select(-splitgroup) # 9 rows
#> # A tibble: 9 x 3
#>   gr1    gr2        x
#>   <fctr> <fctr> <dbl>
#> 1 A      1       10.0
#> 2 A      2       NA  
#> 3 B      1       NA  
#> 4 B      2       NA  
#> 5 A      1       NA  
#> 6 A      2       NA  
#> 7 B      1       NA  
#> 8 B      2       20.0
#> 9 B      2       30.0

# Alternative method to find the *expected* results
# (which are identical to the results from the
# `group_by(splitgroup)` approach)
d %>% 
  split(.$gr1) %>% 
  map_df(~complete(., gr1, gr2)) %>% 
  select(-splitgroup) # 9 rows
#> # A tibble: 9 x 3
#>   gr1    gr2        x
#>   <fctr> <fctr> <dbl>
#> 1 A      1       10.0
#> 2 A      2       NA  
#> 3 B      1       NA  
#> 4 B      2       NA  
#> 5 A      1       NA  
#> 6 A      2       NA  
#> 7 B      1       NA  
#> 8 B      2       20.0
#> 9 B      2       30.0
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

complete() on grouping variables gives wrong output #396

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

complete() on grouping variables gives wrong output #396

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions