Skip to content

complete() on grouping variables gives wrong output #396

@huftis

Description

@huftis

When running complete() on a grouped tibble and some of the variables completed on are also grouping variables, the resulting tibble is incorrect.

Here’s a reprex. First, I’ll create a simple tibble with three factors (gr1, gr2, splitgroup) and one numeric variable (x). The factor splitgroup is identical to gr1, so grouping on either variable should results in identical output. However, it doesn’t (I’ll remove splitgroup from the output just so that it doesn’t effect the ordering of the columns). There’s not even the same number of rows in the output:

library(tidyverse)
#> ── Attaching packages ───────────────────────────────────────────── tidyverse 1.2.1 ──
#> ✔ ggplot2 2.2.1          ✔ purrr   0.2.4     
#> ✔ tibble  1.4.1          ✔ dplyr   0.7.4.9000
#> ✔ tidyr   0.7.2.9000     ✔ stringr 1.2.0     
#> ✔ readr   1.1.1          ✔ forcats 0.2.0
#> ── Conflicts ──────────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()

# Example data
d = tibble(
  gr1 = factor(c("A", "B", "B")),
  gr2 = factor(c(1, 2, 2)),
  x = c(10, 20, 30),
  splitgroup = gr1
)

# Complete on grouping variable
d %>% 
  group_by(gr1) %>% 
  complete(gr1, gr2) %>% 
  select(-splitgroup) # 10 rows
#> # A tibble: 10 x 3
#> # Groups:   gr1 [2]
#>    gr1    gr2        x
#>    <fctr> <fctr> <dbl>
#>  1 A      1       10.0
#>  2 A      2       NA  
#>  3 B      1       NA  
#>  4 B      2       20.0
#>  5 B      2       30.0
#>  6 A      1       10.0
#>  7 A      2       NA  
#>  8 B      1       NA  
#>  9 B      2       20.0
#> 10 B      2       30.0

# Completing on non-grouping but identical variable (should give same output)
d %>% 
  group_by(splitgroup) %>% 
  complete(gr1, gr2) %>% 
  ungroup %>% select(-splitgroup) # 9 rows
#> # A tibble: 9 x 3
#>   gr1    gr2        x
#>   <fctr> <fctr> <dbl>
#> 1 A      1       10.0
#> 2 A      2       NA  
#> 3 B      1       NA  
#> 4 B      2       NA  
#> 5 A      1       NA  
#> 6 A      2       NA  
#> 7 B      1       NA  
#> 8 B      2       20.0
#> 9 B      2       30.0

# Alternative method to find the *expected* results
# (which are identical to the results from the
# `group_by(splitgroup)` approach)
d %>% 
  split(.$gr1) %>% 
  map_df(~complete(., gr1, gr2)) %>% 
  select(-splitgroup) # 9 rows
#> # A tibble: 9 x 3
#>   gr1    gr2        x
#>   <fctr> <fctr> <dbl>
#> 1 A      1       10.0
#> 2 A      2       NA  
#> 3 B      1       NA  
#> 4 B      2       NA  
#> 5 A      1       NA  
#> 6 A      2       NA  
#> 7 B      1       NA  
#> 8 B      2       20.0
#> 9 B      2       30.0

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions