Collapse duplicate key groups, concatenating selected columns and replacing other divergences with NA
collapse_by_keys.Rdcollapse_by_keys() enforces uniqueness of key combinations by collapsing (merging)
rows that share the same keys (provided in ...). The output contains exactly one
row per unique key combination.
For each non-key column within a key group:
If the column has a single distinct value within the group (optionally ignoring
NAaccording tona_rm), that value is kept.If the column has multiple distinct values:
If the column is selected by
.concat, the distinct values are concatenated as a single character string usingsep(after optionalNAremoval).Otherwise, the column is set to
NAfor that collapsed row (typedNAviavctrs::vec_cast()).
When warn = TRUE, the function emits a warning listing the non-key columns (excluding .concat)
that were divergent in at least one group and therefore were replaced by NA (for affected groups).
Usage
collapse_by_keys(
.data,
...,
.keys = NULL,
.concat = NULL,
sep = " ; ",
na_rm = TRUE,
warn = TRUE
)Arguments
- .data
A data frame or tibble.
- ...
Key columns defining groups that must be unique in the output. Uses tidyeval.
- .keys
Optional alternative to …for programmatic key selection. Accepts either (i) a character vector of column names or (ii) a tidyselect expression evaluated in.data. If supplied, .keystakes precedence over…
- .concat
Optional tidyselect specification of non-key columns whose divergent values should be concatenated. If
NULL(default), no columns are concatenated.- sep
String used to separate concatenated values.
- na_rm
Logical. If
TRUE, ignoreNAvalues when assessing distinctness within a group and when concatenating. IfFALSE,NAparticipates in distinctness (i.e.,c("A", NA)is divergent).- warn
Logical. If
TRUE, emit a warning listing columns (excluding keys and.concat) that were divergent in at least one group and were replaced byNA.
Details
Type behavior
Columns selected in
.concatreturn a character result when concatenation is needed. This may change the type of those columns in the output.For non-
.concatcolumns, divergent groups are replaced by a typedNAusingvctrs::vec_cast(NA, x)to preserve the column's type whenever possible.
The output is an ungrouped tibble (.groups = "drop").
Examples
library(dplyr)
#>
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#>
#> filter, lag
#> The following objects are masked from ‘package:base’:
#>
#> intersect, setdiff, setequal, union
library(tibble)
df <- tibble(
exam.num_collec = c(1, 1, 1, 2, 2),
mat.matrice = c("SER", "SER", "SER", "PLAS", "PLAS"),
spe.denomination = c("E. coli", "E. coli", "E. coli", "S. aureus", "S. aureus"),
result = c("POS", "NEG", "POS", "NEG", "NEG"),
value = c(10, 10, 12, 5, 5),
commentaire = c("first", NA, "repeat", "ok", "ok"),
source_info = c("labA", "labA", "labB", "labC", NA),
flag = c(TRUE, TRUE, TRUE, FALSE, FALSE)
)
# Collapse by keys:
# - concatenate selected text columns if needed
# - replace other divergent columns by NA
out <- df %>%
collapse_by_keys(
exam.num_collec, mat.matrice, spe.denomination,
.concat = c(commentaire, source_info),
sep = " | ",
na_rm = TRUE,
warn = TRUE
)
#> Warning: Divergent columns replaced by NA: result, value
out
#> # A tibble: 2 × 8
#> exam.num_collec mat.matrice spe.denomination result value commentaire
#> <dbl> <chr> <chr> <chr> <dbl> <chr>
#> 1 1 SER E. coli NA NA first | repeat
#> 2 2 PLAS S. aureus NEG 5 ok
#> # ℹ 2 more variables: source_info <chr>, flag <lgl>