Skip to content

Drop duplicated rows

Source code

Description

Drop duplicated rows

Usage

<LazyFrame>$unique(subset = NULL, ..., keep = "any", maintain_order = FALSE)

Arguments

subset A character vector with the names of the column(s) to use to identify duplicates. If NULL (default), use all columns.
Not used.
keep Which of the duplicate rows to keep:
  • “any” (default): Does not give any guarantee of which row is kept. This allows more optimizations.
  • “first”: Keep first unique row.
  • “last”: Keep last unique row.
  • “none”: Don’t keep duplicate rows.
maintain_order Keep the same order as the original data. Setting this to TRUE makes it more expensive to compute and blocks the possibility to run on the streaming engine.

Value

LazyFrame

Examples

library("polars")

df = pl$LazyFrame(
  x = sample(10, 100, rep = TRUE),
  y = sample(10, 100, rep = TRUE)
)
df$collect()$height
#> [1] 100
df$unique()$collect()$height
#> [1] 68
df$unique(subset = "x")$collect()$height
#> [1] 10
df$unique(keep = "last")
#> polars LazyFrame
#>  $explain(): Show the optimized query plan.
#> 
#> Naive plan:
#> UNIQUE[maintain_order: false, keep_strategy: Last] BY None
#>   DF ["x", "y"]; PROJECT */2 COLUMNS; SELECTION: None
# only keep unique rows
df$unique(keep = "none")
#> polars LazyFrame
#>  $explain(): Show the optimized query plan.
#> 
#> Naive plan:
#> UNIQUE[maintain_order: false, keep_strategy: None] BY None
#>   DF ["x", "y"]; PROJECT */2 COLUMNS; SELECTION: None