Drop duplicated rows

Description

Usage

<LazyFrame>$unique(subset = NULL, ..., keep = "any", maintain_order = FALSE)

Arguments

`subset`	A character vector with the names of the column(s) to use to identify duplicates. If `NULL` (default), use all columns.
`…`	Not used.
`keep`	Which of the duplicate rows to keep: `“any”` (default): Does not give any guarantee of which row is kept. This allows more optimizations. `“first”`: Keep first unique row. `“last”`: Keep last unique row. `“none”`: Don’t keep duplicate rows.
`maintain_order`	Keep the same order as the original data. Setting this to `TRUE` makes it more expensive to compute and blocks the possibility to run on the streaming engine.

Value

LazyFrame

Examples

library("polars")

df = pl$LazyFrame(
  x = sample(10, 100, rep = TRUE),
  y = sample(10, 100, rep = TRUE)
)
df$collect()$height

#> [1] 100

df$unique()$collect()$height

#> [1] 68

df$unique(subset = "x")$collect()$height

#> [1] 10

df$unique(keep = "last")

#> polars LazyFrame
#>  $explain(): Show the optimized query plan.
#> 
#> Naive plan:
#> UNIQUE[maintain_order: false, keep_strategy: Last] BY None
#>   DF ["x", "y"]; PROJECT */2 COLUMNS; SELECTION: None

# only keep unique rows
df$unique(keep = "none")

#> polars LazyFrame
#>  $explain(): Show the optimized query plan.
#> 
#> Naive plan:
#> UNIQUE[maintain_order: false, keep_strategy: None] BY None
#>   DF ["x", "y"]; PROJECT */2 COLUMNS; SELECTION: None