Skip to content

Drop duplicate rows

Source code

Description

Drop duplicate rows

Usage

<LazyFrame>$unique(
  ...,
  keep = c("any", "none", "first", "last"),
  maintain_order = FALSE,
  subset = deprecated()
)

Arguments

\<dynamic-dots\> Column names or selectors for which are considered. If empty (default), use all columns (same as specifying with the selector cs$all()).
keep Which of the duplicate rows to keep. Must be one of:
  • “any”: does not give any guarantee of which row is kept. This allows more optimizations.
  • “none”: don’t keep duplicate rows.
  • “first”: keep first unique row.
  • “last”: keep last unique row.
maintain_order Keep the same order as the original data. This is more expensive to compute. Setting this to TRUE blocks the possibility to run on the streaming engine.
subset [Deprecated] Replaced by in 1.1.0.

Value

A polars LazyFrame

Examples

library("polars")

lf <- pl$LazyFrame(
  foo = c(1, 2, 3, 1),
  bar = c("a", "a", "a", "a"),
  ham = c("b", "b", "b", "b"),
)
lf$unique(maintain_order = TRUE)$collect()
#> shape: (3, 3)
#> ┌─────┬─────┬─────┐
#> │ foo ┆ bar ┆ ham │
#> │ --- ┆ --- ┆ --- │
#> │ f64 ┆ str ┆ str │
#> ╞═════╪═════╪═════╡
#> │ 1.0 ┆ a   ┆ b   │
#> │ 2.0 ┆ a   ┆ b   │
#> │ 3.0 ┆ a   ┆ b   │
#> └─────┴─────┴─────┘
lf$unique(c("bar", "ham"), maintain_order = TRUE)$collect()
#> shape: (1, 3)
#> ┌─────┬─────┬─────┐
#> │ foo ┆ bar ┆ ham │
#> │ --- ┆ --- ┆ --- │
#> │ f64 ┆ str ┆ str │
#> ╞═════╪═════╪═════╡
#> │ 1.0 ┆ a   ┆ b   │
#> └─────┴─────┴─────┘
lf$unique(keep = "last", maintain_order = TRUE)$collect()
#> shape: (3, 3)
#> ┌─────┬─────┬─────┐
#> │ foo ┆ bar ┆ ham │
#> │ --- ┆ --- ┆ --- │
#> │ f64 ┆ str ┆ str │
#> ╞═════╪═════╪═════╡
#> │ 2.0 ┆ a   ┆ b   │
#> │ 3.0 ┆ a   ┆ b   │
#> │ 1.0 ┆ a   ┆ b   │
#> └─────┴─────┴─────┘