polars.LazyFrame.unique#

LazyFrame.unique(maintain_order: bool = True, subset: str | list[str] | None = None, keep: UniqueKeepStrategy = 'first') LDF[source]#

Drop duplicate rows from this DataFrame.

Note that this fails if there is a column of type List in the DataFrame or subset.

Parameters:
maintain_order

Keep the same order as the original DataFrame. This requires more work to compute.

subset

Subset to use to compare rows.

keep{‘first’, ‘last’}

Which of the duplicate rows to keep.

Returns:
DataFrame with unique rows

Examples

>>> df = pl.DataFrame(
...     {
...         "foo": [1, 2, 3, 1],
...         "bar": ["a", "a", "a", "a"],
...         "ham": ["b", "b", "b", "b"],
...     }
... ).lazy()
>>> df.unique().collect()
shape: (3, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ a   ┆ b   │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 2   ┆ a   ┆ b   │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 3   ┆ a   ┆ b   │
└─────┴─────┴─────┘
>>> df.unique(subset=["bar", "ham"]).collect()
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ a   ┆ b   │
└─────┴─────┴─────┘
>>> df.unique(keep="last").collect()
shape: (3, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str │
╞═════╪═════╪═════╡
│ 2   ┆ a   ┆ b   │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 3   ┆ a   ┆ b   │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 1   ┆ a   ┆ b   │
└─────┴─────┴─────┘