polars.DataFrame.unique#

DataFrame.unique(*, maintain_order: bool = True, subset: str | Sequence[str] | None = None, keep: UniqueKeepStrategy = 'any') Self[source]#

Drop duplicate rows from this dataframe.

Parameters:
maintain_order

Keep the same order as the original DataFrame. This is more expensive to compute. Settings this to True blocks the possibility to run on the streaming engine.

subset

Column name(s) to consider when identifying duplicates. If set to None (default), use all columns.

keep{‘first’, ‘last’, ‘any’, ‘none’}

Which of the duplicate rows to keep.

  • ‘any’: Does not give any guarantee of which row is kept.

    This allows more optimizations.

  • ‘none’: Don’t keep duplicate rows.

  • ‘first’: Keep first unique row.

  • ‘last’: Keep last unique row.

Returns:
DataFrame with unique rows.

Warning

This method will fail if there is a column of type List in the DataFrame or subset.

Examples

>>> df = pl.DataFrame(
...     {
...         "foo": [1, 2, 3, 1],
...         "bar": ["a", "a", "a", "a"],
...         "ham": ["b", "b", "b", "b"],
...     }
... )
>>> df.unique()
shape: (3, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ a   ┆ b   │
│ 2   ┆ a   ┆ b   │
│ 3   ┆ a   ┆ b   │
└─────┴─────┴─────┘
>>> df.unique(subset=["bar", "ham"])
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str │
╞═════╪═════╪═════╡
│ 1   ┆ a   ┆ b   │
└─────┴─────┴─────┘
>>> df.unique(keep="last")
shape: (3, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str │
╞═════╪═════╪═════╡
│ 2   ┆ a   ┆ b   │
│ 3   ┆ a   ┆ b   │
│ 1   ┆ a   ┆ b   │
└─────┴─────┴─────┘