polars.DataFrame.unique#
- DataFrame.unique(*, maintain_order: bool = True, subset: str | Sequence[str] | None = None, keep: UniqueKeepStrategy = 'any') Self [source]#
Drop duplicate rows from this dataframe.
- Parameters:
- maintain_order
Keep the same order as the original DataFrame. This is more expensive to compute. Settings this to
True
blocks the possibility to run on the streaming engine.- subset
Column name(s) to consider when identifying duplicates. If set to
None
(default), use all columns.- keep{‘first’, ‘last’, ‘any’, ‘none’}
Which of the duplicate rows to keep.
- ‘any’: Does not give any guarantee of which row is kept.
This allows more optimizations.
‘none’: Don’t keep duplicate rows.
‘first’: Keep first unique row.
‘last’: Keep last unique row.
- Returns:
- DataFrame with unique rows.
Warning
This method will fail if there is a column of type List in the DataFrame or subset.
Examples
>>> df = pl.DataFrame( ... { ... "foo": [1, 2, 3, 1], ... "bar": ["a", "a", "a", "a"], ... "ham": ["b", "b", "b", "b"], ... } ... ) >>> df.unique() shape: (3, 3) ┌─────┬─────┬─────┐ │ foo ┆ bar ┆ ham │ │ --- ┆ --- ┆ --- │ │ i64 ┆ str ┆ str │ ╞═════╪═════╪═════╡ │ 1 ┆ a ┆ b │ │ 2 ┆ a ┆ b │ │ 3 ┆ a ┆ b │ └─────┴─────┴─────┘ >>> df.unique(subset=["bar", "ham"]) shape: (1, 3) ┌─────┬─────┬─────┐ │ foo ┆ bar ┆ ham │ │ --- ┆ --- ┆ --- │ │ i64 ┆ str ┆ str │ ╞═════╪═════╪═════╡ │ 1 ┆ a ┆ b │ └─────┴─────┴─────┘ >>> df.unique(keep="last") shape: (3, 3) ┌─────┬─────┬─────┐ │ foo ┆ bar ┆ ham │ │ --- ┆ --- ┆ --- │ │ i64 ┆ str ┆ str │ ╞═════╪═════╪═════╡ │ 2 ┆ a ┆ b │ │ 3 ┆ a ┆ b │ │ 1 ┆ a ┆ b │ └─────┴─────┴─────┘