polars.approx_n_unique#

polars.approx_n_unique(*columns: str) Expr[source]#

Approximate count of unique values.

This function is syntactic sugar for pl.col(columns).approx_n_unique(), and uses the HyperLogLog++ algorithm for cardinality estimation.

Parameters:
columns

One or more column names.

Examples

>>> df = pl.DataFrame(
...     {
...         "a": [1, 8, 1],
...         "b": [4, 5, 2],
...         "c": ["foo", "bar", "foo"],
...     }
... )
>>> df.select(pl.approx_n_unique("a"))
shape: (1, 1)
┌─────┐
│ a   │
│ --- │
│ u32 │
╞═════╡
│ 2   │
└─────┘
>>> df.select(pl.approx_n_unique("b", "c"))
shape: (1, 2)
┌─────┬─────┐
│ b   ┆ c   │
│ --- ┆ --- │
│ u32 ┆ u32 │
╞═════╪═════╡
│ 3   ┆ 2   │
└─────┴─────┘