polars.Expr.over#

Expr.over(expr: IntoExpr | Iterable[IntoExpr], *more_exprs: IntoExpr) Self[source]#

Compute expressions over the given groups.

This expression is similar to performing a groupby aggregation and joining the result back into the original dataframe.

The outcome is similar to how window functions work in PostgreSQL.

Parameters:
expr

Column(s) to group by. Accepts expression input. Strings are parsed as column names.

*more_exprs

Additional columns to group by, specified as positional arguments.

Examples

Pass the name of a column to compute the expression over that column.

>>> df = pl.DataFrame(
...     {
...         "a": ["a", "a", "b", "b", "b"],
...         "b": [1, 2, 3, 5, 3],
...         "c": [5, 4, 3, 2, 1],
...     }
... )
>>> df.with_columns(pl.col("c").max().over("a").suffix("_max"))
shape: (5, 4)
┌─────┬─────┬─────┬───────┐
│ a   ┆ b   ┆ c   ┆ c_max │
│ --- ┆ --- ┆ --- ┆ ---   │
│ str ┆ i64 ┆ i64 ┆ i64   │
╞═════╪═════╪═════╪═══════╡
│ a   ┆ 1   ┆ 5   ┆ 5     │
│ a   ┆ 2   ┆ 4   ┆ 5     │
│ b   ┆ 3   ┆ 3   ┆ 3     │
│ b   ┆ 5   ┆ 2   ┆ 3     │
│ b   ┆ 3   ┆ 1   ┆ 3     │
└─────┴─────┴─────┴───────┘

Expression input is supported.

>>> df.with_columns(pl.col("c").max().over(pl.col("b") // 2).suffix("_max"))
shape: (5, 4)
┌─────┬─────┬─────┬───────┐
│ a   ┆ b   ┆ c   ┆ c_max │
│ --- ┆ --- ┆ --- ┆ ---   │
│ str ┆ i64 ┆ i64 ┆ i64   │
╞═════╪═════╪═════╪═══════╡
│ a   ┆ 1   ┆ 5   ┆ 5     │
│ a   ┆ 2   ┆ 4   ┆ 4     │
│ b   ┆ 3   ┆ 3   ┆ 4     │
│ b   ┆ 5   ┆ 2   ┆ 2     │
│ b   ┆ 3   ┆ 1   ┆ 4     │
└─────┴─────┴─────┴───────┘

Group by multiple columns by passing a list of column names or expressions.

>>> df.with_columns(pl.col("c").min().over(["a", "b"]).suffix("_min"))
shape: (5, 4)
┌─────┬─────┬─────┬───────┐
│ a   ┆ b   ┆ c   ┆ c_min │
│ --- ┆ --- ┆ --- ┆ ---   │
│ str ┆ i64 ┆ i64 ┆ i64   │
╞═════╪═════╪═════╪═══════╡
│ a   ┆ 1   ┆ 5   ┆ 5     │
│ a   ┆ 2   ┆ 4   ┆ 4     │
│ b   ┆ 3   ┆ 3   ┆ 1     │
│ b   ┆ 5   ┆ 2   ┆ 2     │
│ b   ┆ 3   ┆ 1   ┆ 1     │
└─────┴─────┴─────┴───────┘

Or use positional arguments to group by multiple columns in the same way.

>>> df.with_columns(pl.col("c").min().over("a", pl.col("b") % 2).suffix("_min"))
shape: (5, 4)
┌─────┬─────┬─────┬───────┐
│ a   ┆ b   ┆ c   ┆ c_min │
│ --- ┆ --- ┆ --- ┆ ---   │
│ str ┆ i64 ┆ i64 ┆ i64   │
╞═════╪═════╪═════╪═══════╡
│ a   ┆ 1   ┆ 5   ┆ 5     │
│ a   ┆ 2   ┆ 4   ┆ 4     │
│ b   ┆ 3   ┆ 3   ┆ 1     │
│ b   ┆ 5   ┆ 2   ┆ 1     │
│ b   ┆ 3   ┆ 1   ┆ 1     │
└─────┴─────┴─────┴───────┘