polars.lazyframe.group_by.LazyGroupBy.agg#

LazyGroupBy.agg(
*aggs: IntoExpr | Iterable[IntoExpr],
**named_aggs: IntoExpr,
) LazyFrame[source]#

Compute aggregations for each group of a group by operation.

Parameters:
*aggs

Aggregations to compute for each group of the group by operation, specified as positional arguments. Accepts expression input. Strings are parsed as column names.

**named_aggs

Additional aggregations, specified as keyword arguments. The resulting columns will be renamed to the keyword used.

Examples

Compute the aggregation of the columns for each group.

>>> ldf = pl.DataFrame(
...     {
...         "a": ["a", "b", "a", "b", "c"],
...         "b": [1, 2, 1, 3, 3],
...         "c": [5, 4, 3, 2, 1],
...     }
... ).lazy()
>>> ldf.group_by("a").agg(
...     [pl.col("b"), pl.col("c")]
... ).collect()  
shape: (3, 3)
┌─────┬───────────┬───────────┐
│ a   ┆ b         ┆ c         │
│ --- ┆ ---       ┆ ---       │
│ str ┆ list[i64] ┆ list[i64] │
╞═════╪═══════════╪═══════════╡
│ a   ┆ [1, 1]    ┆ [5, 3]    │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ b   ┆ [2, 3]    ┆ [4, 2]    │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌┤
│ c   ┆ [3]       ┆ [1]       │
└─────┴───────────┴───────────┘

Compute the sum of a column for each group.

>>> ldf.group_by("a").agg(
...     pl.col("b").sum()
... ).collect()  
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ a   ┆ 2   │
│ b   ┆ 5   │
│ c   ┆ 3   │
└─────┴─────┘

Compute multiple aggregates at once by passing a list of expressions.

>>> ldf.group_by("a").agg(
...     [pl.sum("b"), pl.mean("c")]
... ).collect()  
shape: (3, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ f64 │
╞═════╪═════╪═════╡
│ c   ┆ 3   ┆ 1.0 │
│ a   ┆ 2   ┆ 4.0 │
│ b   ┆ 5   ┆ 3.0 │
└─────┴─────┴─────┘

Or use positional arguments to compute multiple aggregations in the same way.

>>> ldf.group_by("a").agg(
...     pl.sum("b").name.suffix("_sum"),
...     (pl.col("c") ** 2).mean().name.suffix("_mean_squared"),
... ).collect()  
shape: (3, 3)
┌─────┬───────┬────────────────┐
│ a   ┆ b_sum ┆ c_mean_squared │
│ --- ┆ ---   ┆ ---            │
│ str ┆ i64   ┆ f64            │
╞═════╪═══════╪════════════════╡
│ a   ┆ 2     ┆ 17.0           │
│ c   ┆ 3     ┆ 1.0            │
│ b   ┆ 5     ┆ 10.0           │
└─────┴───────┴────────────────┘

Use keyword arguments to easily name your expression inputs.

>>> ldf.group_by("a").agg(
...     b_sum=pl.sum("b"),
...     c_mean_squared=(pl.col("c") ** 2).mean(),
... ).collect()  
shape: (3, 3)
┌─────┬───────┬────────────────┐
│ a   ┆ b_sum ┆ c_mean_squared │
│ --- ┆ ---   ┆ ---            │
│ str ┆ i64   ┆ f64            │
╞═════╪═══════╪════════════════╡
│ a   ┆ 2     ┆ 17.0           │
│ c   ┆ 3     ┆ 1.0            │
│ b   ┆ 5     ┆ 10.0           │
└─────┴───────┴────────────────┘