polars.DataFrame.groupby#

DataFrame.groupby(by: Union[str, Expr, Sequence[str | Expr]], maintain_order: bool = False) GroupBy[DF][source]#

Start a groupby operation.

Parameters:
by

Column(s) to group by.

maintain_order

Make sure that the order of the groups remain consistent. This is more expensive than a default groupby. Note that this only works in expression aggregations.

Examples

Below we group by column “a”, and we sum column “b”.

>>> df = pl.DataFrame(
...     {
...         "a": ["a", "b", "a", "b", "b", "c"],
...         "b": [1, 2, 3, 4, 5, 6],
...         "c": [6, 5, 4, 3, 2, 1],
...     }
... )
>>> df.groupby("a").agg(pl.col("b").sum()).sort(by="a")
shape: (3, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════╪═════╡
│ a   ┆ 4   │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ b   ┆ 11  │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ c   ┆ 6   │
└─────┴─────┘

We can also loop over the grouped DataFrame

>>> for sub_df in df.groupby("a"):
...     print(sub_df)  
...
shape: (3, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ b   ┆ 2   ┆ 5   │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ b   ┆ 4   ┆ 3   │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ b   ┆ 5   ┆ 2   │
└─────┴─────┴─────┘
shape: (1, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ c   ┆ 6   ┆ 1   │
└─────┴─────┴─────┘