Group a DataFrame

This doesn’t modify the data but only stores information about the group structure. This structure can then be used by several functions ($agg(), $filter(), etc.).


<DataFrame>$group_by(..., maintain_order = polars_options()\$maintain_order)


Column(s) to group by. Accepts expression input. Characters are parsed as column names.
maintain_order Ensure that the order of the groups is consistent with the input data. This is slower than a default group by. Setting this to TRUE blocks the possibility to run on the streaming engine. The default value can be changed with options(polars.maintain_order = TRUE).


Within each group, the order of the rows is always preserved, regardless of the maintain_order argument.


GroupBy (a DataFrame with special groupby methods like $agg())

df = pl$DataFrame(
  a = c("a", "b", "a", "b", "c"),
  b = c(1, 2, 1, 3, 3),
  c = c(5, 4, 3, 2, 1)

#> shape: (3, 2)
#> ┌─────┬─────┐
#> │ a   ┆ b   │
#> │ --- ┆ --- │
#> │ str ┆ f64 │
#> ╞═════╪═════╡
#> │ c   ┆ 3.0 │
#> │ b   ┆ 5.0 │
#> │ a   ┆ 2.0 │
#> └─────┴─────┘
# Set `maintain_order = TRUE` to ensure the order of the groups is consistent with the input.
df$group_by("a", maintain_order = TRUE)$agg(pl$col("c"))
#> shape: (3, 2)
#> ┌─────┬────────────┐
#> │ a   ┆ c          │
#> │ --- ┆ ---        │
#> │ str ┆ list[f64]  │
#> ╞═════╪════════════╡
#> │ a   ┆ [5.0, 3.0] │
#> │ b   ┆ [4.0, 2.0] │
#> │ c   ┆ [1.0]      │
#> └─────┴────────────┘
# Group by multiple columns by passing a list of column names.
df$group_by(c("a", "b"))$agg(pl$max("c"))
#> shape: (4, 3)
#> ┌─────┬─────┬─────┐
#> │ a   ┆ b   ┆ c   │
#> │ --- ┆ --- ┆ --- │
#> │ str ┆ f64 ┆ f64 │
#> ╞═════╪═════╪═════╡
#> │ b   ┆ 2.0 ┆ 4.0 │
#> │ b   ┆ 3.0 ┆ 2.0 │
#> │ a   ┆ 1.0 ┆ 5.0 │
#> │ c   ┆ 3.0 ┆ 1.0 │
#> └─────┴─────┴─────┘
# Or pass some arguments to group by multiple columns in the same way.
# Expressions are also accepted.
df$group_by("a", pl$col("b") %/% 2)$agg(
#> shape: (3, 3)
#> ┌─────┬─────┬─────┐
#> │ a   ┆ b   ┆ c   │
#> │ --- ┆ --- ┆ --- │
#> │ str ┆ f64 ┆ f64 │
#> ╞═════╪═════╪═════╡
#> │ c   ┆ 1.0 ┆ 1.0 │
#> │ a   ┆ 0.0 ┆ 4.0 │
#> │ b   ┆ 1.0 ┆ 3.0 │
#> └─────┴─────┴─────┘
# The columns will be renamed to the argument names.
df$group_by(d = "a", e = pl$col("b") %/% 2)$agg(
#> shape: (3, 3)
#> ┌─────┬─────┬─────┐
#> │ d   ┆ e   ┆ c   │
#> │ --- ┆ --- ┆ --- │
#> │ str ┆ f64 ┆ f64 │
#> ╞═════╪═════╪═════╡
#> │ b   ┆ 1.0 ┆ 3.0 │
#> │ a   ┆ 0.0 ┆ 4.0 │
#> │ c   ┆ 1.0 ┆ 1.0 │
#> └─────┴─────┴─────┘