polars.DataFrame.partition_by#

DataFrame.partition_by(by: str | Iterable[str], *more_by: str, maintain_order: bool = True, as_dict: Literal[False] = False) list[Self][source]#
DataFrame.partition_by(by: str | Iterable[str], *more_by: str, maintain_order: bool = True, as_dict: Literal[True]) dict[Any, Self]

Group by the given columns and return the groups as separate dataframes.

Parameters:
by

Name of the column(s) to group by.

*more_by

Additional names of columns to group by, specified as positional arguments.

maintain_order

Ensure that the order of the groups is consistent with the input data. This is slower than a default partition by operation.

as_dict

Return a dictionary instead of a list. The dictionary keys are the distinct group values that identify that group.

Examples

Pass a single column name to partition by that column.

>>> df = pl.DataFrame(
...     {
...         "a": ["a", "b", "a", "b", "c"],
...         "b": [1, 2, 1, 3, 3],
...         "c": [5, 4, 3, 2, 1],
...     }
... )
>>> df.partition_by("a")  
[shape: (2, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ a   ┆ 1   ┆ 5   │
│ a   ┆ 1   ┆ 3   │
└─────┴─────┴─────┘, shape: (2, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ b   ┆ 2   ┆ 4   │
│ b   ┆ 3   ┆ 2   │
└─────┴─────┴─────┘, shape: (1, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ c   ┆ 3   ┆ 1   │
└─────┴─────┴─────┘]

Partition by multiple columns by either passing a list of column names, or by specifying each column name as a positional argument.

>>> df.partition_by("a", "b")  
[shape: (2, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ a   ┆ 1   ┆ 5   │
│ a   ┆ 1   ┆ 3   │
└─────┴─────┴─────┘, shape: (1, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ b   ┆ 2   ┆ 4   │
└─────┴─────┴─────┘, shape: (1, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ b   ┆ 3   ┆ 2   │
└─────┴─────┴─────┘, shape: (1, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ c   ┆ 3   ┆ 1   │
└─────┴─────┴─────┘]

Return the partitions as a dictionary by specifying as_dict=True.

>>> df.partition_by("a", as_dict=True)  
{'a': shape: (2, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ a   ┆ 1   ┆ 5   │
│ a   ┆ 1   ┆ 3   │
└─────┴─────┴─────┘, 'b': shape: (2, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ b   ┆ 2   ┆ 4   │
│ b   ┆ 3   ┆ 2   │
└─────┴─────┴─────┘, 'c': shape: (1, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ c   ┆ 3   ┆ 1   │
└─────┴─────┴─────┘}