polars.LazyFrame.groupby_dynamic#

LazyFrame.groupby_dynamic(index_column: str, *, every: str | timedelta, period: str | timedelta | None = None, offset: str | timedelta | None = None, truncate: bool = True, include_boundaries: bool = False, closed: ClosedWindow = 'left', by: str | Sequence[str] | Expr | Sequence[Expr] | None = None, start_by: StartBy = 'window') LazyGroupBy[LDF][source]#

Group based on a time value (or index value of type Int32, Int64).

Time windows are calculated and rows are assigned to windows. Different from a normal groupby is that a row can be member of multiple groups. The time/index window could be seen as a rolling window, with a window size determined by dates/times/values instead of slots in the DataFrame.

A window is defined by:

  • every: interval of the window

  • period: length of the window

  • offset: offset of the window

The every, period and offset arguments are created with the following string language:

  • 1ns (1 nanosecond)

  • 1us (1 microsecond)

  • 1ms (1 millisecond)

  • 1s (1 second)

  • 1m (1 minute)

  • 1h (1 hour)

  • 1d (1 day)

  • 1w (1 week)

  • 1mo (1 calendar month)

  • 1y (1 calendar year)

  • 1i (1 index count)

Or combine them: “3d12h4m25s” # 3 days, 12 hours, 4 minutes, and 25 seconds

In case of a groupby_dynamic on an integer column, the windows are defined by:

  • “1i” # length 1

  • “10i” # length 10

Parameters:
index_column

Column used to group based on the time window. Often to type Date/Datetime This column must be sorted in ascending order. If not the output will not make sense.

In case of a dynamic groupby on indices, dtype needs to be one of {Int32, Int64}. Note that Int32 gets temporarily cast to Int64, so if performance matters use an Int64 column.

every

interval of the window

period

length of the window, if None it is equal to ‘every’

offset

offset of the window if None and period is None it will be equal to negative every

truncate

truncate the time value to the window lower bound

include_boundaries

Add the lower and upper bound of the window to the “_lower_bound” and “_upper_bound” columns. This will impact performance because it’s harder to parallelize

closed{‘right’, ‘left’, ‘both’, ‘none’}

Define whether the temporal window interval is closed or not.

by

Also group by this column/these columns

start_by{‘window’, ‘datapoint’, ‘monday’}

The strategy to determine the start of the first window by. * ‘window’: Truncate the start of the window with the ‘every’ argument. * ‘datapoint’: Start from the first encountered data point. * ‘monday’: Start the window on the monday before the first data point.

See also

groupby_rolling

Examples

>>> from datetime import datetime
>>> # create an example dataframe
>>> df = pl.DataFrame(
...     {
...         "time": pl.date_range(
...             low=datetime(2021, 12, 16),
...             high=datetime(2021, 12, 16, 3),
...             interval="30m",
...         ),
...         "n": range(7),
...     }
... )
>>> df
shape: (7, 2)
┌─────────────────────┬─────┐
│ time                ┆ n   │
│ ---                 ┆ --- │
│ datetime[μs]        ┆ i64 │
╞═════════════════════╪═════╡
│ 2021-12-16 00:00:00 ┆ 0   │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ 2021-12-16 00:30:00 ┆ 1   │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ 2021-12-16 01:00:00 ┆ 2   │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ 2021-12-16 01:30:00 ┆ 3   │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ 2021-12-16 02:00:00 ┆ 4   │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ 2021-12-16 02:30:00 ┆ 5   │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ 2021-12-16 03:00:00 ┆ 6   │
└─────────────────────┴─────┘

Group by windows of 1 hour starting at 2021-12-16 00:00:00.

>>> (
...     df.lazy()
...     .groupby_dynamic("time", every="1h", closed="right")
...     .agg(
...         [
...             pl.col("time").min().alias("time_min"),
...             pl.col("time").max().alias("time_max"),
...         ]
...     )
... ).collect()
shape: (4, 3)
┌─────────────────────┬─────────────────────┬─────────────────────┐
│ time                ┆ time_min            ┆ time_max            │
│ ---                 ┆ ---                 ┆ ---                 │
│ datetime[μs]        ┆ datetime[μs]        ┆ datetime[μs]        │
╞═════════════════════╪═════════════════════╪═════════════════════╡
│ 2021-12-15 23:00:00 ┆ 2021-12-16 00:00:00 ┆ 2021-12-16 00:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2021-12-16 00:00:00 ┆ 2021-12-16 00:30:00 ┆ 2021-12-16 01:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2021-12-16 01:00:00 ┆ 2021-12-16 01:30:00 ┆ 2021-12-16 02:00:00 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2021-12-16 02:00:00 ┆ 2021-12-16 02:30:00 ┆ 2021-12-16 03:00:00 │
└─────────────────────┴─────────────────────┴─────────────────────┘

The window boundaries can also be added to the aggregation result

>>> (
...     df.lazy()
...     .groupby_dynamic(
...         "time", every="1h", include_boundaries=True, closed="right"
...     )
...     .agg([pl.col("time").count().alias("time_count")])
... ).collect()
shape: (4, 4)
┌─────────────────────┬─────────────────────┬─────────────────────┬────────────┐
│ _lower_boundary     ┆ _upper_boundary     ┆ time                ┆ time_count │
│ ---                 ┆ ---                 ┆ ---                 ┆ ---        │
│ datetime[μs]        ┆ datetime[μs]        ┆ datetime[μs]        ┆ u32        │
╞═════════════════════╪═════════════════════╪═════════════════════╪════════════╡
│ 2021-12-15 23:00:00 ┆ 2021-12-16 00:00:00 ┆ 2021-12-15 23:00:00 ┆ 1          │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2021-12-16 00:00:00 ┆ 2021-12-16 01:00:00 ┆ 2021-12-16 00:00:00 ┆ 2          │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2021-12-16 01:00:00 ┆ 2021-12-16 02:00:00 ┆ 2021-12-16 01:00:00 ┆ 2          │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2021-12-16 02:00:00 ┆ 2021-12-16 03:00:00 ┆ 2021-12-16 02:00:00 ┆ 2          │
└─────────────────────┴─────────────────────┴─────────────────────┴────────────┘

When closed=”left”, should not include right end of interval [lower_bound, upper_bound)

>>> (
...     df.lazy()
...     .groupby_dynamic("time", every="1h", closed="left")
...     .agg(
...         [
...             pl.col("time").count().alias("time_count"),
...             pl.col("time").list().alias("time_agg_list"),
...         ]
...     )
... ).collect()
shape: (4, 3)
┌─────────────────────┬────────────┬─────────────────────────────────────┐
│ time                ┆ time_count ┆ time_agg_list                       │
│ ---                 ┆ ---        ┆ ---                                 │
│ datetime[μs]        ┆ u32        ┆ list[datetime[μs]]                  │
╞═════════════════════╪════════════╪═════════════════════════════════════╡
│ 2021-12-16 00:00:00 ┆ 2          ┆ [2021-12-16 00:00:00, 2021-12-16... │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2021-12-16 01:00:00 ┆ 2          ┆ [2021-12-16 01:00:00, 2021-12-16... │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2021-12-16 02:00:00 ┆ 2          ┆ [2021-12-16 02:00:00, 2021-12-16... │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2021-12-16 03:00:00 ┆ 1          ┆ [2021-12-16 03:00:00]               │
└─────────────────────┴────────────┴─────────────────────────────────────┘

When closed=”both” the time values at the window boundaries belong to 2 groups.

>>> (
...     df.lazy()
...     .groupby_dynamic("time", every="1h", closed="both")
...     .agg([pl.col("time").count().alias("time_count")])
... ).collect()
shape: (5, 2)
┌─────────────────────┬────────────┐
│ time                ┆ time_count │
│ ---                 ┆ ---        │
│ datetime[μs]        ┆ u32        │
╞═════════════════════╪════════════╡
│ 2021-12-15 23:00:00 ┆ 1          │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2021-12-16 00:00:00 ┆ 3          │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2021-12-16 01:00:00 ┆ 3          │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2021-12-16 02:00:00 ┆ 3          │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2021-12-16 03:00:00 ┆ 1          │
└─────────────────────┴────────────┘

Dynamic groupbys can also be combined with grouping on normal keys

>>> df = pl.DataFrame(
...     {
...         "time": pl.date_range(
...             low=datetime(2021, 12, 16),
...             high=datetime(2021, 12, 16, 3),
...             interval="30m",
...         ),
...         "groups": ["a", "a", "a", "b", "b", "a", "a"],
...     }
... )
>>> df
shape: (7, 2)
┌─────────────────────┬────────┐
│ time                ┆ groups │
│ ---                 ┆ ---    │
│ datetime[μs]        ┆ str    │
╞═════════════════════╪════════╡
│ 2021-12-16 00:00:00 ┆ a      │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2021-12-16 00:30:00 ┆ a      │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2021-12-16 01:00:00 ┆ a      │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2021-12-16 01:30:00 ┆ b      │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2021-12-16 02:00:00 ┆ b      │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2021-12-16 02:30:00 ┆ a      │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2021-12-16 03:00:00 ┆ a      │
└─────────────────────┴────────┘
>>> (
...     df.lazy()
...     .groupby_dynamic(
...         "time",
...         every="1h",
...         closed="both",
...         by="groups",
...         include_boundaries=True,
...     )
...     .agg([pl.col("time").count().alias("time_count")])
... ).collect()
shape: (7, 5)
┌────────┬─────────────────────┬─────────────────────┬─────────────────────┬────────────┐
│ groups ┆ _lower_boundary     ┆ _upper_boundary     ┆ time                ┆ time_count │
│ ---    ┆ ---                 ┆ ---                 ┆ ---                 ┆ ---        │
│ str    ┆ datetime[μs]        ┆ datetime[μs]        ┆ datetime[μs]        ┆ u32        │
╞════════╪═════════════════════╪═════════════════════╪═════════════════════╪════════════╡
│ a      ┆ 2021-12-15 23:00:00 ┆ 2021-12-16 00:00:00 ┆ 2021-12-15 23:00:00 ┆ 1          │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ a      ┆ 2021-12-16 00:00:00 ┆ 2021-12-16 01:00:00 ┆ 2021-12-16 00:00:00 ┆ 3          │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ a      ┆ 2021-12-16 01:00:00 ┆ 2021-12-16 02:00:00 ┆ 2021-12-16 01:00:00 ┆ 1          │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ a      ┆ 2021-12-16 02:00:00 ┆ 2021-12-16 03:00:00 ┆ 2021-12-16 02:00:00 ┆ 2          │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ a      ┆ 2021-12-16 03:00:00 ┆ 2021-12-16 04:00:00 ┆ 2021-12-16 03:00:00 ┆ 1          │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ b      ┆ 2021-12-16 01:00:00 ┆ 2021-12-16 02:00:00 ┆ 2021-12-16 01:00:00 ┆ 2          │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ b      ┆ 2021-12-16 02:00:00 ┆ 2021-12-16 03:00:00 ┆ 2021-12-16 02:00:00 ┆ 1          │
└────────┴─────────────────────┴─────────────────────┴─────────────────────┴────────────┘

Dynamic groupby on an index column

>>> df = pl.DataFrame(
...     {
...         "idx": pl.arange(0, 6, eager=True),
...         "A": ["A", "A", "B", "B", "B", "C"],
...     }
... )
>>> (
...     df.lazy()
...     .groupby_dynamic(
...         "idx",
...         every="2i",
...         period="3i",
...         include_boundaries=True,
...         closed="right",
...     )
...     .agg(pl.col("A").list().alias("A_agg_list"))
... ).collect()
shape: (3, 4)
┌─────────────────┬─────────────────┬─────┬─────────────────┐
│ _lower_boundary ┆ _upper_boundary ┆ idx ┆ A_agg_list      │
│ ---             ┆ ---             ┆ --- ┆ ---             │
│ i64             ┆ i64             ┆ i64 ┆ list[str]       │
╞═════════════════╪═════════════════╪═════╪═════════════════╡
│ 0               ┆ 3               ┆ 0   ┆ ["A", "B", "B"] │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2               ┆ 5               ┆ 2   ┆ ["B", "B", "C"] │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 4               ┆ 7               ┆ 4   ┆ ["C"]           │
└─────────────────┴─────────────────┴─────┴─────────────────┘