polars.DataFrame.groupby_rolling#

DataFrame.groupby_rolling(index_column: str, *, period: str | timedelta, offset: str | timedelta | None = None, closed: ClosedWindow = 'right', by: str | Sequence[str] | Expr | Sequence[Expr] | None = None) RollingGroupBy[DF][source]#

Create rolling groups based on a time column.

Also works for index values of type Int32 or Int64.

Different from a dynamic_groupby the windows are now determined by the individual values and are not of constant intervals. For constant intervals use groupby_dynamic

The period and offset arguments are created either from a timedelta, or by using the following string language:

  • 1ns (1 nanosecond)

  • 1us (1 microsecond)

  • 1ms (1 millisecond)

  • 1s (1 second)

  • 1m (1 minute)

  • 1h (1 hour)

  • 1d (1 day)

  • 1w (1 week)

  • 1mo (1 calendar month)

  • 1y (1 calendar year)

  • 1i (1 index count)

Or combine them: “3d12h4m25s” # 3 days, 12 hours, 4 minutes, and 25 seconds

In case of a groupby_rolling on an integer column, the windows are defined by:

  • “1i” # length 1

  • “10i” # length 10

Parameters:
index_column

Column used to group based on the time window. Often to type Date/Datetime This column must be sorted in ascending order. If not the output will not make sense.

In case of a rolling groupby on indices, dtype needs to be one of {Int32, Int64}. Note that Int32 gets temporarily cast to Int64, so if performance matters use an Int64 column.

period

length of the window

offset

offset of the window. Default is -period

closed{‘right’, ‘left’, ‘both’, ‘none’}

Define whether the temporal window interval is closed or not.

by

Also group by this column/these columns

See also

groupby_dynamic

Examples

>>> dates = [
...     "2020-01-01 13:45:48",
...     "2020-01-01 16:42:13",
...     "2020-01-01 16:45:09",
...     "2020-01-02 18:12:48",
...     "2020-01-03 19:45:32",
...     "2020-01-08 23:16:43",
... ]
>>> df = pl.DataFrame({"dt": dates, "a": [3, 7, 5, 9, 2, 1]}).with_column(
...     pl.col("dt").str.strptime(pl.Datetime)
... )
>>> out = df.groupby_rolling(index_column="dt", period="2d").agg(
...     [
...         pl.sum("a").alias("sum_a"),
...         pl.min("a").alias("min_a"),
...         pl.max("a").alias("max_a"),
...     ]
... )
>>> assert out["sum_a"].to_list() == [3, 10, 15, 24, 11, 1]
>>> assert out["max_a"].to_list() == [3, 7, 7, 9, 9, 1]
>>> assert out["min_a"].to_list() == [3, 3, 3, 3, 2, 1]
>>> out
shape: (6, 4)
┌─────────────────────┬───────┬───────┬───────┐
│ dt                  ┆ sum_a ┆ min_a ┆ max_a │
│ ---                 ┆ ---   ┆ ---   ┆ ---   │
│ datetime[μs]        ┆ i64   ┆ i64   ┆ i64   │
╞═════════════════════╪═══════╪═══════╪═══════╡
│ 2020-01-01 13:45:48 ┆ 3     ┆ 3     ┆ 3     │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2020-01-01 16:42:13 ┆ 10    ┆ 3     ┆ 7     │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2020-01-01 16:45:09 ┆ 15    ┆ 3     ┆ 7     │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2020-01-02 18:12:48 ┆ 24    ┆ 3     ┆ 9     │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2020-01-03 19:45:32 ┆ 11    ┆ 2     ┆ 9     │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2020-01-08 23:16:43 ┆ 1     ┆ 1     ┆ 1     │
└─────────────────────┴───────┴───────┴───────┘