polars.LazyFrame.groupby_rolling

LazyFrame.groupby_rolling(index_column: str, period: str, offset: Optional[str] = None, closed: str = 'right', by: Optional[Union[str, List[str], polars.internals.expr.Expr, List[polars.internals.expr.Expr]]] = None) polars.internals.lazy_frame.LazyGroupBy[polars.internals.lazy_frame.LDF]

Create rolling groups based on a time column (or index value of type Int32, Int64).

Different from a rolling groupby the windows are now determined by the individual values and are not of constant intervals. For constant intervals use groupby_dynamic

See also

groupby_dynamic

The period and offset arguments are created with the following string language:

  • 1ns (1 nanosecond)

  • 1us (1 microsecond)

  • 1ms (1 millisecond)

  • 1s (1 second)

  • 1m (1 minute)

  • 1h (1 hour)

  • 1d (1 day)

  • 1w (1 week)

  • 1mo (1 calendar month)

  • 1y (1 calendar year)

  • 1i (1 index count)

Or combine them: “3d12h4m25s” # 3 days, 12 hours, 4 minutes, and 25 seconds

In case of a groupby_rolling on an integer column, the windows are defined by:

  • “1i” # length 1

  • “10i” # length 10

Parameters
index_column

Column used to group based on the time window. Often to type Date/Datetime This column must be sorted in ascending order. If not the output will not make sense.

In case of a rolling groupby on indices, dtype needs to be one of {Int32, Int64}. Note that Int32 gets temporarily cast to Int64, so if performance matters use an Int64 column.

period

length of the window

offset

offset of the window. Default is -period

closed

Defines if the window interval is closed or not. Any of {“left”, “right”, “both” “none”}

by

Also group by this column/these columns

Examples

>>> dates = [
...     "2020-01-01 13:45:48",
...     "2020-01-01 16:42:13",
...     "2020-01-01 16:45:09",
...     "2020-01-02 18:12:48",
...     "2020-01-03 19:45:32",
...     "2020-01-08 23:16:43",
... ]
>>> df = pl.DataFrame({"dt": dates, "a": [3, 7, 5, 9, 2, 1]}).with_column(
...     pl.col("dt").str.strptime(pl.Datetime)
... )
>>> out = df.groupby_rolling(index_column="dt", period="2d").agg(
...     [
...         pl.sum("a").alias("sum_a"),
...         pl.min("a").alias("min_a"),
...         pl.max("a").alias("max_a"),
...     ]
... )
>>> assert out["sum_a"].to_list() == [3, 10, 15, 24, 11, 1]
>>> assert out["max_a"].to_list() == [3, 7, 7, 9, 9, 1]
>>> assert out["min_a"].to_list() == [3, 3, 3, 3, 2, 1]
>>> out
shape: (6, 4)
┌─────────────────────┬───────┬───────┬───────┐
│ dt                  ┆ a_sum ┆ a_max ┆ a_min │
│ ---                 ┆ ---   ┆ ---   ┆ ---   │
│ datetime[ms]        ┆ i64   ┆ i64   ┆ i64   │
╞═════════════════════╪═══════╪═══════╪═══════╡
│ 2020-01-01 13:45:48 ┆ 3     ┆ 3     ┆ 3     │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2020-01-01 16:42:13 ┆ 10    ┆ 7     ┆ 3     │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2020-01-01 16:45:09 ┆ 15    ┆ 7     ┆ 3     │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2020-01-02 18:12:48 ┆ 24    ┆ 9     ┆ 3     │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2020-01-03 19:45:32 ┆ 11    ┆ 9     ┆ 2     │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2020-01-08 23:16:43 ┆ 1     ┆ 1     ┆ 1     │
└─────────────────────┴───────┴───────┴───────┘