Create rolling groups based on a date/time or integer column
Description
If you have a time series <t_0, t_1, …,
t_n>
, then by default the windows created will be:
- (t_0 - period, t_0\]
- (t_1 - period, t_1\]
- …
- (t_n - period, t_n\]
whereas if you pass a non-default offset, then the windows will be:
- (t_0 + offset, t_0 + offset + period\]
- (t_1 + offset, t_1 + offset + period\]
- …
- (t_n + offset, t_n + offset + period\]
Usage
<LazyFrame>$rolling(
index_column,
...,
period,
offset = NULL,
closed = "right",
group_by = NULL
)
Arguments
index_column
|
Column used to group based on the time window. Often of type
Date/Datetime. This column must be sorted in ascending order (or, if
by is specified, then it must be sorted in ascending order
within each group). In case of a rolling group by on indices, dtype
needs to be either Int32 or Int64. Note that Int32 gets temporarily cast
to Int64, so if performance matters use an Int64 column.
|
…
|
Ignored. |
period
|
A character representing the length of the window, must be non-negative.
See the Polars duration string
language section for details.
|
offset
|
A character representing the offset of the window, or NULL
(default). If NULL , -period is used. See the
Polars duration string language
section for details.
|
closed
|
Define which sides of the temporal interval are closed (inclusive). This
can be either “left” , “right” ,
“both” or “none” .
|
group_by
|
Also group by this column/these columns. |
Details
In case of a rolling operation on an integer column, the windows are defined by:
- "1i" \# length 1
- "10i" \# length 10
Value
A LazyGroupBy object
Polars duration string language
Polars duration string language is a simple representation of durations. It is used in many Polars functions that accept durations.
It has the following format:
- 1ns (1 nanosecond)
- 1us (1 microsecond)
- 1ms (1 millisecond)
- 1s (1 second)
- 1m (1 minute)
- 1h (1 hour)
- 1d (1 calendar day)
- 1w (1 calendar week)
- 1mo (1 calendar month)
- 1q (1 calendar quarter)
- 1y (1 calendar year)
Or combine them: “3d12h4m25s”
# 3 days, 12 hours, 4
minutes, and 25 seconds
By "calendar day", we mean the corresponding time on the next day (which may not be 24 hours, due to daylight savings). Similarly for "calendar week", "calendar month", "calendar quarter", and "calendar year".
See Also
-
\
$group_by_dynamic()
Examples
library("polars")
dates = c(
"2020-01-01 13:45:48",
"2020-01-01 16:42:13",
"2020-01-01 16:45:09",
"2020-01-02 18:12:48",
"2020-01-03 19:45:32",
"2020-01-08 23:16:43"
)
df = pl$LazyFrame(dt = dates, a = c(3, 7, 5, 9, 2, 1))$with_columns(
pl$col("dt")$str$strptime(pl$Datetime())$set_sorted()
)
df$rolling(index_column = "dt", period = "2d")$agg(
sum_a = pl$sum("a"),
min_a = pl$min("a"),
max_a = pl$max("a")
)$collect()
#> shape: (6, 4)
#> ┌─────────────────────┬───────┬───────┬───────┐
#> │ dt ┆ sum_a ┆ min_a ┆ max_a │
#> │ --- ┆ --- ┆ --- ┆ --- │
#> │ datetime[μs] ┆ f64 ┆ f64 ┆ f64 │
#> ╞═════════════════════╪═══════╪═══════╪═══════╡
#> │ 2020-01-01 13:45:48 ┆ 3.0 ┆ 3.0 ┆ 3.0 │
#> │ 2020-01-01 16:42:13 ┆ 10.0 ┆ 3.0 ┆ 7.0 │
#> │ 2020-01-01 16:45:09 ┆ 15.0 ┆ 3.0 ┆ 7.0 │
#> │ 2020-01-02 18:12:48 ┆ 24.0 ┆ 3.0 ┆ 9.0 │
#> │ 2020-01-03 19:45:32 ┆ 11.0 ┆ 2.0 ┆ 9.0 │
#> │ 2020-01-08 23:16:43 ┆ 1.0 ┆ 1.0 ┆ 1.0 │
#> └─────────────────────┴───────┴───────┴───────┘