Time Series
For time-series
resampling Polars
offers a powerful API to resample data. Pandas
is well known for
its resampling functionality via df.resample
.
Polars
make the distinction between
- upsampling
- downsampling
Upsampling
An upsample operation is actually nothing more than left joining a date range with your dataset and filling the blanks.
Polars
provides wrapper methods for this operation. Later on we'll discuss an example.
Downsampling
Downsampling is interesting. Here you deal with date intervals, window durations, aggregations etc.
Polars
views downsampling as a special case of the groupby operation and therefore has two extra entrances in the
expression API with the groupby
context:
Calling any of those functions will give you complete access to the expression API and performance!
Let's go through some examples and see what that means.
Groupby Dynamic
In the snippet below we create a date range
with every day ("1d"
) in 2021 and turn this into a DataFrame
.
Then we we create dynamic windows that starts every month ("1mo"
) and has a window length of 1
month. Dynamic windows
don't have a size thats fixed by the number of rows in a DataFrame
, instead they are fixed by a temporal unit. This can
be a day ("1d"
), 3
weeks ("3w"
) or 5
nanoseconds ("5ns"
) ... you get the idea.
The values that match these dynamic windows are then assigned to that group and can be aggregated with the powerful expression API.
Below we show an example where we use groupby_dynamic to compute:
- the number of days until the end of the month
- the number of days in a month
df = pl.date_range(low=datetime(2021, 1, 1), high=datetime(2021, 12, 31), interval="1d", name="time").to_frame()
out = (
df.groupby_dynamic("time", every="1mo", period="1mo", closed="left")
.agg(
[
pl.col("time").cumcount().reverse().head(3).alias("day/eom"),
((pl.col("time") - pl.col("time").first()).last().dt.days() + 1).alias("days_in_month"),
]
)
.explode("day/eom")
)
print(out)
shape: (36, 3)
┌─────────────────────┬─────────┬───────────────┐
│ time ┆ day/eom ┆ days_in_month │
│ --- ┆ --- ┆ --- │
│ datetime[ns] ┆ u32 ┆ i64 │
╞═════════════════════╪═════════╪═══════════════╡
│ 2021-01-01 00:00:00 ┆ 30 ┆ 31 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2021-01-01 00:00:00 ┆ 29 ┆ 31 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2021-01-01 00:00:00 ┆ 28 ┆ 31 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2021-02-01 00:00:00 ┆ 27 ┆ 28 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ ... ┆ ... ┆ ... │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2021-11-01 00:00:00 ┆ 27 ┆ 30 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2021-12-01 00:00:00 ┆ 30 ┆ 31 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2021-12-01 00:00:00 ┆ 29 ┆ 31 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2021-12-01 00:00:00 ┆ 28 ┆ 31 │
└─────────────────────┴─────────┴───────────────┘
A dynamic window is defined by a:
- every: indicates the interval of the window
- period: indicates the duration of the window
- offset: can be used to offset the start of the windows
Because every does not have to be equal to period, we can create many groups in a very flexible way. They may overlap or leave boundaries between them.
Let's see how the windows for some parameter combinations would look. Let's start out boring. 🥱
- every: 1 day ->
"1d"
- period: 1 day ->
"1d"
this creates adjacent windows of the same size
|--|
|--|
|--|
- every: 1 day ->
"1d"
- period: 2 days ->
"2d"
these windows have an overlap of 1 day
|----|
|----|
|----|
- every: 2 days ->
"2d"
- period: 1 day ->
"1d"
this would leave gaps between the windows
data points that in these gaps will not be a member of any group
|--|
|--|
|--|
Rolling GroupBy
The rolling groupby is another entrance to the groupby
context. But different from the groupby_dynamic
the windows are
not fixed by a parameter every
and period
. In a rolling groupby the windows are not fixed at all! They are determined
by the values in the index_column
.
So imagine having a time column with the values {2021-01-01, 20210-01-05}
and a period="5d"
this would create the following
windows:
2021-01-01 2021-01-06
|----------|
2021-01-05 2021-01-10
|----------|
Because the windows of a rolling groupby are always determined by the values in the DataFrame
column, the number of
groups is always equal to the original DataFrame
.
Combining Groupby and Dynamic / Rolling
Rolling and dynamic groupby's can be combined with normal groupby operations.
Below is an example with a dynamic groupby.
import polars as pl
from datetime import datetime
df = pl.DataFrame(
{
"time": pl.date_range(
low=datetime(2021, 12, 16),
high=datetime(2021, 12, 16, 3),
interval="30m",
),
"groups": ["a", "a", "a", "b", "b", "a", "a"],
}
)
print(out)
shape: (7, 2)
┌─────────────────────┬────────┐
│ time ┆ groups │
│ --- ┆ --- │
│ datetime[ns] ┆ str │
╞═════════════════════╪════════╡
│ 2021-12-16 00:00:00 ┆ a │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2021-12-16 00:30:00 ┆ a │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2021-12-16 01:00:00 ┆ a │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2021-12-16 01:30:00 ┆ b │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2021-12-16 02:00:00 ┆ b │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2021-12-16 02:30:00 ┆ a │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ 2021-12-16 03:00:00 ┆ a │
└─────────────────────┴────────┘
out = df.groupby_dynamic(
"time",
every="1h",
closed="both",
by="groups",
include_boundaries=True,
).agg([pl.count()])
print(out)
shape: (7, 5)
┌────────┬─────────────────────┬─────────────────────┬─────────────────────┬───────┐
│ groups ┆ _lower_boundary ┆ _upper_boundary ┆ time ┆ count │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ datetime[ns] ┆ datetime[ns] ┆ datetime[ns] ┆ u32 │
╞════════╪═════════════════════╪═════════════════════╪═════════════════════╪═══════╡
│ a ┆ 2021-12-15 23:00:00 ┆ 2021-12-16 00:00:00 ┆ 2021-12-16 00:00:00 ┆ 1 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ a ┆ 2021-12-16 00:00:00 ┆ 2021-12-16 01:00:00 ┆ 2021-12-16 00:00:00 ┆ 3 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ a ┆ 2021-12-16 01:00:00 ┆ 2021-12-16 02:00:00 ┆ 2021-12-16 01:00:00 ┆ 1 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ a ┆ 2021-12-16 02:00:00 ┆ 2021-12-16 03:00:00 ┆ 2021-12-16 02:00:00 ┆ 2 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ a ┆ 2021-12-16 03:00:00 ┆ 2021-12-16 04:00:00 ┆ 2021-12-16 03:00:00 ┆ 1 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ b ┆ 2021-12-16 01:00:00 ┆ 2021-12-16 02:00:00 ┆ 2021-12-16 01:00:00 ┆ 2 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ b ┆ 2021-12-16 02:00:00 ┆ 2021-12-16 03:00:00 ┆ 2021-12-16 02:00:00 ┆ 1 │
└────────┴─────────────────────┴─────────────────────┴─────────────────────┴───────┘
Upsample
This content is under construction.