Interface GroupByOps<T>

GroupBy operations that can be applied to a DataFrame or LazyFrame.

interface GroupByOps<T> {
    groupByDynamic(
        options: {
            by?: ColumnsOrExpr;
            closed?: "none" | "left" | "right" | "both";
            every: string;
            includeBoundaries?: boolean;
            indexColumn: string;
            label?: string;
            offset?: string;
            period?: string;
            startBy?: StartBy;
        },
    ): T;
    groupByRolling(
        opts: {
            by?: ColumnsOrExpr;
            closed?: "none" | "left" | "right" | "both";
            indexColumn: ColumnsOrExpr;
            offset?: string;
            period: string;
        },
    ): T;
}

Type Parameters

Hierarchy (View Summary)

GroupByOps
- pl.DataFrame
- LazyDataFrame

Index

Methods

groupByDynamic groupByRolling

Methods

groupByDynamic

groupByDynamic(
    options: {
        by?: ColumnsOrExpr;
        closed?: "none" | "left" | "right" | "both";
        every: string;
        includeBoundaries?: boolean;
        indexColumn: string;
        label?: string;
        offset?: string;
        period?: string;
        startBy?: StartBy;
    },
): T
Groups based on a time value (or index value of type Int32, Int64). Time windows are calculated and rows are assigned to windows. Different from a normal groupby is that a row can be member of multiple groups. The time/index window could be seen as a rolling window, with a window size determined by dates/times/values instead of slots in the DataFrame.

A window is defined by:
- every: interval of the window
- period: length of the window
- offset: offset of the window
The every, period and offset arguments are created with the following string language:
- 1ns (1 nanosecond)
- 1us (1 microsecond)
- 1ms (1 millisecond)
- 1s (1 second)
- 1m (1 minute)
- 1h (1 hour)
- 1d (1 day)
- 1w (1 week)
- 1mo (1 calendar month)
- 1y (1 calendar year)
- 1i (1 index count)
Or combine them: "3d12h4m25s" # 3 days, 12 hours, 4 minutes, and 25 seconds

In case of a groupbyDynamic on an integer column, the windows are defined by:
- "1i" # length 1
- "10i" # length 10
Parameters
Parameters
- options: {
      by?: ColumnsOrExpr;
      closed?: "none" | "left" | "right" | "both";
      every: string;
      includeBoundaries?: boolean;
      indexColumn: string;
      label?: string;
      offset?: string;
      period?: string;
      startBy?: StartBy;
  }
  - Optionalby?: ColumnsOrExpr
    Also group by this column/these columns
  - Optionalclosed?: "none" | "left" | "right" | "both"
    Defines if the window interval is closed or not. Any of {"left", "right", "both" "none"}
  - every: string
    interval of the window
  - OptionalincludeBoundaries?: boolean
    add the lower and upper bound of the window to the "_lower_bound" and "_upper_bound" columns. This will impact performance because it's harder to parallelize
  - indexColumn: string
    Column used to group based on the time window. Often to type Date/Datetime This column must be sorted in ascending order. If not the output will not make sense.
    
    In case of a dynamic groupby on indices, dtype needs to be one of {Int32, Int64}. Note that Int32 gets temporarily cast to Int64, so if performance matters use an Int64 column.
  - Optionallabel?: string
    Define which label to use for the window: Any if {'left', 'right', 'datapoint'}
  - Optionaloffset?: string
    offset of the window if None and period is None it will be equal to negative every
  - Optionalperiod?: string
    length of the window, if None it is equal to 'every'
  - OptionalstartBy?: StartBy
    The strategy to determine the start of the first window by. Any of {'window', 'datapoint', 'monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday', 'sunday'}
Returns T
- Defined in polars/shared_traits.ts:1461

groupByRolling

groupByRolling(
    opts: {
        by?: ColumnsOrExpr;
        closed?: "none" | "left" | "right" | "both";
        indexColumn: ColumnsOrExpr;
        offset?: string;
        period: string;
    },
): T

Create rolling groups based on a time column (or index value of type Int32, Int64).

Different from a rolling groupby the windows are now determined by the individual values and are not of constant intervals. For constant intervals use groupByDynamic

The period and offset arguments are created with the following string language:

1ns (1 nanosecond)
1us (1 microsecond)
1ms (1 millisecond)
1s (1 second)
1m (1 minute)
1h (1 hour)
1d (1 day)
1w (1 week)
1mo (1 calendar month)
1y (1 calendar year)
1i (1 index count)

Or combine them: "3d12h4m25s" # 3 days, 12 hours, 4 minutes, and 25 seconds

In case of a groupby_rolling on an integer column, the windows are defined by:

"1i" # length 1
"10i" # length 10

Parameters

opts: {
    by?: ColumnsOrExpr;
    closed?: "none" | "left" | "right" | "both";
    indexColumn: ColumnsOrExpr;
    offset?: string;
    period: string;
}
- Optionalby?: ColumnsOrExpr
  Also group by this column/these columns
- Optionalclosed?: "none" | "left" | "right" | "both"
  Defines if the window interval is closed or not. Any of {"left", "right", "both" "none"}
- indexColumn: ColumnsOrExpr
  Column used to group based on the time window. Often to type Date/Datetime This column must be sorted in ascending order. If not the output will not make sense.
  
  In case of a rolling groupby on indices, dtype needs to be one of {Int32, Int64}. Note that Int32 gets temporarily cast to Int64, so if performance matters use an Int64 column.
- Optionaloffset?: string
  offset of the window. Default is -period
- period: string
  length of the window

Returns T

Example


>dates = [
...     "2020-01-01 13:45:48",
...     "2020-01-01 16:42:13",
...     "2020-01-01 16:45:09",
...     "2020-01-02 18:12:48",
...     "2020-01-03 19:45:32",
...     "2020-01-08 23:16:43",
... ]
>df = pl.DataFrame({"dt": dates, "a": [3, 7, 5, 9, 2, 1]}).withColumn(
...     pl.col("dt").str.strptime(pl.Datetime)
... )
>out = df.groupbyRolling({indexColumn:"dt", period:"2d"}).agg(
...     [
...         pl.sum("a").alias("sum_a"),
...         pl.min("a").alias("min_a"),
...         pl.max("a").alias("max_a"),
...     ]
... )
>assert(out["sum_a"].toArray() === [3, 10, 15, 24, 11, 1])
>assert(out["max_a"].toArray() === [3, 7, 7, 9, 9, 1])
>assert(out["min_a"].toArray() === [3, 3, 3, 3, 2, 1])
>out
shape: (6, 4)
┌─────────────────────┬───────┬───────┬───────┐
│ dt                  ┆ a_sum ┆ a_max ┆ a_min │
│ ---                 ┆ ---   ┆ ---   ┆ ---   │
│ datetime[ms]        ┆ i64   ┆ i64   ┆ i64   │
╞═════════════════════╪═══════╪═══════╪═══════╡
│ 2020-01-01 13:45:48 ┆ 3     ┆ 3     ┆ 3     │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2020-01-01 16:42:13 ┆ 10    ┆ 7     ┆ 3     │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2020-01-01 16:45:09 ┆ 15    ┆ 7     ┆ 3     │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2020-01-02 18:12:48 ┆ 24    ┆ 9     ┆ 3     │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2020-01-03 19:45:32 ┆ 11    ┆ 9     ┆ 2     │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2020-01-08 23:16:43 ┆ 1     ┆ 1     ┆ 1     │
└─────────────────────┴───────┴───────┴───────┘

Interface GroupByOps<T>

Type Parameters

Hierarchy (View Summary)

Index

Methods

Methods

groupByDynamic

Parameters

Parameters

`Optional`by?: ColumnsOrExpr

`Optional`closed?: "none" | "left" | "right" | "both"

every: string

`Optional`includeBoundaries?: boolean

indexColumn: string

`Optional`label?: string

`Optional`offset?: string

`Optional`period?: string

`Optional`startBy?: StartBy

Returns T

groupByRolling

Parameters

`Optional`by?: ColumnsOrExpr

`Optional`closed?: "none" | "left" | "right" | "both"

indexColumn: ColumnsOrExpr

`Optional`offset?: string

period: string

Returns T

Example

Settings

On This Page

Interface GroupByOps<T>

Type Parameters

Hierarchy (View Summary)

Index

Methods

Methods

groupByDynamic

Parameters

Parameters

Optionalby?: ColumnsOrExpr

Optionalclosed?: "none" | "left" | "right" | "both"

every: string

OptionalincludeBoundaries?: boolean

indexColumn: string

Optionallabel?: string

Optionaloffset?: string

Optionalperiod?: string

OptionalstartBy?: StartBy

Returns T

groupByRolling

Parameters

Optionalby?: ColumnsOrExpr

Optionalclosed?: "none" | "left" | "right" | "both"

indexColumn: ColumnsOrExpr

Optionaloffset?: string

period: string

Returns T

Example

Settings

On This Page

`Optional`by?: ColumnsOrExpr

`Optional`closed?: "none" | "left" | "right" | "both"

`Optional`includeBoundaries?: boolean

`Optional`label?: string

`Optional`offset?: string

`Optional`period?: string

`Optional`startBy?: StartBy

`Optional`by?: ColumnsOrExpr

`Optional`closed?: "none" | "left" | "right" | "both"

`Optional`offset?: string