nodejs-polars
    Preparing search index...

    Interface GroupByOps<T>

    GroupBy operations that can be applied to a DataFrame or LazyFrame.

    interface GroupByOps<T> {
        groupByDynamic(
            options: {
                by?: ColumnsOrExpr;
                closed?: "none" | "left" | "right" | "both";
                every: string;
                includeBoundaries?: boolean;
                indexColumn: string;
                label?: string;
                offset?: string;
                period?: string;
                startBy?: StartBy;
            },
        ): T;
        groupByRolling(
            opts: {
                by?: ColumnsOrExpr;
                closed?: "none" | "left" | "right" | "both";
                indexColumn: ColumnsOrExpr;
                offset?: string;
                period: string;
            },
        ): T;
    }

    Type Parameters

    • T

    Hierarchy (View Summary)

    Index

    Methods

    • Groups based on a time value (or index value of type Int32, Int64). Time windows are calculated and rows are assigned to windows. Different from a normal groupby is that a row can be member of multiple groups. The time/index window could be seen as a rolling window, with a window size determined by dates/times/values instead of slots in the DataFrame.

      A window is defined by:

      • every: interval of the window
      • period: length of the window
      • offset: offset of the window

      The every, period and offset arguments are created with the following string language:

      • 1ns (1 nanosecond)
      • 1us (1 microsecond)
      • 1ms (1 millisecond)
      • 1s (1 second)
      • 1m (1 minute)
      • 1h (1 hour)
      • 1d (1 day)
      • 1w (1 week)
      • 1mo (1 calendar month)
      • 1y (1 calendar year)
      • 1i (1 index count)

      Or combine them: "3d12h4m25s" # 3 days, 12 hours, 4 minutes, and 25 seconds

      In case of a groupbyDynamic on an integer column, the windows are defined by:

      • "1i" # length 1
      • "10i" # length 10

      Parameters

      • options: {
            by?: ColumnsOrExpr;
            closed?: "none" | "left" | "right" | "both";
            every: string;
            includeBoundaries?: boolean;
            indexColumn: string;
            label?: string;
            offset?: string;
            period?: string;
            startBy?: StartBy;
        }
        • Optionalby?: ColumnsOrExpr

          Also group by this column/these columns

        • Optionalclosed?: "none" | "left" | "right" | "both"

          Defines if the window interval is closed or not. Any of {"left", "right", "both" "none"}

        • every: string

          interval of the window

        • OptionalincludeBoundaries?: boolean

          add the lower and upper bound of the window to the "_lower_bound" and "_upper_bound" columns. This will impact performance because it's harder to parallelize

        • indexColumn: string

          Column used to group based on the time window. Often to type Date/Datetime This column must be sorted in ascending order. If not the output will not make sense.

          In case of a dynamic groupby on indices, dtype needs to be one of {Int32, Int64}. Note that
          Int32 gets temporarily cast to Int64, so if performance matters use an Int64 column.
          
        • Optionallabel?: string

          Define which label to use for the window: Any if {'left', 'right', 'datapoint'}

        • Optionaloffset?: string

          offset of the window if None and period is None it will be equal to negative every

        • Optionalperiod?: string

          length of the window, if None it is equal to 'every'

        • OptionalstartBy?: StartBy

          The strategy to determine the start of the first window by. Any of {'window', 'datapoint', 'monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday', 'sunday'}

      Returns T

    • Create rolling groups based on a time column (or index value of type Int32, Int64).

      Different from a rolling groupby the windows are now determined by the individual values and are not of constant intervals. For constant intervals use groupByDynamic

      The period and offset arguments are created with the following string language:

      • 1ns (1 nanosecond)
      • 1us (1 microsecond)
      • 1ms (1 millisecond)
      • 1s (1 second)
      • 1m (1 minute)
      • 1h (1 hour)
      • 1d (1 day)
      • 1w (1 week)
      • 1mo (1 calendar month)
      • 1y (1 calendar year)
      • 1i (1 index count)

      Or combine them: "3d12h4m25s" # 3 days, 12 hours, 4 minutes, and 25 seconds

      In case of a groupby_rolling on an integer column, the windows are defined by:

      • "1i" # length 1
      • "10i" # length 10

      Parameters

      • opts: {
            by?: ColumnsOrExpr;
            closed?: "none" | "left" | "right" | "both";
            indexColumn: ColumnsOrExpr;
            offset?: string;
            period: string;
        }
        • Optionalby?: ColumnsOrExpr

          Also group by this column/these columns

        • Optionalclosed?: "none" | "left" | "right" | "both"

          Defines if the window interval is closed or not. Any of {"left", "right", "both" "none"}

        • indexColumn: ColumnsOrExpr

          Column used to group based on the time window. Often to type Date/Datetime This column must be sorted in ascending order. If not the output will not make sense.

          In case of a rolling groupby on indices, dtype needs to be one of {Int32, Int64}. Note that Int32 gets temporarily cast to Int64, so if performance matters use an Int64 column.

        • Optionaloffset?: string

          offset of the window. Default is -period

        • period: string

          length of the window

      Returns T


      >dates = [
      ... "2020-01-01 13:45:48",
      ... "2020-01-01 16:42:13",
      ... "2020-01-01 16:45:09",
      ... "2020-01-02 18:12:48",
      ... "2020-01-03 19:45:32",
      ... "2020-01-08 23:16:43",
      ... ]
      >df = pl.DataFrame({"dt": dates, "a": [3, 7, 5, 9, 2, 1]}).withColumn(
      ... pl.col("dt").str.strptime(pl.Datetime)
      ... )
      >out = df.groupbyRolling({indexColumn:"dt", period:"2d"}).agg(
      ... [
      ... pl.sum("a").alias("sum_a"),
      ... pl.min("a").alias("min_a"),
      ... pl.max("a").alias("max_a"),
      ... ]
      ... )
      >assert(out["sum_a"].toArray() === [3, 10, 15, 24, 11, 1])
      >assert(out["max_a"].toArray() === [3, 7, 7, 9, 9, 1])
      >assert(out["min_a"].toArray() === [3, 3, 3, 3, 2, 1])
      >out
      shape: (6, 4)
      ┌─────────────────────┬───────┬───────┬───────┐
      dta_suma_maxa_min
      │ --- ┆ --- ┆ --- ┆ --- │
      datetime[ms] ┆ i64i64i64
      ╞═════════════════════╪═══════╪═══════╪═══════╡
      2020-01-01 13:45:48333
      ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
      2020-01-01 16:42:131073
      ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
      2020-01-01 16:45:091573
      ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
      2020-01-02 18:12:482493
      ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
      2020-01-03 19:45:321192
      ├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
      2020-01-08 23:16:43111
      └─────────────────────┴───────┴───────┴───────┘