Interface GroupBy

Starts a new GroupBy operation.

interface GroupBy {
    [inspect](): string;
    agg(...columns: pl.Expr[]): pl.DataFrame<any>;
    agg(columns: Record<string, (keyof Expr) | (keyof Expr)[]>): pl.DataFrame<any>;
    aggList(): pl.DataFrame<any>;
    count(): pl.DataFrame<any>;
    first(): pl.DataFrame<any>;
    groups(): pl.DataFrame<any>;
    head(n?: number): pl.DataFrame<any>;
    last(): pl.DataFrame<any>;
    len(): pl.DataFrame<any>;
    max(): pl.DataFrame<any>;
    mean(): pl.DataFrame<any>;
    median(): pl.DataFrame<any>;
    min(): pl.DataFrame<any>;
    nUnique(): pl.DataFrame<any>;
    pivot(__namedParameters: {
        pivotCol: string;
        valuesCol: string;
    }): PivotOps;
    pivot(pivotCol: string, valuesCol: string): PivotOps;
    quantile(quantile: number): pl.DataFrame<any>;
    sum(): pl.DataFrame<any>;
    tail(n?: number): pl.DataFrame<any>;
    toString(): string;
}

Methods

  • Use multiple aggregations on columns. This can be combined with complete lazy API and is considered idiomatic polars.


    Parameters

    • Rest...columns: pl.Expr[]

      map of 'col' -> 'agg'

      • using lazy API (recommended): [col('foo').sum(), col('bar').min()]
      • using multiple aggs per column: {'foo': ['sum', 'numUnique'], 'bar': ['min'] }
      • using single agg per column: {'foo': ['sum'], 'bar': 'min' }

    Returns pl.DataFrame<any>

    // use lazy api rest parameter style
    > df.groupBy('foo', 'bar')
    > .agg(pl.sum('ham'), col('spam').tail(4).sum())

    // use lazy api array style
    > df.groupBy('foo', 'bar')
    > .agg([pl.sum('ham'), col('spam').tail(4).sum()])

    // use a mapping
    > df.groupBy('foo', 'bar')
    > .agg({'spam': ['sum', 'min']})
  • Parameters

    • columns: Record<string, (keyof Expr) | (keyof Expr)[]>

    Returns pl.DataFrame<any>

  • Return first n rows of each group.

    Parameters

    • Optionaln: number

      Number of values of the group to select

    Returns pl.DataFrame<any>

    > df = pl.DataFrame({
    > "letters": ["c", "c", "a", "c", "a", "b"],
    > "nrs": [1, 2, 3, 4, 5, 6]
    > })
    > df
    shape: (6, 2)
    ╭─────────┬─────╮
    lettersnrs
    │ --- ┆ --- │
    stri64
    ╞═════════╪═════╡
    "c"1
    ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
    "c"2
    ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
    "a"3
    ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
    "c"4
    ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
    "a"5
    ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
    "b"6
    ╰─────────┴─────╯
    > df.groupby("letters")
    > .head(2)
    > .sort("letters");
    > >>
    shape: (5, 2)
    ╭─────────┬─────╮
    lettersnrs
    │ --- ┆ --- │
    stri64
    ╞═════════╪═════╡
    "a"3
    ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
    "a"5
    ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
    "b"6
    ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
    "c"1
    ├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
    "c"2
    ╰─────────┴─────╯
  • Do a pivot operation based on the group key, a pivot column and an aggregation function on the values column.

    Parameters

    • __namedParameters: {
          pivotCol: string;
          valuesCol: string;
      }
      • pivotCol: string
      • valuesCol: string

    Returns PivotOps

  • Parameters

    • pivotCol: string
    • valuesCol: string

    Returns PivotOps