窗口函数

Polars 支持窗口函数,灵感来自于PostgreSQL. Pandas 用户可能会将其识别为a groupby.transform(aggregation).

Polars 窗口函数比Pandas转换(transform)函数更加优雅. 我们可以在一个表达式中的多个列上应用多个函数!

import polars as pl

dataset = pl.DataFrame(
    {
        "A": [1, 2, 3, 4, 5],
        "fruits": ["banana", "banana", "apple", "apple", "banana"],
        "B": [5, 4, 3, 2, 1],
        "cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
    }
)

q = dataset.lazy().with_columns(
    [
        pl.sum("A").over("fruits").alias("fruit_sum_A"),  # 在"fruits"列的基础上进行"A"的加和,并另起一列
        pl.first("B").over("fruits").alias("fruit_first_B"),
        pl.max("B").over("cars").alias("cars_max_B"),
    ]
)

df = q.collect()
shape: (5, 7)
┌─────┬────────┬─────┬────────┬─────────────┬───────────────┬────────────┐
│ A   ┆ fruits ┆ B   ┆ cars   ┆ fruit_sum_A ┆ fruit_first_B ┆ cars_max_B │
│ --- ┆ ---    ┆ --- ┆ ---    ┆ ---         ┆ ---           ┆ ---        │
│ i64 ┆ str    ┆ i64 ┆ str    ┆ i64         ┆ i64           ┆ i64        │
╞═════╪════════╪═════╪════════╪═════════════╪═══════════════╪════════════╡
│ 1   ┆ banana ┆ 5   ┆ beetle ┆ 8           ┆ 5             ┆ 5          │
│ 2   ┆ banana ┆ 4   ┆ audi   ┆ 8           ┆ 5             ┆ 4          │
│ 3   ┆ apple  ┆ 3   ┆ beetle ┆ 7           ┆ 3             ┆ 5          │
│ 4   ┆ apple  ┆ 2   ┆ beetle ┆ 7           ┆ 3             ┆ 5          │
│ 5   ┆ banana ┆ 1   ┆ beetle ┆ 8           ┆ 5             ┆ 5          │
└─────┴────────┴─────┴────────┴─────────────┴───────────────┴────────────┘