窗口函数
Polars
支持窗口函数,灵感来自于PostgreSQL. Pandas
用户可能会将其识别为a groupby.transform(aggregation)
.
Polars
窗口函数比Pandas
转换(transform)函数更加优雅. 我们可以在一个表达式中的多个列上应用多个函数!
import polars as pl
dataset = pl.DataFrame(
{
"A": [1, 2, 3, 4, 5],
"fruits": ["banana", "banana", "apple", "apple", "banana"],
"B": [5, 4, 3, 2, 1],
"cars": ["beetle", "audi", "beetle", "beetle", "beetle"],
}
)
q = dataset.lazy().with_columns(
[
pl.sum("A").over("fruits").alias("fruit_sum_A"), # 在"fruits"列的基础上进行"A"的加和,并另起一列
pl.first("B").over("fruits").alias("fruit_first_B"),
pl.max("B").over("cars").alias("cars_max_B"),
]
)
df = q.collect()
shape: (5, 7)
┌─────┬────────┬─────┬────────┬─────────────┬───────────────┬────────────┐
│ A ┆ fruits ┆ B ┆ cars ┆ fruit_sum_A ┆ fruit_first_B ┆ cars_max_B │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ i64 ┆ str ┆ i64 ┆ i64 ┆ i64 │
╞═════╪════════╪═════╪════════╪═════════════╪═══════════════╪════════════╡
│ 1 ┆ banana ┆ 5 ┆ beetle ┆ 8 ┆ 5 ┆ 5 │
│ 2 ┆ banana ┆ 4 ┆ audi ┆ 8 ┆ 5 ┆ 4 │
│ 3 ┆ apple ┆ 3 ┆ beetle ┆ 7 ┆ 3 ┆ 5 │
│ 4 ┆ apple ┆ 2 ┆ beetle ┆ 7 ┆ 3 ┆ 5 │
│ 5 ┆ banana ┆ 1 ┆ beetle ┆ 8 ┆ 5 ┆ 5 │
└─────┴────────┴─────┴────────┴─────────────┴───────────────┴────────────┘