Expression contexts
You cannot use an expression anywhere. An expression needs a context, the available contexts are:
- selection:
df.select([..])
- groupby aggregation:
df.groupby(..).agg([..])
- hstack/ add columns:
df.with_columns([..])
Syntactic sugar
The reason for such a context, is that you actually are using the Polars lazy API, even if you use it in eager. For instance this snippet:
df.groupby("foo").agg([pl.col("bar").sum()])
eager.groupby(["foo"])?.agg(&[("bar", &["sum"])])?;
actually desugars to:
(df.lazy().groupby("foo").agg([pl.col("bar").sum()])).collect()
eager.lazy().groupby(["foo"]).agg([col("bar").sum()]).collect()?;
This allows Polars to push the expression into the query engine, do optimizations, and cache intermediate results.
Rust differs from Python somewhat in this respect. Where Python's eager mode is little more than a thin veneer over the lazy API, Rust's eager mode is closer to an implementation detail, and isn't really recommended for end-user use. It is possible that the eager API in Rust will be scoped private sometime in the future. Therefore, for the remainder of this document, assume that the Rust examples are using the lazy API.
Select context
In the select
context the selection applies expressions over columns. The expressions in this context must produce Series
that are all
the same length or have a length of 1
.
A Series
of a length of 1
will be broadcasted to match the height of the DataFrame
.
Note that a select
may produce new columns that are aggregations, combinations of expressions, or literals.
Selection context
out = df.select(
[
pl.sum("nrs"),
pl.col("names").sort(),
pl.col("names").first().alias("first name"),
(pl.mean("nrs") * 10).alias("10xnrs"),
]
)
print(out)
let out = df
.clone()
.lazy()
.select([
sum("nrs"),
col("names").sort(false),
col("names").first().alias("first name"),
mean("nrs").mul(lit(10)).alias("10xnrs"),
])
.collect()?;
println!("{}", out);
shape: (5, 4)
┌─────┬───────┬────────────┬────────┐
│ nrs ┆ names ┆ first name ┆ 10xnrs │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ f64 │
╞═════╪═══════╪════════════╪════════╡
│ 11 ┆ null ┆ foo ┆ 27.5 │
│ 11 ┆ egg ┆ foo ┆ 27.5 │
│ 11 ┆ foo ┆ foo ┆ 27.5 │
│ 11 ┆ ham ┆ foo ┆ 27.5 │
│ 11 ┆ spam ┆ foo ┆ 27.5 │
└─────┴───────┴────────────┴────────┘
Add columns
Adding columns to a DataFrame
using with_columns
is also the selection
context.
df = df.with_columns(
[
pl.sum("nrs").alias("nrs_sum"),
pl.col("random").count().alias("count"),
]
)
print(df)
let out = df
.clone()
.lazy()
.with_columns([
sum("nrs").alias("nrs_sum"),
col("random").count().alias("count"),
])
.collect()?;
println!("{}", out);
shape: (5, 6)
┌──────┬───────┬──────────┬────────┬─────────┬───────┐
│ nrs ┆ names ┆ random ┆ groups ┆ nrs_sum ┆ count │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ f64 ┆ str ┆ i64 ┆ u32 │
╞══════╪═══════╪══════════╪════════╪═════════╪═══════╡
│ 1 ┆ foo ┆ 0.154163 ┆ A ┆ 11 ┆ 5 │
│ 2 ┆ ham ┆ 0.74005 ┆ A ┆ 11 ┆ 5 │
│ 3 ┆ spam ┆ 0.263315 ┆ B ┆ 11 ┆ 5 │
│ null ┆ egg ┆ 0.533739 ┆ C ┆ 11 ┆ 5 │
│ 5 ┆ null ┆ 0.014575 ┆ B ┆ 11 ┆ 5 │
└──────┴───────┴──────────┴────────┴─────────┴───────┘
Groupby context
In the groupby
context expressions work on groups and thus may yield results of any length (a group may have many members).
out = df.groupby("groups").agg(
[
pl.sum("nrs"), # sum nrs by groups
pl.col("random").count().alias("count"), # count group members
# sum random where name != null
pl.col("random").filter(pl.col("names").is_not_null()).sum().suffix("_sum"),
pl.col("names").reverse().alias(("reversed names")),
]
)
print(out)
let out = df
.lazy()
.groupby([col("groups")])
.agg([
sum("nrs"), // sum nrs by groups
col("random").count().alias("count"), // count group members
// sum random where name != null
col("random")
.filter(col("names").is_not_null())
.sum()
.suffix("_sum"),
col("names").reverse().alias("reversed names"),
])
.collect()?;
println!("{}", out);
shape: (3, 5)
┌────────┬──────┬───────┬────────────┬────────────────┐
│ groups ┆ nrs ┆ count ┆ random_sum ┆ reversed names │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ u32 ┆ f64 ┆ list[str] │
╞════════╪══════╪═══════╪════════════╪════════════════╡
│ C ┆ null ┆ 1 ┆ 0.533739 ┆ ["egg"] │
│ A ┆ 3 ┆ 2 ┆ 0.894213 ┆ ["ham", "foo"] │
│ B ┆ 8 ┆ 2 ┆ 0.263315 ┆ [null, "spam"] │
└────────┴──────┴───────┴────────────┴────────────────┘
Besides the standard groupby
, groupby_dynamic
, and groupby_rolling
are also entrances to the groupby context
.