# Folds

`Polars` provides expressions/methods for horizontal aggregations like `sum`,`min`, `mean`, etc. However, when you need a more complex aggregation the default methods `Polars` supplies may not be sufficient. That's when `folds` come in handy.

The `fold` expression operates on columns for maximum speed. It utilizes the data layout very efficiently and often has vectorized execution.

### Manual sum

Let's start with an example by implementing the `sum` operation ourselves, with a `fold`.

``````df = pl.DataFrame(
{
"a": [1, 2, 3],
"b": [10, 20, 30],
}
)

out = df.select(
pl.fold(acc=pl.lit(0), function=lambda acc, x: acc + x, exprs=pl.all()).alias(
"sum"
),
)
print(out)
``````

``````let df = df!(
"a" => &[1, 2, 3],
"b" => &[10, 20, 30],
)?;

let out = df
.lazy()
.select([fold_exprs(lit(0), |acc, x| Ok(Some(acc + x)), [col("*")]).alias("sum")])
.collect()?;
println!("{}", out);
``````

``````shape: (3, 1)
┌─────┐
│ sum │
│ --- │
│ i64 │
╞═════╡
│ 11  │
│ 22  │
│ 33  │
└─────┘
``````

The snippet above recursively applies the function `f(acc, x) -> acc` to an accumulator `acc` and a new column `x`. The function operates on columns individually and can take advantage of cache efficiency and vectorization.

### Conditional

In the case where you'd want to apply a condition/predicate on all columns in a `DataFrame` a `fold` operation can be a very concise way to express this.

``````df = pl.DataFrame(
{
"a": [1, 2, 3],
"b": [0, 1, 2],
}
)

out = df.filter(
pl.fold(
acc=pl.lit(True),
function=lambda acc, x: acc & x,
exprs=pl.col("*") > 1,
)
)
print(out)
``````

``````let df = df!(
"a" => &[1, 2, 3],
"b" => &[0, 1, 2],
)?;

let out = df
.lazy()
.filter(fold_exprs(
lit(true),
|acc, x| Some(acc.bitand(&x)),
[col("*").gt(1)],
))
.collect()?;
println!("{}", out);
``````

``````shape: (1, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 3   ┆ 2   │
└─────┴─────┘
``````

In the snippet we filter all rows where each column value is `> 1`.

### Folds and string data

Folds could be used to concatenate string data. However, due to the materialization of intermediate columns, this operation will have squared complexity.

Therefore, we recommend using the `concat_str` expression for this.

``````df = pl.DataFrame(
{
"a": ["a", "b", "c"],
"b": [1, 2, 3],
}
)

out = df.select(pl.concat_str(["a", "b"]))
print(out)
``````

``````let df = df!(
"a" => &["a", "b", "c"],
"b" => &[1, 2, 3],
)?;

let out = df
.lazy()
.select([concat_str([col("a"), col("b")], "")])
.collect()?;
println!("{:?}", out);
``````

``````shape: (3, 1)
┌─────┐
│ a   │
│ --- │
│ str │
╞═════╡
│ a1  │
│ b2  │
│ c3  │
└─────┘
``````