Skip to content

Lists

An expression context we haven't discussed yet is the List context. This means simply we can apply any expression on the elements of a List.

Row wise computations

This context is ideal for computing things in row orientation.

Polars expressions work on columns that have the guarantee that they consist of homogeneous data. Columns have this guarantee, rows in a DataFrame not so much. Luckily we have a data type that has the guarantee that the rows are homogeneous: pl.List data type.

Let's say we have the following data:

DataFrame

grades = pl.DataFrame(
    {
        "student": ["bas", "laura", "tim", "jenny"],
        "arithmetic": [10, 5, 6, 8],
        "biology": [4, 6, 2, 7],
        "geography": [8, 4, 9, 7],
    }
)
print(grades)

DataFrame

let grades = df!(
    "student" => &["bas", "laura", "tim", "jenny"],
    "arithmetic" => &[10, 5, 6, 8],
    "biology" => &[4, 6, 2, 7],
    "geography" => &[8, 4, 9, 7],
)?;
println!("{}", grades);

shape: (4, 4)
┌─────────┬────────────┬─────────┬───────────┐
│ student ┆ arithmetic ┆ biology ┆ geography │
│ ---     ┆ ---        ┆ ---     ┆ ---       │
│ str     ┆ i64        ┆ i64     ┆ i64       │
╞═════════╪════════════╪═════════╪═══════════╡
│ bas     ┆ 10         ┆ 4       ┆ 8         │
│ laura   ┆ 5          ┆ 6       ┆ 4         │
│ tim     ┆ 6          ┆ 2       ┆ 9         │
│ jenny   ┆ 8          ┆ 7       ┆ 7         │
└─────────┴────────────┴─────────┴───────────┘

If we want to compute the rank of all the columns except for "student", we can collect those into a list data type:

This would give:

concat_list

out = grades.select([pl.concat_list(pl.all().exclude("student")).alias("all_grades")])
print(out)

concat_lst

let out = grades
    .clone()
    .lazy()
    .select([concat_lst([all().exclude(["student"])]).alias("all_grades")])
    .collect()?;
println!("{}", out);

shape: (4, 1)
┌────────────┐
│ all_grades │
│ ---        │
│ list[i64]  │
╞════════════╡
│ [10, 4, 8] │
│ [5, 6, 4]  │
│ [6, 2, 9]  │
│ [8, 7, 7]  │
└────────────┘

Running polars expression on list elements

We can run any polars expression on the elements of a list with the arr.eval (arr().eval in Rust) expression! These expressions run entirely on polars' query engine and can run in parallel so will be super fast.

Let's expand the example from above with something a little more interesting. Pandas allows you to compute the percentages of the rank values. Polars doesn't provide such a keyword argument. But because expressions are so versatile we can create our own percentage rank expression. Let's try that!

Note that we must select the list's element from the context. When we apply expressions over list elements, we use pl.element() to select the element of a list.

arr.eval

# the percentage rank expression
rank_pct = pl.element().rank(descending=True) / pl.col("*").count()

out = grades.with_columns(
    # create the list of homogeneous data
    pl.concat_list(pl.all().exclude("student")).alias("all_grades")
).select(
    [
        # select all columns except the intermediate list
        pl.all().exclude("all_grades"),
        # compute the rank by calling `arr.eval`
        pl.col("all_grades").arr.eval(rank_pct, parallel=True).alias("grades_rank"),
    ]
)
print(out)

arr · Available on feature rank · Available on feature list_eval

// the percentage rank expression
let rank_opts = RankOptions {
    method: RankMethod::Average,
    descending: true,
};
let rank_pct = col("").rank(rank_opts) / col("").count().cast(DataType::Float32);

let grades = grades
    .clone()
    .lazy()
    .with_columns(
        // create the list of homogeneous data
        [concat_lst([all().exclude(["student"])]).alias("all_grades")],
    )
    .select([
        // select all columns except the intermediate list
        all().exclude(["all_grades"]),
        // compute the rank by calling `arr.eval`
        col("all_grades")
            .arr()
            .eval(rank_pct, true)
            .alias("grades_rank"),
    ])
    .collect()?;
println!("{}", grades);

shape: (4, 5)
┌─────────┬────────────┬─────────┬───────────┬────────────────────────────────┐
│ student ┆ arithmetic ┆ biology ┆ geography ┆ grades_rank                    │
│ ---     ┆ ---        ┆ ---     ┆ ---       ┆ ---                            │
│ str     ┆ i64        ┆ i64     ┆ i64       ┆ list[f64]                      │
╞═════════╪════════════╪═════════╪═══════════╪════════════════════════════════╡
│ bas     ┆ 10         ┆ 4       ┆ 8         ┆ [0.333333, 1.0, 0.666667]      │
│ laura   ┆ 5          ┆ 6       ┆ 4         ┆ [0.666667, 0.333333, 1.0]      │
│ tim     ┆ 6          ┆ 2       ┆ 9         ┆ [0.666667, 1.0, 0.333333]      │
│ jenny   ┆ 8          ┆ 7       ┆ 7         ┆ [0.333333, 0.833333, 0.833333] │
└─────────┴────────────┴─────────┴───────────┴────────────────────────────────┘

Note that this solution works for any expressions/operation you want to do row wise.