Functions

Polars expressions have a large number of built in functions. These allow you to create complex queries without the need for user defined functions. There are too many to go through here, but we will cover some of the more popular use cases. If you want to view all the functions go to the API Reference for your programming language.

In the examples below we will use the following DataFrame:

Python Rust

DataFrame

df = pl.DataFrame(
    {
        "nrs": [1, 2, 3, None, 5],
        "names": ["foo", "ham", "spam", "egg", "spam"],
        "random": np.random.rand(5),
        "groups": ["A", "A", "B", "C", "B"],
    }
)
print(df)

DataFrame

use rand::{thread_rng, Rng};

let mut arr = [0f64; 5];
thread_rng().fill(&mut arr);

let df = df! (
    "nrs" => &[Some(1), Some(2), Some(3), None, Some(5)],
    "names" => &["foo", "ham", "spam", "egg", "spam"],
    "random" => &arr,
    "groups" => &["A", "A", "B", "C", "B"],
)?;

println!("{}", &df);

shape: (5, 4)
┌──────┬───────┬──────────┬────────┐
│ nrs  ┆ names ┆ random   ┆ groups │
│ ---  ┆ ---   ┆ ---      ┆ ---    │
│ i64  ┆ str   ┆ f64      ┆ str    │
╞══════╪═══════╪══════════╪════════╡
│ 1    ┆ foo   ┆ 0.154163 ┆ A      │
│ 2    ┆ ham   ┆ 0.74005  ┆ A      │
│ 3    ┆ spam  ┆ 0.263315 ┆ B      │
│ null ┆ egg   ┆ 0.533739 ┆ C      │
│ 5    ┆ spam  ┆ 0.014575 ┆ B      │
└──────┴───────┴──────────┴────────┘

Column naming

By default if you perform an expression it will keep the same name as the original column. In the example below we perform an expression on the nrs column. Note that the output DataFrame still has the same name.

Python Rust

df_samename = df.select(pl.col("nrs") + 5)
print(df_samename)

let df_samename = df.clone().lazy().select([col("nrs") + lit(5)]).collect()?;
println!("{}", &df_samename);

shape: (5, 1)
┌──────┐
│ nrs  │
│ ---  │
│ i64  │
╞══════╡
│ 6    │
│ 7    │
│ 8    │
│ null │
│ 10   │
└──────┘

This might get problematic in the case you use the same column multiple times in your expression as the output columns will get duplicated. For example, the following query will fail.

Python Rust

try:
    df_samename2 = df.select(pl.col("nrs") + 5, pl.col("nrs") - 5)
    print(df_samename2)
except Exception as e:
    print(e)

let df_samename2 = df
    .clone()
    .lazy()
    .select([col("nrs") + lit(5), col("nrs") - lit(5)])
    .collect();
match df_samename2 {
    Ok(df) => println!("{}", &df),
    Err(e) => println!("{:?}", &e),
};

the name: 'nrs' is duplicate

It's possible that multiple expressions are returning the same default column name. If this is the case, try renaming the columns with `.alias("new_name")` to avoid duplicate column names.

You can change the output name of an expression by using the alias function

Python Rust

alias

df_alias = df.select(
    (pl.col("nrs") + 5).alias("nrs + 5"),
    (pl.col("nrs") - 5).alias("nrs - 5"),
)
print(df_alias)

alias

let df_alias = df
    .clone()
    .lazy()
    .select([
        (col("nrs") + lit(5)).alias("nrs + 5"),
        (col("nrs") - lit(5)).alias("nrs - 5"),
    ])
    .collect()?;
println!("{}", &df_alias);

shape: (5, 2)
┌─────────┬─────────┐
│ nrs + 5 ┆ nrs - 5 │
│ ---     ┆ ---     │
│ i64     ┆ i64     │
╞═════════╪═════════╡
│ 6       ┆ -4      │
│ 7       ┆ -3      │
│ 8       ┆ -2      │
│ null    ┆ null    │
│ 10      ┆ 0       │
└─────────┴─────────┘

In case of multiple columns for example when using all() or col(*) you can apply a mapping function name.map to change the original column name into something else. In case you want to add a suffix (name.suffix()) or prefix (name.prefix()) these are also built in.

Python

name.prefix name.suffix name.map

Count unique values

There are two ways to count unique values in Polars: an exact methodology and an approximation. The approximation uses the HyperLogLog++ algorithm to approximate the cardinality and is especially useful for very large datasets where an approximation is good enough.

Python Rust

n_unique · approx_n_unique

df_alias = df.select(
    pl.col("names").n_unique().alias("unique"),
    pl.approx_n_unique("names").alias("unique_approx"),
)
print(df_alias)

n_unique · approx_n_unique

let df_alias = df
    .clone()
    .lazy()
    .select([
        col("names").n_unique().alias("unique"),
        // Following query shows there isn't anything in Rust API
        // https://docs.rs/polars/latest/polars/?search=approx_n_unique
        // col("names").approx_n_unique().alias("unique_approx"),
    ])
    .collect()?;
println!("{}", &df_alias);

shape: (1, 2)
┌────────┬───────────────┐
│ unique ┆ unique_approx │
│ ---    ┆ ---           │
│ u32    ┆ u32           │
╞════════╪═══════════════╡
│ 4      ┆ 4             │
└────────┴───────────────┘

Conditionals

Polars supports if-else like conditions in expressions with the when, then, otherwise syntax. The predicate is placed in the when clause and when this evaluates to true the then expression is applied otherwise the otherwise expression is applied (row-wise).

Python Rust

when

df_conditional = df.select(
    pl.col("nrs"),
    pl.when(pl.col("nrs") > 2)
    .then(pl.lit(True))
    .otherwise(pl.lit(False))
    .alias("conditional"),
)
print(df_conditional)

when

let df_conditional = df
    .clone()
    .lazy()
    .select([
        col("nrs"),
        when(col("nrs").gt(2))
            .then(lit(true))
            .otherwise(lit(false))
            .alias("conditional"),
    ])
    .collect()?;
println!("{}", &df_conditional);

shape: (5, 2)
┌──────┬─────────────┐
│ nrs  ┆ conditional │
│ ---  ┆ ---         │
│ i64  ┆ bool        │
╞══════╪═════════════╡
│ 1    ┆ false       │
│ 2    ┆ false       │
│ 3    ┆ true        │
│ null ┆ false       │
│ 5    ┆ true        │
└──────┴─────────────┘