Polars has a powerful concept called expressions that is central to its very fast performance.
Expressions are at the core of many data science operations:
- taking a sample of rows from a column
- multiplying values in a column
- extracting a column of years from dates
- convert a column of strings to lowercase
- and so on!
However, expressions are also used within other operations:
- taking the mean of a group in a
- calculating the size of groups in a
- taking the sum horizontally across columns
Polars performs these core data transformations very quickly by:
- automatic query optimization on each expression
- automatic parallelization of expressions on many columns
Polars expressions are a mapping from a series to a series (or mathematically
Fn(Series) -> Series). As expressions have a
Series as an input and a
Series as an output then it is straightforward to do a sequence of expressions (similar to method chaining in pandas).
The following is an expression:
The snippet above says:
- Select column "foo"
- Then sort the column (not in reversed order)
- Then take the first two values of the sorted output
The power of expressions is that every expression produces a new expression, and that they can be piped together. You can run an expression by passing them to one of Polars execution contexts.
Here we run two expressions by running
All expressions are run in parallel, meaning that separate Polars expressions are embarrassingly parallel. Note that within an expression there may be more parallelization going on.
This is the tip of the iceberg in terms of possible expressions. There are a ton more, and they can be combined in a variety of ways. This page is intended to get you familiar with the concept of expressions, in the section on expressions we will dive deeper.