Polars has a powerful concept called expressions that is central to its very fast performance.
Expressions are at the core of many data science operations:
- taking a sample of rows from a column
- multiplying values in a column
- extracting a column of years from dates
- convert a column of strings to lowercase
- and so on!
However, expressions are also used within other operations:
- taking the mean of a group in a
- calculating the size of groups in a
- taking the sum horizontally across columns
Polars performs these core data transformations very quickly by:
- automatic query optimization on each expression
- automatic parallelization of expressions on many columns
Polars expressions are a mapping from a series to a series (or mathematically
Fn(Series) -> Series). As expressions have a
Series as an input and a
Series as an output then it is straightforward to do a sequence of expressions (similar to method chaining in
This has all been a bit abstract, so let's start with some examples.