Other optimizations

The Other optimizations page is under construction.

Besides predicate and projection pushdown, Polars does other optimizations.

One important topic is optional caching and parallelization. It's easy to imagine having two different DataFrame computations that lead to a scan of the same file. Polars may cache the scanned file to prevent scanning the same file twice. However, if you want to, you may override this behavior and force Polars to read the same file. This could be faster because the scan can be done in parallel.

Join parallelization

If we look at the previous query, we see that the join operation has as input a computation path with data/reddit.csv as root and one path with data/runescape.csv as root. Polars can observe that there are no dependencies between the two DataFrames and will read both files in parallel. If other operations are done before the join (e.g. groupby, filters, etc.) they are also executed in parallel.

Simplify expressions

Some other optimizations that are done are expression simplifications. The impact of these optimizations is less than that of predicate and projection pushdown, but they likely add up. You can track this issue to see the latest status of those.