This book is an introduction to the
Polars DataFrame library. Its goal is to
explain the inner workings of
Polars by going through examples and compare it to other
solutions. Some design choices are here introduced, and the optimal use of
Polars is completely written in
runtime overhead!) and uses
Rust implementation- at its foundation, the
examples presented in this guide will be mostly using its higher-level language
bindings. Those latter are merely a thin wrapper that will not offer more
functionalities than the core library does.
The goal of
Polars is being a lightning fast DataFrame library that utilizes all
available cores on your machine.
Polars is semi-lazy. It allows you to do most of your work eagerly, similar to
it does provide you with a powerful expression syntax that will be optimized executed on polars' query engine.
Polars also supports full lazy query execution that allows for more query optimization.
Polars keeps track of your query in a logical plan. This
plan is optimized and reordered before running it. When a result is requested
distributes the available work to different executors that use the algorithms available
in the eager API to come up with the result. Because the whole query context is known to
the optimizer and executors of the logical plan, processes dependent on separate data
sources can be parallelized on the fly.
Polars is very fast, and in fact is one of the best performing solutions available. See the results in h2oai's db-benchmark. The image below shows the biggest datasets yielding a result.
Below a concise list of the features allowing
Polars to meet its goals:
- Copy-on-write (COW) semantics
- "Free" clones
- Cheap appends
- Appending without clones
- Column oriented data storage
- No block manager (-i.e.- predictable performance)
- Missing values indicated with bitmask
- NaN are different from missing
- Bitmask optimizations
- Efficient algorithms
- Query optimizations
- Predicate pushdown
- Filtering at scan level
- Projection pushdown
- Projection at scan level
- Simplify expressions
- Parallel execution of physical plan
- Predicate pushdown
- SIMD vectorization
Polars is proudly powered by