Polars
Blazingly Fast DataFrame Library
Polars is a highly performant DataFrame library for manipulating structured data. The core is written in Rust, but the library is also available in Python. Its key features are:
- Fast: Polars is written from the ground up, designed close to the machine and without external dependencies.
- I/O: First class support for all common data storage layers: local, cloud storage & databases.
- Easy to use: Write your queries the way they were intended. Polars, internally, will determine the most efficient way to execute using its query optimizer.
- Out of Core: Polars supports out of core data transformation with its streaming API. Allowing you to process your results without requiring all your data to be in memory at the same time
- Parallel: Polars fully utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration.
- Vectorized Query Engine: Polars uses Apache Arrow, a columnar data format, to process your queries in a vectorized manner. It uses SIMD to optimize CPU usage.
About this guide
The Polars
user guide is intended to live alongside the API documentation. Its purpose is to explain (new) users how to use Polars
and to provide meaningful examples. The guide is split into two parts:
- Getting Started: A 10 minute helicopter view of the library and its primary function.
- User Guide: A detailed explanation of how the library is setup and how to use it most effectively.
If you are looking for details on a specific level / object, it is probably best to go the API documentation: Python | Rust.
Performance
Polars
is very fast, and in fact is one of the best performing solutions available.
See the results in h2oai's db-benchmark, revived by the DuckDB project.
Polars
TPCH Benchmark results are now available on the official website.
Example
scan_csv
路 filter
路 group_by
路 collect
import polars as pl
q = (
pl.scan_csv("docs/data/iris.csv")
.filter(pl.col("sepal_length") > 5)
.group_by("species")
.agg(pl.all().sum())
)
df = q.collect()
LazyCsvReader
路 filter
路 group_by
路 collect
路 Available on feature csv 路 Available on feature streaming
use polars::prelude::*;
let q = LazyCsvReader::new("docs/data/iris.csv")
.has_header(true)
.finish()?
.filter(col("sepal_length").gt(lit(5)))
.group_by(vec![col("species")])
.agg([col("*").sum()]);
let df = q.collect();
Sponsors
Community
Polars
has a very active community with frequent releases (approximately weekly). Below are some of the top contributors to the project:
Contribute
Thanks for taking the time to contribute! We appreciate all contributions, from reporting bugs to implementing new features. If you're unclear on how to proceed read our contribution guide or contact us on discord.
License
This project is licensed under the terms of the MIT license.