Polars

logo

Blazingly Fast DataFrame Library

Polars is a highly performant DataFrame library for manipulating structured data. The core is written in Rust, but the library is also available in Python. Its key features are:

Fast: Polars is written from the ground up, designed close to the machine and without external dependencies.
I/O: First class support for all common data storage layers: local, cloud storage & databases.
Easy to use: Write your queries the way they were intended. Polars, internally, will determine the most efficient way to execute using its query optimizer.
Out of Core: Polars supports out of core data transformation with its streaming API. Allowing you to process your results without requiring all your data to be in memory at the same time
Parallel: Polars fully utilises the power of your machine by dividing the workload among the available CPU cores without any additional configuration.
Vectorized Query Engine: Polars uses Apache Arrow, a columnar data format, to process your queries in a vectorized manner. It uses SIMD to optimize CPU usage.

About this guide

The Polars user guide is intended to live alongside the API documentation. Its purpose is to explain (new) users how to use Polars and to provide meaningful examples. The guide is split into two parts:

Getting Started: A 10 minute helicopter view of the library and its primary function.
User Guide: A detailed explanation of how the library is setup and how to use it most effectively.

If you are looking for details on a specific level / object, it is probably best to go the API documentation: Python | Rust.

Performance

Polars is very fast, and in fact is one of the best performing solutions available. See the results in h2oai's db-benchmark, revived by the DuckDB project.

Polars TPCH Benchmark results are now available on the official website.

Example

Python Rust

scan_csv · filter · group_by · collect

import polars as pl

q = (
    pl.scan_csv("docs/data/iris.csv")
    .filter(pl.col("sepal_length") > 5)
    .group_by("species")
    .agg(pl.all().sum())
)

df = q.collect()

LazyCsvReader · filter · group_by · collect · Available on feature csv · Available on feature streaming

use polars::prelude::*;

let q = LazyCsvReader::new("docs/data/iris.csv")
    .has_header(true)
    .finish()?
    .filter(col("sepal_length").gt(lit(5)))
    .group_by(vec![col("species")])
    .agg([col("*").sum()]);

let df = q.collect();

Community

Polars has a very active community with frequent releases (approximately weekly). Below are some of the top contributors to the project:

Contribute

Thanks for taking the time to contribute! We appreciate all contributions, from reporting bugs to implementing new features. If you're unclear on how to proceed read our contribution guide or contact us on discord.

License

This project is licensed under the terms of the MIT license.