Lazy API

With the lazy API Polars doesn't run each query line-by-line but instead processes the full query end-to-end. To get the most out of Polars it is important that you use the lazy API because:

  • the lazy API allows Polars to apply automatic query optimization with the query optimizer
  • the lazy API allows you to work with larger than memory datasets using streaming
  • the lazy API can catch schema errors before processing the data

The pages in this section cover:

Dataset

To demonstrate the lazy Polars capabilities we'll explore a medium-large dataset of usernames.

The Reddit usernames dataset contains over 69 million rows with data on Reddit users.

import polars as pl

from ..paths import DATA_DIR

dataset = pl.read_csv(f"{DATA_DIR}/reddit.csv", n_rows=10)
{{#include ../outputs/lazy_api/dataset.txt}}