Lazy API
With the lazy API Polars doesn't run each query line-by-line but instead processes the full query end-to-end. To get the most out of Polars it is important that you use the lazy API because:
- the lazy API allows Polars to apply automatic query optimization with the query optimizer
- the lazy API allows you to work with larger than memory datasets using streaming
- the lazy API can catch schema errors before processing the data
The pages in this section cover:
- Using the lazy API
- Schema in the lazy API
- Understanding the query plan
- Executing lazy queries
- Streaming larger-than-memory datasets
Dataset
To demonstrate the lazy Polars
capabilities we'll explore a medium-large
dataset of usernames.
The Reddit usernames dataset contains over 69 million rows with data on Reddit users.
import polars as pl
from ..paths import DATA_DIR
dataset = pl.read_csv(f"{DATA_DIR}/reddit.csv", n_rows=10)
{{#include ../outputs/lazy_api/dataset.txt}}