Using the lazy API

Here we see how to use the lazy API starting from either a file or an existing DataFrame.

Using the lazy API from a file

In the ideal case we use the lazy API right from a file as the query optimizer may help us to reduce the amount of data we read from the file.

We create a lazy query from the Reddit CSV data and apply some transformations.

By starting the query with pl.scan_csv we are using the lazy API.

import polars as pl

from ..paths import DATA_DIR

q1 = (
    pl.scan_csv(f"{DATA_DIR}/reddit.csv")
    .with_columns(pl.col("name").str.to_uppercase())
    .filter(pl.col("comment_karma") > 0)
)

A pl.scan_ function is available for a number of file types including CSV, Parquet, IPC and newline delimited JSON.

In this query we tell Polars that we want to:

  • load data from the Reddit CSV file
  • convert the name column to uppercase
  • apply a filter to the comment_karma column

The lazy query will not be executed at this point. See this page on executing lazy queries for more on running lazy queries.

Using the lazy API from a DataFrame

An alternative way to access the lazy API is to call .lazy on a DataFrame that has already been created in memory.

q3 = pl.DataFrame({"foo": ["a", "b", "c"], "bar": [0, 1, 2]}).lazy()

By calling .lazy we convert the DataFrame to a LazyFrame.