Using the lazy API
Here we see how to use the lazy API starting from either a file or an existing DataFrame
.
Using the lazy API from a file
In the ideal case we use the lazy API right from a file as the query optimizer may help us to reduce the amount of data we read from the file.
We create a lazy query from the Reddit CSV data and apply some transformations.
By starting the query with pl.scan_csv
we are using the lazy API.
import polars as pl
from ..paths import DATA_DIR
q1 = (
pl.scan_csv(f"{DATA_DIR}/reddit.csv")
.with_columns(pl.col("name").str.to_uppercase())
.filter(pl.col("comment_karma") > 0)
)
A pl.scan_
function is available for a number of file types including CSV, Parquet, IPC and newline delimited JSON.
In this query we tell Polars that we want to:
- load data from the Reddit CSV file
- convert the
name
column to uppercase - apply a filter to the
comment_karma
column
The lazy query will not be executed at this point. See this page on executing lazy queries for more on running lazy queries.
Using the lazy API from a DataFrame
An alternative way to access the lazy API is to call .lazy
on a DataFrame
that has already been created in memory.
q3 = pl.DataFrame({"foo": ["a", "b", "c"], "bar": [0, 1, 2]}).lazy()
By calling .lazy
we convert the DataFrame
to a LazyFrame
.