Polars SQL

Starting the SQL Context

You can query a Polars LazyFrame with SQL. The first step is to initialize a SQL context, and register a LazyFrame with it.

Let's load some data and initialize the SQL context:

import polars as pl

# convert 'pokemon' into a Lazyframe by calling the .lazy() method
pokemon = pl.read_csv(
    "https://gist.githubusercontent.com/ritchie46/cac6b337ea52281aa23c049250a4ff03/raw/89a957ff3919d90e6ef2d34235e6bf22304f3366/pokemon.csv"
).lazy()

# initialize the SQL context and register the lazyframe
sql = pl.SQLContext()
sql.register("pokemon", pokemon)

Polars supports a single SQL context per thread, and the registered dataframe should be a LazyFrame. You can call the register function multiple time for each of your LazyFrame.

Running your SQL queries 🚀🚀

You run your SQL queries with SQLContext.query.

out = sql.query(
    """
SELECT 
    "Type 1",
    COUNT(DISTINCT "Type 2") AS count_type_2,
    AVG(Attack) AS avg_attack_by_type,
    MAX(Speed) AS max_speed
FROM pokemon
GROUP BY "Type 1"
"""
)
shape: (15, 4)
┌──────────┬──────────────┬────────────────────┬───────────┐
│ Type 1   ┆ count_type_2 ┆ avg_attack_by_type ┆ max_speed │
│ ---      ┆ ---          ┆ ---                ┆ ---       │
│ str      ┆ u32          ┆ f64                ┆ i64       │
╞══════════╪══════════════╪════════════════════╪═══════════╡
│ Electric ┆ 3            ┆ 62.0               ┆ 140       │
│ Normal   ┆ 3            ┆ 70.625             ┆ 121       │
│ Ground   ┆ 2            ┆ 81.875             ┆ 120       │
│ …        ┆ …            ┆ …                  ┆ …         │
│ Ghost    ┆ 1            ┆ 53.75              ┆ 130       │
│ Rock     ┆ 3            ┆ 87.5               ┆ 150       │
│ Fairy    ┆ 1            ┆ 57.5               ┆ 60        │
└──────────┴──────────────┴────────────────────┴───────────┘