Introduction
While Polars does support writing queries in SQL, it's recommended that users familiarize themselves with the expression syntax for more readable and expressive code. As a primarily DataFrame library, new features will typically be added to the expression API first. However, if you already have an existing SQL codebase or prefer to use SQL, Polars also offers support for SQL queries.
Note
In Polars, there is no separate SQL engine because Polars translates SQL queries into expressions, which are then executed using its built-in execution engine. This approach ensures that Polars maintains its performance and scalability advantages as a native DataFrame library while still providing users with the ability to work with SQL queries.
Context
Polars uses the SQLContext
to manage SQL queries . The context contains a dictionary mapping DataFrames
and LazyFrames
names to their corresponding datasets1. The example below starts a SQLContext
:
ctx = pl.SQLContext()
Register Dataframes
There are 2 ways to register DataFrames in the SQLContext
:
- register all
LazyFrames
andDataFrames
in the global namespace - register them one by one
df = pl.DataFrame({"a": [1, 2, 3]})
lf = pl.LazyFrame({"b": [4, 5, 6]})
# Register all dataframes in the global namespace: registers both df and lf
ctx = pl.SQLContext(register_globals=True)
# Other option: register dataframe df as "df" and lazyframe lf as "lf"
ctx = pl.SQLContext(df=df, lf=lf)
We can also register Pandas DataFrames by converting them to Polars first.
import pandas as pd
df_pandas = pd.DataFrame({"c": [7, 8, 9]})
ctx = pl.SQLContext(df_pandas=pl.from_pandas(df_pandas))
Note
Converting a Pandas DataFrame backed by Numpy to Polars triggers a conversion to the Arrow format. This conversion has a computation cost. Converting a Pandas DataFrame backed by Arrow on the other hand will be free or almost free.
Once the SQLContext
is initialized, we can register additional Dataframes or unregister existing Dataframes with:
register
register_globals
register_many
unregister
Execute queries and collect results
SQL queries are always executed in lazy mode to benefit from lazy optimizations, so we have 2 options to collect the result:
- Set the parameter
eager_execution
to True inSQLContext
. With this parameter, Polars will automatically collect SQL results - Set the parameter
eager
to True when executing a query withexecute
, or collect the result withcollect
.
We execute SQL queries by calling execute
on a SQLContext
.
# For local files use scan_csv instead
pokemon = pl.read_csv(
"https://gist.githubusercontent.com/ritchie46/cac6b337ea52281aa23c049250a4ff03/raw/89a957ff3919d90e6ef2d34235e6bf22304f3366/pokemon.csv"
)
ctx = pl.SQLContext(register_globals=True, eager_execution=True)
df_small = ctx.execute("SELECT * from pokemon LIMIT 5")
print(df_small)
shape: (5, 13)
┌─────┬───────────────────────┬────────┬────────┬───┬─────────┬───────┬────────────┬───────────┐
│ # ┆ Name ┆ Type 1 ┆ Type 2 ┆ … ┆ Sp. Def ┆ Speed ┆ Generation ┆ Legendary │
│ --- ┆ --- ┆ --- ┆ --- ┆ ┆ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ str ┆ str ┆ ┆ i64 ┆ i64 ┆ i64 ┆ bool │
╞═════╪═══════════════════════╪════════╪════════╪═══╪═════════╪═══════╪════════════╪═══════════╡
│ 1 ┆ Bulbasaur ┆ Grass ┆ Poison ┆ … ┆ 65 ┆ 45 ┆ 1 ┆ false │
│ 2 ┆ Ivysaur ┆ Grass ┆ Poison ┆ … ┆ 80 ┆ 60 ┆ 1 ┆ false │
│ 3 ┆ Venusaur ┆ Grass ┆ Poison ┆ … ┆ 100 ┆ 80 ┆ 1 ┆ false │
│ 3 ┆ VenusaurMega Venusaur ┆ Grass ┆ Poison ┆ … ┆ 120 ┆ 80 ┆ 1 ┆ false │
│ 4 ┆ Charmander ┆ Fire ┆ null ┆ … ┆ 50 ┆ 65 ┆ 1 ┆ false │
└─────┴───────────────────────┴────────┴────────┴───┴─────────┴───────┴────────────┴───────────┘
Compatibility
Polars does not support the full SQL language, in Polars you are allowed to:
- Write a
CREATE
statementsCREATE TABLE xxx AS ...
- Write a
SELECT
statements with all generic elements (GROUP BY
,WHERE
,ORDER
,LIMIT
,JOIN
, ...) - Write Common Table Expressions (CTE's) (
WITH tablename AS
) - Show an overview of all tables
SHOW TABLES
The following is not yet supported:
INSERT
,UPDATE
orDELETE
statements- Table aliasing (e.g.
SELECT p.Name from pokemon AS p
) - Meta queries such as
ANALYZE
,EXPLAIN
In the upcoming sections we will cover each of the statements in more details.
-
Additionally it also tracks the common table expressions as well. ↩