polars.read_sql(sql: list[str] | str, connection_uri: str, partition_on: Optional[str] = None, partition_range: Optional[tuple[int, int]] = None, partition_num: Optional[int] = None, protocol: Optional[str] = None) DataFrame[source]#

Read a SQL query into a DataFrame.

Reading a SQL query from the following data sources are supported:

  • Postgres

  • Mysql

  • Sqlite

  • Redshift (through postgres protocol)

  • Clickhouse (through mysql protocol)

If a database source is not supported, an alternative solution is to first use pandas to load the SQL query, then converting the result into a polars DataFrame:

>>> import pandas as pd
>>> df = pl.from_pandas(pd.read_sql(sql, engine))  

Raw SQL query / queries.


Connectorx connection uri, for example

  • “postgresql://username:password@server:port/database”


The column on which to partition the result.


The value range of the partition column.


How many partitions to generate.


Backend-specific transfer protocol directive; see connectorx documentation for details.


Make sure to install connectorx>=0.2.2. Read the documentation here.


Read a DataFrame from a SQL query using a single thread:

>>> uri = "postgresql://username:password@server:port/database"
>>> query = "SELECT * FROM lineitem"
>>> pl.read_sql(query, uri)  

Read a DataFrame in parallel using 10 threads by automatically partitioning the provided SQL on the partition column:

>>> uri = "postgresql://username:password@server:port/database"
>>> query = "SELECT * FROM lineitem"
>>> pl.read_sql(
...     query, uri, partition_on="partition_col", partition_num=10
... )  

Read a DataFrame in parallel using 2 threads by explicitly providing two SQL queries:

>>> uri = "postgresql://username:password@server:port/database"
>>> queries = [
...     "SELECT * FROM lineitem WHERE partition_col <= 10",
...     "SELECT * FROM lineitem WHERE partition_col > 10",
... ]
>>> pl.read_sql(queries, uri)