polars.read_database#

polars.read_database(query: list[str] | str, connection_uri: str, *, partition_on: str | None = None, partition_range: tuple[int, int] | None = None, partition_num: int | None = None, protocol: str | None = None, engine: DbReadEngine = 'connectorx') DataFrame[source]#

Read a SQL query into a DataFrame.

Parameters:
query

Raw SQL query (or queries).

connection_uri

A connectorx compatible connection uri, for example

  • “postgresql://username:password@server:port/database”

partition_on

The column on which to partition the result.

partition_range

The value range of the partition column.

partition_num

How many partitions to generate.

protocol

Backend-specific transfer protocol directive; see connectorx documentation for details.

engine{‘connectorx’, ‘adbc’}

Select the engine used for reading the data.

Notes

Make sure to install connectorx>=0.3.1. Read the documentation here.

Examples

Read a DataFrame from a SQL query using a single thread:

>>> uri = "postgresql://username:password@server:port/database"
>>> query = "SELECT * FROM lineitem"
>>> pl.read_database(query, uri)  

Read a DataFrame in parallel using 10 threads by automatically partitioning the provided SQL on the partition column:

>>> uri = "postgresql://username:password@server:port/database"
>>> query = "SELECT * FROM lineitem"
>>> pl.read_database(
...     query, uri, partition_on="partition_col", partition_num=10
... )  

Read a DataFrame in parallel using 2 threads by explicitly providing two SQL queries:

>>> uri = "postgresql://username:password@server:port/database"
>>> queries = [
...     "SELECT * FROM lineitem WHERE partition_col <= 10",
...     "SELECT * FROM lineitem WHERE partition_col > 10",
... ]
>>> pl.read_database(queries, uri)