polars.read_sql

polars.read_sql(sql: Union[List[str], str], connection_uri: str, partition_on: Optional[str] = None, partition_range: Optional[Tuple[int, int]] = None, partition_num: Optional[int] = None, protocol: Optional[str] = None) polars.internals.frame.DataFrame

Read a SQL query into a DataFrame. Make sure to install connectorx>=0.2

# Sources Supports reading a sql query from the following data sources:

  • Postgres

  • Mysql

  • Sqlite

  • Redshift (through postgres protocol)

  • Clickhouse (through mysql protocol)

## Source not supported? If a database source is not supported, pandas can be used to load the query:

>>> import pandas as pd
>>> df = pl.from_pandas(pd.read_sql(sql, engine))  
Parameters
sql

raw sql query.

connection_uri
connectorx connection uri:
  • “postgresql://username:password@server:port/database”

partition_on

the column on which to partition the result.

partition_range

the value range of the partition column.

partition_num

how many partitions to generate.

protocol

backend-specific transfer protocol directive; see connectorx documentation for details.

Examples

## Single threaded Read a DataFrame from a SQL query using a single thread:

>>> uri = "postgresql://username:password@server:port/database"
>>> query = "SELECT * FROM lineitem"
>>> pl.read_sql(query, uri)  

## Using 10 threads Read a DataFrame in parallel using 10 threads by automatically partitioning the provided SQL on the partition column:

>>> uri = "postgresql://username:password@server:port/database"
>>> query = "SELECT * FROM lineitem"
>>> pl.read_sql(
...     query, uri, partition_on="partition_col", partition_num=10
... )  

## Using Read a DataFrame in parallel using 2 threads by explicitly providing two SQL queries:

>>> uri = "postgresql://username:password@server:port/database"
>>> queries = [
...     "SELECT * FROM lineitem WHERE partition_col <= 10",
...     "SELECT * FROM lineitem WHERE partition_col > 10",
... ]
>>> pl.read_sql(uri, queries)