Skip to content

Databases

Read from a database

Polars can read from a database using the pl.read_database_uri and pl.read_database functions.

Difference between read_database_uri and read_database

Use pl.read_database_uri if you want to specify the database connection with a connection string called a uri. For example, the following snippet shows a query to read all columns from the foo table in a Postgres database where we use the uri to connect:

read_database_uri

import polars as pl

uri = "postgresql://username:password@server:port/database"
query = "SELECT * FROM foo"

pl.read_database_uri(query=query, uri=uri)

On the other hand, use pl.read_database if you want to connect via a connection engine created with a library like SQLAlchemy.

read_database

import polars as pl
from sqlalchemy import create_engine

conn = create_engine(f"sqlite:///test.db")

query = "SELECT * FROM foo"

pl.read_database(query=query, connection=conn.connect())

Note that pl.read_database_uri is likely to be faster than pl.read_database if you are using a SQLAlchemy or DBAPI2 connection as these connections may load the data row-wise into Python before copying the data again to the column-wise Apache Arrow format.

Engines

Polars doesn't manage connections and data transfer from databases by itself. Instead, external libraries (known as engines) handle this.

When using pl.read_database, you specify the engine when you create the connection object. When using pl.read_database_uri, you can specify one of two engines to read from the database:

Both engines have native support for Apache Arrow and so can read data directly into a Polars DataFrame without copying the data.

ConnectorX

ConnectorX is the default engine and supports numerous databases including Postgres, Mysql, SQL Server and Redshift. ConnectorX is written in Rust and stores data in Arrow format to allow for zero-copy to Polars.

To read from one of the supported databases with ConnectorX you need to activate the additional dependency ConnectorX when installing Polars or install it manually with

$ pip install connectorx

ADBC

ADBC (Arrow Database Connectivity) is an engine supported by the Apache Arrow project. ADBC aims to be both an API standard for connecting to databases and libraries implementing this standard in a range of languages.

It is still early days for ADBC so support for different databases is limited. At present, drivers for ADBC are only available for Postgres, SQLite and Snowflake. To install ADBC, you need to install the driver for your database. For example, to install the driver for SQLite, you run:

$ pip install adbc-driver-sqlite

As ADBC is not the default engine, you must specify the engine as an argument to pl.read_database_uri.

read_database_uri

uri = "postgresql://username:password@server:port/database"
query = "SELECT * FROM foo"

pl.read_database_uri(query=query, uri=uri, engine="adbc")

Write to a database

We can write to a database with Polars using the pl.write_database function.

Engines

As with reading from a database above, Polars uses an engine to write to a database. The currently supported engines are:

  • SQLAlchemy and
  • Arrow Database Connectivity (ADBC)

SQLAlchemy

With the default engine SQLAlchemy you can write to any database supported by SQLAlchemy. To use this engine you need to install SQLAlchemy and Pandas

$ pip install SQLAlchemy pandas

In this example, we write the DataFrame to a table called records in the database

write_database

uri = "postgresql://username:password@server:port/database"
df = pl.DataFrame({"foo": [1, 2, 3]})

df.write_database(table_name="records",  connection=uri)

In the SQLAlchemy approach, Polars converts the DataFrame to a Pandas DataFrame backed by PyArrow and then uses SQLAlchemy methods on a Pandas DataFrame to write to the database.

ADBC

ADBC can also be used to write to a database. Writing is supported for the same databases that support reading with ADBC. As shown above, you need to install the appropriate ADBC driver for your database.

write_database

uri = "postgresql://username:password@server:port/database"
df = pl.DataFrame({"foo": [1, 2, 3]})

df.write_database(table_name="records", connection=uri, engine="adbc")