polars.DataFrame.write_delta#

DataFrame.write_delta(target: str | Path | deltalake.DeltaTable, *, mode: Literal['error', 'append', 'overwrite', 'ignore'] = 'error', overwrite_schema: bool = False, storage_options: dict[str, str] | None = None, delta_write_options: dict[str, Any] | None = None) None[source]#

Write DataFrame as delta table.

Note: Some polars data types like Null, Categorical and Time are not supported by the delta protocol specification.

Parameters:
target

URI of a table or a DeltaTable object.

mode{‘error’, ‘append’, ‘overwrite’, ‘ignore’}

How to handle existing data.

  • If ‘error’, throw an error if the table already exists (default).

  • If ‘append’, will add new data.

  • If ‘overwrite’, will replace table with new data.

  • If ‘ignore’, will not write anything if table already exists.

overwrite_schema

If True, allows updating the schema of the table.

storage_options

Extra options for the storage backends supported by deltalake. For cloud storages, this may include configurations for authentication etc.

  • See a list of supported storage options for S3 here.

  • See a list of supported storage options for GCS here.

  • See a list of supported storage options for Azure here.

delta_write_options

Additional keyword arguments while writing a Delta lake Table. See a list of supported write options here.

Examples

Instantiate a basic dataframe:

>>> df = pl.DataFrame(
...     {
...         "foo": [1, 2, 3, 4, 5],
...         "bar": [6, 7, 8, 9, 10],
...         "ham": ["a", "b", "c", "d", "e"],
...     }
... )

Write DataFrame as a Delta Lake table on local filesystem.

>>> table_path = "/path/to/delta-table/"
>>> df.write_delta(table_path)  

Append data to an existing Delta Lake table on local filesystem. Note: This will fail if schema of the new data does not match the schema of existing table.

>>> df.write_delta(table_path, mode="append")  

Overwrite a Delta Lake table as a new version. Note: If the schema of the new and old data is same, then setting overwrite_schema is not required.

>>> existing_table_path = "/path/to/delta-table/"
>>> df.write_delta(
...     existing_table_path, mode="overwrite", overwrite_schema=True
... )  

Write DataFrame as a Delta Lake table on cloud object store like S3.

>>> table_path = "s3://bucket/prefix/to/delta-table/"
>>> df.write_delta(
...     table_path,
...     storage_options={
...         "AWS_REGION": "THE_AWS_REGION",
...         "AWS_ACCESS_KEY_ID": "THE_AWS_ACCESS_KEY_ID",
...         "AWS_SECRET_ACCESS_KEY": "THE_AWS_SECRET_ACCESS_KEY",
...     },
... )