polars.DataFrame.write_parquet#

DataFrame.write_parquet(file: str | Path | BytesIO, *, compression: ParquetCompression = 'zstd', compression_level: int | None = None, statistics: bool = False, row_group_size: int | None = None, use_pyarrow: bool = False, pyarrow_options: dict[str, object] | None = None) None[source]#

Write to Apache Parquet file.

Parameters:
file

File path to which the file should be written.

compression{‘lz4’, ‘uncompressed’, ‘snappy’, ‘gzip’, ‘lzo’, ‘brotli’, ‘zstd’}

Choose “zstd” for good compression performance. Choose “lz4” for fast compression/decompression. Choose “snappy” for more backwards compatibility guarantees when you deal with older parquet readers.

compression_level

The level of compression to use. Higher compression means smaller files on disk.

  • “gzip” : min-level: 0, max-level: 10.

  • “brotli” : min-level: 0, max-level: 11.

  • “zstd” : min-level: 1, max-level: 22.

statistics

Write statistics to the parquet headers. This requires extra compute.

row_group_size

Size of the row groups in number of rows. Defaults to 512^2 rows.

use_pyarrow

Use C++ parquet implementation vs Rust parquet implementation. At the moment C++ supports more features.

pyarrow_options

Arguments passed to pyarrow.parquet.write_table.

Examples

>>> import pathlib
>>>
>>> df = pl.DataFrame(
...     {
...         "foo": [1, 2, 3, 4, 5],
...         "bar": [6, 7, 8, 9, 10],
...         "ham": ["a", "b", "c", "d", "e"],
...     }
... )
>>> path: pathlib.Path = dirpath / "new_file.parquet"
>>> df.write_parquet(path)