polars.DataFrame.write_parquet#

DataFrame.write_parquet(file: str | Path | BytesIO, *, compression: ParquetCompression = 'zstd', compression_level: int | None = None, statistics: bool = False, row_group_size: int | None = None, use_pyarrow: bool = False, pyarrow_options: dict[str, object] | None = None) None[source]#

Write to Apache Parquet file.

Parameters:
file

File path to which the file should be written.

compression{‘lz4’, ‘uncompressed’, ‘snappy’, ‘gzip’, ‘lzo’, ‘brotli’, ‘zstd’}

Choose “zstd” for good compression performance. Choose “lz4” for fast compression/decompression. Choose “snappy” for more backwards compatibility guarantees when you deal with older parquet readers. Method “uncompressed” is not supported by pyarrow.

compression_level

The level of compression to use. Higher compression means smaller files on disk.

  • “gzip” : min-level: 0, max-level: 10.

  • “brotli” : min-level: 0, max-level: 11.

  • “zstd” : min-level: 1, max-level: 22.

statistics

Write statistics to the parquet headers. This requires extra compute.

row_group_size

Size of the row groups in number of rows. If None (default), the chunks of the DataFrame are used. Writing in smaller chunks may reduce memory pressure and improve writing speeds. This argument has no effect if ‘pyarrow’ is used.

use_pyarrow

Use C++ parquet implementation vs rust parquet implementation. At the moment C++ supports more features.

pyarrow_options

Arguments passed to pyarrow.parquet.write_table.