DataFrame.write_parquet(file: str | Path | BytesIO, *, compression: ParquetCompression = 'zstd', compression_level: int | None = None, statistics: bool = False, row_group_size: int | None = None, use_pyarrow: bool = False, pyarrow_options: dict[str, object] | None = None) None[source]#

Write to Apache Parquet file.


File path to which the file should be written.

compression{‘lz4’, ‘uncompressed’, ‘snappy’, ‘gzip’, ‘lzo’, ‘brotli’, ‘zstd’}

Choose “zstd” for good compression performance. Choose “lz4” for fast compression/decompression. Choose “snappy” for more backwards compatibility guarantees when you deal with older parquet readers. Method “uncompressed” is not supported by pyarrow.


The level of compression to use. Higher compression means smaller files on disk.

  • “gzip” : min-level: 0, max-level: 10.

  • “brotli” : min-level: 0, max-level: 11.

  • “zstd” : min-level: 1, max-level: 22.


Write statistics to the parquet headers. This requires extra compute.


Size of the row groups in number of rows. If None (default), the chunks of the DataFrame are used. Writing in smaller chunks may reduce memory pressure and improve writing speeds. This argument has no effect if ‘pyarrow’ is used.


Use C++ parquet implementation vs rust parquet implementation. At the moment C++ supports more features.


Arguments passed to pyarrow.parquet.write_table.