Input/output#

Avro#

read_avro(source, *[, columns, n_rows])

Read into a DataFrame from Apache Avro format.

DataFrame.write_avro(file[, compression, name])

Write to Apache Avro file.

Clipboard#

read_clipboard([separator])

Read text from clipboard and pass to read_csv.

DataFrame.write_clipboard(*[, separator])

Copy DataFrame in csv format to the system clipboard with write_csv.

CSV#

read_csv(source, *[, has_header, columns, ...])

Read a CSV file into a DataFrame.

read_csv_batched(source, *[, has_header, ...])

Read a CSV file in batches.

scan_csv(source, *[, has_header, separator, ...])

Lazily read from a CSV file or multiple files via glob patterns.

DataFrame.write_csv([file, include_bom, ...])

Write to comma-separated values (CSV) file.

LazyFrame.sink_csv(path, *[, include_bom, ...])

Evaluate the query in streaming mode and write to a CSV file.

BatchedCsvReader.next_batches(n)

Read n batches from the reader.

Database#

read_database(query, connection, *[, ...])

Read the results of a SQL query into a DataFrame, given a connection object.

read_database_uri(query, uri, *[, ...])

Read the results of a SQL query into a DataFrame, given a URI.

DataFrame.write_database(table_name, ...[, ...])

Write a polars frame to a database.

Delta Lake#

read_delta(source, *[, version, columns, ...])

Reads into a DataFrame from a Delta lake table.

scan_delta(source, *[, version, ...])

Lazily read from a Delta lake table.

DataFrame.write_delta(target, *[, mode, ...])

Write DataFrame as delta table.

Excel / ODS#

read_excel(source, *[, sheet_id, ...])

Read Excel spreadsheet data into a DataFrame.

read_ods(source, *[, sheet_id, sheet_name, ...])

Read OpenOffice (ODS) spreadsheet data into a DataFrame.

DataFrame.write_excel([workbook, worksheet, ...])

Write frame data to a table in an Excel workbook/worksheet.

Feather / IPC#

read_ipc(source, *[, columns, n_rows, ...])

Read into a DataFrame from Arrow IPC (Feather v2) file.

read_ipc_schema(source)

Get the schema of an IPC file without reading data.

read_ipc_stream(source, *[, columns, ...])

Read into a DataFrame from Arrow IPC record batch stream.

scan_ipc(source, *[, n_rows, cache, ...])

Lazily read from an Arrow IPC (Feather v2) file or multiple files via glob patterns.

DataFrame.write_ipc(file[, compression, future])

Write to Arrow IPC binary stream or Feather file.

DataFrame.write_ipc_stream(file[, compression])

Write to Arrow IPC record batch stream.

LazyFrame.sink_ipc(path, *[, compression, ...])

Evaluate the query in streaming mode and write to an IPC file.

Iceberg#

scan_iceberg(source, *[, storage_options])

Lazily read from an Apache Iceberg table.

JSON#

read_json(source, *[, schema, ...])

Read into a DataFrame from a JSON file.

read_ndjson(source, *[, schema, ...])

Read into a DataFrame from a newline delimited JSON file.

scan_ndjson(source, *[, schema, ...])

Lazily read from a newline delimited JSON file or multiple files via glob patterns.

DataFrame.write_json([file, pretty, ...])

Serialize to JSON representation.

DataFrame.write_ndjson([file])

Serialize to newline delimited JSON representation.

LazyFrame.sink_ndjson(path, *[, ...])

Evaluate the query in streaming mode and write to an NDJSON file.

Parquet#

read_parquet(source, *[, columns, n_rows, ...])

Read into a DataFrame from a parquet file.

read_parquet_schema(source)

Get the schema of a Parquet file without reading data.

scan_parquet(source, *[, n_rows, ...])

Lazily read from a local or cloud-hosted parquet file (or files).

DataFrame.write_parquet(file, *[, ...])

Write to Apache Parquet file.

LazyFrame.sink_parquet(path, *[, ...])

Evaluate the query in streaming mode and write to a Parquet file.

PyArrow Datasets#

Connect to pyarrow datasets.

scan_pyarrow_dataset(source, *[, ...])

Scan a pyarrow dataset.