Input/output#

CSV#

read_csv(file[, has_header, columns, ...])

Read a CSV file into a DataFrame.

scan_csv(file[, has_header, sep, ...])

Lazily read from a CSV file or multiple files via glob patterns.

DataFrame.write_csv()

Write to comma-separated values (CSV) file.

Feather/ IPC#

read_ipc(file[, columns, n_rows, ...])

Read into a DataFrame from Arrow IPC (Feather v2) file.

scan_ipc(file[, n_rows, cache, rechunk, ...])

Lazily read from an Arrow IPC (Feather v2) file or multiple files via glob patterns.

read_ipc_schema(file)

Get a schema of the IPC file without reading data.

DataFrame.write_ipc(file[, compression])

Write to Arrow IPC binary stream or Feather file.

Parquet#

read_parquet(source[, columns, n_rows, ...])

Read into a DataFrame from a parquet file.

scan_parquet(file[, n_rows, cache, ...])

Lazily read from a parquet file or multiple files via glob patterns.

read_parquet_schema(file)

Get a schema of the Parquet file without reading data.

DataFrame.write_parquet(file, *[, ...])

Write to Apache Parquet file.

SQL#

read_sql(sql, connection_uri[, ...])

Read a SQL query into a DataFrame.

JSON#

read_json(file[, json_lines])

Read into a DataFrame from a JSON file.

read_ndjson(file)

Read into a DataFrame from a newline delimited JSON file.

scan_ndjson(file[, infer_schema_length, ...])

Lazily read from a newline delimited JSON file.

DataFrame.write_json()

Serialize to JSON representation.

DataFrame.write_ndjson()

Serialize to newline delimited JSON representation.

AVRO#

read_avro(file[, columns, n_rows])

Read into a DataFrame from Apache Avro format.

DataFrame.write_avro(file[, compression])

Write to Apache Avro file.

Excel#

read_excel()

Read Excel (XLSX) sheet into a DataFrame.

Datasets#

Connect to pyarrow datasets.

scan_ds(ds)

Scan a pyarrow dataset.