path to a file
Optional
options: Partial<ScanCsvOptions>Indicate if first row of dataset is header or not. If set to False first row will be set to column_x
,
x
being an enumeration over every column in the dataset.
Character to use as delimiter in the file.
character that indicates the start of a comment line, for instance '#'.
character that is used for csv quoting, default = ''. Set to null to turn special handling and escaping of quotes off.
Start reading after skipRows
position.
Values to interpret as null values. You can provide a
- string
-> all values encountered equal to this string will be null
- Array<string>
-> A null value per column.
- Record<string,string>
-> An object or map that maps column name to a null value string.Ex. {"column_1": 0}
Try to keep reading lines if some lines yield errors.
Cache the result after reading.
Maximum number of lines to read to infer schema. If set to 0, all columns will be read as pl.Utf8.
If set to null
, a full table scan will be done (slow).
After n rows are read from the CSV, it stops reading.
During multi-threaded parsing, an upper bound of n
rows
cannot be guaranteed.
Make sure that all columns are contiguous in memory by aggregating the chunks into a single array.
Reduce memory usage in expense of performance.
Lazily read from a CSV file or multiple files via glob patterns.
This allows the query optimizer to push down predicates and projections to the scan level, thereby potentially reducing memory overhead.