readable stream containing csv data
Optional
options: Partial<ReadCsvOptions>Maximum number of lines to read to infer schema. If set to 0, all columns will be read as pl.Utf8.
If set to null
, a full table scan will be done (slow).
Number of lines to read into the buffer at once. Modify this to change performance.
Indicate if first row of dataset is header or not. If set to False first row will be set to column_x
,
x
being an enumeration over every column in the dataset.
Try to keep reading lines if some lines yield errors.
After n rows are read from the CSV, it stops reading.
During multi-threaded parsing, an upper bound of n
rows
cannot be guaranteed.
Start reading after startRows
position.
Indices of columns to select. Note that column indices start at zero.
Character to use as delimiter in the file.
Columns to select.
Make sure that all columns are contiguous in memory by aggregating the chunks into a single array.
Allowed encodings: utf8
, utf8-lossy
. Lossy means that invalid utf8 values are replaced with �
character.
Number of threads to use in csv parsing. Defaults to the number of physical cpu's of your system.
Overwrite the dtypes during inference.
Reduce memory usage in expense of performance.
character that indicates the start of a comment line, for instance '#'.
character that is used for csv quoting, default = ''. Set to null to turn special handling and escaping of quotes off.
Values to interpret as null values. You can provide a
- string
-> all values encountered equal to this string will be null
- Array<string>
-> A null value per column.
- Record<string,string>
-> An object or map that maps column name to a null value string.Ex. {"column_1": 0}
Whether to attempt to parse dates or not
Promise
>>> const readStream = new Stream.Readable({read(){}});
>>> readStream.push(`a,b\n`);
>>> readStream.push(`1,2\n`);
>>> readStream.push(`2,2\n`);
>>> readStream.push(`3,2\n`);
>>> readStream.push(`4,2\n`);
>>> readStream.push(null);
>>> pl.readCSVStream(readStream).then(df => console.log(df));
shape: (4, 2)
┌─────┬─────┐
│ a ┆ b │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1 ┆ 2 │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 2 ┆ 2 │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 3 ┆ 2 │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 4 ┆ 2 │
└─────┴─────┘
Read a stream into a Dataframe.
Warning: this is much slower than
scanCSV
orreadCSV
This will consume the entire stream into a single buffer and then call
readCSV
Only use it when you must consume from a stream, or when performance is not a major consideration