New LazyFrame from CSV
Description
Read a file from path into a polars LazyFrame.
Usage
pl_scan_csv(
source,
...,
has_header = TRUE,
separator = ",",
comment_prefix = NULL,
quote_char = "\"",
skip_rows = 0,
dtypes = NULL,
null_values = NULL,
ignore_errors = FALSE,
cache = FALSE,
infer_schema_length = 100,
n_rows = NULL,
encoding = "utf8",
low_memory = FALSE,
rechunk = TRUE,
skip_rows_after_header = 0,
row_index_name = NULL,
row_index_offset = 0,
try_parse_dates = FALSE,
eol_char = "\n",
raise_if_empty = TRUE,
truncate_ragged_lines = FALSE,
reuse_downloaded = TRUE,
include_file_paths = NULL
)
Arguments
source
|
Path to a file or URL. It is possible to provide multiple paths provided that all CSV files have the same schema. It is not possible to provide several URLs. |
…
|
Ignored. |
has_header
|
Indicate if the first row of dataset is a header or not.If
FALSE , column names will be autogenerated in the following
format: “column_x” x being an enumeration over
every column in the dataset starting at 1.
|
separator
|
Single byte character to use as separator in the file. |
comment_prefix
|
A string, which can be up to 5 symbols in length, used to indicate the
start of a comment line. For instance, it can be set to
\# or
// .
|
quote_char
|
Single byte character used for quoting. Set to NULL to turn
off special handling and escaping of quotes.
|
skip_rows
|
Start reading after a particular number of rows. The header will be parsed at this offset. |
dtypes
|
Named list of column names - dtypes or dtype - column names. This list
is used while reading to overwrite dtypes. Supported types so far are:
|
null_values
|
Values to interpret as NA values. Can be:
|
ignore_errors
|
Keep reading the file even if some lines yield errors. You can also use
infer_schema_length = 0 to read all columns as UTF8 to
check which values might cause an issue.
|
cache
|
Cache the result after reading. |
infer_schema_length
|
Maximum number of rows to read to infer the column types. If set to 0,
all columns will be read as UTF-8. If NULL , a full table
scan will be done (slow).
|
n_rows
|
Maximum number of rows to read. |
encoding
|
Either “utf8” or “utf8-lossy” . Lossy means
that invalid UTF8 values are replaced with "?" characters.
|
low_memory
|
Reduce memory usage (will yield a lower performance). |
rechunk
|
Reallocate to contiguous memory when all chunks / files are parsed. |
skip_rows_after_header
|
Parse the first row as headers, and then skip this number of rows. |
row_index_name
|
If not NULL , this will insert a row index column with the
given name into the DataFrame.
|
row_index_offset
|
Offset to start the row index column (only used if the name is set). |
try_parse_dates
|
Try to automatically parse dates. Most ISO8601-like formats can be
inferred, as well as a handful of others. If this does not succeed, the
column remains of data type pl$String .
|
eol_char
|
Single byte end of line character (default:
). When encountering a file with
Windows line endings ( ), one can
go with the default . The extra
/code\> will be removed when processed.
|
raise_if_empty
|
If FALSE , parsing an empty file returns an empty DataFrame
or LazyFrame.
|
truncate_ragged_lines
|
Truncate lines that are longer than the schema. |
reuse_downloaded
|
If TRUE (default) and a URL was provided, cache the
downloaded files in session for an easy reuse.
|
include_file_paths
|
Include the path of the source file(s) as a column with this name. |
Value
LazyFrame
Examples
library("polars")
my_file = tempfile()
write.csv(iris, my_file)
lazy_frame = pl$scan_csv(my_file)
lazy_frame$collect()
#> shape: (150, 6)
#> ┌─────┬──────────────┬─────────────┬──────────────┬─────────────┬───────────┐
#> │ ┆ Sepal.Length ┆ Sepal.Width ┆ Petal.Length ┆ Petal.Width ┆ Species │
#> │ --- ┆ --- ┆ --- ┆ --- ┆ --- ┆ --- │
#> │ i64 ┆ f64 ┆ f64 ┆ f64 ┆ f64 ┆ str │
#> ╞═════╪══════════════╪═════════════╪══════════════╪═════════════╪═══════════╡
#> │ 1 ┆ 5.1 ┆ 3.5 ┆ 1.4 ┆ 0.2 ┆ setosa │
#> │ 2 ┆ 4.9 ┆ 3.0 ┆ 1.4 ┆ 0.2 ┆ setosa │
#> │ 3 ┆ 4.7 ┆ 3.2 ┆ 1.3 ┆ 0.2 ┆ setosa │
#> │ 4 ┆ 4.6 ┆ 3.1 ┆ 1.5 ┆ 0.2 ┆ setosa │
#> │ 5 ┆ 5.0 ┆ 3.6 ┆ 1.4 ┆ 0.2 ┆ setosa │
#> │ … ┆ … ┆ … ┆ … ┆ … ┆ … │
#> │ 146 ┆ 6.7 ┆ 3.0 ┆ 5.2 ┆ 2.3 ┆ virginica │
#> │ 147 ┆ 6.3 ┆ 2.5 ┆ 5.0 ┆ 1.9 ┆ virginica │
#> │ 148 ┆ 6.5 ┆ 3.0 ┆ 5.2 ┆ 2.0 ┆ virginica │
#> │ 149 ┆ 6.2 ┆ 3.4 ┆ 5.4 ┆ 2.3 ┆ virginica │
#> │ 150 ┆ 5.9 ┆ 3.0 ┆ 5.1 ┆ 1.8 ┆ virginica │
#> └─────┴──────────────┴─────────────┴──────────────┴─────────────┴───────────┘