New LazyFrame from CSV

Description

Read a file from path into a polars LazyFrame.

Usage

pl_scan_csv(
  source,
  ...,
  has_header = TRUE,
  separator = ",",
  comment_prefix = NULL,
  quote_char = "\"",
  skip_rows = 0,
  dtypes = NULL,
  null_values = NULL,
  ignore_errors = FALSE,
  cache = FALSE,
  infer_schema_length = 100,
  n_rows = NULL,
  encoding = "utf8",
  low_memory = FALSE,
  rechunk = TRUE,
  skip_rows_after_header = 0,
  row_index_name = NULL,
  row_index_offset = 0,
  try_parse_dates = FALSE,
  eol_char = "\n",
  raise_if_empty = TRUE,
  truncate_ragged_lines = FALSE,
  reuse_downloaded = TRUE,
  include_file_paths = NULL
)

Arguments

`source`	Path to a file or URL. It is possible to provide multiple paths provided that all CSV files have the same schema. It is not possible to provide several URLs.
`…`	Ignored.
`has_header`	Indicate if the first row of dataset is a header or not.If `FALSE`, column names will be autogenerated in the following format: `“column_x”` `x` being an enumeration over every column in the dataset starting at 1.
`separator`	Single byte character to use as separator in the file.
`comment_prefix`	A string, which can be up to 5 symbols in length, used to indicate the start of a comment line. For instance, it can be set to `\#` or `//`.
`quote_char`	Single byte character used for quoting. Set to `NULL` to turn off special handling and escaping of quotes.
`skip_rows`	Start reading after a particular number of rows. The header will be parsed at this offset.
`dtypes`	Named list of column names - dtypes or dtype - column names. This list is used while reading to overwrite dtypes. Supported types so far are: "Boolean" or "logical" for DataType::Boolean, "Categorical" or "factor" for DataType::Categorical, "Float32" or "double" for DataType::Float32, "Float64" or "float64" for DataType::Float64, "Int32" or "integer" for DataType::Int32, "Int64" or "integer64" for DataType::Int64, "String" or "character" for DataType::String,
`null_values`	Values to interpret as `NA` values. Can be: a character vector: all values that match one of the values in this vector will be `NA`; a named list with column names and null values.
`ignore_errors`	Keep reading the file even if some lines yield errors. You can also use `infer_schema_length = 0` to read all columns as UTF8 to check which values might cause an issue.
`cache`	Cache the result after reading.
`infer_schema_length`	Maximum number of rows to read to infer the column types. If set to 0, all columns will be read as UTF-8. If `NULL`, a full table scan will be done (slow).
`n_rows`	Maximum number of rows to read.
`encoding`	Either `“utf8”` or `“utf8-lossy”`. Lossy means that invalid UTF8 values are replaced with "?" characters.
`low_memory`	Reduce memory usage (will yield a lower performance).
`rechunk`	Reallocate to contiguous memory when all chunks / files are parsed.
`skip_rows_after_header`	Parse the first row as headers, and then skip this number of rows.
`row_index_name`	If not `NULL`, this will insert a row index column with the given name into the DataFrame.
`row_index_offset`	Offset to start the row index column (only used if the name is set).
`try_parse_dates`	Try to automatically parse dates. Most ISO8601-like formats can be inferred, as well as a handful of others. If this does not succeed, the column remains of data type `pl$String`.
`eol_char`	Single byte end of line character (default: ). When encountering a file with Windows line endings (), one can go with the default . The extra `/code\> will be removed when processed.`
`raise_if_empty`	If `FALSE`, parsing an empty file returns an empty DataFrame or LazyFrame.
`truncate_ragged_lines`	Truncate lines that are longer than the schema.
`reuse_downloaded`	If `TRUE`(default) and a URL was provided, cache the downloaded files in session for an easy reuse.
`include_file_paths`	Include the path of the source file(s) as a column with this name.

Value

LazyFrame

Examples

library("polars")

my_file = tempfile()
write.csv(iris, my_file)
lazy_frame = pl$scan_csv(my_file)
lazy_frame$collect()

#> shape: (150, 6)
#> ┌─────┬──────────────┬─────────────┬──────────────┬─────────────┬───────────┐
#> │     ┆ Sepal.Length ┆ Sepal.Width ┆ Petal.Length ┆ Petal.Width ┆ Species   │
#> │ --- ┆ ---          ┆ ---         ┆ ---          ┆ ---         ┆ ---       │
#> │ i64 ┆ f64          ┆ f64         ┆ f64          ┆ f64         ┆ str       │
#> ╞═════╪══════════════╪═════════════╪══════════════╪═════════════╪═══════════╡
#> │ 1   ┆ 5.1          ┆ 3.5         ┆ 1.4          ┆ 0.2         ┆ setosa    │
#> │ 2   ┆ 4.9          ┆ 3.0         ┆ 1.4          ┆ 0.2         ┆ setosa    │
#> │ 3   ┆ 4.7          ┆ 3.2         ┆ 1.3          ┆ 0.2         ┆ setosa    │
#> │ 4   ┆ 4.6          ┆ 3.1         ┆ 1.5          ┆ 0.2         ┆ setosa    │
#> │ 5   ┆ 5.0          ┆ 3.6         ┆ 1.4          ┆ 0.2         ┆ setosa    │
#> │ …   ┆ …            ┆ …           ┆ …            ┆ …           ┆ …         │
#> │ 146 ┆ 6.7          ┆ 3.0         ┆ 5.2          ┆ 2.3         ┆ virginica │
#> │ 147 ┆ 6.3          ┆ 2.5         ┆ 5.0          ┆ 1.9         ┆ virginica │
#> │ 148 ┆ 6.5          ┆ 3.0         ┆ 5.2          ┆ 2.0         ┆ virginica │
#> │ 149 ┆ 6.2          ┆ 3.4         ┆ 5.4          ┆ 2.3         ┆ virginica │
#> │ 150 ┆ 5.9          ┆ 3.0         ┆ 5.1          ┆ 1.8         ┆ virginica │
#> └─────┴──────────────┴─────────────┴──────────────┴─────────────┴───────────┘

unlink(my_file)