Skip to content

New DataFrame from CSV

Description

New DataFrame from CSV

Usage

pl_read_csv(
  source,
  ...,
  has_header = TRUE,
  separator = ",",
  comment_prefix = NULL,
  quote_char = "\"",
  skip_rows = 0,
  dtypes = NULL,
  null_values = NULL,
  ignore_errors = FALSE,
  cache = FALSE,
  infer_schema_length = 100,
  n_rows = NULL,
  encoding = "utf8",
  low_memory = FALSE,
  rechunk = TRUE,
  skip_rows_after_header = 0,
  row_index_name = NULL,
  row_index_offset = 0,
  try_parse_dates = FALSE,
  eol_char = "\n",
  raise_if_empty = TRUE,
  truncate_ragged_lines = FALSE,
  reuse_downloaded = TRUE,
  include_file_paths = NULL
)

Arguments

source Path to a file or URL. It is possible to provide multiple paths provided that all CSV files have the same schema. It is not possible to provide several URLs.
Ignored.
has_header Indicate if the first row of dataset is a header or not.If FALSE, column names will be autogenerated in the following format: “column_x” x being an enumeration over every column in the dataset starting at 1.
separator Single byte character to use as separator in the file.
comment_prefix A string, which can be up to 5 symbols in length, used to indicate the start of a comment line. For instance, it can be set to \# or //.
quote_char Single byte character used for quoting. Set to NULL to turn off special handling and escaping of quotes.
skip_rows Start reading after a particular number of rows. The header will be parsed at this offset.
dtypes Named list of column names - dtypes or dtype - column names. This list is used while reading to overwrite dtypes. Supported types so far are:
  • "Boolean" or "logical" for DataType::Boolean,
  • "Categorical" or "factor" for DataType::Categorical,
  • "Float32" or "double" for DataType::Float32,
  • "Float64" or "float64" for DataType::Float64,
  • "Int32" or "integer" for DataType::Int32,
  • "Int64" or "integer64" for DataType::Int64,
  • "String" or "character" for DataType::String,
null_values Values to interpret as NA values. Can be:
  • a character vector: all values that match one of the values in this vector will be NA;
  • a named list with column names and null values.
ignore_errors Keep reading the file even if some lines yield errors. You can also use infer_schema_length = 0 to read all columns as UTF8 to check which values might cause an issue.
cache Cache the result after reading.
infer_schema_length Maximum number of rows to read to infer the column types. If set to 0, all columns will be read as UTF-8. If NULL, a full table scan will be done (slow).
n_rows Maximum number of rows to read.
encoding Either “utf8” or “utf8-lossy”. Lossy means that invalid UTF8 values are replaced with "?" characters.
low_memory Reduce memory usage (will yield a lower performance).
rechunk Reallocate to contiguous memory when all chunks / files are parsed.
skip_rows_after_header Parse the first row as headers, and then skip this number of rows.
row_index_name If not NULL, this will insert a row index column with the given name into the DataFrame.
row_index_offset Offset to start the row index column (only used if the name is set).
try_parse_dates Try to automatically parse dates. Most ISO8601-like formats can be inferred, as well as a handful of others. If this does not succeed, the column remains of data type pl$String.
eol_char Single byte end of line character (default: ). When encountering a file with Windows line endings (), one can go with the default . The extra /code\> will be removed when processed.
raise_if_empty If FALSE, parsing an empty file returns an empty DataFrame or LazyFrame.
truncate_ragged_lines Truncate lines that are longer than the schema.
reuse_downloaded If TRUE(default) and a URL was provided, cache the downloaded files in session for an easy reuse.
include_file_paths Include the path of the source file(s) as a column with this name.

Value

DataFrame