Convert a String column into a Date/Datetime/Time column.
Description
Similar to the strptime() function.
Usage
<Expr>$str$strptime(
dtype,
format = NULL,
...,
strict = TRUE,
exact = TRUE,
cache = TRUE,
ambiguous = c("raise", "earliest", "latest", "null")
)
Arguments
dtype
|
The data type to convert into. Can be either pl$Date,
pl$Datetime, or pl$Time.
|
format
|
Format to use for conversion. Refer to
the
chrono crate documentation for the full specification. Example:
“%Y-%m-%d %H:%M:%S”. If NULL (default), the
format is inferred from the data. Notice that time zone
%Z is not supported and will just
ignore timezones. Numeric time zones like
%z or
%:z are supported.
|
…
|
These dots are for future extensions and must be empty. |
strict
|
If TRUE (default), raise an error if a single string cannot
be parsed. If FALSE, produce a polars null.
|
exact
|
If TRUE (default), require an exact format match. If
FALSE, allow the format to match anywhere in the target
string. Conversion to the Time type is always exact. Note that using
exact = FALSE introduces a performance penalty - cleaning
your data beforehand will almost certainly be more performant.
|
cache
|
Use a cache of unique, converted dates to apply the datetime conversion. |
ambiguous
|
Determine how to deal with ambiguous datetimes. Character vector or
expression containing the followings:
|
Details
When parsing a Datetime the column precision will be inferred from the
format string, if given, e.g.: “%F %T%.3f” =>
pl$Datetime("ms"). If no fractional second component is
found then the default is “us” (microsecond).
Value
A polars expression
See Also
-
\$str$to_date() -
\$str$to_datetime() -
\$str$to_time() -
\$str$to_datetime()
Examples
library("polars")
# Dealing with a consistent format
df <- pl$DataFrame(x = c("2020-01-01 01:00Z", "2020-01-01 02:00Z"))
df$select(pl$col("x")$str$strptime(pl$Datetime(), "%Y-%m-%d %H:%M%#z"))
#> shape: (2, 1)
#> ┌─────────────────────────┐
#> │ x │
#> │ --- │
#> │ datetime[μs, UTC] │
#> ╞═════════════════════════╡
#> │ 2020-01-01 01:00:00 UTC │
#> │ 2020-01-01 02:00:00 UTC │
#> └─────────────────────────┘
# Dealing with different formats.
df <- pl$DataFrame(
date = c(
"2021-04-22",
"2022-01-04 00:00:00",
"01/31/22",
"Sun Jul 8 00:34:60 2001"
)
)
df$select(
pl$coalesce(
pl$col("date")$str$strptime(pl$Date, "%F", strict = FALSE),
pl$col("date")$str$strptime(pl$Date, "%F %T", strict = FALSE),
pl$col("date")$str$strptime(pl$Date, "%D", strict = FALSE),
pl$col("date")$str$strptime(pl$Date, "%c", strict = FALSE)
)
)
#> shape: (4, 1)
#> ┌────────────┐
#> │ date │
#> │ --- │
#> │ date │
#> ╞════════════╡
#> │ 2021-04-22 │
#> │ 2022-01-04 │
#> │ 2022-01-31 │
#> │ 2001-07-08 │
#> └────────────┘
# Ignore invalid time
df <- pl$DataFrame(
x = c(
"2023-01-01 11:22:33 -0100",
"2023-01-01 11:22:33 +0300",
"invalid time"
)
)
df$select(pl$col("x")$str$strptime(
pl$Datetime("ns"),
format = "%Y-%m-%d %H:%M:%S %z",
strict = FALSE
))
#> shape: (3, 1)
#> ┌─────────────────────────┐
#> │ x │
#> │ --- │
#> │ datetime[ns, UTC] │
#> ╞═════════════════════════╡
#> │ 2023-01-01 12:22:33 UTC │
#> │ 2023-01-01 08:22:33 UTC │
#> │ null │
#> └─────────────────────────┘