Skip to content

Convert a String Series into a Date/Datetime/Time Series

Source code

Description

Similar to the strptime() function.

Usage

series_str_strptime(
  dtype,
  format = NULL,
  ...,
  strict = TRUE,
  exact = TRUE,
  cache = TRUE,
  ambiguous = c("raise", "earliest", "latest", "null")
)

Arguments

dtype The data type to convert into. Can be either pl$Date, pl$Datetime, or pl$Time.
format Format to use for conversion. Refer to the chrono crate documentation for the full specification. Example: “%Y-%m-%d %H:%M:%S”. If NULL (default), the format is inferred from the data. Notice that time zone %Z is not supported and will just ignore timezones. Numeric time zones like %z or %:z are supported.
These dots are for future extensions and must be empty.
strict If TRUE (default), raise an error if a single string cannot be parsed. If FALSE, produce a polars null.
exact If TRUE (default), require an exact format match. If FALSE, allow the format to match anywhere in the target string. Note that using exact = FALSE introduces a performance penalty - cleaning your data beforehand will almost certainly be more performant.
cache Use a cache of unique, converted dates to apply the datetime conversion.
ambiguous Determine how to deal with ambiguous datetimes. Character vector or Series containing the followings:
  • “raise” (default): Throw an error
  • “earliest”: Use the earliest datetime
  • “latest”: Use the latest datetime
  • “null”: Return a null value

Details

When parsing a Datetime the column precision will be inferred from the format string, if given, e.g.: “%F %T%.3f” => pl$Datetime("ms"). If no fractional second component is found then the default is “us” (microsecond).

Value

A polars Series

Examples

library("polars")

s1 <- as_polars_series(c("2020-01-01 01:00Z", "2020-01-01 02:00Z"))

s1$str$strptime(pl$Datetime(), "%Y-%m-%d %H:%M%#z")
#> shape: (2, 1)
#> ┌─────────────────────────┐
#> │                         │
#> │ ---                     │
#> │ datetime[μs, UTC]       │
#> ╞═════════════════════════╡
#> │ 2020-01-01 01:00:00 UTC │
#> │ 2020-01-01 02:00:00 UTC │
#> └─────────────────────────┘
# Auto infer format
s1$str$strptime(pl$Datetime())
#> shape: (2, 1)
#> ┌─────────────────────────┐
#> │                         │
#> │ ---                     │
#> │ datetime[μs, UTC]       │
#> ╞═════════════════════════╡
#> │ 2020-01-01 01:00:00 UTC │
#> │ 2020-01-01 02:00:00 UTC │
#> └─────────────────────────┘
# Datetime with timezone is interpreted as UTC timezone
s2 <- as_polars_series(c("2020-01-01T01:00:00+09:00"))
s2$str$strptime(pl$Datetime())
#> shape: (1, 1)
#> ┌─────────────────────────┐
#> │                         │
#> │ ---                     │
#> │ datetime[μs, UTC]       │
#> ╞═════════════════════════╡
#> │ 2019-12-31 16:00:00 UTC │
#> └─────────────────────────┘
# Dealing with different formats.
s3 <- as_polars_series(
  c(
    "2021-04-22",
    "2022-01-04 00:00:00",
    "01/31/22",
    "Sun Jul  8 00:34:60 2001"
  )
)

pl$select(pl$coalesce(
  s3$str$strptime(pl$Date, "%F", strict = FALSE),
  s3$str$strptime(pl$Date, "%F %T", strict = FALSE),
  s3$str$strptime(pl$Date, "%D", strict = FALSE),
  s3$str$strptime(pl$Date, "%c", strict = FALSE),
))$to_series()
#> shape: (4, 1)
#> ┌────────────┐
#> │            │
#> │ ---        │
#> │ date       │
#> ╞════════════╡
#> │ 2021-04-22 │
#> │ 2022-01-04 │
#> │ 2022-01-31 │
#> │ 2001-07-08 │
#> └────────────┘