Timestamp parsing

Polars offers 4 time datatypes:

  • pl.Date, to be used for date objects: the number of days since the UNIX epoch as a 32 bit signed integer.
  • pl.Datetime, to be used for datetime objects: the number of nanoseconds since the UNIX epoch as a 64 bit signed integer.
  • pl.Time, encoded as the number of nanoseconds since midnight.
  • pl.Duration, to be used for timedelta objects: the difference between Date, Datetime or Time as a 64 bit signed integer offering microsecond resolution.

Polars string (pl.Utf8) datatypes can be parsed as either of them. You can let Polars try to guess the format of the date[time], or explicitly provide a fmt rule.

For instance (check this link for an comprehensive list):

  • "%Y-%m-%d" for "2020-12-31"
  • "%Y/%B/%d" for "2020/December/31"
  • "%B %y" for "December 20"

Below a quick example:

import polars as pl

dataset = pl.DataFrame({"date": ["2020-01-02", "2020-01-03", "2020-01-04"], "index": [1, 2, 3]})

q = dataset.lazy().with_columns(pl.col("date").str.strptime(pl.Date, "%Y-%m-%d"))

df = q.collect()

returning:

shape: (3, 2)
┌────────────┬───────┐
│ date       ┆ index │
│ ---        ┆ ---   │
│ date       ┆ i64   │
╞════════════╪═══════╡
│ 2020-01-02 ┆ 1     │
│ 2020-01-03 ┆ 2     │
│ 2020-01-04 ┆ 3     │
└────────────┴───────┘

All datetime functionality is shown in the dt namespace.