Timestamp parsing
Polars
offers 4
time datatypes:
pl.Date
, to be used for date objects: the number of days since the UNIX epoch as a 32 bit signed integer.pl.Datetime
, to be used for datetime objects: the number of nanoseconds since the UNIX epoch as a 64 bit signed integer.pl.Time
, encoded as the number of nanoseconds since midnight.pl.Duration
, to be used for timedelta objects: the difference between Date, Datetime or Time as a 64 bit signed integer offering microsecond resolution.
Polars
string (pl.Utf8
) datatypes can be parsed as either of them. You can let
Polars
try to guess the format of the date[time], or explicitly provide a fmt
rule.
For instance (check this link for an comprehensive list):
"%Y-%m-%d"
for"2020-12-31"
"%Y/%B/%d"
for"2020/December/31"
"%B %y"
for"December 20"
Below a quick example:
import polars as pl
dataset = pl.DataFrame({"date": ["2020-01-02", "2020-01-03", "2020-01-04"], "index": [1, 2, 3]})
q = dataset.lazy().with_columns(pl.col("date").str.strptime(pl.Date, "%Y-%m-%d"))
df = q.collect()
returning:
shape: (3, 2)
┌────────────┬───────┐
│ date ┆ index │
│ --- ┆ --- │
│ date ┆ i64 │
╞════════════╪═══════╡
│ 2020-01-02 ┆ 1 │
│ 2020-01-03 ┆ 2 │
│ 2020-01-04 ┆ 3 │
└────────────┴───────┘
All datetime functionality is shown in the dt
namespace.