Skip to content

Parsing

Polars has native support for parsing time series data and doing more sophisticated operations such as temporal grouping and resampling.

Datatypes

Polars has the following datetime datatypes:

  • Date: Date representation e.g. 2014-07-08. It is internally represented as days since UNIX epoch encoded by a 32-bit signed integer.
  • Datetime: Datetime representation e.g. 2014-07-08 07:00:00. It is internally represented as a 64 bit integer since the Unix epoch and can have different units such as ns, us, ms.
  • Duration: A time delta type that is created when subtracting Date/Datetime. Similar to timedelta in python.
  • Time: Time representation, internally represented as nanoseconds since midnight.

Parsing dates from a file

When loading from a CSV file Polars attempts to parse dates and times if the try_parse_dates flag is set to True:

read_csv

df = pl.read_csv("docs/data/apple_stock.csv", try_parse_dates=True)
print(df)

CsvReader · Available on feature csv

let df = CsvReader::from_path("docs/data/apple_stock.csv")
    .unwrap()
    .with_try_parse_dates(true)
    .finish()
    .unwrap();
println!("{}", &df);

shape: (100, 2)
┌────────────┬────────┐
│ Date       ┆ Close  │
│ ---        ┆ ---    │
│ date       ┆ f64    │
╞════════════╪════════╡
│ 1981-02-23 ┆ 24.62  │
│ 1981-05-06 ┆ 27.38  │
│ 1981-05-18 ┆ 28.0   │
│ 1981-09-25 ┆ 14.25  │
│ …          ┆ …      │
│ 2012-12-04 ┆ 575.85 │
│ 2013-07-05 ┆ 417.42 │
│ 2013-11-07 ┆ 512.49 │
│ 2014-02-25 ┆ 522.06 │
└────────────┴────────┘

On the other hand binary formats such as parquet have a schema that is respected by Polars.

Casting strings to dates

You can also cast a column of datetimes encoded as strings to a datetime type. You do this by calling the string str.strptime method and passing the format of the date string:

read_csv · strptime

df = pl.read_csv("docs/data/apple_stock.csv", try_parse_dates=False)

df = df.with_columns(pl.col("Date").str.strptime(pl.Date, format="%Y-%m-%d"))
print(df)

CsvReader · Available on feature csv

let df = CsvReader::from_path("docs/data/apple_stock.csv")
    .unwrap()
    .with_try_parse_dates(false)
    .finish()
    .unwrap();
let df = df
    .clone()
    .lazy()
    .with_columns([col("Date")
        .str()
        .strptime(DataType::Date, StrptimeOptions::default())])
    .collect()?;
println!("{}", &df);

shape: (100, 2)
┌────────────┬────────┐
│ Date       ┆ Close  │
│ ---        ┆ ---    │
│ date       ┆ f64    │
╞════════════╪════════╡
│ 1981-02-23 ┆ 24.62  │
│ 1981-05-06 ┆ 27.38  │
│ 1981-05-18 ┆ 28.0   │
│ 1981-09-25 ┆ 14.25  │
│ …          ┆ …      │
│ 2012-12-04 ┆ 575.85 │
│ 2013-07-05 ┆ 417.42 │
│ 2013-11-07 ┆ 512.49 │
│ 2014-02-25 ┆ 522.06 │
└────────────┴────────┘

The strptime date formats can be found here..

Extracting date features from a date column

You can extract data features such as the year or day from a date column using the .dt namespace on a date column:

year

df_with_year = df.with_columns(pl.col("Date").dt.year().alias("year"))
print(df_with_year)

let df_with_year = df
    .clone()
    .lazy()
    .with_columns([col("Date").dt().year().alias("year")])
    .collect()?;
println!("{}", &df_with_year);
shape: (100, 3)
┌────────────┬────────┬──────┐
│ Date       ┆ Close  ┆ year │
│ ---        ┆ ---    ┆ ---  │
│ date       ┆ f64    ┆ i32  │
╞════════════╪════════╪══════╡
│ 1981-02-23 ┆ 24.62  ┆ 1981 │
│ 1981-05-06 ┆ 27.38  ┆ 1981 │
│ 1981-05-18 ┆ 28.0   ┆ 1981 │
│ 1981-09-25 ┆ 14.25  ┆ 1981 │
│ …          ┆ …      ┆ …    │
│ 2012-12-04 ┆ 575.85 ┆ 2012 │
│ 2013-07-05 ┆ 417.42 ┆ 2013 │
│ 2013-11-07 ┆ 512.49 ┆ 2013 │
│ 2014-02-25 ┆ 522.06 ┆ 2014 │
└────────────┴────────┴──────┘

Mixed offsets

If you have mixed offsets (say, due to crossing daylight saving time), then you can use utc=True and then convert to your time zone:

strptime · convert_time_zone · Available on feature timezone

data = [
    "2021-03-27T00:00:00+0100",
    "2021-03-28T00:00:00+0100",
    "2021-03-29T00:00:00+0200",
    "2021-03-30T00:00:00+0200",
]
mixed_parsed = (
    pl.Series(data)
    .str.strptime(pl.Datetime, format="%Y-%m-%dT%H:%M:%S%z")
    .dt.convert_time_zone("Europe/Brussels")
)
print(mixed_parsed)

let data = [
    "2021-03-27T00:00:00+0100",
    "2021-03-28T00:00:00+0100",
    "2021-03-29T00:00:00+0200",
    "2021-03-30T00:00:00+0200",
];
let q = col("date")
    .str()
    .strptime(
        DataType::Datetime(TimeUnit::Microseconds, None),
        StrptimeOptions {
            format: Some("%Y-%m-%dT%H:%M:%S%z".to_string()),
            ..Default::default()
        },
    )
    .dt()
    .convert_time_zone("Europe/Brussels".to_string());
let mixed_parsed = df!("date" => &data)?.lazy().select([q]).collect()?;

println!("{}", &mixed_parsed);
shape: (4,)
Series: '' [datetime[μs, Europe/Brussels]]
[
    2021-03-27 00:00:00 CET
    2021-03-28 00:00:00 CET
    2021-03-29 00:00:00 CEST
    2021-03-30 00:00:00 CEST
]