Parsing
Polars has native support for parsing time series data and doing more sophisticated operations such as temporal grouping and resampling.
Datatypes
Polars
has the following datetime datatypes:
Date
: Date representation e.g. 2014-07-08. It is internally represented as days since UNIX epoch encoded by a 32-bit signed integer.Datetime
: Datetime representation e.g. 2014-07-08 07:00:00. It is internally represented as a 64 bit integer since the Unix epoch and can have different units such as ns, us, ms.Duration
: A time delta type that is created when subtractingDate/Datetime
. Similar totimedelta
in python.Time
: Time representation, internally represented as nanoseconds since midnight.
Parsing dates from a file
When loading from a CSV file Polars
attempts to parse dates and times if the try_parse_dates
flag is set to True
:
df = pl.read_csv("docs/data/apple_stock.csv", try_parse_dates=True)
print(df)
CsvReader
· Available on feature csv
let df = CsvReader::from_path("docs/data/apple_stock.csv")
.unwrap()
.with_try_parse_dates(true)
.finish()
.unwrap();
println!("{}", &df);
shape: (100, 2)
┌────────────┬────────┐
│ Date ┆ Close │
│ --- ┆ --- │
│ date ┆ f64 │
╞════════════╪════════╡
│ 1981-02-23 ┆ 24.62 │
│ 1981-05-06 ┆ 27.38 │
│ 1981-05-18 ┆ 28.0 │
│ 1981-09-25 ┆ 14.25 │
│ … ┆ … │
│ 2012-12-04 ┆ 575.85 │
│ 2013-07-05 ┆ 417.42 │
│ 2013-11-07 ┆ 512.49 │
│ 2014-02-25 ┆ 522.06 │
└────────────┴────────┘
On the other hand binary formats such as parquet have a schema that is respected by Polars
.
Casting strings to dates
You can also cast a column of datetimes encoded as strings to a datetime type. You do this by calling the string str.strptime
method and passing the format of the date string:
df = pl.read_csv("docs/data/apple_stock.csv", try_parse_dates=False)
df = df.with_columns(pl.col("Date").str.strptime(pl.Date, format="%Y-%m-%d"))
print(df)
CsvReader
· Available on feature csv
let df = CsvReader::from_path("docs/data/apple_stock.csv")
.unwrap()
.with_try_parse_dates(false)
.finish()
.unwrap();
let df = df
.clone()
.lazy()
.with_columns([col("Date")
.str()
.strptime(DataType::Date, StrptimeOptions::default())])
.collect()?;
println!("{}", &df);
shape: (100, 2)
┌────────────┬────────┐
│ Date ┆ Close │
│ --- ┆ --- │
│ date ┆ f64 │
╞════════════╪════════╡
│ 1981-02-23 ┆ 24.62 │
│ 1981-05-06 ┆ 27.38 │
│ 1981-05-18 ┆ 28.0 │
│ 1981-09-25 ┆ 14.25 │
│ … ┆ … │
│ 2012-12-04 ┆ 575.85 │
│ 2013-07-05 ┆ 417.42 │
│ 2013-11-07 ┆ 512.49 │
│ 2014-02-25 ┆ 522.06 │
└────────────┴────────┘
The strptime date formats can be found here..
Extracting date features from a date column
You can extract data features such as the year or day from a date column using the .dt
namespace on a date column:
df_with_year = df.with_columns(pl.col("Date").dt.year().alias("year"))
print(df_with_year)
let df_with_year = df
.clone()
.lazy()
.with_columns([col("Date").dt().year().alias("year")])
.collect()?;
println!("{}", &df_with_year);
shape: (100, 3)
┌────────────┬────────┬──────┐
│ Date ┆ Close ┆ year │
│ --- ┆ --- ┆ --- │
│ date ┆ f64 ┆ i32 │
╞════════════╪════════╪══════╡
│ 1981-02-23 ┆ 24.62 ┆ 1981 │
│ 1981-05-06 ┆ 27.38 ┆ 1981 │
│ 1981-05-18 ┆ 28.0 ┆ 1981 │
│ 1981-09-25 ┆ 14.25 ┆ 1981 │
│ … ┆ … ┆ … │
│ 2012-12-04 ┆ 575.85 ┆ 2012 │
│ 2013-07-05 ┆ 417.42 ┆ 2013 │
│ 2013-11-07 ┆ 512.49 ┆ 2013 │
│ 2014-02-25 ┆ 522.06 ┆ 2014 │
└────────────┴────────┴──────┘
Mixed offsets
If you have mixed offsets (say, due to crossing daylight saving time),
then you can use utc=True
and then convert to your time zone:
strptime
· convert_time_zone
· Available on feature timezone
data = [
"2021-03-27T00:00:00+0100",
"2021-03-28T00:00:00+0100",
"2021-03-29T00:00:00+0200",
"2021-03-30T00:00:00+0200",
]
mixed_parsed = (
pl.Series(data)
.str.strptime(pl.Datetime, format="%Y-%m-%dT%H:%M:%S%z")
.dt.convert_time_zone("Europe/Brussels")
)
print(mixed_parsed)
let data = [
"2021-03-27T00:00:00+0100",
"2021-03-28T00:00:00+0100",
"2021-03-29T00:00:00+0200",
"2021-03-30T00:00:00+0200",
];
let q = col("date")
.str()
.strptime(
DataType::Datetime(TimeUnit::Microseconds, None),
StrptimeOptions {
format: Some("%Y-%m-%dT%H:%M:%S%z".to_string()),
..Default::default()
},
)
.dt()
.convert_time_zone("Europe/Brussels".to_string());
let mixed_parsed = df!("date" => &data)?.lazy().select([q]).collect()?;
println!("{}", &mixed_parsed);
shape: (4,)
Series: '' [datetime[μs, Europe/Brussels]]
[
2021-03-27 00:00:00 CET
2021-03-28 00:00:00 CET
2021-03-29 00:00:00 CEST
2021-03-30 00:00:00 CEST
]