Filtering
Filtering date columns works in the same way as with other types of columns using the .filter
method.
Polars uses Python's native datetime
, date
and timedelta
for equality comparisons between the datatypes pl.Datetime
, pl.Date
and pl.Duration
.
In the following example we use a time series of Apple stock prices.
import polars as pl
from datetime import datetime
df = pl.read_csv("docs/data/apple_stock.csv", try_parse_dates=True)
print(df)
CsvReader
· Available on feature csv
let df = CsvReader::from_path("docs/data/apple_stock.csv")
.unwrap()
.with_try_parse_dates(true)
.finish()
.unwrap();
println!("{}", &df);
shape: (100, 2)
┌────────────┬────────┐
│ Date ┆ Close │
│ --- ┆ --- │
│ date ┆ f64 │
╞════════════╪════════╡
│ 1981-02-23 ┆ 24.62 │
│ 1981-05-06 ┆ 27.38 │
│ 1981-05-18 ┆ 28.0 │
│ 1981-09-25 ┆ 14.25 │
│ … ┆ … │
│ 2012-12-04 ┆ 575.85 │
│ 2013-07-05 ┆ 417.42 │
│ 2013-11-07 ┆ 512.49 │
│ 2014-02-25 ┆ 522.06 │
└────────────┴────────┘
Filtering by single dates
We can filter by a single date by casting the desired date string to a Date
object
in a filter expression:
shape: (1, 2)
┌────────────┬───────┐
│ Date ┆ Close │
│ --- ┆ --- │
│ date ┆ f64 │
╞════════════╪═══════╡
│ 1995-10-16 ┆ 36.13 │
└────────────┴───────┘
Note we are using the lowercase datetime
method rather than the uppercase Datetime
data type.
Filtering by a date range
We can filter by a range of dates using the is_between
method in a filter expression with the start and end dates:
filtered_range_df = df.filter(
pl.col("Date").is_between(datetime(1995, 7, 1), datetime(1995, 11, 1)),
)
print(filtered_range_df)
let filtered_range_df = df
.clone()
.lazy()
.filter(
col("Date")
.gt(lit(NaiveDate::from_ymd_opt(1995, 7, 1).unwrap()))
.and(col("Date").lt(lit(NaiveDate::from_ymd_opt(1995, 11, 1).unwrap()))),
)
.collect()?;
println!("{}", &filtered_range_df);
shape: (2, 2)
┌────────────┬───────┐
│ Date ┆ Close │
│ --- ┆ --- │
│ date ┆ f64 │
╞════════════╪═══════╡
│ 1995-07-06 ┆ 47.0 │
│ 1995-10-16 ┆ 36.13 │
└────────────┴───────┘
Filtering with negative dates
Say you are working with an archeologist and are dealing in negative dates.
Polars can parse and store them just fine, but the Python datetime
library
does not. So for filtering, you should use attributes in the .dt
namespace:
ts = pl.Series(["-1300-05-23", "-1400-03-02"]).str.strptime(pl.Date)
negative_dates_df = pl.DataFrame({"ts": ts, "values": [3, 4]})
negative_dates_filtered_df = negative_dates_df.filter(pl.col("ts").dt.year() < -1300)
print(negative_dates_filtered_df)
let negative_dates_df = df!(
"ts"=> &["-1300-05-23", "-1400-03-02"],
"values"=> &[3, 4])?
.lazy()
.with_column(
col("ts")
.str()
.strptime(DataType::Date, StrptimeOptions::default()),
)
.collect()?;
let negative_dates_filtered_df = negative_dates_df
.clone()
.lazy()
.filter(col("ts").dt().year().lt(-1300))
.collect()?;
println!("{}", &negative_dates_filtered_df);
shape: (1, 2)
┌─────────────┬────────┐
│ ts ┆ values │
│ --- ┆ --- │
│ date ┆ i64 │
╞═════════════╪════════╡
│ -1400-03-02 ┆ 4 │
└─────────────┴────────┘