Data structures
The core base data structures provided by Polars are Series
and DataFrames
.
Series
Series are a 1-dimensional data structure. Within a series all elements have the same Data Type .
The snippet below shows how to create a simple named Series
object.
shape: (5,)
Series: 'a' [i64]
[
1
2
3
4
5
]
DataFrame
A DataFrame
is a 2-dimensional data structure that is backed by a Series
, and it can be seen as an abstraction of a collection (e.g. list) of Series
. Operations that can be executed on a DataFrame
are very similar to what is done in a SQL
like query. You can GROUP BY
, JOIN
, PIVOT
, but also define custom functions.
from datetime import datetime
df = pl.DataFrame(
{
"integer": [1, 2, 3, 4, 5],
"date": [
datetime(2022, 1, 1),
datetime(2022, 1, 2),
datetime(2022, 1, 3),
datetime(2022, 1, 4),
datetime(2022, 1, 5),
],
"float": [4.0, 5.0, 6.0, 7.0, 8.0],
}
)
print(df)
use chrono::prelude::*;
let df: DataFrame = df!(
"integer" => &[1, 2, 3, 4, 5],
"date" => &[
NaiveDate::from_ymd_opt(2022, 1, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2022, 1, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2022, 1, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2022, 1, 4).unwrap().and_hms_opt(0, 0, 0).unwrap(),
NaiveDate::from_ymd_opt(2022, 1, 5).unwrap().and_hms_opt(0, 0, 0).unwrap()
],
"float" => &[4.0, 5.0, 6.0, 7.0, 8.0],
)
.unwrap();
println!("{}", df);
shape: (5, 3)
┌─────────┬─────────────────────┬───────┐
│ integer ┆ date ┆ float │
│ --- ┆ --- ┆ --- │
│ i64 ┆ datetime[μs] ┆ f64 │
╞═════════╪═════════════════════╪═══════╡
│ 1 ┆ 2022-01-01 00:00:00 ┆ 4.0 │
│ 2 ┆ 2022-01-02 00:00:00 ┆ 5.0 │
│ 3 ┆ 2022-01-03 00:00:00 ┆ 6.0 │
│ 4 ┆ 2022-01-04 00:00:00 ┆ 7.0 │
│ 5 ┆ 2022-01-05 00:00:00 ┆ 8.0 │
└─────────┴─────────────────────┴───────┘
Viewing data
This part focuses on viewing data in a DataFrame
. We will use the DataFrame
from the previous example as a starting point.
Head
The head
function shows by default the first 5 rows of a DataFrame
. You can specify the number of rows you want to see (e.g. df.head(10)
).
shape: (3, 3)
┌─────────┬─────────────────────┬───────┐
│ integer ┆ date ┆ float │
│ --- ┆ --- ┆ --- │
│ i64 ┆ datetime[μs] ┆ f64 │
╞═════════╪═════════════════════╪═══════╡
│ 1 ┆ 2022-01-01 00:00:00 ┆ 4.0 │
│ 2 ┆ 2022-01-02 00:00:00 ┆ 5.0 │
│ 3 ┆ 2022-01-03 00:00:00 ┆ 6.0 │
└─────────┴─────────────────────┴───────┘
Tail
The tail
function shows the last 5 rows of a DataFrame
. You can also specify the number of rows you want to see, similar to head
.
shape: (3, 3)
┌─────────┬─────────────────────┬───────┐
│ integer ┆ date ┆ float │
│ --- ┆ --- ┆ --- │
│ i64 ┆ datetime[μs] ┆ f64 │
╞═════════╪═════════════════════╪═══════╡
│ 3 ┆ 2022-01-03 00:00:00 ┆ 6.0 │
│ 4 ┆ 2022-01-04 00:00:00 ┆ 7.0 │
│ 5 ┆ 2022-01-05 00:00:00 ┆ 8.0 │
└─────────┴─────────────────────┴───────┘
Sample
If you want to get an impression of the data of your DataFrame
, you can also use sample
. With sample
you get an n number of random rows from the DataFrame
.
shape: (2, 3)
┌─────────┬─────────────────────┬───────┐
│ integer ┆ date ┆ float │
│ --- ┆ --- ┆ --- │
│ i64 ┆ datetime[μs] ┆ f64 │
╞═════════╪═════════════════════╪═══════╡
│ 3 ┆ 2022-01-03 00:00:00 ┆ 6.0 │
│ 2 ┆ 2022-01-02 00:00:00 ┆ 5.0 │
└─────────┴─────────────────────┴───────┘
Describe
Describe
returns summary statistics of your DataFrame
. It will provide several quick statistics if possible.
print(df.describe())
describe
· Available on feature describe
println!("{:?}", df.describe(None));
shape: (9, 4)
┌────────────┬──────────┬─────────────────────┬──────────┐
│ describe ┆ integer ┆ date ┆ float │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ str ┆ f64 │
╞════════════╪══════════╪═════════════════════╪══════════╡
│ count ┆ 5.0 ┆ 5 ┆ 5.0 │
│ null_count ┆ 0.0 ┆ 0 ┆ 0.0 │
│ mean ┆ 3.0 ┆ null ┆ 6.0 │
│ std ┆ 1.581139 ┆ null ┆ 1.581139 │
│ min ┆ 1.0 ┆ 2022-01-01 00:00:00 ┆ 4.0 │
│ 25% ┆ 2.0 ┆ null ┆ 5.0 │
│ 50% ┆ 3.0 ┆ null ┆ 6.0 │
│ 75% ┆ 4.0 ┆ null ┆ 7.0 │
│ max ┆ 5.0 ┆ 2022-01-05 00:00:00 ┆ 8.0 │
└────────────┴──────────┴─────────────────────┴──────────┘