Skip to content

Data structures

The core base data structures provided by Polars are Series and DataFrames.

Series

Series are a 1-dimensional data structure. Within a series all elements have the same Data Type . The snippet below shows how to create a simple named Series object.

Series

import polars as pl

s = pl.Series("a", [1, 2, 3, 4, 5])
print(s)

Series

use polars::prelude::*;

let s = Series::new("a", [1, 2, 3, 4, 5]);
println!("{}", s);

shape: (5,)
Series: 'a' [i64]
[
    1
    2
    3
    4
    5
]

DataFrame

A DataFrame is a 2-dimensional data structure that is backed by a Series, and it can be seen as an abstraction of a collection (e.g. list) of Series. Operations that can be executed on a DataFrame are very similar to what is done in a SQL like query. You can GROUP BY, JOIN, PIVOT, but also define custom functions.

DataFrame

from datetime import datetime

df = pl.DataFrame(
    {
        "integer": [1, 2, 3, 4, 5],
        "date": [
            datetime(2022, 1, 1),
            datetime(2022, 1, 2),
            datetime(2022, 1, 3),
            datetime(2022, 1, 4),
            datetime(2022, 1, 5),
        ],
        "float": [4.0, 5.0, 6.0, 7.0, 8.0],
    }
)

print(df)

DataFrame

use chrono::prelude::*;

let df: DataFrame = df!(
    "integer" => &[1, 2, 3, 4, 5],
    "date" => &[
        NaiveDate::from_ymd_opt(2022, 1, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(),
        NaiveDate::from_ymd_opt(2022, 1, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(),
        NaiveDate::from_ymd_opt(2022, 1, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(),
        NaiveDate::from_ymd_opt(2022, 1, 4).unwrap().and_hms_opt(0, 0, 0).unwrap(),
        NaiveDate::from_ymd_opt(2022, 1, 5).unwrap().and_hms_opt(0, 0, 0).unwrap()
    ],
    "float" => &[4.0, 5.0, 6.0, 7.0, 8.0],
)
.unwrap();

println!("{}", df);

shape: (5, 3)
┌─────────┬─────────────────────┬───────┐
│ integer ┆ date                ┆ float │
│ ---     ┆ ---                 ┆ ---   │
│ i64     ┆ datetime[μs]        ┆ f64   │
╞═════════╪═════════════════════╪═══════╡
│ 1       ┆ 2022-01-01 00:00:00 ┆ 4.0   │
│ 2       ┆ 2022-01-02 00:00:00 ┆ 5.0   │
│ 3       ┆ 2022-01-03 00:00:00 ┆ 6.0   │
│ 4       ┆ 2022-01-04 00:00:00 ┆ 7.0   │
│ 5       ┆ 2022-01-05 00:00:00 ┆ 8.0   │
└─────────┴─────────────────────┴───────┘

Viewing data

This part focuses on viewing data in a DataFrame. We will use the DataFrame from the previous example as a starting point.

The head function shows by default the first 5 rows of a DataFrame. You can specify the number of rows you want to see (e.g. df.head(10)).

head

print(df.head(3))

head

println!("{}", df.head(Some(3)));

shape: (3, 3)
┌─────────┬─────────────────────┬───────┐
│ integer ┆ date                ┆ float │
│ ---     ┆ ---                 ┆ ---   │
│ i64     ┆ datetime[μs]        ┆ f64   │
╞═════════╪═════════════════════╪═══════╡
│ 1       ┆ 2022-01-01 00:00:00 ┆ 4.0   │
│ 2       ┆ 2022-01-02 00:00:00 ┆ 5.0   │
│ 3       ┆ 2022-01-03 00:00:00 ┆ 6.0   │
└─────────┴─────────────────────┴───────┘

Tail

The tail function shows the last 5 rows of a DataFrame. You can also specify the number of rows you want to see, similar to head.

tail

print(df.tail(3))

tail

println!("{}", df.tail(Some(3)));

shape: (3, 3)
┌─────────┬─────────────────────┬───────┐
│ integer ┆ date                ┆ float │
│ ---     ┆ ---                 ┆ ---   │
│ i64     ┆ datetime[μs]        ┆ f64   │
╞═════════╪═════════════════════╪═══════╡
│ 3       ┆ 2022-01-03 00:00:00 ┆ 6.0   │
│ 4       ┆ 2022-01-04 00:00:00 ┆ 7.0   │
│ 5       ┆ 2022-01-05 00:00:00 ┆ 8.0   │
└─────────┴─────────────────────┴───────┘

Sample

If you want to get an impression of the data of your DataFrame, you can also use sample. With sample you get an n number of random rows from the DataFrame.

sample

print(df.sample(2))

sample_n

println!("{}", df.sample_n(2, false, true, None)?);

shape: (2, 3)
┌─────────┬─────────────────────┬───────┐
│ integer ┆ date                ┆ float │
│ ---     ┆ ---                 ┆ ---   │
│ i64     ┆ datetime[μs]        ┆ f64   │
╞═════════╪═════════════════════╪═══════╡
│ 3       ┆ 2022-01-03 00:00:00 ┆ 6.0   │
│ 2       ┆ 2022-01-02 00:00:00 ┆ 5.0   │
└─────────┴─────────────────────┴───────┘

Describe

Describe returns summary statistics of your DataFrame. It will provide several quick statistics if possible.

describe

print(df.describe())

describe · Available on feature describe

println!("{:?}", df.describe(None));

shape: (9, 4)
┌────────────┬──────────┬─────────────────────┬──────────┐
│ describe   ┆ integer  ┆ date                ┆ float    │
│ ---        ┆ ---      ┆ ---                 ┆ ---      │
│ str        ┆ f64      ┆ str                 ┆ f64      │
╞════════════╪══════════╪═════════════════════╪══════════╡
│ count      ┆ 5.0      ┆ 5                   ┆ 5.0      │
│ null_count ┆ 0.0      ┆ 0                   ┆ 0.0      │
│ mean       ┆ 3.0      ┆ null                ┆ 6.0      │
│ std        ┆ 1.581139 ┆ null                ┆ 1.581139 │
│ min        ┆ 1.0      ┆ 2022-01-01 00:00:00 ┆ 4.0      │
│ 25%        ┆ 2.0      ┆ null                ┆ 5.0      │
│ 50%        ┆ 3.0      ┆ null                ┆ 6.0      │
│ 75%        ┆ 4.0      ┆ null                ┆ 7.0      │
│ max        ┆ 5.0      ┆ 2022-01-05 00:00:00 ┆ 8.0      │
└────────────┴──────────┴─────────────────────┴──────────┘