Series & DataFrames

The core base data structures provided by Polars are Series and DataFrames.

Series

Series are a 1-dimensional data structure. Within a series all elements have the same data type (e.g. int, string). The snippet below shows how to create a simple named Series object. In a later section of this getting started guide we will learn how to read data from external sources (e.g. files, database), for now lets keep it simple.

Python Rust

Series

import polars as pl

s = pl.Series("a", [1, 2, 3, 4, 5])
print(s)

Series

use polars::prelude::*;

let s = Series::new("a", [1, 2, 3, 4, 5]);
println!("{}", s);

shape: (5,)
Series: 'a' [i64]
[
    1
    2
    3
    4
    5
]

Methods

Although it is more common to work directly on a DataFrame object, Series implement a number of base methods which make it easy to perform transformations. Below are some examples of common operations you might want to perform. Note that these are for illustration purposes and only show a small subset of what is available.

Aggregations

Series out of the box supports all basic aggregations (e.g. min, max, mean, mode, ...).

Python Rust

min · max

s = pl.Series("a", [1, 2, 3, 4, 5])
print(s.min())
print(s.max())

min · max

let s = Series::new("a", [1, 2, 3, 4, 5]);
// The use of generics is necessary for the type system
println!("{}", s.min::<u64>().unwrap());
println!("{}", s.max::<u64>().unwrap());

1
5

String

There are a number of methods related to string operations in the StringNamespace. These only work on Series with the Datatype Utf8.

Python Rust

replace

s = pl.Series("a", ["polar", "bear", "arctic", "polar fox", "polar bear"])
s2 = s.str.replace("polar", "pola")
print(s2)

// This operation is not directly available on the Series object yet, only on the DataFrame

shape: (5,)
Series: 'a' [str]
[
    "pola"
    "bear"
    "arctic"
    "pola fox"
    "pola bear"
]

Datetime

Similar to strings, there is a separate namespace for datetime related operations in the DateLikeNameSpace. These only work on Serieswith DataTypes related to dates.

Python Rust

day

from datetime import datetime

start = datetime(2001, 1, 1)
stop = datetime(2001, 1, 9)
s = pl.date_range(start, stop, interval="2d", eager=True)
s.dt.day()
print(s)

// This operation is not directly available on the Series object yet, only on the DataFrame

shape: (5,)
Series: 'date' [datetime[μs]]
[
    2001-01-01 00:00:00
    2001-01-03 00:00:00
    2001-01-05 00:00:00
    2001-01-07 00:00:00
    2001-01-09 00:00:00
]

DataFrame

A DataFrame is a 2-dimensional data structure that is backed by a Series, and it could be seen as an abstraction of on collection (e.g. list) of Series. Operations that can be executed on DataFrame are very similar to what is done in a SQL like query. You can GROUP BY, JOIN, PIVOT, but also define custom functions. In the next pages we will cover how to perform these transformations.

Python Rust

DataFrame

from datetime import datetime

df = pl.DataFrame(
    {
        "integer": [1, 2, 3, 4, 5],
        "date": [
            datetime(2022, 1, 1),
            datetime(2022, 1, 2),
            datetime(2022, 1, 3),
            datetime(2022, 1, 4),
            datetime(2022, 1, 5),
        ],
        "float": [4.0, 5.0, 6.0, 7.0, 8.0],
    }
)

print(df)

DataFrame

use chrono::prelude::*;

let df: DataFrame = df!(
    "integer" => &[1, 2, 3, 4, 5],
    "date" => &[
        NaiveDate::from_ymd_opt(2022, 1, 1).unwrap().and_hms_opt(0, 0, 0).unwrap(),
        NaiveDate::from_ymd_opt(2022, 1, 2).unwrap().and_hms_opt(0, 0, 0).unwrap(),
        NaiveDate::from_ymd_opt(2022, 1, 3).unwrap().and_hms_opt(0, 0, 0).unwrap(),
        NaiveDate::from_ymd_opt(2022, 1, 4).unwrap().and_hms_opt(0, 0, 0).unwrap(),
        NaiveDate::from_ymd_opt(2022, 1, 5).unwrap().and_hms_opt(0, 0, 0).unwrap()
    ],
    "float" => &[4.0, 5.0, 6.0, 7.0, 8.0],
)
.unwrap();

println!("{}", df);

shape: (5, 3)
┌─────────┬─────────────────────┬───────┐
│ integer ┆ date                ┆ float │
│ ---     ┆ ---                 ┆ ---   │
│ i64     ┆ datetime[μs]        ┆ f64   │
╞═════════╪═════════════════════╪═══════╡
│ 1       ┆ 2022-01-01 00:00:00 ┆ 4.0   │
│ 2       ┆ 2022-01-02 00:00:00 ┆ 5.0   │
│ 3       ┆ 2022-01-03 00:00:00 ┆ 6.0   │
│ 4       ┆ 2022-01-04 00:00:00 ┆ 7.0   │
│ 5       ┆ 2022-01-05 00:00:00 ┆ 8.0   │
└─────────┴─────────────────────┴───────┘

Viewing data

This part focuses on viewing data in a DataFrame. We will use the DataFrame from the previous example as a starting point.

Head

The head function shows by default the first 5 rows of a DataFrame. You can specify the number of rows you want to see (e.g. df.head(10)).

Python Rust

head

print(df.head(3))

head

println!("{}", df.head(Some(3)));

shape: (3, 3)
┌─────────┬─────────────────────┬───────┐
│ integer ┆ date                ┆ float │
│ ---     ┆ ---                 ┆ ---   │
│ i64     ┆ datetime[μs]        ┆ f64   │
╞═════════╪═════════════════════╪═══════╡
│ 1       ┆ 2022-01-01 00:00:00 ┆ 4.0   │
│ 2       ┆ 2022-01-02 00:00:00 ┆ 5.0   │
│ 3       ┆ 2022-01-03 00:00:00 ┆ 6.0   │
└─────────┴─────────────────────┴───────┘

Tail

The tail function shows the last 5 rows of a DataFrame. You can also specify the number of rows you want to see, similar to head.

Python Rust

tail

print(df.tail(3))

tail

println!("{}", df.tail(Some(3)));

shape: (3, 3)
┌─────────┬─────────────────────┬───────┐
│ integer ┆ date                ┆ float │
│ ---     ┆ ---                 ┆ ---   │
│ i64     ┆ datetime[μs]        ┆ f64   │
╞═════════╪═════════════════════╪═══════╡
│ 3       ┆ 2022-01-03 00:00:00 ┆ 6.0   │
│ 4       ┆ 2022-01-04 00:00:00 ┆ 7.0   │
│ 5       ┆ 2022-01-05 00:00:00 ┆ 8.0   │
└─────────┴─────────────────────┴───────┘

Sample

If you want to get an impression of the data of your DataFrame, you can also use sample. With sample you get an n number of random rows from the DataFrame.

Python Rust

sample

print(df.sample(2))

sample_n

println!("{}", df.sample_n(2, false, true, None)?);

shape: (2, 3)
┌─────────┬─────────────────────┬───────┐
│ integer ┆ date                ┆ float │
│ ---     ┆ ---                 ┆ ---   │
│ i64     ┆ datetime[μs]        ┆ f64   │
╞═════════╪═════════════════════╪═══════╡
│ 5       ┆ 2022-01-05 00:00:00 ┆ 8.0   │
│ 4       ┆ 2022-01-04 00:00:00 ┆ 7.0   │
└─────────┴─────────────────────┴───────┘

Describe

Describe returns summary statistics of your DataFrame. It will provide several quick statistics if possible.

Python Rust

describe

print(df.describe())

describe · Available on feature describe

println!("{:?}", df.describe(None));

shape: (9, 4)
┌────────────┬──────────┬─────────────────────┬──────────┐
│ describe   ┆ integer  ┆ date                ┆ float    │
│ ---        ┆ ---      ┆ ---                 ┆ ---      │
│ str        ┆ f64      ┆ str                 ┆ f64      │
╞════════════╪══════════╪═════════════════════╪══════════╡
│ count      ┆ 5.0      ┆ 5                   ┆ 5.0      │
│ null_count ┆ 0.0      ┆ 0                   ┆ 0.0      │
│ mean       ┆ 3.0      ┆ null                ┆ 6.0      │
│ std        ┆ 1.581139 ┆ null                ┆ 1.581139 │
│ min        ┆ 1.0      ┆ 2022-01-01 00:00:00 ┆ 4.0      │
│ 25%        ┆ 2.0      ┆ null                ┆ 5.0      │
│ 50%        ┆ 3.0      ┆ null                ┆ 6.0      │
│ 75%        ┆ 4.0      ┆ null                ┆ 7.0      │
│ max        ┆ 5.0      ┆ 2022-01-05 00:00:00 ┆ 8.0      │
└────────────┴──────────┴─────────────────────┴──────────┘