Skip to content

Casting

Casting converts the underlying DataType of a column to a new one. Polars uses Arrow to manage the data in memory and relies on the compute kernels in the Rust implementation to do the conversion. Casting is available with the cast() method.

The cast method includes a strict parameter that determines how Polars behaves when it encounters a value that can't be converted from the source DataType to the target DataType. By default, strict=True, which means that Polars will throw an error to notify the user of the failed conversion and provide details on the values that couldn't be cast. On the other hand, if strict=False, any values that can't be converted to the target DataType will be quietly converted to null.

Numerics

Let's take a look at the following DataFrame which contains both integers and floating point numbers.

DataFrame

df = pl.DataFrame(
    {
        "integers": [1, 2, 3, 4, 5],
        "big_integers": [1, 10000002, 3, 10000004, 10000005],
        "floats": [4.0, 5.0, 6.0, 7.0, 8.0],
        "floats_with_decimal": [4.532, 5.5, 6.5, 7.5, 8.5],
    }
)

print(df)

DataFrame

let df = df! (
    "integers"=> &[1, 2, 3, 4, 5],
    "big_integers"=> &[1, 10000002, 3, 10000004, 10000005],
    "floats"=> &[4.0, 5.0, 6.0, 7.0, 8.0],
    "floats_with_decimal"=> &[4.532, 5.5, 6.5, 7.5, 8.5],
)?;

println!("{}", &df);

shape: (5, 4)
┌──────────┬──────────────┬────────┬─────────────────────┐
│ integers ┆ big_integers ┆ floats ┆ floats_with_decimal │
│ ---      ┆ ---          ┆ ---    ┆ ---                 │
│ i64      ┆ i64          ┆ f64    ┆ f64                 │
╞══════════╪══════════════╪════════╪═════════════════════╡
│ 1        ┆ 1            ┆ 4.0    ┆ 4.532               │
│ 2        ┆ 10000002     ┆ 5.0    ┆ 5.5                 │
│ 3        ┆ 3            ┆ 6.0    ┆ 6.5                 │
│ 4        ┆ 10000004     ┆ 7.0    ┆ 7.5                 │
│ 5        ┆ 10000005     ┆ 8.0    ┆ 8.5                 │
└──────────┴──────────────┴────────┴─────────────────────┘

To perform casting operations between floats and integers, or vice versa, we can invoke the cast() function.

cast

out = df.select(
    pl.col("integers").cast(pl.Float32).alias("integers_as_floats"),
    pl.col("floats").cast(pl.Int32).alias("floats_as_integers"),
    pl.col("floats_with_decimal")
    .cast(pl.Int32)
    .alias("floats_with_decimal_as_integers"),
)
print(out)

cast

let out = df
    .clone()
    .lazy()
    .select([
        col("integers")
            .cast(DataType::Float32)
            .alias("integers_as_floats"),
        col("floats")
            .cast(DataType::Int32)
            .alias("floats_as_integers"),
        col("floats_with_decimal")
            .cast(DataType::Int32)
            .alias("floats_with_decimal_as_integers"),
    ])
    .collect()?;
println!("{}", &out);

shape: (5, 3)
┌────────────────────┬────────────────────┬─────────────────────────────────┐
│ integers_as_floats ┆ floats_as_integers ┆ floats_with_decimal_as_integers │
│ ---                ┆ ---                ┆ ---                             │
│ f32                ┆ i32                ┆ i32                             │
╞════════════════════╪════════════════════╪═════════════════════════════════╡
│ 1.0                ┆ 4                  ┆ 4                               │
│ 2.0                ┆ 5                  ┆ 5                               │
│ 3.0                ┆ 6                  ┆ 6                               │
│ 4.0                ┆ 7                  ┆ 7                               │
│ 5.0                ┆ 8                  ┆ 8                               │
└────────────────────┴────────────────────┴─────────────────────────────────┘

Note that in the case of decimal values these are rounded downwards when casting to an integer.

Downcast

Reducing the memory footprint is also achievable by modifying the number of bits allocated to an element. As an illustration, the code below demonstrates how casting from Int64 to Int16 and from Float64 to Float32 can be used to lower memory usage.

cast

out = df.select(
    pl.col("integers").cast(pl.Int16).alias("integers_smallfootprint"),
    pl.col("floats").cast(pl.Float32).alias("floats_smallfootprint"),
)
print(out)

cast

let out = df
    .clone()
    .lazy()
    .select([
        col("integers")
            .cast(DataType::Int16)
            .alias("integers_smallfootprint"),
        col("floats")
            .cast(DataType::Float32)
            .alias("floats_smallfootprint"),
    ])
    .collect();
match out {
    Ok(out) => println!("{}", &out),
    Err(e) => println!("{:?}", e),
};

shape: (5, 2)
┌─────────────────────────┬───────────────────────┐
│ integers_smallfootprint ┆ floats_smallfootprint │
│ ---                     ┆ ---                   │
│ i16                     ┆ f32                   │
╞═════════════════════════╪═══════════════════════╡
│ 1                       ┆ 4.0                   │
│ 2                       ┆ 5.0                   │
│ 3                       ┆ 6.0                   │
│ 4                       ┆ 7.0                   │
│ 5                       ┆ 8.0                   │
└─────────────────────────┴───────────────────────┘

Overflow

When performing downcasting, it is crucial to ensure that the chosen number of bits (such as 64, 32, or 16) is sufficient to accommodate the largest and smallest numbers in the column. For example, using a 32-bit signed integer (Int32) allows handling integers within the range of -2147483648 to +2147483647, while using Int8 covers integers between -128 to 127. Attempting to cast to a DataType that is too small will result in a ComputeError thrown by Polars, as the operation is not supported.

cast

try:
    out = df.select(pl.col("big_integers").cast(pl.Int8))
    print(out)
except Exception as e:
    print(e)

cast

let out = df
    .clone()
    .lazy()
    .select([col("big_integers").strict_cast(DataType::Int8)])
    .collect();
match out {
    Ok(out) => println!("{}", &out),
    Err(e) => println!("{:?}", e),
};

conversion from `i64` to `i8` failed in column 'big_integers' for 3 out of 5 values: [10000002, 10000004, 10000005]

You can set the strict parameter to False, this converts values that are overflowing to null values.

cast

out = df.select(pl.col("big_integers").cast(pl.Int8, strict=False))
print(out)

cast

let out = df
    .clone()
    .lazy()
    .select([col("big_integers").cast(DataType::Int8)])
    .collect();
match out {
    Ok(out) => println!("{}", &out),
    Err(e) => println!("{:?}", e),
};

shape: (5, 1)
┌──────────────┐
│ big_integers │
│ ---          │
│ i8           │
╞══════════════╡
│ 1            │
│ null         │
│ 3            │
│ null         │
│ null         │
└──────────────┘

Strings

Strings can be casted to numerical data types and vice versa:

cast

df = pl.DataFrame(
    {
        "integers": [1, 2, 3, 4, 5],
        "float": [4.0, 5.03, 6.0, 7.0, 8.0],
        "floats_as_string": ["4.0", "5.0", "6.0", "7.0", "8.0"],
    }
)

out = df.select(
    pl.col("integers").cast(pl.String),
    pl.col("float").cast(pl.String),
    pl.col("floats_as_string").cast(pl.Float64),
)
print(out)

cast

let df = df! (
        "integers" => &[1, 2, 3, 4, 5],
        "float" => &[4.0, 5.03, 6.0, 7.0, 8.0],
        "floats_as_string" => &["4.0", "5.0", "6.0", "7.0", "8.0"],
)?;

let out = df
    .clone()
    .lazy()
    .select([
        col("integers").cast(DataType::String),
        col("float").cast(DataType::String),
        col("floats_as_string").cast(DataType::Float64),
    ])
    .collect()?;
println!("{}", &out);

shape: (5, 3)
┌──────────┬───────┬──────────────────┐
│ integers ┆ float ┆ floats_as_string │
│ ---      ┆ ---   ┆ ---              │
│ str      ┆ str   ┆ f64              │
╞══════════╪═══════╪══════════════════╡
│ 1        ┆ 4.0   ┆ 4.0              │
│ 2        ┆ 5.03  ┆ 5.0              │
│ 3        ┆ 6.0   ┆ 6.0              │
│ 4        ┆ 7.0   ┆ 7.0              │
│ 5        ┆ 8.0   ┆ 8.0              │
└──────────┴───────┴──────────────────┘

In case the column contains a non-numerical value, Polars will throw a ComputeError detailing the conversion error. Setting strict=False will convert the non float value to null.

cast

df = pl.DataFrame({"strings_not_float": ["4.0", "not_a_number", "6.0", "7.0", "8.0"]})
try:
    out = df.select(pl.col("strings_not_float").cast(pl.Float64))
    print(out)
except Exception as e:
    print(e)

cast

let df = df! ("strings_not_float"=> ["4.0", "not_a_number", "6.0", "7.0", "8.0"])?;

let out = df
    .clone()
    .lazy()
    .select([col("strings_not_float").cast(DataType::Float64)])
    .collect();
match out {
    Ok(out) => println!("{}", &out),
    Err(e) => println!("{:?}", e),
};

conversion from `str` to `f64` failed in column 'strings_not_float' for 1 out of 5 values: ["not_a_number"]

Booleans

Booleans can be expressed as either 1 (True) or 0 (False). It's possible to perform casting operations between a numerical DataType and a boolean, and vice versa. However, keep in mind that casting from a string (String) to a boolean is not permitted.

cast

df = pl.DataFrame(
    {
        "integers": [-1, 0, 2, 3, 4],
        "floats": [0.0, 1.0, 2.0, 3.0, 4.0],
        "bools": [True, False, True, False, True],
    }
)

out = df.select(pl.col("integers").cast(pl.Boolean), pl.col("floats").cast(pl.Boolean))
print(out)

cast

let df = df! (
        "integers"=> &[-1, 0, 2, 3, 4],
        "floats"=> &[0.0, 1.0, 2.0, 3.0, 4.0],
        "bools"=> &[true, false, true, false, true],
)?;

let out = df
    .clone()
    .lazy()
    .select([
        col("integers").cast(DataType::Boolean),
        col("floats").cast(DataType::Boolean),
    ])
    .collect()?;
println!("{}", &out);

shape: (5, 2)
┌──────────┬────────┐
│ integers ┆ floats │
│ ---      ┆ ---    │
│ bool     ┆ bool   │
╞══════════╪════════╡
│ true     ┆ false  │
│ false    ┆ true   │
│ true     ┆ true   │
│ true     ┆ true   │
│ true     ┆ true   │
└──────────┴────────┘

Dates

Temporal data types such as Date or Datetime are represented as the number of days (Date) and microseconds (Datetime) since epoch. Therefore, casting between the numerical types and the temporal data types is allowed.

cast

from datetime import date, datetime

df = pl.DataFrame(
    {
        "date": pl.date_range(date(2022, 1, 1), date(2022, 1, 5), eager=True),
        "datetime": pl.datetime_range(
            datetime(2022, 1, 1), datetime(2022, 1, 5), eager=True
        ),
    }
)

out = df.select(pl.col("date").cast(pl.Int64), pl.col("datetime").cast(pl.Int64))
print(out)

cast

use chrono::prelude::*;

let date = polars::time::date_range(
    "date",
    NaiveDate::from_ymd_opt(2022, 1, 1)
        .unwrap()
        .and_hms_opt(0, 0, 0)
        .unwrap(),
    NaiveDate::from_ymd_opt(2022, 1, 5)
        .unwrap()
        .and_hms_opt(0, 0, 0)
        .unwrap(),
    Duration::parse("1d"),
    ClosedWindow::Both,
    TimeUnit::Milliseconds,
    None,
)?
.cast(&DataType::Date)?;

let datetime = polars::time::date_range(
    "datetime",
    NaiveDate::from_ymd_opt(2022, 1, 1)
        .unwrap()
        .and_hms_opt(0, 0, 0)
        .unwrap(),
    NaiveDate::from_ymd_opt(2022, 1, 5)
        .unwrap()
        .and_hms_opt(0, 0, 0)
        .unwrap(),
    Duration::parse("1d"),
    ClosedWindow::Both,
    TimeUnit::Milliseconds,
    None,
)?;

let df = df! (
    "date" => date,
    "datetime" => datetime,
)?;

let out = df
    .clone()
    .lazy()
    .select([
        col("date").cast(DataType::Int64),
        col("datetime").cast(DataType::Int64),
    ])
    .collect()?;
println!("{}", &out);

shape: (5, 2)
┌───────┬──────────────────┐
│ date  ┆ datetime         │
│ ---   ┆ ---              │
│ i64   ┆ i64              │
╞═══════╪══════════════════╡
│ 18993 ┆ 1640995200000000 │
│ 18994 ┆ 1641081600000000 │
│ 18995 ┆ 1641168000000000 │
│ 18996 ┆ 1641254400000000 │
│ 18997 ┆ 1641340800000000 │
└───────┴──────────────────┘

To convert between strings and Dates/Datetimes, dt.to_string and str.to_datetime are utilized. Polars adopts the chrono format syntax for formatting. It's worth noting that str.to_datetime features additional options that support timezone functionality. Refer to the API documentation for further information.

dt.to_string · str.to_date

df = pl.DataFrame(
    {
        "date": pl.date_range(date(2022, 1, 1), date(2022, 1, 5), eager=True),
        "string": [
            "2022-01-01",
            "2022-01-02",
            "2022-01-03",
            "2022-01-04",
            "2022-01-05",
        ],
    }
)

out = df.select(
    pl.col("date").dt.to_string("%Y-%m-%d"),
    pl.col("string").str.to_datetime("%Y-%m-%d"),
)
print(out)

dt.to_string · str.replace_all · Available on feature temporal · Available on feature dtype-date

let date = polars::time::date_range(
    "date",
    NaiveDate::from_ymd_opt(2022, 1, 1)
        .unwrap()
        .and_hms_opt(0, 0, 0)
        .unwrap(),
    NaiveDate::from_ymd_opt(2022, 1, 5)
        .unwrap()
        .and_hms_opt(0, 0, 0)
        .unwrap(),
    Duration::parse("1d"),
    ClosedWindow::Both,
    TimeUnit::Milliseconds,
    None,
)?;

let df = df! (
        "date" => date,
        "string" => &[
            "2022-01-01",
            "2022-01-02",
            "2022-01-03",
            "2022-01-04",
            "2022-01-05",
        ],
)?;

let out = df
    .clone()
    .lazy()
    .select([
        col("date").dt().to_string("%Y-%m-%d"),
        col("string").str().to_datetime(
            Some(TimeUnit::Microseconds),
            None,
            StrptimeOptions::default(),
            lit("raise"),
        ),
    ])
    .collect()?;
println!("{}", &out);

shape: (5, 2)
┌────────────┬─────────────────────┐
│ date       ┆ string              │
│ ---        ┆ ---                 │
│ str        ┆ datetime[μs]        │
╞════════════╪═════════════════════╡
│ 2022-01-01 ┆ 2022-01-01 00:00:00 │
│ 2022-01-02 ┆ 2022-01-02 00:00:00 │
│ 2022-01-03 ┆ 2022-01-03 00:00:00 │
│ 2022-01-04 ┆ 2022-01-04 00:00:00 │
│ 2022-01-05 ┆ 2022-01-05 00:00:00 │
└────────────┴─────────────────────┘