polars.Expr.str.len_bytes#

Expr.str.len_bytes() Expr[source]#

Return the length of each string as the number of bytes.

Returns:
Expr

Expression of data type UInt32.

See also

len_chars

Notes

When working with non-ASCII text, the length in bytes is not the same as the length in characters. You may want to use len_chars() instead. Note that len_bytes() is much more performant (_O(1)_) than len_chars() (_O(n)_).

Examples

>>> df = pl.DataFrame({"a": ["Café", "345", "東京", None]})
>>> df.with_columns(
...     pl.col("a").str.len_bytes().alias("n_bytes"),
...     pl.col("a").str.len_chars().alias("n_chars"),
... )
shape: (4, 3)
┌──────┬─────────┬─────────┐
│ a    ┆ n_bytes ┆ n_chars │
│ ---  ┆ ---     ┆ ---     │
│ str  ┆ u32     ┆ u32     │
╞══════╪═════════╪═════════╡
│ Café ┆ 5       ┆ 4       │
│ 345  ┆ 3       ┆ 3       │
│ 東京 ┆ 6       ┆ 2       │
│ null ┆ null    ┆ null    │
└──────┴─────────┴─────────┘