polars.Expr.str.len_chars#

Expr.str.len_chars() Expr[source]#

Return the length of each string as the number of characters.

Returns:
Expr

Expression of data type UInt32.

See also

len_bytes

Notes

When working with ASCII text, use len_bytes() instead to achieve equivalent output with much better performance: len_bytes() runs in _O(1)_, while len_chars() runs in (_O(n)_).

A character is defined as a Unicode scalar value. A single character is represented by a single byte when working with ASCII text, and a maximum of 4 bytes otherwise.

Examples

>>> df = pl.DataFrame({"a": ["Café", "345", "東京", None]})
>>> df.with_columns(
...     pl.col("a").str.len_chars().alias("n_chars"),
...     pl.col("a").str.len_bytes().alias("n_bytes"),
... )
shape: (4, 3)
┌──────┬─────────┬─────────┐
│ a    ┆ n_chars ┆ n_bytes │
│ ---  ┆ ---     ┆ ---     │
│ str  ┆ u32     ┆ u32     │
╞══════╪═════════╪═════════╡
│ Café ┆ 4       ┆ 5       │
│ 345  ┆ 3       ┆ 3       │
│ 東京 ┆ 2       ┆ 6       │
│ null ┆ null    ┆ null    │
└──────┴─────────┴─────────┘