Returns the Unicode normal form of the string values
Description
This uses the forms described in Unicode Standard Annex 15: https://www.unicode.org/reports/tr15/.
Usage
<Expr>$str$normalize(form = c("NFC", "NFKC", "NFD", "NFKD"))
Arguments
form
|
Unicode form to use. Must be one of: “NFC” ,
“NFKC” , “NFD” , “NFKD” .
|
Value
A polars expression
Examples
library("polars")
df <- pl$DataFrame(text = c("01²", "KADOKAWA"))
new <- df$with_columns(
nfc = pl$col("text")$str$normalize("NFC"),
nfkc = pl$col("text")$str$normalize("NFKC"),
)
new
#> shape: (2, 3)
#> ┌──────────────────┬──────────────────┬──────────┐
#> │ text ┆ nfc ┆ nfkc │
#> │ --- ┆ --- ┆ --- │
#> │ str ┆ str ┆ str │
#> ╞══════════════════╪══════════════════╪══════════╡
#> │ 01² ┆ 01² ┆ 012 │
#> │ KADOKAWA ┆ KADOKAWA ┆ KADOKAWA │
#> └──────────────────┴──────────────────┴──────────┘
#> shape: (2, 3)
#> ┌──────┬─────┬──────┐
#> │ text ┆ nfc ┆ nfkc │
#> │ --- ┆ --- ┆ --- │
#> │ u32 ┆ u32 ┆ u32 │
#> ╞══════╪═════╪══════╡
#> │ 4 ┆ 4 ┆ 3 │
#> │ 24 ┆ 24 ┆ 8 │
#> └──────┴─────┴──────┘