字符串

由于 Arrow 后端, Polars字符串操作比使用NumPy或Pandas执行的相同操作快得多。在后者中，字符串存储为Python对象。在遍历np.array or the pd.Series时，CPU需要跟踪所有字符串指针，并跳转到许多随机内存位置——这是非常低效的缓存。在Polars（通过Arrow数据结构）中，字符串在内存中是连续的。因此，对于CPU来说，遍历缓存是最优的，也是可预测的。

Polars中可用的字符串处理函数可以在 ``str` namespace 中找到。

下面是几个例子。要计算字符串长度，请执行以下操作：

import polars as pl

df = pl.DataFrame({"shakespeare": "All that glitters is not gold".split(" ")})

df = df.with_columns(pl.col("shakespeare").str.lengths().alias("letter_count"))

shape: (6, 2)
┌─────────────┬──────────────┐
│ shakespeare ┆ letter_count │
│ ---         ┆ ---          │
│ str         ┆ u32          │
╞═════════════╪══════════════╡
│ All         ┆ 3            │
│ that        ┆ 4            │
│ glitters    ┆ 8            │
│ is          ┆ 2            │
│ not         ┆ 3            │
│ gold        ┆ 4            │
└─────────────┴──────────────┘

下面是从句子中过滤出冠词（the、a、and、etc.）的正则表达式模式：

import polars as pl

df = pl.DataFrame({"a": "The man that ate a whole cake".split(" ")})

df = df.filter(pl.col("a").str.contains(r"(?i)^the$|^a$").is_not())

输出：

shape: (5, 1)
┌───────┐
│ a     │
│ ---   │
│ str   │
╞═══════╡
│ man   │
│ that  │
│ ate   │
│ whole │
│ cake  │
└───────┘

Polars - 用户指南

字符串