Extract the target capture group from provided patterns
Description
Extract the target capture group from provided patterns
Usage
<Expr>$str$extract(pattern, group_index = 1L)
Arguments
pattern
|
A valid regular expression pattern containing at least one capture group, compatible with the regex crate. |
group_index
|
Index of the targeted capture group. Group 0 means the whole pattern, the first group begins at index 1. Defaults to the first capture group. |
Details
To modify regular expression behaviour (such as multi-line matching)
with flags, use the inline (?iLmsuxU)
syntax. See the
example.
See the regex crate’s section on grouping and flags for additional information about the use of inline expression modifiers.
Value
A polars expression
Examples
library("polars")
df <- pl$DataFrame(
url = c(
"http://vote.com/ballon_dor?error=404&ref=unknown",
"http://vote.com/ballon_dor?ref=polars&candidate=messi",
"http://vote.com/ballon_dor?candidate=ronaldo&ref=polars"
)
)
df$select(
extracted = pl$col("url")$str$extract(r"(candidate=(\w+))", 1),
referer = pl$col("url")$str$extract(r"(ref=(\w+))", 1),
error = pl$col("url")$str$extract(r"(error=(\w+))", 1)
)
#> shape: (3, 3)
#> ┌───────────┬─────────┬───────┐
#> │ extracted ┆ referer ┆ error │
#> │ --- ┆ --- ┆ --- │
#> │ str ┆ str ┆ str │
#> ╞═══════════╪═════════╪═══════╡
#> │ null ┆ unknown ┆ 404 │
#> │ messi ┆ polars ┆ null │
#> │ ronaldo ┆ polars ┆ null │
#> └───────────┴─────────┴───────┘
# Using the multi-line mode flag `(?m)`
df <- pl$DataFrame(
lines = c("I Like\nThose\nOdds", "This is\nThe Way")
)
df$with_columns(
with_m_flag = pl$col("lines")$str$extract(r"((?m)^(T\w+))", 1),
without_flag = pl$col("lines")$str$extract(r"(^(T\w+))", 1),
)
#> shape: (2, 3)
#> ┌─────────┬─────────────┬──────────────┐
#> │ lines ┆ with_m_flag ┆ without_flag │
#> │ --- ┆ --- ┆ --- │
#> │ str ┆ str ┆ str │
#> ╞═════════╪═════════════╪══════════════╡
#> │ I Like ┆ Those ┆ null │
#> │ Those ┆ ┆ │
#> │ Odds ┆ ┆ │
#> │ This is ┆ This ┆ This │
#> │ The Way ┆ ┆ │
#> └─────────┴─────────────┴──────────────┘