Interface StringNamespace

namespace containing expr string functions

interface StringNamespace {
    concat(delimiter, ignoreNulls?): pl.Expr;
    contains(pat): pl.Expr;
    decode(encoding, strict?): pl.Expr;
    decode(options): pl.Expr;
    encode(encoding): pl.Expr;
    extract(pat, groupIndex): pl.Expr;
    jsonDecode(dtype?, inferSchemaLength?): pl.Expr;
    jsonExtract(dtype?, inferSchemaLength?): pl.Expr;
    jsonPathMatch(pat): pl.Expr;
    lengths(): pl.Expr;
    lstrip(): pl.Expr;
    padEnd(length, fillChar): pl.Expr;
    padStart(length, fillChar): pl.Expr;
    replace(pat, val): pl.Expr;
    replaceAll(pat, val): pl.Expr;
    rstrip(): pl.Expr;
    slice(start, length?): pl.Expr;
    split(by, options?): pl.Expr;
    strip(): pl.Expr;
    strptime(datatype, fmt?): pl.Expr;
    strptime(datatype, fmt?): pl.Expr;
    toLowerCase(): pl.Expr;
    toUpperCase(): pl.Expr;
    zFill(length): pl.Expr;
}

Hierarchy

  • StringFunctions<pl.Expr>
    • StringNamespace

Methods

  • Vertically concat the values in the Series to a single string value.

    Parameters

    • delimiter: string
    • Optional ignoreNulls: boolean

    Returns pl.Expr

    Example

    >>> df = pl.DataFrame({"foo": [1, null, 2]})
    >>> df = df.select(pl.col("foo").str.concat("-"))
    >>> df
    shape: (1, 1)
    ┌──────────┐
    foo
    │ --- │
    str
    ╞══════════╡
    1-null-2
    └──────────┘
  • Check if strings in Series contain regex pattern.

    Parameters

    • pat: string | RegExp

    Returns pl.Expr

  • Decodes a value using the provided encoding

    Parameters

    • encoding: "base64" | "hex"

      hex | base64

    • Optional strict: boolean

      how to handle invalid inputs

      - true: method will throw error if unable to decode a value
      - false: unhandled values will be replaced with `null`

    Returns pl.Expr

    Example

    >>> df = pl.DataFrame({"strings": ["666f6f", "626172", null]})
    >>> df.select(col("strings").str.decode("hex"))
    shape: (3, 1)
    ┌─────────┐
    strings
    │ --- │
    str
    ╞═════════╡
    foo
    ├╌╌╌╌╌╌╌╌╌┤
    bar
    ├╌╌╌╌╌╌╌╌╌┤
    null
    └─────────┘
  • Parameters

    • options: {
          encoding: "base64" | "hex";
          strict?: boolean;
      }
      • encoding: "base64" | "hex"
      • Optional strict?: boolean

    Returns pl.Expr

  • Encodes a value using the provided encoding

    Parameters

    • encoding: "base64" | "hex"

      hex | base64

    Returns pl.Expr

    Example

    >>> df = pl.DataFrame({"strings", ["foo", "bar", null]})
    >>> df.select(col("strings").str.encode("hex"))
    shape: (3, 1)
    ┌─────────┐
    strings
    │ --- │
    str
    ╞═════════╡
    │ 666f6f
    ├╌╌╌╌╌╌╌╌╌┤
    626172
    ├╌╌╌╌╌╌╌╌╌┤
    null
    └─────────┘
  • Extract the target capture group from provided patterns.

    Parameters

    • pat: any
    • groupIndex: number

      Index of the targeted capture group. Group 0 mean the whole pattern, first group begin at index 1 Default to the first capture group

    Returns pl.Expr

    Utf8 array. Contain null if original value is null or regex capture nothing.

    Example

    > df = pl.DataFrame({
    ... 'a': [
    ... 'http://vote.com/ballon_dor?candidate=messi&ref=polars',
    ... 'http://vote.com/ballon_dor?candidat=jorginho&ref=polars',
    ... 'http://vote.com/ballon_dor?candidate=ronaldo&ref=polars'
    ... ]})
    > df.select(pl.col('a').str.extract(/candidate=(\w+)/, 1))
    shape: (3, 1)
    ┌─────────┐
    a
    │ --- │
    str
    ╞═════════╡
    messi
    ├╌╌╌╌╌╌╌╌╌┤
    null
    ├╌╌╌╌╌╌╌╌╌┤
    ronaldo
    └─────────┘
  • Parse string values as JSON. Throw errors if encounter invalid JSON strings.

    Parameters

    • Optional dtype: DataType
    • Optional inferSchemaLength: number

    Returns pl.Expr

    DF with struct

    Params

    Not implemented ATM

    Example

    >>> df = pl.DataFrame( {json: ['{"a":1, "b": true}', null, '{"a":2, "b": false}']} )
    >>> df.select(pl.col("json").str.jsonDecode())
    shape: (3, 1)
    ┌─────────────┐
    json
    │ --- │
    struct[2] │
    ╞═════════════╡
    │ {1,true} │
    │ {null,null} │
    │ {2,false} │
    └─────────────┘
    See Also
    ----------
    jsonPathMatch : Extract the first match of json string with provided JSONPath expression.
  • Parse string values as JSON. Throw errors if encounter invalid JSON strings.

    Parameters

    • Optional dtype: DataType
    • Optional inferSchemaLength: number

    Returns pl.Expr

    DF with struct

    Params

    Not implemented ATM

    Deprecated

    Since

    0.8.4

    Use

    jsonDecode

    Example

    >>> df = pl.DataFrame( {json: ['{"a":1, "b": true}', null, '{"a":2, "b": false}']} )
    >>> df.select(pl.col("json").str.jsonExtract())
    shape: (3, 1)
    ┌─────────────┐
    json
    │ --- │
    struct[2] │
    ╞═════════════╡
    │ {1,true} │
    │ {null,null} │
    │ {2,false} │
    └─────────────┘
    See Also
    ----------
    jsonPathMatch : Extract the first match of json string with provided JSONPath expression.
  • Extract the first match of json string with provided JSONPath expression. Throw errors if encounter invalid json strings. All return value will be casted to Utf8 regardless of the original value.

    Parameters

    • pat: string

    Returns pl.Expr

    Utf8 array. Contain null if original value is null or the jsonPath return nothing.

    See

    https://goessner.net/articles/JsonPath/

    Example

    >>> df = pl.DataFrame({
    ... 'json_val': [
    ... '{"a":"1"}',
    ... null,
    ... '{"a":2}',
    ... '{"a":2.1}',
    ... '{"a":true}'
    ... ]
    ... })
    >>> df.select(pl.col('json_val').str.jsonPathMatch('$.a')
    shape: (5,)
    Series: 'json_val' [str]
    [
    "1"
    null
    "2"
    "2.1"
    "true"
    ]
  • Add a trailing fillChar to a string until string length is reached. If string is longer or equal to given length no modifications will be done

    Parameters

    • length: number

      of the final string

    • fillChar: string

      that will fill the string.

    Returns pl.Expr

    Note

    If a string longer than 1 character is provided only the first character will be used *

    Example

    > df = pl.DataFrame({
    ... 'foo': [
    ... "a",
    ... "b",
    ... "LONG_WORD",
    ... "cow"
    ... ]})
    > df.select(pl.col('foo').str.padEnd("_", 3)
    shape: (4, 1)
    ┌──────────┐
    a
    │ -------- │
    str
    ╞══════════╡
    a__
    ├╌╌╌╌╌╌╌╌╌╌┤
    b__
    ├╌╌╌╌╌╌╌╌╌╌┤
    LONG_WORD
    ├╌╌╌╌╌╌╌╌╌╌┤
    cow
    └──────────┘
  • Add a leading fillChar to a string until string length is reached. If string is longer or equal to given length no modifications will be done

    Parameters

    • length: number

      of the final string

    • fillChar: string

      that will fill the string.

    Returns pl.Expr

    Note

    If a string longer than 1 character is provided only the first character will be used

    Example

    > df = pl.DataFrame({
    ... 'foo': [
    ... "a",
    ... "b",
    ... "LONG_WORD",
    ... "cow"
    ... ]})
    > df.select(pl.col('foo').str.padStart("_", 3)
    shape: (4, 1)
    ┌──────────┐
    a
    │ -------- │
    str
    ╞══════════╡
    __a
    ├╌╌╌╌╌╌╌╌╌╌┤
    __b
    ├╌╌╌╌╌╌╌╌╌╌┤
    LONG_WORD
    ├╌╌╌╌╌╌╌╌╌╌┤
    cow
    └──────────┘
  • Replace first regex match with a string value.

    Parameters

    • pat: string | RegExp
    • val: string

    Returns pl.Expr

  • Replace all regex matches with a string value.

    Parameters

    • pat: string | RegExp
    • val: string

    Returns pl.Expr

  • Create subslices of the string values of a Utf8 Series.

    Parameters

    • start: number | pl.Expr

      Start of the slice (negative indexing may be used).

    • Optional length: number | pl.Expr

      Optional length of the slice.

    Returns pl.Expr

  • Split a string into substrings using the specified separator and return them as a Series.

    Parameters

    • by: string
    • Optional options: boolean | {
          inclusive?: boolean;
      }

    Returns pl.Expr

  • Parse a Series of dtype Utf8 to a Date/Datetime Series.

    Parameters

    • datatype: _Date

      Date or Datetime.

    • Optional fmt: string

      formatting syntax. Read more

    Returns pl.Expr

  • Parameters

    • datatype: _Datetime
    • Optional fmt: string

    Returns pl.Expr

  • Add leading "0" to a string until string length is reached. If string is longer or equal to given length no modifications will be done

    Parameters

    • length: number | pl.Expr

      of the final string

    Returns pl.Expr

    See

    padStart *

    Example

    > df = pl.DataFrame({
    ... 'foo': [
    ... "a",
    ... "b",
    ... "LONG_WORD",
    ... "cow"
    ... ]})
    > df.select(pl.col('foo').str.justify(3)
    shape: (4, 1)
    ┌──────────┐
    a
    │ -------- │
    str
    ╞══════════╡
    │ 00a
    ├╌╌╌╌╌╌╌╌╌╌┤
    │ 00b
    ├╌╌╌╌╌╌╌╌╌╌┤
    LONG_WORD
    ├╌╌╌╌╌╌╌╌╌╌┤
    cow
    └──────────┘