nodejs-polars
    Preparing search index...

    Interface StringNamespace

    String functions for Lazy dataframes

    interface StringNamespace {
        concat(delimiter: string, ignoreNulls?: boolean): pl.Expr;
        contains(
            pat: string | RegExp | pl.Expr,
            literal?: boolean,
            strict?: boolean,
        ): pl.Expr;
        decode(encoding: "base64" | "hex", strict?: boolean): pl.Expr;
        decode(options: { encoding: "base64" | "hex"; strict?: boolean }): pl.Expr;
        encode(encoding: "base64" | "hex"): pl.Expr;
        endsWith(suffix: string | pl.Expr): pl.Expr;
        extract(pattern: string | RegExp | pl.Expr, groupIndex: number): pl.Expr;
        jsonDecode(dtype?: DataType, inferSchemaLength?: number): pl.Expr;
        jsonPathMatch(pat: string): pl.Expr;
        lengths(): pl.Expr;
        lstrip(): pl.Expr;
        padEnd(length: number, fillChar: string): pl.Expr;
        padStart(length: number, fillChar: string): pl.Expr;
        replace(
            pattern: string | RegExp | pl.Expr,
            value: string | pl.Expr,
            literal?: boolean,
            n?: number,
        ): pl.Expr;
        replaceAll(
            pattern: string | RegExp | pl.Expr,
            value: string | pl.Expr,
            literal?: boolean,
        ): pl.Expr;
        rstrip(): pl.Expr;
        slice(start: number | pl.Expr, length?: number | pl.Expr): pl.Expr;
        split(by: string, options?: boolean | { inclusive?: boolean }): pl.Expr;
        startsWith(prefix: string | pl.Expr): pl.Expr;
        strip(): pl.Expr;
        stripChars(prefix: string | pl.Expr): pl.Expr;
        stripCharsEnd(prefix: string | pl.Expr): pl.Expr;
        stripCharsStart(prefix: string | pl.Expr): pl.Expr;
        strptime(datatype: Date, fmt?: string): pl.Expr;
        strptime(datatype: Datetime, fmt?: string): pl.Expr;
        strptime(
            datatype: (
                timeUnit?: TimeUnit | "ms" | "ns" | "us",
                timeZone?: undefined | null | string,
            ) => Datetime,
            fmt?: string,
        ): pl.Expr;
        toLowerCase(): pl.Expr;
        toUpperCase(): pl.Expr;
        zFill(length: number | pl.Expr): pl.Expr;
    }

    Hierarchy

    • StringFunctions<pl.Expr>
      • StringNamespace
    Index

    Methods

    • Vertically concat the values in the Expression to a single string value.

      Parameters

      • delimiter: string
      • OptionalignoreNulls: boolean

      Returns pl.Expr

      >>> df = pl.DataFrame({"foo": [1, null, 2]})
      >>> df = df.select(pl.col("foo").str.concat("-"))
      >>> df
      shape: (1, 1)
      ┌──────────┐
      foo
      │ --- │
      str
      ╞══════════╡
      1-null-2
      └──────────┘
    • Check if strings in Expression contain a substring that matches a pattern.

      Parameters

      • pat: string | RegExp | pl.Expr

        A valid regular expression pattern, compatible with the regex crate @param literal Treat pattern` as a literal string, not as a regular expression.

      • Optionalliteral: boolean
      • Optionalstrict: boolean

        Raise an error if the underlying pattern is not a valid regex, otherwise mask out with a null value.

      Returns pl.Expr

      Boolean mask

      const df = pl.DataFrame({"txt": ["Crab", "cat and dog", "rab$bit", null]})
      df.select(
      ... pl.col("txt"),
      ... pl.col("txt").str.contains("cat|bit").alias("regex"),
      ... pl.col("txt").str.contains("rab$", true).alias("literal"),
      ... )
      shape: (4, 3)
      ┌─────────────┬───────┬─────────┐
      txtregexliteral
      │ --- ┆ --- ┆ --- │
      strboolbool
      ╞═════════════╪═══════╪═════════╡
      Crabfalsefalse
      cat and dogtruefalse
      rab$bittruetrue
      nullnullnull
      └─────────────┴───────┴─────────┘
    • Decodes a value in Expression using the provided encoding

      Parameters

      • encoding: "base64" | "hex"

        hex | base64

      • Optionalstrict: boolean

        how to handle invalid inputs

        - true: method will throw error if unable to decode a value
        - false: unhandled values will be replaced with `null`
        

      Returns pl.Expr

      >>> df = pl.DataFrame({"strings": ["666f6f", "626172", null]})
      >>> df.select(col("strings").str.decode("hex"))
      shape: (3, 1)
      ┌─────────┐
      strings
      │ --- │
      str
      ╞═════════╡
      foo
      ├╌╌╌╌╌╌╌╌╌┤
      bar
      ├╌╌╌╌╌╌╌╌╌┤
      null
      └─────────┘
    • Decodes a value using the provided encoding

      Parameters

      • options: { encoding: "base64" | "hex"; strict?: boolean }

      Returns pl.Expr

      > df = pl.DataFrame({"strings": ["666f6f", "626172", null]})
      > df.select(col("strings").str.decode("hex"))
      shape: (3, 1)
      ┌─────────┐
      strings
      │ --- │
      str
      ╞═════════╡
      foo
      ├╌╌╌╌╌╌╌╌╌┤
      bar
      ├╌╌╌╌╌╌╌╌╌┤
      null
      └─────────┘
    • Encodes a value in Expression using the provided encoding

      Parameters

      • encoding: "base64" | "hex"

        hex | base64

      Returns pl.Expr

      >>> df = pl.DataFrame({"strings", ["foo", "bar", null]})
      >>> df.select(col("strings").str.encode("hex"))
      shape: (3, 1)
      ┌─────────┐
      strings
      │ --- │
      str
      ╞═════════╡
      │ 666f6f
      ├╌╌╌╌╌╌╌╌╌┤
      626172
      ├╌╌╌╌╌╌╌╌╌┤
      null
      └─────────┘
    • Check if string values in Expression ends with a substring.

      Parameters

      • suffix: string | pl.Expr

        Suffix substring or expression

      Returns pl.Expr

      >>> df = pl.DataFrame({"fruits": ["apple", "mango", None]})
      >>> df.withColumns(
      ... pl.col("fruits").str.endsWith("go").alias("has_suffix"),
      ... )
      shape: (3, 2)
      ┌────────┬────────────┐
      fruitshas_suffix
      │ --- ┆ --- │
      strbool
      ╞════════╪════════════╡
      applefalse
      mangotrue
      nullnull
      └────────┴────────────┘

      >>> df = pl.DataFrame(
      ... {"fruits": ["apple", "mango", "banana"], "suffix": ["le", "go", "nu"]}
      ... )
      >>> df.withColumns(
      ... pl.col("fruits").str.endsWith(pl.col("suffix")).alias("has_suffix"),
      ... )
      shape: (3, 3)
      ┌────────┬────────┬────────────┐
      fruitssuffixhas_suffix
      │ --- ┆ --- ┆ --- │
      strstrbool
      ╞════════╪════════╪════════════╡
      appleletrue
      mangogotrue
      banananufalse
      └────────┴────────┴────────────┘

      Using `ends_with` as a filter condition:

      >>> df.filter(pl.col("fruits").str.endsWith("go"))
      shape: (1, 2)
      ┌────────┬────────┐
      fruitssuffix
      │ --- ┆ --- │
      strstr
      ╞════════╪════════╡
      mangogo
      └────────┴────────┘
    • Extract the target capture group from provided patterns.

      Parameters

      • pattern: string | RegExp | pl.Expr

        A valid regex pattern

      • groupIndex: number

        Index of the targeted capture group. Group 0 mean the whole pattern, first group begin at index 1 Default to the first capture group

      Returns pl.Expr

      Utf8 array. Contain null if original value is null or regex capture nothing.

      > df = pl.DataFrame({
      ... 'a': [
      ... 'http://vote.com/ballon_dor?candidate=messi&ref=polars',
      ... 'http://vote.com/ballon_dor?candidat=jorginho&ref=polars',
      ... 'http://vote.com/ballon_dor?candidate=ronaldo&ref=polars'
      ... ]})
      > df.select(pl.col('a').str.extract(/candidate=(\w+)/, 1))
      shape: (3, 1)
      ┌─────────┐
      a
      │ --- │
      str
      ╞═════════╡
      messi
      ├╌╌╌╌╌╌╌╌╌┤
      null
      ├╌╌╌╌╌╌╌╌╌┤
      ronaldo
      └─────────┘
    • Parse string values in Expression as JSON. Throw errors if encounter invalid JSON strings.

      Parameters

      • Optionaldtype: DataType
      • OptionalinferSchemaLength: number

      Returns pl.Expr

      DF with struct

      Not implemented ATM

      >>> df = pl.DataFrame( {json: ['{"a":1, "b": true}', null, '{"a":2, "b": false}']} )
      >>> df.select(pl.col("json").str.jsonDecode())
      shape: (3, 1)
      ┌─────────────┐
      json
      │ --- │
      struct[2] │
      ╞═════════════╡
      │ {1,true} │
      │ {null,null} │
      │ {2,false} │
      └─────────────┘
      See Also
      ----------
      jsonPathMatch : Extract the first match of json string with provided JSONPath expression
    • Extract the first match of json string in Expression with provided JSONPath expression. Throw errors if encounter invalid json strings. All return value will be casted to Utf8 regardless of the original value.

      Parameters

      • pat: string

      Returns pl.Expr

      Utf8 array. Contain null if original value is null or the jsonPath return nothing.

      >>> df = pl.DataFrame({
      ... 'json_val': [
      ... '{"a":"1"}',
      ... null,
      ... '{"a":2}',
      ... '{"a":2.1}',
      ... '{"a":true}'
      ... ]
      ... })
      >>> df.select(pl.col('json_val').str.jsonPathMatch('$.a')
      shape: (5, 1)
      ┌──────────┐
      json_val
      │ --- │
      str
      ╞══════════╡
      1
      null
      2
      2.1
      true
      └──────────┘
    • Get number of chars of the string values in Expression.

      df = pl.DataFrame({"a": ["Café", "345", "東京", null]})
      df.withColumns(
      pl.col("a").str.lengths().alias("n_chars"),
      )
      shape: (4, 3)
      ┌──────┬─────────┬─────────┐
      an_charsn_bytes
      │ --- ┆ --- ┆ --- │
      stru32u32
      ╞══════╪═════════╪═════════╡
      Café45
      34533
      東京26
      nullnullnull
      └──────┴─────────┴─────────┘

      Returns pl.Expr

    • Add a trailing fillChar to a string until string length is reached. If string is longer or equal to given length no modifications will be done

      Parameters

      • length: number

        of the final string

      • fillChar: string

        that will fill the string.

      Returns pl.Expr

      If a string longer than 1 character is provided only the first character will be used

      > df = pl.DataFrame({
      ... 'foo': [
      ... "a",
      ... "b",
      ... "LONG_WORD",
      ... "cow"
      ... ]})
      > df.select(pl.col('foo').str.padEnd("_", 3)
      shape: (4, 1)
      ┌──────────┐
      a
      │ -------- │
      str
      ╞══════════╡
      a__
      ├╌╌╌╌╌╌╌╌╌╌┤
      b__
      ├╌╌╌╌╌╌╌╌╌╌┤
      LONG_WORD
      ├╌╌╌╌╌╌╌╌╌╌┤
      cow
      └──────────┘
    • Add a leading fillChar to a string in Expression until string length is reached. If string is longer or equal to given length no modifications will be done

      Parameters

      • length: number

        of the final string

      • fillChar: string

        that will fill the string. If a string longer than 1 character is provided only the first character will be used

      Returns pl.Expr

      > df = pl.DataFrame({
      ... 'foo': [
      ... "a",
      ... "b",
      ... "LONG_WORD",
      ... "cow"
      ... ]})
      > df.select(pl.col('foo').str.padStart("_", 3)
      shape: (4, 1)
      ┌──────────┐
      a
      │ -------- │
      str
      ╞══════════╡
      __a
      ├╌╌╌╌╌╌╌╌╌╌┤
      __b
      ├╌╌╌╌╌╌╌╌╌╌┤
      LONG_WORD
      ├╌╌╌╌╌╌╌╌╌╌┤
      cow
      └──────────┘
    • Replace first match with a string value in Expression.

      Parameters

      • pattern: string | RegExp | pl.Expr

        A valid regex pattern, string or expression

      • value: string | pl.Expr

        Substring or expression to replace.

      • Optionalliteral: boolean

        Treat pattern as a literal string.

      • Optionaln: number

      Returns pl.Expr

      pattern as expression is not yet supported by polars

      df = pl.DataFrame({"cost": ["#12.34", "#56.78"], "text": ["123abc", "abc456"]})
      df = df.withColumns(
      pl.col("cost").str.replace(/#(\d+)/, "$$$1"),
      pl.col("text").str.replace("ab", "-")
      pl.col("text").str.replace("abc", pl.col("cost")).alias("expr")
      );
      shape: (2, 2)
      ┌────────┬───────┬───────────┐
      costtextexpr
      │ --- ┆ --- │ --- │
      strstrstr
      ╞════════╪═══════╪═══════════╡
      $12.34123-c123#12.34
      $56.78 ┆ -c456 │ #56.78456
      └────────┴───────┴───────────┘
    • Replace all regex matches with a string value in Expression.

      Parameters

      • pattern: string | RegExp | pl.Expr

        A valid regex pattern, string or expression

      • value: string | pl.Expr

        Substring or expression to replace.

      • Optionalliteral: boolean

        Treat pattern as a literal string.

      Returns pl.Expr

      pattern as expression is not yet supported by polars

      df = df = pl.DataFrame({"weather": ["Rainy", "Sunny", "Cloudy", "Snowy"], "text": ["abcabc", "123a123", null, null]})
      df = df.withColumns(
      pl.col("weather").str.replaceAll(/foggy|rainy/i, "Sunny"),
      pl.col("text").str.replaceAll("a", "-")
      )
      shape: (4, 2)
      ┌─────────┬─────────┐
      weathertext
      │ --- ┆ --- │
      strstr
      ╞═════════╪═════════╡
      Sunny ┆ -bc-bc
      Sunny123-123
      Cloudynull
      Snowynull
      └─────────┴─────────┘
    • Create subslices of the string values of a Utf8 Series.

      Parameters

      • start: number | pl.Expr

        Start of the slice (negative indexing may be used).

      • Optionallength: number | pl.Expr

        Optional length of the slice.

      Returns pl.Expr

    • Split a string into substrings using the specified separator and return them as a Series.

      Parameters

      • by: string
      • Optionaloptions: boolean | { inclusive?: boolean }

      Returns pl.Expr

    • Check if string values start with a substring.

      Parameters

      • prefix: string | pl.Expr

        Prefix substring or expression

      Returns pl.Expr

      >>> df = pl.DataFrame({"fruits": ["apple", "mango", None]})
      >>> df.withColumns(
      ... pl.col("fruits").str.startsWith("app").alias("has_prefix"),
      ... )
      shape: (3, 2)
      ┌────────┬────────────┐
      fruitshas_prefix
      │ --- ┆ --- │
      strbool
      ╞════════╪════════════╡
      appletrue
      mangofalse
      nullnull
      └────────┴────────────┘

      >>> df = pl.DataFrame(
      ... {"fruits": ["apple", "mango", "banana"], "prefix": ["app", "na", "ba"]}
      ... )
      >>> df.withColumns(
      ... pl.col("fruits").str.startsWith(pl.col("prefix")).alias("has_prefix"),
      ... )
      shape: (3, 3)
      ┌────────┬────────┬────────────┐
      fruitsprefixhas_prefix
      │ --- ┆ --- ┆ --- │
      strstrbool
      ╞════════╪════════╪════════════╡
      appleapptrue
      mangonafalse
      bananabatrue
      └────────┴────────┴────────────┘

      Using `starts_with` as a filter condition:

      >>> df.filter(pl.col("fruits").str.startsWith("app"))
      shape: (1, 2)
      ┌────────┬────────┐
      fruitsprefix
      │ --- ┆ --- │
      strstr
      ╞════════╪════════╡
      appleapp
      └────────┴────────┘
    • Remove leading and trailing whitespace.

      Parameters

      • prefix: string | pl.Expr

        Prefix substring or expression (null means whitespace)

      Returns pl.Expr

      >>> df = pl.DataFrame({
      os: [
      "#Kali-Linux###",
      "$$$Debian-Linux$",
      null,
      "Ubuntu-Linux ",
      " Mac-Sierra",
      ],
      chars: ["#", "$", " ", " ", null],
      })
      >>> df.select(col("os").str.stripChars(col("chars")).as("os"))
      shape: (5, 1)
      ┌──────────────┐
      os
      │ --- │
      str
      ╞══════════════╡
      Kali-Linux
      Debian-Linux
      null
      Ubuntu-Linux
      Mac-Sierra
      └──────────────┘
    • Parse a Series of dtype Utf8 to a Date/Datetime Series.

      Parameters

      • datatype: Date

        Date or Datetime.

      • Optionalfmt: string

        formatting syntax. Read more

      Returns pl.Expr

    • Parse a Series of dtype Utf8 to a Date/Datetime Series.

      Parameters

      • datatype: Datetime

        Date or Datetime.

      • Optionalfmt: string

        formatting syntax. Read more

      Returns pl.Expr

    • Parse a Series of dtype Utf8 to a Date/Datetime Series.

      Parameters

      • datatype: (
            timeUnit?: TimeUnit | "ms" | "ns" | "us",
            timeZone?: undefined | null | string,
        ) => Datetime

        Date or Datetime.

          • (
                timeUnit?: TimeUnit | "ms" | "ns" | "us",
                timeZone?: undefined | null | string,
            ): Datetime
          • Calendar date and time type

            Parameters

            • OptionaltimeUnit: TimeUnit | "ms" | "ns" | "us"

              any of 'ms' | 'ns' | 'us'

            • timeZone: undefined | null | string = null

              timezone string as defined by Intl.DateTimeFormat America/New_York for example.

            Returns Datetime

      • Optionalfmt: string

        formatting syntax. Read more

      Returns pl.Expr

    • Add leading "0" to a string until string length is reached. If string is longer or equal to given length no modifications will be done

      Parameters

      • length: number | pl.Expr

        of the final string

      Returns pl.Expr

      > df = pl.DataFrame({
      ... 'foo': [
      ... "a",
      ... "b",
      ... "LONG_WORD",
      ... "cow"
      ... ]})
      > df.select(pl.col('foo').str.justify(3)
      shape: (4, 1)
      ┌──────────┐
      a
      │ -------- │
      str
      ╞══════════╡
      │ 00a
      ├╌╌╌╌╌╌╌╌╌╌┤
      │ 00b
      ├╌╌╌╌╌╌╌╌╌╌┤
      LONG_WORD
      ├╌╌╌╌╌╌╌╌╌╌┤
      cow
      └──────────┘