> const df: pl.DataFrame = pl.DataFrame({
... "foo": [1, 2, 3],
... "bar": [6, 7, 8],
... "ham": ['a', 'b', 'c']
... });
// df: pl.DataFrame<{
// foo: pl.Series<Float64, "foo">;
// bar: pl.Series<Float64, "bar">;
// ham: pl.Series<Utf8, "ham">;
// }>
> df.schema
// {
// foo: Float64;
// bar: Float64;
// ham: Utf8;
// }
Write the DataFrame disk in avro format.
File path to which the file should be written, or writable.
Optionaloptions: WriteAvroOptionsOptions for writing Avro files
Optionalcompression?: "uncompressed" | "snappy" | "deflate"Optionaloptions: WriteAvroOptionsWrite DataFrame to comma-separated values file (csv).
If no options are specified, it will return a new string containing the contents
file or stream to write to
Optionaloptions: CsvWriterOptionsOptions for
OptionalbatchSize?: numberOptionaldateFormat?: stringOptionaldatetimeFormat?: stringOptionalfloatPrecision?: numberOptionalincludeBom?: booleanOptionalincludeHeader?: booleanOptionallineTerminator?: stringOptionalmaintainOrder?: booleanOptionalnullValue?: stringOptionalquoteChar?: stringOptionalseparator?: stringOptionaltimeFormat?: string> const df = pl.DataFrame({
... "foo": [1, 2, 3],
... "bar": [6, 7, 8],
... "ham": ['a', 'b', 'c']
... });
> df.writeCSV();
foo,bar,ham
1,6,a
2,7,b
3,8,c
// using a file path
> df.head(1).writeCSV("./foo.csv")
// foo.csv
foo,bar,ham
1,6,a
// using a write stream
> const writeStream = new Stream.Writable({
... write(chunk, encoding, callback) {
... console.log("writeStream: %O', chunk.toString());
... callback(null);
... }
... });
> df.head(1).writeCSV(writeStream, {includeHeader: false});
writeStream: '1,6,a'
Write to Arrow IPC feather file, either to a file path or to a write stream.
File path to which the file should be written, or writable.
Optionaloptions: WriteIPCOptionsOptions for DataFrame.writeIPC
Optionalcompression?: "uncompressed" | "lz4" | "zstd"Optionaloptions: WriteIPCOptionsWrite to Arrow IPC stream file, either to a file path or to a write stream.
File path to which the file should be written, or writable.
Optionaloptions: WriteIPCOptionsOptions for DataFrame.writeIPC
Optionalcompression?: "uncompressed" | "lz4" | "zstd"Optionaloptions: WriteIPCOptionsWrite Dataframe to JSON string, file, or write stream
file or write stream
Optionaloptions: { format: "lines" | "json" }json | lines
> const df = pl.DataFrame({
... foo: [1,2,3],
... bar: ['a','b','c']
... })
> df.writeJSON({format:"json"})
`[ {"foo":1.0,"bar":"a"}, {"foo":2.0,"bar":"b"}, {"foo":3.0,"bar":"c"}]`
> df.writeJSON({format:"lines"})
`{"foo":1.0,"bar":"a"}
{"foo":2.0,"bar":"b"}
{"foo":3.0,"bar":"c"}`
// writing to a file
> df.writeJSON("/path/to/file.json", {format:'lines'})
Optionaloptions: { format: "lines" | "json" }Write the DataFrame disk in parquet format.
File path to which the file should be written, or writable.
Optionaloptions: WriteParquetOptionsOptions for DataFrame.writeParquet
Optionalcompression?: "uncompressed" | "snappy" | "gzip" | "lzo" | "brotli" | "lz4" | "zstd"Optionaloptions: WriteParquetOptionsSample from this DataFrame by setting either n or frac.
Optionaln: numberNumber of samples < self.len() .
Optionalfrac: numberFraction between 0.0 and 1.0 .
OptionalwithReplacement: booleanSample with replacement.
Optionalseed: number | bigintSeed initialization. If not provided, a random seed will be used
Summary statistics for a DataFrame.
Only summarizes numeric datatypes at the moment and returns nulls for non numeric datatypes.
Example
> const df = pl.DataFrame({
... 'a': [1.0, 2.8, 3.0],
... 'b': [4, 5, 6],
... "c": [True, False, True]
... });
... df.describe()
shape: (5, 4)
╭──────────┬───────┬─────┬──────╮
│ describe ┆ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ f64 ┆ f64 │
╞══════════╪═══════╪═════╪══════╡
│ "mean" ┆ 2.267 ┆ 5 ┆ null │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤
│ "std" ┆ 1.102 ┆ 1 ┆ null │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤
│ "min" ┆ 1 ┆ 4 ┆ 0.0 │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤
│ "max" ┆ 3 ┆ 6 ┆ 1 │
├╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┤
│ "median" ┆ 2.8 ┆ 5 ┆ null │
╰──────────┴───────┴─────┴──────╯
Remove column from DataFrame and return as new.
> const df = pl.DataFrame({
... "foo": [1, 2, 3],
... "bar": [6.0, 7.0, 8.0],
... "ham": ['a', 'b', 'c'],
... "apple": ['a', 'b', 'c']
... });
// df: pl.DataFrame<{
// foo: pl.Series<Float64, "foo">;
// bar: pl.Series<Float64, "bar">;
// ham: pl.Series<Utf8, "ham">;
// apple: pl.Series<Utf8, "apple">;
// }>
> const df2 = df.drop(['ham', 'apple']);
// df2: pl.DataFrame<{
// foo: pl.Series<Float64, "foo">;
// bar: pl.Series<Float64, "bar">;
// }>
> console.log(df2.toString());
shape: (3, 2)
╭─────┬─────╮
│ foo ┆ bar │
│ --- ┆ --- │
│ i64 ┆ f64 │
╞═════╪═════╡
│ 1 ┆ 6 │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 2 ┆ 7 │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 3 ┆ 8 │
╰─────┴─────╯
Return a new DataFrame where the null values are dropped.
This method only drops nulls row-wise if any single value of the row is null.
> const df = pl.DataFrame({
... "foo": [1, 2, 3],
... "bar": [6, null, 8],
... "ham": ['a', 'b', 'c']
... });
> console.log(df.dropNulls().toString());
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ 6 ┆ "a" │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 3 ┆ 8 ┆ "c" │
└─────┴─────┴─────┘
Explode DataFrame to long format by exploding a column with Lists.
column or columns to explode
> const df = pl.DataFrame({
... "letters": ["c", "c", "a", "c", "a", "b"],
... "nrs": [[1, 2], [1, 3], [4, 3], [5, 5, 5], [6], [2, 1, 2]]
... });
> console.log(df.toString());
shape: (6, 2)
╭─────────┬────────────╮
│ letters ┆ nrs │
│ --- ┆ --- │
│ str ┆ list [i64] │
╞═════════╪════════════╡
│ "c" ┆ [1, 2] │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ "c" ┆ [1, 3] │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ "a" ┆ [4, 3] │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ "c" ┆ [5, 5, 5] │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ "a" ┆ [6] │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ "b" ┆ [2, 1, 2] │
╰─────────┴────────────╯
> df.explode("nrs")
shape: (13, 2)
╭─────────┬─────╮
│ letters ┆ nrs │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════════╪═════╡
│ "c" ┆ 1 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "c" ┆ 2 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "c" ┆ 1 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "c" ┆ 3 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ ... ┆ ... │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "c" ┆ 5 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "a" ┆ 6 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "b" ┆ 2 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "b" ┆ 1 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "b" ┆ 2 │
╰─────────┴─────╯
Extend the memory backed by this DataFrame with the values from other.
Different from vstack which adds the chunks from other to the chunks of this DataFrame
extent appends the data from other to the underlying memory locations and thus may cause a reallocation.
If this does not cause a reallocation, the resulting data structure will not have any extra chunks and thus will yield faster queries.
Prefer extend over vstack when you want to do a query after a single append. For instance during
online operations where you add n rows and rerun a query.
Prefer vstack over extend when you want to append many times before doing a query. For instance
when you read in multiple files and when to store them in a single DataFrame.
In the latter case, finish the sequence of vstack operations with a rechunk.
Fill null/missing values by a filling strategy
One of:
DataFrame with None replaced with the filling strategy.
Filter the rows in the DataFrame based on a predicate expression.
Expression that evaluates to a boolean Series.
> const df = pl.DataFrame({
... "foo": [1, 2, 3],
... "bar": [6, 7, 8],
... "ham": ['a', 'b', 'c']
... });
// Filter on one condition
> df.filter(pl.col("foo").lt(3))
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ 6 ┆ a │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 2 ┆ 7 ┆ b │
└─────┴─────┴─────┘
// Filter on multiple conditions
> df.filter(
... pl.col("foo").lt(3)
... .and(pl.col("ham").eq(pl.lit("a")))
... )
shape: (1, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ 6 ┆ a │
└─────┴─────┴─────┘
Find the index of a column by name.
Name of the column to find.
Apply a horizontal reduction on a DataFrame.
This can be used to effectively determine aggregations on a row level, and can be applied to any DataType that can be supercasted (casted to a similar parent type).
An example of the supercast rules when applying an arithmetic operation on two DataTypes are for instance:
function that takes two Series and returns a Series.
Series
> // A horizontal sum operation
> let df = pl.DataFrame({
... "a": [2, 1, 3],
... "b": [1, 2, 3],
... "c": [1.0, 2.0, 3.0]
... });
> df.fold((s1, s2) => s1.plus(s2))
Series: 'a' [f64]
[
4
5
9
]
> // A horizontal minimum operation
> df = pl.DataFrame({
... "a": [2, 1, 3],
... "b": [1, 2, 3],
... "c": [1.0, 2.0, 3.0]
... });
> df.fold((s1, s2) => s1.zipWith(s1.lt(s2), s2))
Series: 'a' [f64]
[
1
1
3
]
> // A horizontal string concatenation
> df = pl.DataFrame({
... "a": ["foo", "bar", 2],
... "b": [1, 2, 3],
... "c": [1.0, 2.0, 3.0]
... })
> df.fold((s1, s2) => s.plus(s2))
Series: '' [f64]
[
"foo11"
"bar22
"233"
]
Check if DataFrame is equal to other.
Get a single column as Series by name.
> const df = pl.DataFrame({
... foo: [1, 2, 3],
... bar: [6, null, 8],
... ham: ["a", "b", "c"],
... });
// df: pl.DataFrame<{
// foo: pl.Series<Float64, "foo">;
// bar: pl.Series<Float64, "bar">;
// ham: pl.Series<Utf8, "ham">;
// }>
> const column = df.getColumn("foo");
// column: pl.Series<Float64, "foo">
> const df = pl.DataFrame({
... foo: [1, 2, 3],
... bar: [6, null, 8],
... ham: ["a", "b", "c"],
... });
// df: pl.DataFrame<{
// foo: pl.Series<Float64, "foo">;
// bar: pl.Series<Float64, "bar">;
// ham: pl.Series<Utf8, "ham">;
// }>
> const columns = df.getColumns();
// columns: (pl.Series<Float64, "foo"> | pl.Series<Float64, "bar"> | pl.Series<Utf8, "ham">)[]
Groups based on a time value (or index value of type Int32, Int64). Time windows are calculated and rows are assigned to windows. Different from a normal groupby is that a row can be member of multiple groups. The time/index window could be seen as a rolling window, with a window size determined by dates/times/values instead of slots in the DataFrame.
A window is defined by:
The every, period and offset arguments are created with
the following string language:
Or combine them: "3d12h4m25s" # 3 days, 12 hours, 4 minutes, and 25 seconds
In case of a groupbyDynamic on an integer column, the windows are defined by:
Optionalby?: ColumnsOrExprAlso group by this column/these columns
Optionalclosed?: "none" | "left" | "right" | "both"Defines if the window interval is closed or not. Any of {"left", "right", "both" "none"}
interval of the window
OptionalincludeBoundaries?: booleanadd the lower and upper bound of the window to the "_lower_bound" and "_upper_bound" columns. This will impact performance because it's harder to parallelize
Column used to group based on the time window. Often to type Date/Datetime This column must be sorted in ascending order. If not the output will not make sense.
In case of a dynamic groupby on indices, dtype needs to be one of {Int32, Int64}. Note that
Int32 gets temporarily cast to Int64, so if performance matters use an Int64 column.
Optionallabel?: stringDefine which label to use for the window: Any if {'left', 'right', 'datapoint'}
Optionaloffset?: stringoffset of the window if None and period is None it will be equal to negative every
Optionalperiod?: stringlength of the window, if None it is equal to 'every'
OptionalstartBy?: StartByThe strategy to determine the start of the first window by. Any of {'window', 'datapoint', 'monday', 'tuesday', 'wednesday', 'thursday', 'friday', 'saturday', 'sunday'}
Create rolling groups based on a time column (or index value of type Int32, Int64).
Different from a rolling groupby the windows are now determined by the individual values and are not of constant intervals. For constant intervals use groupByDynamic
The period and offset arguments are created with
the following string language:
Or combine them: "3d12h4m25s" # 3 days, 12 hours, 4 minutes, and 25 seconds
In case of a groupby_rolling on an integer column, the windows are defined by:
Optionalby?: ColumnsOrExprAlso group by this column/these columns
Optionalclosed?: "none" | "left" | "right" | "both"Defines if the window interval is closed or not. Any of {"left", "right", "both" "none"}
Column used to group based on the time window. Often to type Date/Datetime This column must be sorted in ascending order. If not the output will not make sense.
In case of a rolling groupby on indices, dtype needs to be one of {Int32, Int64}. Note that Int32 gets temporarily cast to Int64, so if performance matters use an Int64 column.
Optionaloffset?: stringoffset of the window. Default is -period
length of the window
>dates = [
... "2020-01-01 13:45:48",
... "2020-01-01 16:42:13",
... "2020-01-01 16:45:09",
... "2020-01-02 18:12:48",
... "2020-01-03 19:45:32",
... "2020-01-08 23:16:43",
... ]
>df = pl.DataFrame({"dt": dates, "a": [3, 7, 5, 9, 2, 1]}).withColumn(
... pl.col("dt").str.strptime(pl.Datetime)
... )
>out = df.groupbyRolling({indexColumn:"dt", period:"2d"}).agg(
... [
... pl.sum("a").alias("sum_a"),
... pl.min("a").alias("min_a"),
... pl.max("a").alias("max_a"),
... ]
... )
>assert(out["sum_a"].toArray() === [3, 10, 15, 24, 11, 1])
>assert(out["max_a"].toArray() === [3, 7, 7, 9, 9, 1])
>assert(out["min_a"].toArray() === [3, 3, 3, 3, 2, 1])
>out
shape: (6, 4)
┌─────────────────────┬───────┬───────┬───────┐
│ dt ┆ a_sum ┆ a_max ┆ a_min │
│ --- ┆ --- ┆ --- ┆ --- │
│ datetime[ms] ┆ i64 ┆ i64 ┆ i64 │
╞═════════════════════╪═══════╪═══════╪═══════╡
│ 2020-01-01 13:45:48 ┆ 3 ┆ 3 ┆ 3 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2020-01-01 16:42:13 ┆ 10 ┆ 7 ┆ 3 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2020-01-01 16:45:09 ┆ 15 ┆ 7 ┆ 3 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2020-01-02 18:12:48 ┆ 24 ┆ 9 ┆ 3 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2020-01-03 19:45:32 ┆ 11 ┆ 9 ┆ 2 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2020-01-08 23:16:43 ┆ 1 ┆ 1 ┆ 1 │
└─────────────────────┴───────┴───────┴───────┘
Get first N rows as DataFrame.
Optionallength: numberLength of the head.
> const df = pl.DataFrame({
... "foo": [1, 2, 3, 4, 5],
... "bar": [6, 7, 8, 9, 10],
... "ham": ['a', 'b', 'c', 'd','e']
... });
> df.head(3)
shape: (3, 3)
╭─────┬─────┬─────╮
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ 6 ┆ "a" │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 2 ┆ 7 ┆ "b" │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 3 ┆ 8 ┆ "c" │
╰─────┴─────┴─────╯
Return a new DataFrame grown horizontally by stacking multiple Series to it.
> const df = pl.DataFrame({
... "foo": [1, 2, 3],
... "bar": [6, 7, 8],
... "ham": ['a', 'b', 'c']
... });
// df: pl.DataFrame<{
// foo: pl.Series<Float64, "foo">;
// bar: pl.Series<Float64, "bar">;
// ham: pl.Series<Utf8, "ham">;
// }>
> const x = pl.Series("apple", [10, 20, 30])
// x: pl.Series<Float64, "apple">
> df.hstack([x])
// pl.DataFrame<{
// foo: pl.Series<Float64, "foo">;
// bar: pl.Series<Float64, "bar">;
// ham: pl.Series<Utf8, "ham">;
// apple: pl.Series<Float64, "apple">;
// }>
shape: (3, 4)
╭─────┬─────┬─────┬───────╮
│ foo ┆ bar ┆ ham ┆ apple │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str ┆ i64 │
╞═════╪═════╪═════╪═══════╡
│ 1 ┆ 6 ┆ "a" ┆ 10 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2 ┆ 7 ┆ "b" ┆ 20 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 3 ┆ 8 ┆ "c" ┆ 30 │
╰─────┴─────┴─────┴───────╯
Check if the dataframe is empty
SQL like joins.
DataFrame to join with.
options for same named column join
> const df = pl.DataFrame({
... "foo": [1, 2, 3],
... "bar": [6.0, 7.0, 8.0],
... "ham": ['a', 'b', 'c']
... });
> const otherDF = pl.DataFrame({
... "apple": ['x', 'y', 'z'],
... "ham": ['a', 'b', 'd']
... });
> df.join(otherDF, {on: 'ham'})
shape: (2, 4)
╭─────┬─────┬─────┬───────╮
│ foo ┆ bar ┆ ham ┆ apple │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ str │
╞═════╪═════╪═════╪═══════╡
│ 1 ┆ 6 ┆ "a" ┆ "x" │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2 ┆ 7 ┆ "b" ┆ "y" │
╰─────┴─────┴─────┴───────╯
SQL like joins with different names for left and right dataframes.
DataFrame to join with.
options for differently named column join
Optionalcoalesce?: booleanCoalescing behavior (merging of join columns).
Optionalhow?: Exclude<JoinType, "cross">Join strategy
Name(s) of the left join column(s).
Name(s) of the right join column(s).
Optionalsuffix?: stringSuffix to append to columns with a duplicate name.
> const df = pl.DataFrame({
... "foo": [1, 2, 3],
... "bar": [6.0, 7.0, 8.0],
... "ham": ['a', 'b', 'c']
... });
> const otherDF = pl.DataFrame({
... "apple": ['x', 'y', 'z'],
... "ham": ['a', 'b', 'd']
... });
> df.join(otherDF, {leftOn: 'ham', rightOn: 'ham'})
shape: (2, 4)
╭─────┬─────┬─────┬───────╮
│ foo ┆ bar ┆ ham ┆ apple │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ f64 ┆ str ┆ str │
╞═════╪═════╪═════╪═══════╡
│ 1 ┆ 6 ┆ "a" ┆ "x" │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 2 ┆ 7 ┆ "b" ┆ "y" │
╰─────┴─────┴─────┴───────╯
SQL like cross joins.
DataFrame to join with.
options for cross join
Optionalcoalesce?: booleanCoalescing behavior (merging of join columns).
Join strategy
Optionalsuffix?: stringSuffix to append to columns with a duplicate name.
> const df = pl.DataFrame({
... "foo": [1, 2],
... "bar": [6.0, 7.0],
... "ham": ['a', 'b']
... });
> const otherDF = pl.DataFrame({
... "apple": ['x', 'y'],
... "ham": ['a', 'b']
... });
> df.join(otherDF, {how: 'cross'})
shape: (4, 5)
╭─────┬─────┬─────┬───────┬───────────╮
│ foo ┆ bar ┆ ham ┆ apple ┆ ham_right │
│ --- ┆ --- ┆ --- ┆ --- ┆ --- │
│ f64 ┆ f64 ┆ str ┆ str ┆ str │
╞═════╪═════╪═════╪═══════╪═══════════╡
│ 1.0 ┆ 6.0 ┆ a ┆ x ┆ a │
│ 1.0 ┆ 6.0 ┆ a ┆ y ┆ b │
│ 2.0 ┆ 7.0 ┆ b ┆ x ┆ a │
│ 2.0 ┆ 7.0 ┆ b ┆ y ┆ b │
╰─────┴─────┴─────┴───────┴───────────╯
Perform an asof join. This is similar to a left-join except that we match on nearest key rather than equal keys.
Both DataFrames must be sorted by the asofJoin key.
For each row in the left DataFrame:
A "backward" search selects the last row in the right DataFrame whose 'on' key is less than or equal to the left's key.
A "forward" search selects the first row in the right DataFrame whose 'on' key is greater than or equal to the left's key.
A "nearest" search selects the last row in the right DataFrame whose value is nearest to the left's key. String keys are not currently supported for a nearest search.
The default is "backward".
DataFrame to join with.
OptionalallowParallel?: booleanAllow the physical plan to optionally evaluate the computation of both DataFrames up to the join in parallel.
Optionalby?: string | string[]OptionalbyLeft?: string | string[]join on these columns before doing asof join
OptionalbyRight?: string | string[]join on these columns before doing asof join
OptionalcheckSortedness?: booleanCheck the sortedness of the asof keys. If the keys are not sorted Polars will error, or in case of 'by' argument raise a warning. This might become a hard error in the future.
OptionalforceParallel?: booleanForce the physical plan to evaluate the computation of both DataFrames up to the join in parallel.
OptionalleftOn?: stringJoin column of the left DataFrame.
Optionalon?: stringJoin column of both DataFrames. If set, leftOn and rightOn should be undefined.
OptionalrightOn?: stringJoin column of the right DataFrame.
Optionalstrategy?: "backward" | "forward" | "nearest"One of 'forward', 'backward', 'nearest'
Optionalsuffix?: stringSuffix to append to columns with a duplicate name.
Optionaltolerance?: string | numberNumeric tolerance. By setting this the join will only be done if the near keys are within this distance. If an asof join is done on columns of dtype "Date", "Datetime" you use the following string language:
Or combine them:
> const gdp = pl.DataFrame({
... date: [
... new Date('2016-01-01'),
... new Date('2017-01-01'),
... new Date('2018-01-01'),
... new Date('2019-01-01'),
... ], // note record date: Jan 1st (sorted!)
... gdp: [4164, 4411, 4566, 4696],
... })
> const population = pl.DataFrame({
... date: [
... new Date('2016-05-12'),
... new Date('2017-05-12'),
... new Date('2018-05-12'),
... new Date('2019-05-12'),
... ], // note record date: May 12th (sorted!)
... "population": [82.19, 82.66, 83.12, 83.52],
... })
> population.joinAsof(
... gdp,
... {leftOn:"date", rightOn:"date", strategy:"backward"}
... )
shape: (4, 3)
┌─────────────────────┬────────────┬──────┐
│ date ┆ population ┆ gdp │
│ --- ┆ --- ┆ --- │
│ datetime[μs] ┆ f64 ┆ i64 │
╞═════════════════════╪════════════╪══════╡
│ 2016-05-12 00:00:00 ┆ 82.19 ┆ 4164 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2017-05-12 00:00:00 ┆ 82.66 ┆ 4411 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2018-05-12 00:00:00 ┆ 83.12 ┆ 4566 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2019-05-12 00:00:00 ┆ 83.52 ┆ 4696 │
└─────────────────────┴────────────┴──────┘
Get number of chunks used by the ChunkedArrays of this DataFrame.
Create a spreadsheet-style pivot table as a DataFrame.
The existing column(s) of values which will be moved under the new columns from index. If an
aggregation is specified, these are the values on which the aggregation will be computed.
If None, all remaining columns not specified on on and index will be used.
At least one of index and values must be specified.
OptionalaggregateFunc?: pl.Expr | "mean" | "min" | "max" | "count" | "first" | "last" | "median" | "sum"Any of: - "sum" - "max" - "min" - "mean" - "median" - "first" - "last" - "count" Defaults to "first"
The column(s) that remain from the input to the output. The output DataFrame will have one row
for each unique combination of the index's values.
If None, all remaining columns not specified on on and values will be used. At least one
of index and values must be specified.
OptionalmaintainOrder?: booleanSort the grouped keys so that the output order is predictable.
The column(s) whose values will be used as the new columns of the output DataFrame.
Optionalseparator?: stringUsed as separator/delimiter in generated column names.
OptionalsortColumns?: booleanSort the transposed columns by name. Default is by order of discovery.
> const df = pl.DataFrame(
... {
... "foo": ["one", "one", "one", "two", "two", "two"],
... "bar": ["A", "B", "C", "A", "B", "C"],
... "baz": [1, 2, 3, 4, 5, 6],
... }
... );
> df.pivot("baz", {index:"foo", on:"bar"});
shape: (2, 4)
┌─────┬─────┬─────┬─────┐
│ foo ┆ A ┆ B ┆ C │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ f64 ┆ f64 ┆ f64 │
╞═════╪═════╪═════╪═════╡
│ one ┆ 1 ┆ 2 ┆ 3 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ two ┆ 4 ┆ 5 ┆ 6 │
└─────┴─────┴─────┴─────┘
Rename column names.
Key value pairs that map from old name to new name.
> const df = pl.DataFrame({
... "foo": [1, 2, 3],
... "bar": [6, 7, 8],
... "ham": ['a', 'b', 'c']
... });
// df: pl.DataFrame<{
// foo: pl.Series<Float64, "foo">;
// bar: pl.Series<Float64, "bar">;
// ham: pl.Series<Utf8, "ham">;
// }>
> df.rename({"foo": "apple"});
╭───────┬─────┬─────╮
│ apple ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═══════╪═════╪═════╡
│ 1 ┆ 6 ┆ "a" │
├╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 2 ┆ 7 ┆ "b" │
├╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 3 ┆ 8 ┆ "c" │
╰───────┴─────┴─────╯
Replace a column at an index location.
Warning: typescript cannot encode type mutation, so the type of the DataFrame will be incorrect. cast the type of dataframe manually.
> const df: pl.DataFrame = pl.DataFrame({
... "foo": [1, 2, 3],
... "bar": [6, 7, 8],
... "ham": ['a', 'b', 'c']
... });
// df: pl.DataFrame<{
// foo: pl.Series<Float64, "foo">;
// bar: pl.Series<Float64, "bar">;
// ham: pl.Series<Utf8, "ham">;
// }>
> const x = pl.Series("apple", [10, 20, 30]);
// x: pl.Series<Float64, "apple">
> df.replaceAtIdx(0, x);
// df: pl.DataFrame<{
// foo: pl.Series<Float64, "foo">; <- notice how the type is still the same!
// bar: pl.Series<Float64, "bar">;
// ham: pl.Series<Utf8, "ham">;
// }>
shape: (3, 3)
╭───────┬─────┬─────╮
│ apple ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═══════╪═════╪═════╡
│ 10 ┆ 6 ┆ "a" │
├╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 20 ┆ 7 ┆ "b" │
├╌╌╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 30 ┆ 8 ┆ "c" │
╰───────┴─────┴─────╯
Convert columnar data to rows as arrays
Select columns from this DataFrame.
Column or columns to select.
> const df = pl.DataFrame({
... "foo": [1, 2, 3],
... "bar": [6, 7, 8],
... "ham": ['a', 'b', 'c']
... });
// df: pl.DataFrame<{
// foo: pl.Series<Float64, "foo">;
// bar: pl.Series<Float64, "bar">;
// ham: pl.Series<Utf8, "ham">;
// }>
> df.select('foo');
// pl.DataFrame<{
// foo: pl.Series<Float64, "foo">;
// }>
shape: (3, 1)
┌─────┐
│ foo │
│ --- │
│ i64 │
╞═════╡
│ 1 │
├╌╌╌╌╌┤
│ 2 │
├╌╌╌╌╌┤
│ 3 │
└─────┘
Shift the values by a given period and fill the parts that will be empty due to this operation
with Nones.
Number of places to shift (may be negative).
> const df = pl.DataFrame({
... "foo": [1, 2, 3],
... "bar": [6, 7, 8],
... "ham": ['a', 'b', 'c']
... });
> df.shift(1);
shape: (3, 3)
┌──────┬──────┬──────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╡
│ null ┆ null ┆ null │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 1 ┆ 6 ┆ "a" │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2 ┆ 7 ┆ "b" │
└──────┴──────┴──────┘
> df.shift(-1)
shape: (3, 3)
┌──────┬──────┬──────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞══════╪══════╪══════╡
│ 2 ┆ 7 ┆ "b" │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 3 ┆ 8 ┆ "c" │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ null ┆ null ┆ null │
└──────┴──────┴──────┘
Shift the values by a given period and fill the parts that will be empty due to this operation
with the result of the fill_value expression.
Number of places to shift (may be negative).
fill null values with this value.
> const df = pl.DataFrame({
... "foo": [1, 2, 3],
... "bar": [6, 7, 8],
... "ham": ['a', 'b', 'c']
... });
> df.shiftAndFill({n:1, fill_value:0});
shape: (3, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 0 ┆ 0 ┆ "0" │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 1 ┆ 6 ┆ "a" │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 2 ┆ 7 ┆ "b" │
└─────┴─────┴─────┘
Slice this DataFrame over the rows direction.
Length of the slice
Offset index.
> const df = pl.DataFrame({
... "foo": [1, 2, 3],
... "bar": [6.0, 7.0, 8.0],
... "ham": ['a', 'b', 'c']
... });
> df.slice(1, 2); // Alternatively `df.slice({offset:1, length:2})`
shape: (2, 3)
┌─────┬─────┬─────┐
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 2 ┆ 7 ┆ "b" │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 3 ┆ 8 ┆ "c" │
└─────┴─────┴─────┘
Sort the DataFrame by column.
Column(s) to sort by. Accepts expression input, including selectors. Strings are parsed as column names.
Optionaldescending: booleanSort in descending order. When sorting by multiple columns, can be specified per column by passing a sequence of booleans.
OptionalnullsLast: booleanPlace null values last; can specify a single boolean applying to all columns or a sequence of booleans for per-column control.
OptionalmaintainOrder: booleanWhether the order should be maintained if elements are equal.
Optionallength: number> const df = pl.DataFrame({
... "letters": ["c", "c", "a", "c", "a", "b"],
... "nrs": [1, 2, 3, 4, 5, 6]
... });
> console.log(df.toString());
shape: (6, 2)
╭─────────┬─────╮
│ letters ┆ nrs │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════════╪═════╡
│ "c" ┆ 1 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "c" ┆ 2 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "a" ┆ 3 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "c" ┆ 4 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "a" ┆ 5 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "b" ┆ 6 │
╰─────────┴─────╯
> df.groupby("letters")
... .tail(2)
... .sort("letters")
shape: (5, 2)
╭─────────┬─────╮
│ letters ┆ nrs │
│ --- ┆ --- │
│ str ┆ i64 │
╞═════════╪═════╡
│ "a" ┆ 3 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "a" ┆ 5 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "b" ┆ 6 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "c" ┆ 2 │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌┤
│ "c" ┆ 4 │
╰─────────┴─────╯
Converts dataframe object into a TabularDataResource
Converts dataframe object into HTML
Optionalindex: numberReturns a string representation of an object.
Transpose a DataFrame over the diagonal.
Optionaloptions: {OptionalcolumnNames?: Iterable<string, any, any>Optional generator/iterator that yields column names. Will be used to replace the columns in the DataFrame.
OptionalheaderName?: stringIf includeHeader is set, this determines the name of the column that will be inserted
OptionalincludeHeader?: booleanIf set, the column names will be added as first column.
> const df = pl.DataFrame({"a": [1, 2, 3], "b": [1, 2, 3]});
> df.transpose({includeHeader:true})
shape: (2, 4)
┌────────┬──────────┬──────────┬──────────┐
│ column ┆ column_0 ┆ column_1 ┆ column_2 │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ i64 │
╞════════╪══════════╪══════════╪══════════╡
│ a ┆ 1 ┆ 2 ┆ 3 │
├╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌┤
│ b ┆ 1 ┆ 2 ┆ 3 │
└────────┴──────────┴──────────┴──────────┘
// replace the auto generated column names with a list
> df.transpose({includeHeader:false, columnNames:["a", "b", "c"]})
shape: (2, 3)
┌─────┬─────┬─────┐
│ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1 ┆ 2 ┆ 3 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 1 ┆ 2 ┆ 3 │
└─────┴─────┴─────┘
// Include the header as a separate column
> df.transpose({
... includeHeader:true,
... headerName:"foo",
... columnNames:["a", "b", "c"]
... })
shape: (2, 4)
┌─────┬─────┬─────┬─────┐
│ foo ┆ a ┆ b ┆ c │
│ --- ┆ --- ┆ --- ┆ --- │
│ str ┆ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╪═════╡
│ a ┆ 1 ┆ 2 ┆ 3 │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ b ┆ 1 ┆ 2 ┆ 3 │
└─────┴─────┴─────┴─────┘
// Replace the auto generated column with column names from a generator function
> function *namesGenerator() {
... const baseName = "my_column_";
... let count = 0;
... let name = `${baseName}_${count}`;
... count++;
... yield name;
... }
> df.transpose({includeHeader:false, columnNames:namesGenerator})
shape: (2, 3)
┌─────────────┬─────────────┬─────────────┐
│ my_column_0 ┆ my_column_1 ┆ my_column_2 │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════════════╪═════════════╪═════════════╡
│ 1 ┆ 2 ┆ 3 │
├╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 1 ┆ 2 ┆ 3 │
└─────────────┴─────────────┴─────────────┘
Decompose a struct into its fields. The fields will be inserted in to the DataFrame on the
location of the struct type.
Names of the struct columns that will be decomposed by its fields
> const df = pl.DataFrame({
... "int": [1, 2],
... "str": ["a", "b"],
... "bool": [true, null],
... "list": [[1, 2], [3]],
... })
... .toStruct("my_struct")
... .toFrame();
> df
shape: (2, 1)
┌─────────────────────────────┐
│ my_struct │
│ --- │
│ struct[4]{'int',...,'list'} │
╞═════════════════════════════╡
│ {1,"a",true,[1, 2]} │
├╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ {2,"b",null,[3]} │
└─────────────────────────────┘
> df.unnest("my_struct")
shape: (2, 4)
┌─────┬─────┬──────┬────────────┐
│ int ┆ str ┆ bool ┆ list │
│ --- ┆ --- ┆ --- ┆ --- │
│ i64 ┆ str ┆ bool ┆ list [i64] │
╞═════╪═════╪══════╪════════════╡
│ 1 ┆ a ┆ true ┆ [1, 2] │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌┤
│ 2 ┆ b ┆ null ┆ [3] │
└─────┴─────┴──────┴────────────┘
Unpivot a DataFrame from wide to long format.
Columns to use as identifier variables.
Values to use as value variables.
Optionaloptions: { valueName?: string | null; variableName?: string | null }OptionalvalueName?: string | nullName to give to the value column. Defaults to "value"
OptionalvariableName?: string | nullName to give to the variable column. Defaults to "variable"
> const df1 = pl.DataFrame({
... 'id': [1],
... 'asset_key_1': ['123'],
... 'asset_key_2': ['456'],
... 'asset_key_3': ['abc'],
... });
> df1.unpivot('id', ['asset_key_1', 'asset_key_2', 'asset_key_3']);
shape: (3, 3)
┌─────┬─────────────┬───────┐
│ id ┆ variable ┆ value │
│ --- ┆ --- ┆ --- │
│ f64 ┆ str ┆ str │
╞═════╪═════════════╪═══════╡
│ 1 ┆ asset_key_1 ┆ 123 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 1 ┆ asset_key_2 ┆ 456 │
├╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ 1 ┆ asset_key_3 ┆ abc │
└─────┴─────────────┴───────┘
Upsample a DataFrame at a regular frequency.
The every and offset arguments are created with the following string language:
Or combine them:
By "calendar day", we mean the corresponding time on the next day (which may not be 24 hours, due to daylight savings). Similarly for "calendar week", "calendar month", "calendar quarter", and "calendar year".
Time column will be used to determine a date range. Note that this column has to be sorted for the output to make sense.
Interval will start 'every' duration.
Optionalby: string | string[]First group by these columns and then upsample for every group.
OptionalmaintainOrder: booleanKeep the ordering predictable. This is slower.
DataFrame
Result will be sorted by timeColumn (but note that if by columns are passed, it will only be sorted within each by group).
Upsample a DataFrame by a certain interval.
const df = pl.DataFrame({ "date": [ new Date(2024, 1, 1), new Date(2024, 3, 1), new Date(2024, 4, 1), new Date(2024, 5, 1), ], "groups": ["A", "B", "A", "B"], "values": [0, 1, 2, 3], }) .withColumn(pl.col("date").cast(pl.Date).alias("date")) .sort("date");
df.upsample({timeColumn: "date", every: "1mo", by: "groups", maintainOrder: true}) .select(pl.col("*").forwardFill()); shape: (7, 3) ┌────────────┬────────┬────────┐ │ date ┆ groups ┆ values │ │ --- ┆ --- ┆ --- │ │ date ┆ str ┆ f64 │ ╞════════════╪════════╪════════╡ │ 2024-02-01 ┆ A ┆ 0.0 │ │ 2024-03-01 ┆ A ┆ 0.0 │ │ 2024-04-01 ┆ A ┆ 0.0 │ │ 2024-05-01 ┆ A ┆ 2.0 │ │ 2024-04-01 ┆ B ┆ 1.0 │ │ 2024-05-01 ┆ B ┆ 1.0 │ │ 2024-06-01 ┆ B ┆ 3.0 │ └────────────┴────────┴────────┘
Grow this DataFrame vertically by stacking a DataFrame to it.
> const df1 = pl.DataFrame({
... "foo": [1, 2],
... "bar": [6, 7],
... "ham": ['a', 'b']
... });
> const df2 = pl.DataFrame({
... "foo": [3, 4],
... "bar": [8 , 9],
... "ham": ['c', 'd']
... });
> df1.vstack(df2);
shape: (4, 3)
╭─────┬─────┬─────╮
│ foo ┆ bar ┆ ham │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ str │
╞═════╪═════╪═════╡
│ 1 ┆ 6 ┆ "a" │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 2 ┆ 7 ┆ "b" │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 3 ┆ 8 ┆ "c" │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 4 ┆ 9 ┆ "d" │
╰─────┴─────┴─────╯
Return a new DataFrame with the column added or replaced.
Series, where the name of the Series refers to the column in the DataFrame.
A DataFrame is a two-dimensional data structure that represents data as a table with rows and columns.
Param: data
Object, Array, or Series Two-dimensional data in various forms. object must contain Arrays. Array may contain Series or other Arrays.
Param: columns
Array of str, default undefined Column labels to use for resulting DataFrame. If specified, overrides any labels already present in the data. Must match data dimensions.
Param: orient
'col' | 'row' default undefined Whether to interpret two-dimensional data as columns or as rows. If None, the orientation is inferred by matching the columns and data dimensions. If this does not yield conclusive results, column orientation is used.
Example
Constructing a DataFrame from an object :
Notice that the dtype is automatically inferred as a polars Int64:
In order to specify dtypes for your columns, initialize the DataFrame with a list of Series instead:
Constructing a DataFrame from a list of lists, row orientation inferred: