polars.DataFrame

class polars.DataFrame(data: Optional[Union[Dict[str, Sequence[Any]], Sequence[Any], numpy.ndarray, pyarrow.lib.Table, pandas.core.frame.DataFrame, polars.internals.series.Series]] = None, columns: Optional[Union[List[str], Sequence[str], Dict[str, Type[polars.datatypes.DataType]], Sequence[Tuple[str, Type[polars.datatypes.DataType]]]]] = None, orient: Optional[str] = None)

A DataFrame is a two-dimensional data structure that represents data as a table with rows and columns.

Parameters
datadict, Sequence, ndarray, Series, or pandas.DataFrame

Two-dimensional data in various forms. dict must contain Sequences. Sequence may contain Series or other Sequences.

columnsSequence of str or (str,DataType) pairs, default None

Column labels to use for resulting DataFrame. If specified, overrides any labels already present in the data. Must match data dimensions.

orient{‘col’, ‘row’}, default None

Whether to interpret two-dimensional data as columns or as rows. If None, the orientation is inferred by matching the columns and data dimensions. If this does not yield conclusive results, column orientation is used.

Examples

Constructing a DataFrame from a dictionary:

>>> data = {"a": [1, 2], "b": [3, 4]}
>>> df = pl.DataFrame(data)
>>> df
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 3   │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 2   ┆ 4   │
└─────┴─────┘

Notice that the dtype is automatically inferred as a polars Int64:

>>> df.dtypes
[<class 'polars.datatypes.Int64'>, <class 'polars.datatypes.Int64'>]

In order to specify dtypes for your columns, initialize the DataFrame with a list of typed Series, or set the columns parameter with a list of (name,dtype) pairs:

>>> data = [
...     pl.Series("col1", [1, 2], dtype=pl.Float32),
...     pl.Series("col2", [3, 4], dtype=pl.Int64),
... ]
>>> df2 = pl.DataFrame(data)
>>> df2
shape: (2, 2)
┌──────┬──────┐
│ col1 ┆ col2 │
│ ---  ┆ ---  │
│ f32  ┆ i64  │
╞══════╪══════╡
│ 1.0  ┆ 3    │
├╌╌╌╌╌╌┼╌╌╌╌╌╌┤
│ 2.0  ┆ 4    │
└──────┴──────┘

# or, equivalent… (and also compatible with all of the other valid data parameter types): >>> df3 = pl.DataFrame(data, columns=[(“col1”, pl.Float32), (“col2”, pl.Int64)]) >>> df3 ┌──────┬──────┐ │ col1 ┆ col2 │ │ — ┆ — │ │ f32 ┆ i64 │ ╞══════╪══════╡ │ 1.0 ┆ 3 │ ├╌╌╌╌╌╌┼╌╌╌╌╌╌┤ │ 2.0 ┆ 4 │ └──────┴──────┘

Constructing a DataFrame from a numpy ndarray, specifying column names:

>>> import numpy as np
>>> data = np.array([(1, 2), (3, 4)], dtype=np.int64)
>>> df4 = pl.DataFrame(data, columns=["a", "b"], orient="col")
>>> df4
shape: (2, 2)
┌─────┬─────┐
│ a   ┆ b   │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═════╪═════╡
│ 1   ┆ 3   │
├╌╌╌╌╌┼╌╌╌╌╌┤
│ 2   ┆ 4   │
└─────┴─────┘

Constructing a DataFrame from a list of lists, row orientation inferred:

>>> data = [[1, 2, 3], [4, 5, 6]]
>>> df4 = pl.DataFrame(data, columns=["a", "b", "c"])
>>> df4
shape: (2, 3)
┌─────┬─────┬─────┐
│ a   ┆ b   ┆ c   │
│ --- ┆ --- ┆ --- │
│ i64 ┆ i64 ┆ i64 │
╞═════╪═════╪═════╡
│ 1   ┆ 2   ┆ 3   │
├╌╌╌╌╌┼╌╌╌╌╌┼╌╌╌╌╌┤
│ 4   ┆ 5   ┆ 6   │
└─────┴─────┴─────┘
Attributes
columns

Get or set column names.

dtypes

Get dtypes of columns in DataFrame.

height

Get the height of the DataFrame.

schema

Get a dict[column name, DataType]

shape

Get the shape of the DataFrame.

width

Get the width of the DataFrame.

Methods

apply(f[, return_dtype, inference_size])

Apply a custom function over the rows of the DataFrame.

clone()

Very cheap deep clone.

describe()

Summary statistics for a DataFrame.

distinct([maintain_order, subset, keep])

Deprecated since version 0.13.13.

drop(name)

Remove column from DataFrame and return as new.

drop_in_place(name)

Drop in place.

drop_nulls([subset])

Return a new DataFrame where the null values are dropped.

estimated_size()

Returns an estimation of the total (heap) allocated size of the DataFrame in bytes.

explode(columns)

Explode DataFrame to long format by exploding a column with Lists.

extend(other)

Extend the memory backed by this DataFrame with the values from other.

fill_nan(fill_value)

Fill floating point NaN values by an Expression evaluation.

fill_null(strategy)

Fill null values using a filling strategy, literal, or Expr.

filter(predicate)

Filter the rows in the DataFrame based on a predicate expression.

find_idx_by_name(name)

Find the index of a column by name.

fold(operation)

Apply a horizontal reduction on a DataFrame.

frame_equal(other[, null_equal])

Check if DataFrame is equal to other.

get_column(name)

Get a single column as Series by name.

get_columns()

Get the DataFrame as a List of Series.

groupby(by[, maintain_order])

Start a groupby operation.

groupby_dynamic(index_column, every[, ...])

Groups based on a time value (or index value of type Int32, Int64).

groupby_rolling(index_column, period[, ...])

Create rolling groups based on a time column (or index value of type Int32, Int64).

hash_rows([k0, k1, k2, k3])

Hash and combine the rows in this DataFrame.

head([length])

Get first N rows as DataFrame.

hstack(columns[, in_place])

Return a new DataFrame grown horizontally by stacking multiple Series to it.

insert_at_idx(index, series)

Insert a Series at a certain column index.

interpolate()

Interpolate intermediate values.

is_duplicated()

Get a mask of all duplicated rows in this DataFrame.

is_empty()

Check if the dataframe is empty

is_unique()

Get a mask of all unique rows in this DataFrame.

join(df[, left_on, right_on, on, how, ...])

SQL like joins.

join_asof(df[, left_on, right_on, on, ...])

Perform an asof join.

lazy()

Start a lazy query from this point.

limit([length])

Get first N rows as DataFrame.

max()

Aggregate the columns of this DataFrame to their maximum value.

mean()

Aggregate the columns of this DataFrame to their mean value.

median()

Aggregate the columns of this DataFrame to their median value.

melt([id_vars, value_vars, variable_name, ...])

Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.

min()

Aggregate the columns of this DataFrame to their minimum value.

n_chunks()

Get number of chunks used by the ChunkedArrays of this DataFrame.

null_count()

Create a new DataFrame that shows the null counts per column.

partition_by()

Split into multiple DataFrames partitioned by groups.

pipe(func, *args, **kwargs)

Apply a function on Self.

pivot(values, index, columns[, ...])

Create a spreadsheet-style pivot table as a DataFrame.

product()

Aggregate the columns of this DataFrame to their product values

quantile(quantile[, interpolation])

Aggregate the columns of this DataFrame to their quantile value.

rechunk()

Rechunk the data in this DataFrame to a contiguous allocation.

rename(mapping)

Rename column names.

replace(column, new_col)

Replace a column by a new Series.

replace_at_idx(index, series)

Replace a column at an index location.

row(index)

Get a row as tuple.

rows()

Convert columnar data to rows as python tuples.

sample([n, frac, with_replacement, shuffle, ...])

Sample from this DataFrame by setting either n or frac.

select(exprs)

Select columns from this DataFrame.

select_at_idx(idx)

Select column at index location.

shift(periods)

Shift the values by a given period and fill the parts that will be empty due to this operation with Nones.

shift_and_fill(periods, fill_value)

Shift the values by a given period and fill the parts that will be empty due to this operation with the result of the fill_value expression.

shrink_to_fit()

Shrink memory usage of this DataFrame to fit the exact capacity needed to hold the data.

slice(offset, length)

Slice this DataFrame over the rows direction.

sort()

Sort the DataFrame by column.

std()

Aggregate the columns of this DataFrame to their standard deviation value.

sum()

Aggregate the columns of this DataFrame to their sum value.

tail([length])

Get last N rows as DataFrame.

to_arrow()

Collect the underlying arrow arrays in an Arrow Table.

to_avro(file[, compression])

Deprecated since version 0.13.12.

to_csv([file, has_header, sep])

Deprecated since version 0.13.12.

to_dict()

Convert DataFrame to a dictionary mapping column name to values.

to_dicts()

Convert every row to a dictionary.

to_dummies()

Get one hot encoded dummy variables.

to_ipc(file[, compression])

Deprecated since version 0.13.12.

to_json()

Deprecated since version 0.13.12.

to_numpy()

Convert DataFrame to a 2d numpy array.

to_pandas(*args[, date_as_object])

Cast to a pandas DataFrame.

to_parquet(file[, compression, statistics, ...])

Deprecated since version 0.13.12.

to_series([index])

Select column as Series at index location.

to_struct(name)

Convert a DataFrame to a Series of type Struct

transpose([include_header, header_name, ...])

Transpose a DataFrame over the diagonal.

unique([maintain_order, subset, keep])

Drop duplicate rows from this DataFrame.

unnest(names)

Decompose a struct into its fields.

upsample(time_column, every[, offset, by, ...])

Upsample a DataFrame at a regular frequency.

var()

Aggregate the columns of this DataFrame to their variance value.

vstack()

Grow this DataFrame vertically by stacking a DataFrame to it.

with_column(column)

Return a new DataFrame with the column added or replaced.

with_columns(exprs)

Add or overwrite multiple columns in a DataFrame.

with_row_count([name, offset])

Add a column at index 0 that counts the rows.

write_avro(file[, compression])

Write to Apache Avro file.

write_csv([file, has_header, sep, quote])

Write Dataframe to comma-separated values file (csv).

write_ipc(file[, compression])

Write to Arrow IPC binary stream, or a feather file.

write_json()

Serialize to JSON representation.

write_parquet(file[, compression, ...])

Write the DataFrame disk in parquet format.

__init__(data: Optional[Union[Dict[str, Sequence[Any]], Sequence[Any], numpy.ndarray, pyarrow.lib.Table, pandas.core.frame.DataFrame, polars.internals.series.Series]] = None, columns: Optional[Union[List[str], Sequence[str], Dict[str, Type[polars.datatypes.DataType]], Sequence[Tuple[str, Type[polars.datatypes.DataType]]]]] = None, orient: Optional[str] = None)

Methods

__init__([data, columns, orient])

apply(f[, return_dtype, inference_size])

Apply a custom function over the rows of the DataFrame.

clone()

Very cheap deep clone.

describe()

Summary statistics for a DataFrame.

distinct([maintain_order, subset, keep])

Deprecated since version 0.13.13.

drop(name)

Remove column from DataFrame and return as new.

drop_in_place(name)

Drop in place.

drop_nulls([subset])

Return a new DataFrame where the null values are dropped.

estimated_size()

Returns an estimation of the total (heap) allocated size of the DataFrame in bytes.

explode(columns)

Explode DataFrame to long format by exploding a column with Lists.

extend(other)

Extend the memory backed by this DataFrame with the values from other.

fill_nan(fill_value)

Fill floating point NaN values by an Expression evaluation.

fill_null(strategy)

Fill null values using a filling strategy, literal, or Expr.

filter(predicate)

Filter the rows in the DataFrame based on a predicate expression.

find_idx_by_name(name)

Find the index of a column by name.

fold(operation)

Apply a horizontal reduction on a DataFrame.

frame_equal(other[, null_equal])

Check if DataFrame is equal to other.

get_column(name)

Get a single column as Series by name.

get_columns()

Get the DataFrame as a List of Series.

groupby(by[, maintain_order])

Start a groupby operation.

groupby_dynamic(index_column, every[, ...])

Groups based on a time value (or index value of type Int32, Int64).

groupby_rolling(index_column, period[, ...])

Create rolling groups based on a time column (or index value of type Int32, Int64).

hash_rows([k0, k1, k2, k3])

Hash and combine the rows in this DataFrame.

head([length])

Get first N rows as DataFrame.

hstack(columns[, in_place])

Return a new DataFrame grown horizontally by stacking multiple Series to it.

insert_at_idx(index, series)

Insert a Series at a certain column index.

interpolate()

Interpolate intermediate values.

is_duplicated()

Get a mask of all duplicated rows in this DataFrame.

is_empty()

Check if the dataframe is empty

is_unique()

Get a mask of all unique rows in this DataFrame.

join(df[, left_on, right_on, on, how, ...])

SQL like joins.

join_asof(df[, left_on, right_on, on, ...])

Perform an asof join.

lazy()

Start a lazy query from this point.

limit([length])

Get first N rows as DataFrame.

max()

Aggregate the columns of this DataFrame to their maximum value.

mean()

Aggregate the columns of this DataFrame to their mean value.

median()

Aggregate the columns of this DataFrame to their median value.

melt([id_vars, value_vars, variable_name, ...])

Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.

min()

Aggregate the columns of this DataFrame to their minimum value.

n_chunks()

Get number of chunks used by the ChunkedArrays of this DataFrame.

null_count()

Create a new DataFrame that shows the null counts per column.

partition_by()

Split into multiple DataFrames partitioned by groups.

pipe(func, *args, **kwargs)

Apply a function on Self.

pivot(values, index, columns[, ...])

Create a spreadsheet-style pivot table as a DataFrame.

product()

Aggregate the columns of this DataFrame to their product values

quantile(quantile[, interpolation])

Aggregate the columns of this DataFrame to their quantile value.

rechunk()

Rechunk the data in this DataFrame to a contiguous allocation.

rename(mapping)

Rename column names.

replace(column, new_col)

Replace a column by a new Series.

replace_at_idx(index, series)

Replace a column at an index location.

row(index)

Get a row as tuple.

rows()

Convert columnar data to rows as python tuples.

sample([n, frac, with_replacement, shuffle, ...])

Sample from this DataFrame by setting either n or frac.

select(exprs)

Select columns from this DataFrame.

select_at_idx(idx)

Select column at index location.

shift(periods)

Shift the values by a given period and fill the parts that will be empty due to this operation with Nones.

shift_and_fill(periods, fill_value)

Shift the values by a given period and fill the parts that will be empty due to this operation with the result of the fill_value expression.

shrink_to_fit()

Shrink memory usage of this DataFrame to fit the exact capacity needed to hold the data.

slice(offset, length)

Slice this DataFrame over the rows direction.

sort()

Sort the DataFrame by column.

std()

Aggregate the columns of this DataFrame to their standard deviation value.

sum()

Aggregate the columns of this DataFrame to their sum value.

tail([length])

Get last N rows as DataFrame.

to_arrow()

Collect the underlying arrow arrays in an Arrow Table.

to_avro(file[, compression])

Deprecated since version 0.13.12.

to_csv([file, has_header, sep])

Deprecated since version 0.13.12.

to_dict()

Convert DataFrame to a dictionary mapping column name to values.

to_dicts()

Convert every row to a dictionary.

to_dummies()

Get one hot encoded dummy variables.

to_ipc(file[, compression])

Deprecated since version 0.13.12.

to_json()

Deprecated since version 0.13.12.

to_numpy()

Convert DataFrame to a 2d numpy array.

to_pandas(*args[, date_as_object])

Cast to a pandas DataFrame.

to_parquet(file[, compression, statistics, ...])

Deprecated since version 0.13.12.

to_series([index])

Select column as Series at index location.

to_struct(name)

Convert a DataFrame to a Series of type Struct

transpose([include_header, header_name, ...])

Transpose a DataFrame over the diagonal.

unique([maintain_order, subset, keep])

Drop duplicate rows from this DataFrame.

unnest(names)

Decompose a struct into its fields.

upsample(time_column, every[, offset, by, ...])

Upsample a DataFrame at a regular frequency.

var()

Aggregate the columns of this DataFrame to their variance value.

vstack()

Grow this DataFrame vertically by stacking a DataFrame to it.

with_column(column)

Return a new DataFrame with the column added or replaced.

with_columns(exprs)

Add or overwrite multiple columns in a DataFrame.

with_row_count([name, offset])

Add a column at index 0 that counts the rows.

write_avro(file[, compression])

Write to Apache Avro file.

write_csv([file, has_header, sep, quote])

Write Dataframe to comma-separated values file (csv).

write_ipc(file[, compression])

Write to Arrow IPC binary stream, or a feather file.

write_json()

Serialize to JSON representation.

write_parquet(file[, compression, ...])

Write the DataFrame disk in parquet format.

Attributes

columns

Get or set column names.

dtypes

Get dtypes of columns in DataFrame.

height

Get the height of the DataFrame.

schema

Get a dict[column name, DataType]

shape

Get the shape of the DataFrame.

width

Get the width of the DataFrame.