DataFrame

Constructor

DataFrame([data, columns, orient])

A DataFrame is a two-dimensional data structure that represents data as a table with rows and columns.

Attributes

DataFrame.shape

Get the shape of the DataFrame.

DataFrame.height

Get the height of the DataFrame.

DataFrame.width

Get the width of the DataFrame.

DataFrame.columns

Get or set column names.

DataFrame.dtypes

Get dtypes of columns in DataFrame.

DataFrame.schema

Get a dict[column name, DataType]

Conversion

DataFrame.to_arrow()

Collect the underlying arrow arrays in an Arrow Table.

DataFrame.to_avro(file[, compression])

Deprecated since version 0.13.12.

DataFrame.to_json()

Deprecated since version 0.13.12.

DataFrame.to_pandas(*args[, date_as_object])

Cast to a pandas DataFrame.

DataFrame.to_csv([file, has_header, sep])

Deprecated since version 0.13.12.

DataFrame.to_ipc(file[, compression])

Deprecated since version 0.13.12.

DataFrame.to_parquet(file[, compression, ...])

Deprecated since version 0.13.12.

DataFrame.to_numpy()

Convert DataFrame to a 2d numpy array.

DataFrame.to_dict()

Convert DataFrame to a dictionary mapping column name to values.

DataFrame.to_dicts()

Convert every row to a dictionary.

DataFrame.to_struct(name)

Convert a DataFrame to a Series of type Struct

Aggregation

DataFrame.max()

Aggregate the columns of this DataFrame to their maximum value.

DataFrame.min()

Aggregate the columns of this DataFrame to their minimum value.

DataFrame.sum()

Aggregate the columns of this DataFrame to their sum value.

DataFrame.mean()

Aggregate the columns of this DataFrame to their mean value.

DataFrame.std()

Aggregate the columns of this DataFrame to their standard deviation value.

DataFrame.var()

Aggregate the columns of this DataFrame to their variance value.

DataFrame.median()

Aggregate the columns of this DataFrame to their median value.

DataFrame.quantile(quantile[, interpolation])

Aggregate the columns of this DataFrame to their quantile value.

DataFrame.product()

Aggregate the columns of this DataFrame to their product values

Descriptive stats

DataFrame.describe()

Summary statistics for a DataFrame.

DataFrame.estimated_size()

Returns an estimation of the total (heap) allocated size of the DataFrame in bytes.

DataFrame.is_duplicated()

Get a mask of all duplicated rows in this DataFrame.

DataFrame.is_unique()

Get a mask of all unique rows in this DataFrame.

DataFrame.n_chunks()

Get number of chunks used by the ChunkedArrays of this DataFrame.

DataFrame.null_count()

Create a new DataFrame that shows the null counts per column.

DataFrame.is_empty()

Check if the dataframe is empty

Computations

DataFrame.hash_rows([k0, k1, k2, k3])

Hash and combine the rows in this DataFrame.

DataFrame.fold(operation)

Apply a horizontal reduction on a DataFrame.

Manipulation/ selection

DataFrame.rename(mapping)

Rename column names.

DataFrame.with_row_count([name, offset])

Add a column at index 0 that counts the rows.

DataFrame.insert_at_idx(index, series)

Insert a Series at a certain column index.

DataFrame.filter(predicate)

Filter the rows in the DataFrame based on a predicate expression.

DataFrame.find_idx_by_name(name)

Find the index of a column by name.

DataFrame.select_at_idx(idx)

Select column at index location.

DataFrame.replace_at_idx(index, series)

Replace a column at an index location.

DataFrame.sort()

Sort the DataFrame by column.

DataFrame.replace(column, new_col)

Replace a column by a new Series.

DataFrame.slice(offset, length)

Slice this DataFrame over the rows direction.

DataFrame.limit([length])

Get first N rows as DataFrame.

DataFrame.head([length])

Get first N rows as DataFrame.

DataFrame.tail([length])

Get last N rows as DataFrame.

DataFrame.drop_nulls([subset])

Return a new DataFrame where the null values are dropped.

DataFrame.drop(name)

Remove column from DataFrame and return as new.

DataFrame.drop_in_place(name)

Drop in place.

DataFrame.to_series([index])

Select column as Series at index location.

DataFrame.clone()

Very cheap deep clone.

DataFrame.get_columns()

Get the DataFrame as a List of Series.

DataFrame.get_column(name)

Get a single column as Series by name.

DataFrame.fill_null(strategy)

Fill null values using a filling strategy, literal, or Expr.

DataFrame.fill_nan(fill_value)

Fill floating point NaN values by an Expression evaluation.

DataFrame.explode(columns)

Explode DataFrame to long format by exploding a column with Lists.

DataFrame.pivot(values, index, columns[, ...])

Create a spreadsheet-style pivot table as a DataFrame.

DataFrame.melt([id_vars, value_vars, ...])

Unpivot a DataFrame from wide to long format, optionally leaving identifiers set.

DataFrame.shift(periods)

Shift the values by a given period and fill the parts that will be empty due to this operation with Nones.

DataFrame.shift_and_fill(periods, fill_value)

Shift the values by a given period and fill the parts that will be empty due to this operation with the result of the fill_value expression.

DataFrame.with_column(column)

Return a new DataFrame with the column added or replaced.

DataFrame.hstack(columns[, in_place])

Return a new DataFrame grown horizontally by stacking multiple Series to it.

DataFrame.vstack()

Grow this DataFrame vertically by stacking a DataFrame to it.

DataFrame.extend(other)

Extend the memory backed by this DataFrame with the values from other.

DataFrame.groupby(by[, maintain_order])

Start a groupby operation.

DataFrame.groupby_dynamic(index_column, every)

Groups based on a time value (or index value of type Int32, Int64).

DataFrame.groupby_rolling(index_column, period)

Create rolling groups based on a time column (or index value of type Int32, Int64).

DataFrame.select(exprs)

Select columns from this DataFrame.

DataFrame.with_columns(exprs)

Add or overwrite multiple columns in a DataFrame.

DataFrame.sample([n, frac, ...])

Sample from this DataFrame by setting either n or frac.

DataFrame.row(index)

Get a row as tuple.

DataFrame.rows()

Convert columnar data to rows as python tuples.

DataFrame.to_dummies()

Get one hot encoded dummy variables.

DataFrame.distinct([maintain_order, subset, ...])

Deprecated since version 0.13.13.

DataFrame.unique([maintain_order, subset, keep])

Drop duplicate rows from this DataFrame.

DataFrame.shrink_to_fit()

Shrink memory usage of this DataFrame to fit the exact capacity needed to hold the data.

DataFrame.rechunk()

Rechunk the data in this DataFrame to a contiguous allocation.

DataFrame.pipe(func, *args, **kwargs)

Apply a function on Self.

DataFrame.join(df[, left_on, right_on, on, ...])

SQL like joins.

DataFrame.join_asof(df[, left_on, right_on, ...])

Perform an asof join.

DataFrame.interpolate()

Interpolate intermediate values.

DataFrame.transpose([include_header, ...])

Transpose a DataFrame over the diagonal.

DataFrame.partition_by()

Split into multiple DataFrames partitioned by groups.

DataFrame.upsample(time_column, every[, ...])

Upsample a DataFrame at a regular frequency.

DataFrame.unnest(names)

Decompose a struct into its fields.

Apply

DataFrame.apply(f[, return_dtype, ...])

Apply a custom function over the rows of the DataFrame.

Various

DataFrame.frame_equal(other[, null_equal])

Check if DataFrame is equal to other.

DataFrame.lazy()

Start a lazy query from this point.

GroupBy

This namespace comes available by calling DataFrame.groupby(..).

GroupBy.agg(column_to_agg)

Use multiple aggregations on columns.

GroupBy.apply(f)

Apply a function over the groups as a sub-DataFrame.

GroupBy.head([n])

Return first n rows of each group.

GroupBy.tail([n])

Return last n rows of each group.

GroupBy.get_group(group_value)

Select a single group as a new DataFrame.

GroupBy.groups()

Return a DataFrame with:

GroupBy.pivot(pivot_column, values_column)

Do a pivot operation based on the group key, a pivot column and an aggregation function on the values column.

GroupBy.first()

Aggregate the first values in the group.

GroupBy.last()

Aggregate the last values in the group.

GroupBy.sum()

Reduce the groups to the sum.

GroupBy.min()

Reduce the groups to the minimal value.

GroupBy.max()

Reduce the groups to the maximal value.

GroupBy.count()

Count the number of values in each group.

GroupBy.mean()

Reduce the groups to the mean values.

GroupBy.n_unique()

Count the unique values per group.

GroupBy.quantile(quantile[, interpolation])

Compute the quantile per group.

GroupBy.median()

Return the median per group.

GroupBy.agg_list()

Aggregate the groups into Series.

Pivot

This namespace comes available by calling DataFrame.groupby(..).pivot

Note that this API is deprecated in favor of `DataFrame.pivot`

PivotOps.first()

Get the first value per group.

PivotOps.last()

Get the last value per group.

PivotOps.sum()

Get the sum per group.

PivotOps.min()

Get the minimal value per group.

PivotOps.max()

Get the maximal value per group.

PivotOps.mean()

Get the mean value per group.

PivotOps.count()

Count the values per group.

PivotOps.median()

Get the median value per group.