DataFrame#

Constructor#

DataFrame([data, columns, orient])

Two-dimensional data structure representing data as a table with rows and columns.

Attributes#

DataFrame.columns

Get or set column names.

DataFrame.dtypes

Get dtypes of columns in DataFrame.

DataFrame.height

Get the height of the DataFrame.

DataFrame.schema

Get a dict[column name, DataType].

DataFrame.shape

Get the shape of the DataFrame.

DataFrame.width

Get the width of the DataFrame.

Conversion#

DataFrame.to_arrow()

Collect the underlying arrow arrays in an Arrow Table.

DataFrame.to_dict()

Convert DataFrame to a dictionary mapping column name to values.

DataFrame.to_dicts()

Convert every row to a dictionary.

DataFrame.to_numpy()

Convert DataFrame to a 2D NumPy array.

DataFrame.to_pandas(*args[, date_as_object])

Cast to a pandas DataFrame.

DataFrame.to_struct(name)

Convert a DataFrame to a Series of type Struct.

Aggregation#

DataFrame.max()

Aggregate the columns of this DataFrame to their maximum value.

DataFrame.mean()

Aggregate the columns of this DataFrame to their mean value.

DataFrame.median()

Aggregate the columns of this DataFrame to their median value.

DataFrame.min()

Aggregate the columns of this DataFrame to their minimum value.

DataFrame.product()

Aggregate the columns of this DataFrame to their product values.

DataFrame.quantile(quantile[, interpolation])

Aggregate the columns of this DataFrame to their quantile value.

DataFrame.std([ddof])

Aggregate the columns of this DataFrame to their standard deviation value.

DataFrame.sum()

Aggregate the columns of this DataFrame to their sum value.

DataFrame.var([ddof])

Aggregate the columns of this DataFrame to their variance value.

Descriptive stats#

DataFrame.describe()

Summary statistics for a DataFrame.

DataFrame.estimated_size([unit])

Return an estimation of the total (heap) allocated size of the DataFrame.

DataFrame.is_duplicated()

Get a mask of all duplicated rows in this DataFrame.

DataFrame.is_empty()

Check if the dataframe is empty.

DataFrame.is_unique()

Get a mask of all unique rows in this DataFrame.

DataFrame.n_chunks()

Get number of chunks used by the ChunkedArrays of this DataFrame.

DataFrame.null_count()

Create a new DataFrame that shows the null counts per column.

Computations#

DataFrame.fold(operation)

Apply a horizontal reduction on a DataFrame.

DataFrame.hash_rows([seed, seed_1, seed_2, ...])

Hash and combine the rows in this DataFrame.

Manipulation/ selection#

DataFrame.cleared()

Create an empty copy of the current DataFrame.

DataFrame.clone()

Cheap deepcopy/clone.

DataFrame.drop(name)

Remove column from DataFrame and return as new.

DataFrame.drop_in_place(name)

Drop in place.

DataFrame.drop_nulls([subset])

Return a new DataFrame where the null values are dropped.

DataFrame.explode(columns)

Explode DataFrame to long format by exploding a column with Lists.

DataFrame.extend(other)

Extend the memory backed by this DataFrame with the values from other.

DataFrame.fill_nan(fill_value)

Fill floating point NaN values by an Expression evaluation.

DataFrame.fill_null([value, strategy, ...])

Fill null values using the specified value or strategy.

DataFrame.filter(predicate)

Filter the rows in the DataFrame based on a predicate expression.

DataFrame.find_idx_by_name(name)

Find the index of a column by name.

DataFrame.get_column(name)

Get a single column as Series by name.

DataFrame.get_columns()

Get the DataFrame as a List of Series.

DataFrame.groupby(by[, maintain_order])

Start a groupby operation.

DataFrame.groupby_dynamic(index_column, every)

Group based on a time value (or index value of type Int32, Int64).

DataFrame.groupby_rolling(index_column, period)

Create rolling groups based on a time column.

DataFrame.head([n])

Get the first n rows.

DataFrame.hstack(columns[, in_place])

Return a new DataFrame grown horizontally by stacking multiple Series to it.

DataFrame.insert_at_idx(index, series)

Insert a Series at a certain column index.

DataFrame.interpolate()

Interpolate intermediate values.

DataFrame.join(other[, left_on, right_on, ...])

Join in SQL-like fashion.

DataFrame.join_asof(other[, left_on, ...])

Perform an asof join.

DataFrame.limit([n])

Get the first n rows.

DataFrame.melt([id_vars, value_vars, ...])

Unpivot a DataFrame from wide to long format.

DataFrame.partition_by()

Split into multiple DataFrames partitioned by groups.

DataFrame.pipe(func, *args, **kwargs)

Apply a function on Self.

DataFrame.pivot(values, index, columns[, ...])

Create a spreadsheet-style pivot table as a DataFrame.

DataFrame.rechunk()

Rechunk the data in this DataFrame to a contiguous allocation.

DataFrame.rename(mapping)

Rename column names.

DataFrame.replace(column, new_col)

Replace a column by a new Series.

DataFrame.replace_at_idx(index, series)

Replace a column at an index location.

DataFrame.reverse()

Reverse the DataFrame.

DataFrame.row([index, by_predicate])

Get a row as tuple, either by index or by predicate.

DataFrame.rows()

Convert columnar data to rows as python tuples.

DataFrame.sample([n, frac, ...])

Sample from this DataFrame.

DataFrame.select(exprs)

Select columns from this DataFrame.

DataFrame.shift(periods)

Shift values by the given period.

DataFrame.shift_and_fill(periods, fill_value)

Shift the values by a given period and fill the resulting null values.

DataFrame.shrink_to_fit([in_place])

Shrink DataFrame memory usage.

DataFrame.slice(offset[, length])

Get a slice of this DataFrame.

DataFrame.sort(by[, reverse, nulls_last])

Sort the DataFrame by column.

DataFrame.tail([n])

Get the last n rows.

DataFrame.take_every(n)

Take every nth row in the DataFrame and return as a new DataFrame.

DataFrame.to_dummies(*[, columns])

Get one hot encoded dummy variables.

DataFrame.to_series([index])

Select column as Series at index location.

DataFrame.transpose([include_header, ...])

Transpose a DataFrame over the diagonal.

DataFrame.unique([maintain_order, subset, keep])

Drop duplicate rows from this DataFrame.

DataFrame.unnest(names)

Decompose a struct into its fields.

DataFrame.unstack(step[, how, columns, ...])

Unstack a long table to a wide form without doing an aggregation.

DataFrame.upsample(time_column, every[, ...])

Upsample a DataFrame at a regular frequency.

DataFrame.vstack(df[, in_place])

Grow this DataFrame vertically by stacking a DataFrame to it.

DataFrame.with_column(column)

Return a new DataFrame with the column added or replaced.

DataFrame.with_columns([exprs])

Add or overwrite multiple columns in a DataFrame.

DataFrame.with_row_count([name, offset])

Add a column at index 0 that counts the rows.

Apply#

DataFrame.apply(f[, return_dtype, ...])

Apply a custom/user-defined function (UDF) over the rows of the DataFrame.

Various#

DataFrame.frame_equal(other[, null_equal])

Check if DataFrame is equal to other.

DataFrame.lazy()

Start a lazy query from this point.

GroupBy#

This namespace comes available by calling DataFrame.groupby(..).

GroupBy.agg(aggs)

Use multiple aggregations on columns.

GroupBy.agg_list()

Aggregate the groups into Series.

GroupBy.apply(f)

Apply a custom/user-defined function (UDF) over the groups as a sub-DataFrame.

GroupBy.count()

Count the number of values in each group.

GroupBy.first()

Aggregate the first values in the group.

GroupBy.head([n])

Get the first n rows of each group.

GroupBy.last()

Aggregate the last values in the group.

GroupBy.max()

Reduce the groups to the maximal value.

GroupBy.mean()

Reduce the groups to the mean values.

GroupBy.median()

Return the median per group.

GroupBy.min()

Reduce the groups to the minimal value.

GroupBy.n_unique()

Count the unique values per group.

GroupBy.pivot(pivot_column, values_column)

Do a pivot operation.

GroupBy.quantile(quantile[, interpolation])

Compute the quantile per group.

GroupBy.sum()

Reduce the groups to the sum.

GroupBy.tail([n])

Get the last n rows of each group.

Pivot#

This namespace comes available by calling DataFrame.groupby(..).pivot

Note that this API is deprecated in favor of `DataFrame.pivot`

PivotOps.count()

Count the values per group.

PivotOps.first()

Get the first value per group.

PivotOps.last()

Get the last value per group.

PivotOps.max()

Get the maximal value per group.

PivotOps.mean()

Get the mean value per group.

PivotOps.median()

Get the median value per group.

PivotOps.min()

Get the minimal value per group.

PivotOps.sum()

Get the sum per group.