DataFrame¶
Constructor¶
|
A DataFrame is a two-dimensional data structure that represents data as a table with rows and columns. |
Attributes¶
Get the shape of the DataFrame. |
|
Get the height of the DataFrame. |
|
Get the width of the DataFrame. |
|
Get or set column names. |
|
Get dtypes of columns in DataFrame. |
|
Get a dict[column name, DataType] |
Conversion¶
Collect the underlying arrow arrays in an Arrow Table. |
|
|
Deprecated since version 0.13.12. |
Deprecated since version 0.13.12. |
|
|
Cast to a pandas DataFrame. |
|
Deprecated since version 0.13.12. |
|
Deprecated since version 0.13.12. |
|
Deprecated since version 0.13.12. |
Convert DataFrame to a 2d numpy array. |
|
Convert DataFrame to a dictionary mapping column name to values. |
|
Convert every row to a dictionary. |
|
|
Convert a |
Aggregation¶
Aggregate the columns of this DataFrame to their maximum value. |
|
Aggregate the columns of this DataFrame to their minimum value. |
|
Aggregate the columns of this DataFrame to their sum value. |
|
Aggregate the columns of this DataFrame to their mean value. |
|
Aggregate the columns of this DataFrame to their standard deviation value. |
|
Aggregate the columns of this DataFrame to their variance value. |
|
Aggregate the columns of this DataFrame to their median value. |
|
|
Aggregate the columns of this DataFrame to their quantile value. |
Aggregate the columns of this DataFrame to their product values |
Descriptive stats¶
Summary statistics for a DataFrame. |
|
Returns an estimation of the total (heap) allocated size of the DataFrame in bytes. |
|
Get a mask of all duplicated rows in this DataFrame. |
|
Get a mask of all unique rows in this DataFrame. |
|
Get number of chunks used by the ChunkedArrays of this DataFrame. |
|
Create a new DataFrame that shows the null counts per column. |
|
Check if the dataframe is empty |
Computations¶
|
Hash and combine the rows in this DataFrame. |
|
Apply a horizontal reduction on a DataFrame. |
Manipulation/ selection¶
|
Rename column names. |
|
Add a column at index 0 that counts the rows. |
|
Insert a Series at a certain column index. |
|
Filter the rows in the DataFrame based on a predicate expression. |
Find the index of a column by name. |
|
Select column at index location. |
|
|
Replace a column at an index location. |
Sort the DataFrame by column. |
|
|
Replace a column by a new Series. |
|
Slice this DataFrame over the rows direction. |
|
Get first N rows as DataFrame. |
|
Get first N rows as DataFrame. |
|
Get last N rows as DataFrame. |
|
Return a new DataFrame where the null values are dropped. |
|
Remove column from DataFrame and return as new. |
|
Drop in place. |
|
Select column as Series at index location. |
Very cheap deep clone. |
|
Get the DataFrame as a List of Series. |
|
|
Get a single column as Series by name. |
|
Fill null values using a filling strategy, literal, or Expr. |
|
Fill floating point NaN values by an Expression evaluation. |
|
Explode DataFrame to long format by exploding a column with Lists. |
|
Create a spreadsheet-style pivot table as a DataFrame. |
|
Unpivot a DataFrame from wide to long format, optionally leaving identifiers set. |
|
Shift the values by a given period and fill the parts that will be empty due to this operation with Nones. |
|
Shift the values by a given period and fill the parts that will be empty due to this operation with the result of the fill_value expression. |
|
Return a new DataFrame with the column added or replaced. |
|
Return a new DataFrame grown horizontally by stacking multiple Series to it. |
Grow this DataFrame vertically by stacking a DataFrame to it. |
|
|
Extend the memory backed by this DataFrame with the values from other. |
|
Start a groupby operation. |
|
Groups based on a time value (or index value of type Int32, Int64). |
|
Create rolling groups based on a time column (or index value of type Int32, Int64). |
|
Select columns from this DataFrame. |
|
Add or overwrite multiple columns in a DataFrame. |
|
Sample from this DataFrame by setting either n or frac. |
|
Get a row as tuple. |
Convert columnar data to rows as python tuples. |
|
Get one hot encoded dummy variables. |
|
|
Deprecated since version 0.13.13. |
|
Drop duplicate rows from this DataFrame. |
Shrink memory usage of this DataFrame to fit the exact capacity needed to hold the data. |
|
Rechunk the data in this DataFrame to a contiguous allocation. |
|
|
Apply a function on Self. |
|
SQL like joins. |
|
Perform an asof join. |
Interpolate intermediate values. |
|
|
Transpose a DataFrame over the diagonal. |
Split into multiple DataFrames partitioned by groups. |
|
|
Upsample a DataFrame at a regular frequency. |
|
Decompose a struct into its fields. |
Apply¶
|
Apply a custom function over the rows of the DataFrame. |
Various¶
|
Check if DataFrame is equal to other. |
Start a lazy query from this point. |
GroupBy¶
This namespace comes available by calling DataFrame.groupby(..).
|
Use multiple aggregations on columns. |
Apply a function over the groups as a sub-DataFrame. |
|
|
Return first n rows of each group. |
|
Return last n rows of each group. |
|
Select a single group as a new DataFrame. |
Return a DataFrame with: |
|
|
Do a pivot operation based on the group key, a pivot column and an aggregation function on the values column. |
Aggregate the first values in the group. |
|
Aggregate the last values in the group. |
|
Reduce the groups to the sum. |
|
Reduce the groups to the minimal value. |
|
Reduce the groups to the maximal value. |
|
Count the number of values in each group. |
|
Reduce the groups to the mean values. |
|
Count the unique values per group. |
|
|
Compute the quantile per group. |
Return the median per group. |
|
Aggregate the groups into Series. |
Pivot¶
This namespace comes available by calling DataFrame.groupby(..).pivot
Note that this API is deprecated in favor of `DataFrame.pivot`
Get the first value per group. |
|
Get the last value per group. |
|
Get the sum per group. |
|
Get the minimal value per group. |
|
Get the maximal value per group. |
|
Get the mean value per group. |
|
Count the values per group. |
|
Get the median value per group. |