Pandas DataFrame: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
Line 14: Line 14:
:::[[File:Panda_DataFrame_Axes.png]]
:::[[File:Panda_DataFrame_Axes.png]]
A DataFrame shares the same direction for "axis 0" with its Series.
A DataFrame shares the same direction for "axis 0" with its Series.
=Shape=
The dimensionality of the DataFrame is given by its <code>[[#Shape|shape]]</code> property.
=Shape=
=Shape=



Revision as of 02:48, 15 October 2023

External

Internal

Overview

A DataFrame is a two-dimensional data structure with columns of potentially different types and rows. A useful mental model for a DataFrame is a a dict-like container for Series objects, where each column is a Series.

Axes

The DataFrame has two axes: "axis 0" which is aligned alongside the DataFrame's rows pointing "downwards", representing rows, and "axis 1", which is aligned alongside the column headers, pointing from left to right, representing columns:

Panda DataFrame Axes.png

A DataFrame shares the same direction for "axis 0" with its Series.

Shape

shape is a property of the DataFrame, containing a tuple that returns the dimensionality of the DataFrame: rows, columns.

Index

By default, the DataFrame gets a RangeIndex.

However, the index of the DataFrame can be replaced with set_index().

Create a DataFrame

Create a Data Frame Programmatically

import pandas as pd
df = pd.DataFrame({
    'distance': [1, 2, 5, 8, 10, 25],
    'strength': [0.98, 0.97, 0.88, 0.45, 0.20, 0.02]
})

Shows up as:

Panda DataFrame.png

Create a DataFrame from a CSV File

import pandas as pd
df = pd.read_csv("./analysis.csv")

If the CSV file contains column that need to be handled as time series, see:

Load a Time Series

Accessing Elements of a DataFrame

Accessing a Column

An individual column can be accessed with the [] operator, by specifying the column column name. The result is a Series:

df = ...
df['Date']
type(df['Date']) # displays pandas.core.series.Series

iloc[]

A property that allows integer-based access (indexing). The location is specified as a 0-based index position. The property accepts a wide variety of arguments.

Used in the following situations:

Extract a Series from the DataFrame

iloc[] can be used to extract a series from the DataFrame. The first argument is a slice specifying the series indexes, : to extract the entire series, and the second argument specifies the column index in the DataFrame. The Series gets a default RangeIndex:

df = ...
# extract a series corresponding to DataFrame column 0
s = df.iloc[:,0]

loc[]

A property that allows label-based access (indexing).

squeeze()

[]

Operations on DataFrames