Pandas DataFrame: Difference between revisions
(→Axes) |
(→Shape) |
||
Line 14: | Line 14: | ||
:::[[File:Panda_DataFrame_Axes.png]] | :::[[File:Panda_DataFrame_Axes.png]] | ||
A DataFrame shares the same direction for "axis 0" with its Series. | A DataFrame shares the same direction for "axis 0" with its Series. | ||
=Shape= | =Shape= | ||
Revision as of 02:48, 15 October 2023
External
- https://pandas.pydata.org/docs/user_guide/dsintro.html#dataframe
- https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame
Internal
Overview
A DataFrame is a two-dimensional data structure with columns of potentially different types and rows. A useful mental model for a DataFrame is a a dict-like container for Series objects, where each column is a Series.
Axes
The DataFrame has two axes: "axis 0" which is aligned alongside the DataFrame's rows pointing "downwards", representing rows, and "axis 1", which is aligned alongside the column headers, pointing from left to right, representing columns:
A DataFrame shares the same direction for "axis 0" with its Series.
Shape
shape
is a property of the DataFrame, containing a tuple that returns the dimensionality of the DataFrame: rows, columns.
Index
By default, the DataFrame gets a RangeIndex.
However, the index of the DataFrame can be replaced with set_index()
.
Create a DataFrame
Create a Data Frame Programmatically
import pandas as pd
df = pd.DataFrame({
'distance': [1, 2, 5, 8, 10, 25],
'strength': [0.98, 0.97, 0.88, 0.45, 0.20, 0.02]
})
Shows up as:
Create a DataFrame from a CSV File
import pandas as pd
df = pd.read_csv("./analysis.csv")
If the CSV file contains column that need to be handled as time series, see:
Accessing Elements of a DataFrame
Accessing a Column
An individual column can be accessed with the []
operator, by specifying the column column name. The result is a Series:
df = ...
df['Date']
type(df['Date']) # displays pandas.core.series.Series
iloc[]
A property that allows integer-based access (indexing). The location is specified as a 0-based index position. The property accepts a wide variety of arguments.
Used in the following situations:
Extract a Series from the DataFrame
iloc[]
can be used to extract a series from the DataFrame. The first argument is a slice specifying the series indexes, :
to extract the entire series, and the second argument specifies the column index in the DataFrame. The Series gets a default RangeIndex:
df = ...
# extract a series corresponding to DataFrame column 0
s = df.iloc[:,0]
loc[]
A property that allows label-based access (indexing).