Pandas DataFrame: Difference between revisions
(→Index) |
(→Index) |
||
Line 38: | Line 38: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
This kind of index is associated with the [[#Axes|column axis]] of a DataFrame. | This kind of index is associated with the [[#Axes|column axis]] of a DataFrame. | ||
Also see: {{Internal| | Also see: {{Internal|Pandas_Concepts#Index|Pandas Concepts | Index}} | ||
=Create a DataFrame= | =Create a DataFrame= |
Revision as of 01:44, 16 October 2023
External
- https://pandas.pydata.org/docs/user_guide/dsintro.html#dataframe
- https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame
Internal
Overview
A DataFrame is a two-dimensional data structure with columns of potentially different types and rows. A useful mental model for a DataFrame is a a dict-like container for Series objects, where each column is a Series. Some times, documentation refers to columns as "features" or "variables". The values stored in rows are referred to as "records".
Axes
The DataFrame has two axes: "axis 0" which is aligned alongside the DataFrame's rows pointing "downwards", representing rows, and "axis 1", which is aligned alongside the column headers, pointing from left to right, representing columns:
A DataFrame shares the same direction for "axis 0" with its Series' axes.
The DataFrame axes
property gives access to an array containing two Index instances, the first for the rows, the second one for the columns:
assert len(df.axes) == 2
print(df.axes)
[RangeIndex(start=0, stop=6, step=1), Index(['distance', 'strength'], dtype='object')]
Shape
The DataFrame shape
property contains a tuple that returns the dimensionality of the DataFrame: (rows, columns).
Index
By default, the DataFrame gets a RangeIndex.
However, the index of the DataFrame can be replaced with set_index()
.
The columns can be accessed via a generic index:
Index(['distance', 'strength'], dtype='object')
This kind of index is associated with the column axis of a DataFrame.
Also see:
Create a DataFrame
Create a Data Frame Programmatically
import pandas as pd
df = pd.DataFrame({
'distance': [1, 2, 5, 8, 10, 25],
'strength': [0.98, 0.97, 0.88, 0.45, 0.20, 0.02]
})
Shows up as:
Create a DataFrame from a CSV File
import pandas as pd
df = pd.read_csv("./analysis.csv")
If the CSV file contains column that need to be handled as time series, see:
Accessing Elements of a DataFrame
Accessing a Column
An individual column can be accessed with the []
operator, by specifying the column column name. The result is a Series:
df = ...
df['Date']
type(df['Date']) # displays pandas.core.series.Series
iloc[]
A property that allows integer-based access (indexing). The location is specified as a 0-based index position. The property accepts a wide variety of arguments.
Used in the following situations:
Extract a Series from the DataFrame
iloc[]
can be used to extract a series from the DataFrame. The first argument is a slice specifying the series indexes, :
to extract the entire series, and the second argument specifies the column index in the DataFrame. The Series gets a default RangeIndex:
df = ...
# extract a series corresponding to DataFrame column 0
s = df.iloc[:,0]
loc[]
A property that allows label-based access (indexing).