Pandas DataFrame
External
- https://pandas.pydata.org/docs/user_guide/dsintro.html#dataframe
- https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame
Internal
Overview
A DataFrame is a two-dimensional data structure with columns of potentially different types. The data structure also contains labeled axes, for both rows and columns.
Can be thought of as a dict-like container for Series objects, where each column is a Series. The dimensionality of the DataFrame is given by its shape
property.
Shape
shape
is a property of the DataFrame, containing a tuple that returns the dimensionality of the DataFrame: rows, columns.
Index
By default, the DataFrame gets a RangeIndex.
However, the index of the DataFrame can be replaced with set_index()
.
Create a DataFrame
Create a DataFrame from a CSV File
Accessing Elements of a DataFrame
Accessing a Column
An individual column can be accessed with the []
operator, by specifying the column index (0-based) or the column name. The result is a Series:
df = ...
df[0]
df['Date']
iloc[]
A property that allows integer-based access (indexing). The location is specified as a 0-based index position. The property accepts a wide variety of arguments.
Used in the following situations:
Extract a Series from the DataFrame
iloc[]
can be used to extract a series from the DataFrame. The first argument is a slice specifying the series indexes, :
to extract the entire series, and the second argument specifies the column index in the DataFrame. The Series gets a default RangeIndex:
df = ...
# extract a series corresponding to DataFrame column 0
s = df.iloc[:,0]
loc[]
A property that allows label-based access (indexing).