Pandas DataFrame: Difference between revisions
Line 25: | Line 25: | ||
Used in the following situations: | Used in the following situations: | ||
===Extract a Series from the DataFrame=== | ===Extract a Series from the DataFrame=== | ||
<code>iloc[]</code> can be used to extract a series from the DataFrame. The first argument is a slice specifying the series indexes, <code>:</code> to extract the entire series, and the second argument specifies the column index in the DataFrame: | <code>iloc[]</code> can be used to extract a series from the DataFrame. The first argument is a slice specifying the series indexes, <code>:</code> to extract the entire series, and the second argument specifies the column index in the DataFrame. The Series gets a default [[Pandas_Series#RangeIndex|RangeIndex]]: | ||
<syntaxhighlight lang='py'> | <syntaxhighlight lang='py'> | ||
df = ... | df = ... |
Revision as of 18:44, 8 October 2023
External
- https://pandas.pydata.org/docs/user_guide/dsintro.html#dataframe
- https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.html#pandas.DataFrame
Internal
Overview
A DataFrame is a two-dimensional data structure with columns of potentially different types. The data structure also contains labeled axes, for both rows and columns.
Can be thought of as a dict-like container for Series objects, where each column is a Series. The dimensionality of the DataFrame is given by its shape
property.
Shape
shape
is a property of the DataFrame, containing a tuple that returns the dimensionality of the DataFrame: rows, columns.
Create a DataFrame
Create a DataFrame from a CSV File
Accessing Elements of a DataFrame
iloc[]
A property that allows integer-based access (indexing). The location is specified as a 0-based index position. The property accepts a wide variety of arguments.
Used in the following situations:
Extract a Series from the DataFrame
iloc[]
can be used to extract a series from the DataFrame. The first argument is a slice specifying the series indexes, :
to extract the entire series, and the second argument specifies the column index in the DataFrame. The Series gets a default RangeIndex:
df = ...
# extract a series corresponding to DataFrame column 0
s = df.iloc[:,0]
loc[]
A property that allows label-based access (indexing).