Pandas DataFrame: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
Line 26: Line 26:


=Accessing Elements of a DataFrame=
=Accessing Elements of a DataFrame=
==Accessing a Column==
An individual column can be accessed with the <code>[]</code> operator, by specifying the column index (0-based) or the column name. The result is a [[Pandas_Series|Series]]:
<syntaxhighlight lang='py'>
df = ...
df[0]
df['Date']
</syntaxhighlight>
==<tt>iloc[]</tt>==
==<tt>iloc[]</tt>==
A property that allows integer-based access (indexing). The location is specified as a 0-based index position. The property accepts a wide variety of arguments.
A property that allows integer-based access (indexing). The location is specified as a 0-based index position. The property accepts a wide variety of arguments.

Revision as of 20:36, 8 October 2023

External

Internal

Overview

A DataFrame is a two-dimensional data structure with columns of potentially different types. The data structure also contains labeled axes, for both rows and columns.

Can be thought of as a dict-like container for Series objects, where each column is a Series. The dimensionality of the DataFrame is given by its shape property.

Shape

shape is a property of the DataFrame, containing a tuple that returns the dimensionality of the DataFrame: rows, columns.

Index

By default, the DataFrame gets a RangeIndex.

However, the index of the DataFrame can be replaced with set_index().

Create a DataFrame

Create a DataFrame from a CSV File

Accessing Elements of a DataFrame

Accessing a Column

An individual column can be accessed with the [] operator, by specifying the column index (0-based) or the column name. The result is a Series:

df = ...
df[0]
df['Date']


iloc[]

A property that allows integer-based access (indexing). The location is specified as a 0-based index position. The property accepts a wide variety of arguments.

Used in the following situations:

Extract a Series from the DataFrame

iloc[] can be used to extract a series from the DataFrame. The first argument is a slice specifying the series indexes, : to extract the entire series, and the second argument specifies the column index in the DataFrame. The Series gets a default RangeIndex:

df = ...
# extract a series corresponding to DataFrame column 0
s = df.iloc[:,0]

loc[]

A property that allows label-based access (indexing).

squeeze()

[]

Operations on DataFrames