Time Series Processing with Pandas: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
Line 24: Line 24:
</font>
</font>


Create a [[Pandas_DataFrame|DataFrame]] by reading the CSV with <code>read_csv()</code> function. While loading it, we handle the "date" column as a [[Pandas_Concepts#Datetime|datetime]] type and we parse it accordingly by specifying the column to use as date to the <code>parse_dates</code> parameter.
Create a [[Pandas_DataFrame|DataFrame]] by reading the CSV with <code>read_csv()</code> function. While loading it, we handle the "date" column as a [[Pandas_Concepts#Datetime|datetime]] type and we parse it accordingly by specifying the column to use as date to the <code>parse_dates</code> parameter:


<syntaxhighlight lang='py'>
<syntaxhighlight lang='py'>
df = pd.read_csv("./timeseries.csv", parse_dates=["date"])
df = pd.read_csv("./timeseries.csv", parse_dates=["date"])
</syntaxhighlight>
The DataFrame has a [[Pandas_Concepts#RangeIndex|default integral index]], and we replace it with the the content of the "date" column, turning it into a time index.
<syntaxhighlight lang='py'>
df = df.set_index(['date'])
</syntaxhighlight>
</syntaxhighlight>

Revision as of 19:07, 8 October 2023

Internal

Overview

This article provides hints on how time series can be processed with Pandas.

Load a Time Series

Assuming the data comes from a CSV file whose first column, labeled "date", contains timestamp-formatted strings, and the second column contains values corresponding to those timestamps, this is how the data is loaded and turned into a Pandas Series.

The content of the CSV file should be similar to:

date, value
2023-10-01, 133
2023-10-02, 135
2023-10-03, 139
2023-10-04, 123
2023-10-05, 122
2023-10-06, 119
2023-10-07, 117
2023-10-08, 130
2023-10-09, 132

Create a DataFrame by reading the CSV with read_csv() function. While loading it, we handle the "date" column as a datetime type and we parse it accordingly by specifying the column to use as date to the parse_dates parameter:

df = pd.read_csv("./timeseries.csv", parse_dates=["date"])

The DataFrame has a default integral index, and we replace it with the the content of the "date" column, turning it into a time index.

df = df.set_index(['date'])