Time Series Processing with Pandas: Difference between revisions
Jump to navigation
Jump to search
Line 24: | Line 24: | ||
</font> | </font> | ||
Create a [[Pandas_DataFrame|DataFrame]] by reading the CSV with <code>read_csv()</code> function. While loading it, we handle the "date" column as a [[Pandas_Concepts#Datetime|datetime]] type and we parse it accordingly by specifying the column to use as date to the <code>parse_dates</code> parameter | Create a [[Pandas_DataFrame|DataFrame]] by reading the CSV with <code>read_csv()</code> function. While loading it, we handle the "date" column as a [[Pandas_Concepts#Datetime|datetime]] type and we parse it accordingly by specifying the column to use as date to the <code>parse_dates</code> parameter: | ||
<syntaxhighlight lang='py'> | <syntaxhighlight lang='py'> | ||
df = pd.read_csv("./timeseries.csv", parse_dates=["date"]) | df = pd.read_csv("./timeseries.csv", parse_dates=["date"]) | ||
</syntaxhighlight> | |||
The DataFrame has a [[Pandas_Concepts#RangeIndex|default integral index]], and we replace it with the the content of the "date" column, turning it into a time index. | |||
<syntaxhighlight lang='py'> | |||
df = df.set_index(['date']) | |||
</syntaxhighlight> | </syntaxhighlight> |
Revision as of 19:07, 8 October 2023
Internal
Overview
This article provides hints on how time series can be processed with Pandas.
Load a Time Series
Assuming the data comes from a CSV file whose first column, labeled "date", contains timestamp-formatted strings, and the second column contains values corresponding to those timestamps, this is how the data is loaded and turned into a Pandas Series.
The content of the CSV file should be similar to:
date, value 2023-10-01, 133 2023-10-02, 135 2023-10-03, 139 2023-10-04, 123 2023-10-05, 122 2023-10-06, 119 2023-10-07, 117 2023-10-08, 130 2023-10-09, 132
Create a DataFrame by reading the CSV with read_csv()
function. While loading it, we handle the "date" column as a datetime type and we parse it accordingly by specifying the column to use as date to the parse_dates
parameter:
df = pd.read_csv("./timeseries.csv", parse_dates=["date"])
The DataFrame has a default integral index, and we replace it with the the content of the "date" column, turning it into a time index.
df = df.set_index(['date'])