Time Series Processing with Pandas

From NovaOrdis Knowledge Base
Revision as of 19:50, 8 October 2023 by Ovidiu (talk | contribs) (→‎Internal)
Jump to navigation Jump to search

Internal

TODO

<fot color=darkkhaki> TO PROCESS:

Overview

This article provides hints on how time series can be processed with Pandas.

A time series is a sequence of data points indexed in time order. The time index is a Datetime index object that contains timestamps corresponding to each data point. This time index allows for operations such as resampling, rolling and filtering.

Load a Time Series

Assuming the data comes from a CSV file whose first column, labeled "date", contains timestamp-formatted strings, and the second column contains values corresponding to those timestamps, this is how the data is loaded and turned into a Pandas Series.

The content of the CSV file should be similar to:

date, value
2023-10-01, 133
2023-10-02, 135
2023-10-03, 139
2023-10-04, 123
2023-10-05, 122
2023-10-06, 119
2023-10-07, 117
2023-10-08, 130
2023-10-09, 132

Create a DataFrame by reading the CSV with read_csv() function. While loading it, we handle the "date" column as a datetime type and we parse it accordingly by specifying the column to use as date to the parse_dates parameter:

df = pd.read_csv("./timeseries.csv", parse_dates=["date"])

The DataFrame has a default integral index, and we replace it with the the content of the "date" column, turning it into a time index.

df = df.set_index(['date'])

Note that setting the index to the "data" column changes the DataFrame dimensionality, it converts it from a (9, 2) DataFrame to a (9, 1) DataFrame, with a single column.

Then we extract the "value" column as a time series, since the DataFrame index is already a time index. The "value" column is the only column in the DataFrame after we replaced the index:

s = df.iloc[:,0]

The result is a time series:

date
2023-10-01    133
2023-10-02    135
2023-10-03    139
2023-10-04    123
2023-10-05    122
2023-10-06    119
2023-10-07    117
2023-10-08    130
2023-10-09    132
Name:  value, dtype: int64

Filter the Series

loc[]