Time Series Processing with Pandas
Internal
TODO
TO PROCESS:
Overview
This article provides hints on how time series can be processed with Pandas.
A time series is a sequence of data points indexed in time order. The time index is a Datetime index object that contains timestamps corresponding to each data point. This time index allows for operations such as resampling, rolling and filtering.
Load a Time Series
Assuming the data comes from a CSV file whose first column, labeled "date", contains timestamp-formatted strings, and the second column contains values corresponding to those timestamps, this is how the data is loaded and turned into a Pandas Series.
The content of the CSV file should be similar to:
date, value 2023-10-01, 133 2023-10-02, 135 2023-10-03, 139 2023-10-04, 123 2023-10-05, 122 2023-10-06, 119 2023-10-07, 117 2023-10-08, 130 2023-10-09, 132
Create a DataFrame by reading the CSV with read_csv()
function. While loading it, we handle the "date" column as a datetime type and we parse it accordingly by specifying the column to use as date to the parse_dates
parameter:
df = pd.read_csv("./timeseries.csv", parse_dates=["date"])
The DataFrame has a default integral index, and we replace it with the the content of the "date" column, turning it into a time index.
df = df.set_index(['date'])
Note that setting the index to the "data" column changes the DataFrame dimensionality, it converts it from a (9, 2) DataFrame to a (9, 1) DataFrame, with a single column.
Then we extract the "value" column as a time series, since the DataFrame index is already a time index. The "value" column is the only column in the DataFrame after we replaced the index:
s = df.iloc[:,0]
The result is a time series:
date 2023-10-01 133 2023-10-02 135 2023-10-03 139 2023-10-04 123 2023-10-05 122 2023-10-06 119 2023-10-07 117 2023-10-08 130 2023-10-09 132 Name: value, dtype: int64