Time Series Processing with Pandas
Internal
TODO
TO PROCESS:
- https://pandas.pydata.org/docs/getting_started/intro_tutorials/09_timeseries.html
- https://saturncloud.io/blog/how-to-filter-pandas-dataframe-by-time-index/
- https://pandas.pydata.org/docs/reference/api/pandas.Timestamp.html#pandas.Timestamp
- https://pandas.pydata.org/docs/user_guide/timeseries.html#timeseries-overview
- https://pandas.pydata.org/docs/user_guide/timeseries.html#timeseries-datetimeindex
Overview
This article provides hints on how time series can be processed with Pandas.
A time series is a sequence of data points indexed in time order. The time index is a Datetime index object that contains timestamps corresponding to each data point. This time index allows for operations such as resampling, rolling and filtering.
Import Pandas
import pandas as pd
Synthetic Time Series
df = pd.DataFrame({
'date': [
pd.to_datetime('2023-10-01'),
pd.to_datetime('2023-10-05'),
pd.to_datetime('2023-10-15')],
'value': [10, 15, 22]
})
Even if to_datetime()
creates Timestamp
object instances, the df['date']
Series is a timeseries, its elements are datetime64[ns]
. Why?.
Load a Time Series
Assuming the data comes from a CSV file whose first column, labeled "date", contains timestamp-formatted strings, and the second column contains values corresponding to those timestamps, this is how the data is loaded and turned into a Pandas Series.
The content of the CSV file should be similar to:
date, value 2023-10-01, 133 2023-10-02, 135 2023-10-03, 139 2023-10-04, 123 2023-10-05, 122 2023-10-06, 119 2023-10-07, 117 2023-10-08, 130 2023-10-09, 132
Create a DataFrame by reading the CSV with read_csv()
function.
Parse Timestamps while Loading
While loading it, we handle the "date" column as a datetime type and we parse it accordingly by specifying the column to use as date to the parse_dates
parameter:
df = pd.read_csv("./timeseries.csv", parse_dates=["date"])
The date is expected in a YYYY-MM-DD
"2023-12-31" format. To handle custom date or time formats, see:
To verify that the date column was correctly parsed, display df['date']
, it should have a datetime64[ns]
type.
Parse Timestamps after Loading
Not tested yet.
Alternatively, the column carrying timestamps can be converted to datetime
after loading:
df['date'] = pd.to_datetime(df['date'])
Reset the Index
The DataFrame has a default integral index, and we replace it with the the content of the "date" column, turning it into a time index:
df = df.set_index('date')
Note that setting the index to the "date" column changes the DataFrame dimensionality, it reduces the number of column with one, as the "date" column will be used as index. Also, the index replacement takes place for a newly created DataFrame, returned as the result of the function. To perform an in-place replacement, use inplace=True
and drop=True
as arguments. For more details, see
Then we extract the "value" column as a time series, since the DataFrame index is already a time index. The "value" column is the only column in the DataFrame after we replaced the index:
s = df.iloc[:,0]
Get the Interesting Series
The result is a time series:
date 2023-10-01 133 2023-10-02 135 2023-10-03 139 2023-10-04 123 2023-10-05 122 2023-10-06 119 2023-10-07 117 2023-10-08 130 2023-10-09 132 Name: value, dtype: int64
Transform the Elements of the Series
If the elements of the series need transformation, use the methods described here: