Revision as of 21:42, 20 October 2023

Internal

TODO

TO PROCESS:

Overview

This article provides hints on how time series can be processed with Pandas.

A time series is a sequence of data points indexed in time order. The time index is a Datetime index object that contains timestamps corresponding to each data point. This time index allows for operations such as resampling, rolling and filtering.

Import Pandas

import pandas as pd

Synthetic Time Series

Load a Time Series

Assuming the data comes from a CSV file whose first column, labeled "date", contains timestamp-formatted strings, and the second column contains values corresponding to those timestamps, this is how the data is loaded and turned into a Pandas Series.

The content of the CSV file should be similar to:

date, value
2023-10-01, 133
2023-10-02, 135
2023-10-03, 139
2023-10-04, 123
2023-10-05, 122
2023-10-06, 119
2023-10-07, 117
2023-10-08, 130
2023-10-09, 132

Create a DataFrame by reading the CSV with read_csv() function. While loading it, we handle the "date" column as a datetime type and we parse it accordingly by specifying the column to use as date to the parse_dates parameter:

df = pd.read_csv("./timeseries.csv", parse_dates=["date"])

The date is expected in a YYYY-MM-DD "2023-12-31" format. To handle custom date or time formats, see:

read_csv() Custom Date Format

To verify that the date column was correctly parsed, display df['date'], it should have a datetime64[ns] type.

The DataFrame has a default integral index, and we replace it with the the content of the "date" column, turning it into a time index:

df = df.set_index('date')

For more details on set_index(), see:

DataFrame | set_index()

Note that setting the index to the "date" column changes the DataFrame dimensionality, it reduces the number of column with one, as the "date" column will be used as index. Also, the index replacement takes place for a newly created DataFrame, returned as the result of the function. To perform an in-place replacement, use inplace=True and drop=True as arguments.

Then we extract the "value" column as a time series, since the DataFrame index is already a time index. The "value" column is the only column in the DataFrame after we replaced the index:

s = df.iloc[:,0]

The result is a time series:

date
2023-10-01    133
2023-10-02    135
2023-10-03    139
2023-10-04    123
2023-10-05    122
2023-10-06    119
2023-10-07    117
2023-10-08    130
2023-10-09    132
Name:  value, dtype: int64

Transform the Elements of the Series

If the elements of the series need transformation, use the methods described here:

Series Transformation

Filter the Series

`loc[]`

Creates a new, filtered time series:

s = s.loc['2023-09-17':'2023-10-05']

Resample a Time Series with Another Frequency

TODO: https://pandas.pydata.org/docs/getting_started/intro_tutorials/09_timeseries.html#resample-a-time-series-to-another-frequency

@@ Line 26: / Line 26: @@
 <syntaxhighlight lang='py'>
 import pandas as pd
+</syntaxhighlight>
+=Synthetic Time Series=
+<syntaxhighlight lang='py'>
 </syntaxhighlight>

Time Series Processing with Pandas: Difference between revisions

Revision as of 21:42, 20 October 2023

Contents

Internal

TODO

Overview

Import Pandas

Synthetic Time Series

Load a Time Series

Transform the Elements of the Series

Filter the Series

`loc[]`

Resample a Time Series with Another Frequency

Navigation menu

Time Series Processing with Pandas: Difference between revisions

Revision as of 21:42, 20 October 2023

Internal

TODO

Overview

Import Pandas

Synthetic Time Series

Load a Time Series

Transform the Elements of the Series

Filter the Series

loc[]

Resample a Time Series with Another Frequency

Navigation menu

Search

`loc[]`