Pandas Series: Difference between revisions
(7 intermediate revisions by the same user not shown) | |||
Line 13: | Line 13: | ||
Every series has a [[#Name|name]] and a data type, which are both reported when the series is printed. | Every series has a [[#Name|name]] and a data type, which are both reported when the series is printed. | ||
A Series is implemented with a | A Series is implemented with a NumPy <code>[[NumPy_ndarray#Overview|ndarray]]</code>. | ||
=Axis= | =Axis= | ||
The Series has just one [[Pandas_Concepts#Axis|axis]], "axis 0", which is aligned alongside the Series values, pointing "downwards": | The Series has just one [[Pandas_Concepts#Axis|axis]], "axis 0", which is aligned alongside the Series values, pointing "downwards": | ||
Line 66: | Line 67: | ||
==Create a Series from CSV== | ==Create a Series from CSV== | ||
{{Internal|Pandas_CSV#Create_a_Series_from_CSV|Pandas CSV | Create a Series from CSV}} | |||
===Create a Time Series from CSV=== | ===Create a Time Series from CSV=== | ||
{{Internal|Pandas_CSV#Create_a_Time_Series_from_CSV|Pandas CSV | Create a Time Series from CSV}} | |||
==Create a Series from JSON== | ==Create a Series from JSON== | ||
Line 95: | Line 88: | ||
<syntaxhighlight lang='py'> | <syntaxhighlight lang='py'> | ||
s.loc[0] | s.loc[0] | ||
s.loc['2023-10-10'] | |||
</syntaxhighlight> | </syntaxhighlight> | ||
Line 164: | Line 158: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
==Interpolation== | ==Interpolation== | ||
{{Internal|Pandas Series Interpolation#Overview|Series Interpolation}} | {{Internal|Pandas Time Series Resampling and Interpolation#Overview|Time Series Resampling and Interpolation}} | ||
==Binary Operations== | ==Binary Operations with Series== | ||
<font color=darkkhaki>TO PROCESS: https://www.geeksforgeeks.org/python-pandas-series/</font> | <font color=darkkhaki>TO PROCESS: https://www.geeksforgeeks.org/python-pandas-series/</font> | ||
The series must be identically sampled: | |||
<syntaxhighlight lang='py'> | |||
sp500_perc_diff = fid_slf.sub(sp500_perf).div(sp500_perf).mul(100) | |||
</syntaxhighlight> | |||
=Using a matplotlib Plot with Pandas Series= | =Using a matplotlib Plot with Pandas Series= | ||
{{Internal|Using a Matplotlib Plot with Pandas Series|Using a matplotlib Plot with Pandas Series}} | {{Internal|Using a Matplotlib Plot with Pandas Series|Using a matplotlib Plot with Pandas Series}} |
Latest revision as of 19:43, 20 May 2024
External
- https://pandas.pydata.org/docs/user_guide/dsintro.html#series
- https://pandas.pydata.org/docs/reference/api/pandas.Series.html#pandas.Series
- https://www.geeksforgeeks.org/python-pandas-series/
Internal
Overview
A Series is a one-dimensional array of values, where each value has a label. The labels are referred to as "axis labels" and they are managed by the series's index. By default, in absence of any explicit specification, a series gets a monotonic integer range index, starting with 0 and with the step 1, allowing retrieving data with 0-based integer indexes (see Accessing Elements of a Series below).
Every series has a name and a data type, which are both reported when the series is printed.
A Series is implemented with a NumPy ndarray
.
Axis
The Series has just one axis, "axis 0", which is aligned alongside the Series values, pointing "downwards":
The Series axes
property gives access to a one-element array containing the Series's Index:
assert len(s.axes) == 1
print(s.axes)
[RangeIndex(start=0, stop=6, step=1)]
Index
Also see:
RangeIndex
Time Series Index
An index that contains datetime turns the A time series is a series whose index has datetime objects. To create a time series, ensure that the method that creates the series performs the conversion automatically, as show in the Create a Time Series from CSV section.
Name
A series has a name, accessible with .name
.
Investigate a Series
The total number of elements of a series, also known as its size or length can be obtained with the Series' size
attribute, which returns the same value as the Python len()
function applied to the series:
size = s.size
same_size = len(s)
assert size == same_size
Number of elements:
The value of the first index:
The value of the last index:
Create a Series
Create a Series Programmatically
A series can be created from an in-memory list:
import pandas as pd
a = ['a', 'b', 'c']
s = pd.Series(a)
A series can also be created from data stored externally.
From a DataFrame
Create a Series from CSV
Create a Time Series from CSV
Create a Series from JSON
Parse: https://pandas.pydata.org/docs/reference/api/pandas.read_json.html#pandas.read_json
Also see:
Accessing Elements of a Series
This is known as indexing or subset selection.
The Index Operator [...]
Do not attempt to access an element using the indexing operator []
and a integral index. It may work, but the usage has been deprecated, use iloc
instead.
iloc[]
Access using integral coordinates.
s.iloc[0]
loc[]
Access using index values. Reconcile
s.loc[0]
s.loc['2023-10-10']
index[]
Access using index values.
s.index[0]
Operations on Series
Filtering
Index for Condition
Return the index values for which the series values meet a certain condition:
s.index[<condition>]
s.index[s == 0]
Will return:
DatetimeIndex(['2008-04-06', '2008-05-04', '2008-06-07', '2008-07-05', '2008-08-16', '2008-09-06', '2008-09-20', '2008-10-12', '2012-04-12'], dtype='datetime64[ns]', name='Date', freq=None)
Dropping Values
Keep only the elements whose values make the expression evaluate to true:
s = s[<expression>]
Drop all zero values:
s = ...
s = s[s != 0]
Extract Values Between Certain Index Limits
loc[]
For a time series, use loc[]
to apply a slice to the index values.
s = s.loc['2023-09-17':'2023-10-05']
s = s.loc['2023-09-17':]
Transformation
This class of operations are referred to as transformations or conversions.
apply()
Each element of the series can be transformed by applying the function specified as argument to apply()
.
The function can a named function or a lambda.
Note that apply()
will not convert the elements in-place, it will create a new series instead.
apply() a Named Function
For example, if the elements of the series are dollar values in the format "$1,234", to convert them to integers, use:
s = ...
def convert_dollar_str_to_int(s: str):
return int(s[1:].replace(',',''))
s = s.apply(convert_dollar_str_to_int)
apply() a Lambda
s = ...
s.apply(lambda x: x * 1.1)
Interpolation
Binary Operations with Series
TO PROCESS: https://www.geeksforgeeks.org/python-pandas-series/ The series must be identically sampled:
sp500_perc_diff = fid_slf.sub(sp500_perf).div(sp500_perf).mul(100)