Pandas Series: Difference between revisions
Line 113: | Line 113: | ||
s.index[s == 0] | s.index[s == 0] | ||
</syntaxhighlight> | </syntaxhighlight> | ||
Will return: | |||
<font size=-2> | |||
DatetimeIndex(['2008-04-06', '2008-05-04', '2008-06-07', '2008-07-05', | |||
'2008-08-16', '2008-09-06', '2008-09-20', '2008-10-12', | |||
'2008-11-09', '2008-12-27', '2009-01-09', '2009-01-28', | |||
'2009-02-08', '2009-04-12', '2009-05-26', '2009-07-05', | |||
'2009-08-15', '2009-09-14', '2009-10-14', '2009-11-08', | |||
'2009-12-04', '2010-01-04', '2010-02-07', '2010-03-14', | |||
'2010-04-11', '2010-05-19', '2010-06-05', '2010-07-08', | |||
'2010-08-09', '2010-09-07', '2010-11-12', '2010-12-25', | |||
'2011-02-12', '2011-04-11', '2011-06-13', '2011-08-14', | |||
'2011-10-11', '2011-12-11', '2012-01-21', '2012-02-20', | |||
'2012-04-12'], | |||
dtype='datetime64[ns]', name='Date', freq=None) | |||
<.font> | |||
===Dropping Values=== | ===Dropping Values=== |
Revision as of 00:20, 21 October 2023
External
- https://pandas.pydata.org/docs/user_guide/dsintro.html#series
- https://pandas.pydata.org/docs/reference/api/pandas.Series.html#pandas.Series
- https://www.geeksforgeeks.org/python-pandas-series/
Internal
Overview
A Series is a one-dimensional array of values, where each value has a label. The labels are referred to as "axis labels" and they are managed by the series's index. By default, in absence of any explicit specification, a series gets a monotonic integer range index, starting with 0 and with the step 1, allowing retrieving data with 0-based integer indexes (see Accessing Elements of a Series below).
Every series has a name and a data type, which are both reported when the series is printed.
A Series is implemented with a numpy ndarray.
Axis
The Series has just one axis, "axis 0", which is aligned alongside the Series values, pointing "downwards":
The Series axes
property gives access to a one-element array containing the Series's Index:
assert len(s.axes) == 1
print(s.axes)
[RangeIndex(start=0, stop=6, step=1)]
Index
Also see:
RangeIndex
Time Series Index
An index that contains datetime turns the A time series is a series whose index has datetime objects. To create a time series, ensure that the method that creates the series performs the conversion automatically, as show in the Create a Time Series from CSV section.
Name
A series has a name, accessible with .name
.
Investigate a Series
The total number of elements of a series, also known as its size or length can be obtained with the Series' size
attribute, which returns the same value as the Python len()
function applied to the series:
size = s.size
same_size = len(s)
assert size == same_size
Number of elements:
The value of the first index:
The value of the last index:
Create a Series
Create a Series Programmatically
A series can be created from an in-memory list:
import pandas as pd
a = ['a', 'b', 'c']
s = pd.Series(a)
A series can also be created from data stored externally.
From a DataFrame
Create a Series from CSV
https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html#pandas.read_csv
To create a series from a CSV file:
import pandas as pd
# TODO
Create a Time Series from CSV
Create a Series from JSON
Parse: https://pandas.pydata.org/docs/reference/api/pandas.read_json.html#pandas.read_json
Also see:
Accessing Elements of a Series
This is known as indexing or subset selection.
The Index Operator [...]
Do not attempt to access an element using the indexing operator []
and a integral index. It may work, but the usage has been deprecated, use iloc
instead.
iloc[]
Access using integral coordinates.
s.iloc[0]
loc[]
Access using index values. Reconcile
s.loc[0]
index[]
Access using index values.
s.index[0]
Operations on Series
Filtering
Index for Condition
Return the index values for which the series values meet a certain condition:
s.index[<condition>]
s.index[s == 0]
Will return:
DatetimeIndex(['2008-04-06', '2008-05-04', '2008-06-07', '2008-07-05', '2008-08-16', '2008-09-06', '2008-09-20', '2008-10-12', '2008-11-09', '2008-12-27', '2009-01-09', '2009-01-28', '2009-02-08', '2009-04-12', '2009-05-26', '2009-07-05', '2009-08-15', '2009-09-14', '2009-10-14', '2009-11-08', '2009-12-04', '2010-01-04', '2010-02-07', '2010-03-14', '2010-04-11', '2010-05-19', '2010-06-05', '2010-07-08', '2010-08-09', '2010-09-07', '2010-11-12', '2010-12-25', '2011-02-12', '2011-04-11', '2011-06-13', '2011-08-14', '2011-10-11', '2011-12-11', '2012-01-21', '2012-02-20', '2012-04-12'], dtype='datetime64[ns]', name='Date', freq=None)
<.font>
Dropping Values
Keep only the elements whose values make the expression evaluate to true:
s = s[<expression>]
Drop all zero values:
s = ...
s = s[s != 0]
Extract Values Between Certain Index Limits
loc[]
For a time series:
s = s.loc['2023-09-17':'2023-10-05']
s = s.loc['2023-09-17':]
Transformation
This class of operations are referred to as transformations or conversions.
apply()
Each element of the series can be transformed by applying the function specified as argument to apply()
.
The function can a named function or a lambda.
For example, if the elements of the series are dollar values in the format "$1,234", to convert them to integers, use:
s = ...
def convert_dollar_str_to_int(s: str):
return int(s[1:].replace(',',''))
s = s.apply(convert_dollar_str_to_int)
Note that apply()
will not convert the elements in-place, it will create a new series instead.
TODO lambda.
Binary Operations
TO PROCESS: https://www.geeksforgeeks.org/python-pandas-series/