Pandas Series: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
 
(45 intermediate revisions by the same user not shown)
Line 9: Line 9:


=Overview=
=Overview=
A series is a one-dimensional array of values, where each value has a label. The labels are referred to as "axis labels" and they are managed by the series's [[#Index|index]]. By default, in absence of any explicit specification, a series gets a monotonic integer [[#RangeIndex|range index]], starting with 0 and with the step 1, allowing retrieving data with 0-based integer indexes (see [[#Accessing_Elements_of_a_Series|Accessing Elements of a Series]] below).
A Series is a one-dimensional array of values, where each value has a label. The labels are referred to as "axis labels" and they are managed by the series's [[#Index|index]]. By default, in absence of any explicit specification, a series gets a monotonic integer [[#RangeIndex|range index]], starting with 0 and with the step 1, allowing retrieving data with 0-based integer indexes (see [[#Accessing_Elements_of_a_Series|Accessing Elements of a Series]] below).


Every series has a [[#Name|name]] and a data type, which are both reported when the series is printed.
Every series has a [[#Name|name]] and a data type, which are both reported when the series is printed.
A Series is implemented with a NumPy <code>[[NumPy_ndarray#Overview|ndarray]]</code>.
=Axis=
The Series has just one [[Pandas_Concepts#Axis|axis]], "axis 0", which is aligned alongside the Series values, pointing "downwards":
:::[[File:Panda_Series_Axis.png]]
The Series <code>axes</code> property gives access to a one-element array containing the Series's [[Pandas_Concepts#Index|Index]]:
<syntaxhighlight lang='py'>
assert len(s.axes) == 1
print(s.axes)
</syntaxhighlight>
<font size=-1>
[RangeIndex(start=0, stop=6, step=1)]
</font>


=Index=
=Index=
{{External|https://pandas.pydata.org/docs/reference/api/pandas.Series.index.html}}
{{External|https://pandas.pydata.org/docs/reference/api/pandas.Series.index.html}}
Also see: {{Internal|Pandas_Concepts#Index|Pandas Concepts &#124; Index}}
==RangeIndex==
==RangeIndex==
{{Internal|Pandas_Concepts#RangeIndex|RangeIndex}}
{{Internal|Pandas_Concepts#RangeIndex|RangeIndex}}
Line 21: Line 37:
An index that contains [[Pandas_Concepts#Datetime|datetime]] turns the  
An index that contains [[Pandas_Concepts#Datetime|datetime]] turns the  
A time series is a series whose index has [[Pandas_Concepts#Datetime|datetime]] objects. To create a time series, ensure that the method that creates the series performs the conversion automatically, as show in the [[#Create_a_Time_Series_from_CSV|Create a Time Series from CSV]] section.
A time series is a series whose index has [[Pandas_Concepts#Datetime|datetime]] objects. To create a time series, ensure that the method that creates the series performs the conversion automatically, as show in the [[#Create_a_Time_Series_from_CSV|Create a Time Series from CSV]] section.
=Name=
=Name=
A series has a name, accessible with <code>.name</code>.
A series has a name, accessible with <code>.name</code>.
=Investigate a Series=
The total number of elements of a series, also known as its '''size''' or '''length''' can be obtained with the Series' <code>size</code> attribute, which returns the same value as the Python <code>len()</code> function applied to the series:
<syntaxhighlight lang='py'>
size = s.size
same_size = len(s)
assert size == same_size
</syntaxhighlight>
Number of elements:
The value of the first index:
The value of the last index:


=Create a Series=
=Create a Series=
==From a in-Memory List==
==<span id='From_a_in-Memory_List'></span>Create a Series Programmatically==
A series can be created from an in-memory list:
A series can be created from an in-memory list:
<syntaxhighlight lang='py'>
<syntaxhighlight lang='py'>
Line 34: Line 63:
</syntaxhighlight>
</syntaxhighlight>
A series can also be created from data stored externally.
A series can also be created from data stored externally.
==From a DataFrame==
==From a DataFrame==


==Create a Series from CSV==
==Create a Series from CSV==
<font color='darkkhaki'>
{{Internal|Pandas_CSV#Create_a_Series_from_CSV|Pandas CSV &#124; Create a Series from CSV}}
https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html#pandas.read_csv
 
To create a series from a CSV file:
<syntaxhighlight lang='py'>
import pandas as pd
 
# TODO
</syntaxhighlight>
</font>
===Create a Time Series from CSV===
===Create a Time Series from CSV===
{{Internal|Pandas_CSV#Create_a_Time_Series_from_CSV|Pandas CSV &#124; Create a Time Series from CSV}}


==Create a Series from JSON==
==Create a Series from JSON==
Line 54: Line 76:


=Accessing Elements of a Series=
=Accessing Elements of a Series=
This is known as '''indexing''' or '''subset selection'''.
This is known as '''indexing''' or '''subset selection'''.  
==<tt>iloc[]</tt>==
==The Index Operator <tt>[...]</tt>==
Do not attempt to access an element using the indexing operator <code>[]</code> and a integral index. It may work, but the usage has been deprecated, use <code>[[#iloc|iloc]]</code> instead.
==<span id='iloc'></span><tt>iloc[]</tt>==
Access using integral coordinates.
<syntaxhighlight lang='py'>
s.iloc[0]
</syntaxhighlight>
==<tt>loc[]</tt>==
Access using index values. <font color=darkkhaki>Reconcile</font>
<syntaxhighlight lang='py'>
s.loc[0]
s.loc['2023-10-10']
</syntaxhighlight>
 
==<span id='Accessing_the_Index_Value_for_an_Element'></span><tt>index[]</tt>==
<font color=darkkhaki>Access using index values.</font>
<syntaxhighlight lang='py'>
s.index[0]
</syntaxhighlight>


=Operations on Series=
=Operations on Series=
==Filtering==
==Filtering==
===Index for Condition===
Return the index values for which the series values meet a certain condition:
<syntaxhighlight lang='py'>
s.index[<condition>]
</syntaxhighlight>
<syntaxhighlight lang='py'>
s.index[s == 0]
</syntaxhighlight>
Will return:
<font size=-2>
DatetimeIndex(['2008-04-06', '2008-05-04', '2008-06-07', '2008-07-05',
                '2008-08-16', '2008-09-06', '2008-09-20', '2008-10-12',
                '2012-04-12'],
              dtype='datetime64[ns]', name='Date', freq=None)
</font>
===Dropping Values===
Keep only the elements whose values make the expression evaluate to true:
<syntaxhighlight lang='py'>
s = s[<expression>]
</syntaxhighlight>
Drop all zero values:
<syntaxhighlight lang='py'>
s = ...
s = s[s != 0]
</syntaxhighlight>
===Extract Values Between Certain Index Limits===
====<tt>loc[]</tt>====
For a time series, use <code>loc[]</code> to apply a slice to the index values.
<syntaxhighlight lang='py'>
s = s.loc['2023-09-17':'2023-10-05']
</syntaxhighlight>
<syntaxhighlight lang='py'>
s = s.loc['2023-09-17':]
</syntaxhighlight>
==Transformation==
==Transformation==
This class of operations are referred to as '''transformations''' or '''conversions'''.
This class of operations are referred to as '''transformations''' or '''conversions'''.
Line 64: Line 140:
Each element of the series can be transformed by applying the function specified as argument to <code>apply()</code>.
Each element of the series can be transformed by applying the function specified as argument to <code>apply()</code>.


The function can a named function or a lambda.
The function can a [[#Named_Function|named function]] or a [[#Lambda|lambda]].


Note that <code>apply()</code> will not convert the elements in-place, it will create a new series instead.
====<span id='Named_Function'></span><tt>apply()</tt> a Named Function====
For example, if the elements of the series are dollar values in the format "$1,234", to convert them to integers, use:
For example, if the elements of the series are dollar values in the format "$1,234", to convert them to integers, use:


Line 74: Line 152:
s = s.apply(convert_dollar_str_to_int)
s = s.apply(convert_dollar_str_to_int)
</syntaxhighlight>
</syntaxhighlight>
====<span id='Lambda'></span><tt>apply()</tt> a Lambda====
<syntaxhighlight lang='py'>
s = ...
s.apply(lambda x: x * 1.1)
</syntaxhighlight>
==Interpolation==
{{Internal|Pandas Time Series Resampling and Interpolation#Overview|Time Series Resampling and Interpolation}}


<font color=darkkhaki>TODO lambda.</font>
==Binary Operations with Series==
<font color=darkkhaki>TO PROCESS: https://www.geeksforgeeks.org/python-pandas-series/</font>
The series must be identically sampled:
<syntaxhighlight lang='py'>
sp500_perc_diff = fid_slf.sub(sp500_perf).div(sp500_perf).mul(100)
</syntaxhighlight>


==Binary Operations==
=Using a matplotlib Plot with Pandas Series=
<font color=darkkhaki>TO PROCESS: https://www.geeksforgeeks.org/python-pandas-series/</font>
{{Internal|Using a Matplotlib Plot with Pandas Series|Using a matplotlib Plot with Pandas Series}}

Latest revision as of 19:43, 20 May 2024

External

Internal

Overview

A Series is a one-dimensional array of values, where each value has a label. The labels are referred to as "axis labels" and they are managed by the series's index. By default, in absence of any explicit specification, a series gets a monotonic integer range index, starting with 0 and with the step 1, allowing retrieving data with 0-based integer indexes (see Accessing Elements of a Series below).

Every series has a name and a data type, which are both reported when the series is printed.

A Series is implemented with a NumPy ndarray.

Axis

The Series has just one axis, "axis 0", which is aligned alongside the Series values, pointing "downwards":

Panda Series Axis.png

The Series axes property gives access to a one-element array containing the Series's Index:

assert len(s.axes) == 1
print(s.axes)

[RangeIndex(start=0, stop=6, step=1)]

Index

https://pandas.pydata.org/docs/reference/api/pandas.Series.index.html

Also see:

Pandas Concepts | Index

RangeIndex

RangeIndex

Time Series Index

An index that contains datetime turns the A time series is a series whose index has datetime objects. To create a time series, ensure that the method that creates the series performs the conversion automatically, as show in the Create a Time Series from CSV section.

Name

A series has a name, accessible with .name.

Investigate a Series

The total number of elements of a series, also known as its size or length can be obtained with the Series' size attribute, which returns the same value as the Python len() function applied to the series:

size = s.size
same_size = len(s)
assert size == same_size

Number of elements:

The value of the first index:

The value of the last index:

Create a Series

Create a Series Programmatically

A series can be created from an in-memory list:

import pandas as pd

a = ['a', 'b', 'c']
s = pd.Series(a)

A series can also be created from data stored externally.

From a DataFrame

Create a Series from CSV

Pandas CSV | Create a Series from CSV

Create a Time Series from CSV

Pandas CSV | Create a Time Series from CSV

Create a Series from JSON

Parse: https://pandas.pydata.org/docs/reference/api/pandas.read_json.html#pandas.read_json

Also see:

datetime

Accessing Elements of a Series

This is known as indexing or subset selection.

The Index Operator [...]

Do not attempt to access an element using the indexing operator [] and a integral index. It may work, but the usage has been deprecated, use iloc instead.

iloc[]

Access using integral coordinates.

s.iloc[0]

loc[]

Access using index values. Reconcile

s.loc[0]
s.loc['2023-10-10']

index[]

Access using index values.

s.index[0]

Operations on Series

Filtering

Index for Condition

Return the index values for which the series values meet a certain condition:

s.index[<condition>]
s.index[s == 0]

Will return:

DatetimeIndex(['2008-04-06', '2008-05-04', '2008-06-07', '2008-07-05',
               '2008-08-16', '2008-09-06', '2008-09-20', '2008-10-12',
               '2012-04-12'],
              dtype='datetime64[ns]', name='Date', freq=None)

Dropping Values

Keep only the elements whose values make the expression evaluate to true:

s = s[<expression>]

Drop all zero values:

s = ...
s = s[s != 0]

Extract Values Between Certain Index Limits

loc[]

For a time series, use loc[] to apply a slice to the index values.

s = s.loc['2023-09-17':'2023-10-05']
s = s.loc['2023-09-17':]

Transformation

This class of operations are referred to as transformations or conversions.

apply()

Each element of the series can be transformed by applying the function specified as argument to apply().

The function can a named function or a lambda.

Note that apply() will not convert the elements in-place, it will create a new series instead.

apply() a Named Function

For example, if the elements of the series are dollar values in the format "$1,234", to convert them to integers, use:

s = ...
def convert_dollar_str_to_int(s: str):
    return int(s[1:].replace(',',''))
s = s.apply(convert_dollar_str_to_int)

apply() a Lambda

s = ...
s.apply(lambda x: x * 1.1)

Interpolation

Time Series Resampling and Interpolation

Binary Operations with Series

TO PROCESS: https://www.geeksforgeeks.org/python-pandas-series/ The series must be identically sampled:

sp500_perc_diff = fid_slf.sub(sp500_perf).div(sp500_perf).mul(100)

Using a matplotlib Plot with Pandas Series

Using a matplotlib Plot with Pandas Series