Pandas Series: Difference between revisions
Line 39: | Line 39: | ||
=Create a Series= | =Create a Series= | ||
== | ==<span id='From_a_in-Memory_List'></span>Create a Series Programmatically== | ||
A series can be created from an in-memory list: | A series can be created from an in-memory list: | ||
<syntaxhighlight lang='py'> | <syntaxhighlight lang='py'> | ||
Line 48: | Line 48: | ||
</syntaxhighlight> | </syntaxhighlight> | ||
A series can also be created from data stored externally. | A series can also be created from data stored externally. | ||
==From a DataFrame== | ==From a DataFrame== | ||
Revision as of 02:17, 15 October 2023
External
- https://pandas.pydata.org/docs/user_guide/dsintro.html#series
- https://pandas.pydata.org/docs/reference/api/pandas.Series.html#pandas.Series
- https://www.geeksforgeeks.org/python-pandas-series/
Internal
Overview
A Series is a one-dimensional array of values, where each value has a label. The labels are referred to as "axis labels" and they are managed by the series's index. By default, in absence of any explicit specification, a series gets a monotonic integer range index, starting with 0 and with the step 1, allowing retrieving data with 0-based integer indexes (see Accessing Elements of a Series below).
Every series has a name and a data type, which are both reported when the series is printed.
A Series is implemented with a numpy ndarray.
Index
RangeIndex
Time Series Index
An index that contains datetime turns the A time series is a series whose index has datetime objects. To create a time series, ensure that the method that creates the series performs the conversion automatically, as show in the Create a Time Series from CSV section.
Name
A series has a name, accessible with .name
.
Investigate a Series
The total number of elements of a series, also known as its size or length can be obtained with the Series' size
attribute, which returns the same value as the Python len()
function applied to the series:
size = s.size
same_size = len(s)
assert size == same_size
Number of elements:
The value of the first index:
The value of the last index:
Create a Series
Create a Series Programmatically
A series can be created from an in-memory list:
import pandas as pd
a = ['a', 'b', 'c']
s = pd.Series(a)
A series can also be created from data stored externally.
From a DataFrame
Create a Series from CSV
https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html#pandas.read_csv
To create a series from a CSV file:
import pandas as pd
# TODO
Create a Time Series from CSV
Create a Series from JSON
Parse: https://pandas.pydata.org/docs/reference/api/pandas.read_json.html#pandas.read_json
Also see:
Accessing Elements of a Series
This is known as indexing or subset selection.
Do not attempt to access an element using the indexing operator []
and a integral index. It may work, but the usage has been deprecated, use iloc
instead.
iloc[]
s.iloc[0]
Accessing the Index Value for an Element
s.index[0]
Operations on Series
Filtering
Transformation
This class of operations are referred to as transformations or conversions.
apply()
Each element of the series can be transformed by applying the function specified as argument to apply()
.
The function can a named function or a lambda.
For example, if the elements of the series are dollar values in the format "$1,234", to convert them to integers, use:
s = ...
def convert_dollar_str_to_int(s: str):
return int(s[1:].replace(',',''))
s = s.apply(convert_dollar_str_to_int)
Note that apply()
will not convert the elements in-place, it will create a new series instead.
TODO lambda.
Binary Operations
TO PROCESS: https://www.geeksforgeeks.org/python-pandas-series/