Pandas read csv Custom Date Format: Difference between revisions
Jump to navigation
Jump to search
(Created page with "=Internal= * Time Series Processing with Pandas =<tt>date_format</tt> Parameter= =<tt>date_parser</tt> Parameter=...") |
|||
(13 intermediate revisions by the same user not shown) | |||
Line 3: | Line 3: | ||
* [[Time_Series_Processing_with_Pandas#Custom_Date_Format|Time Series Processing with Pandas]] | * [[Time_Series_Processing_with_Pandas#Custom_Date_Format|Time Series Processing with Pandas]] | ||
= | =Overview= | ||
A CSV column can be parsed as date with: | |||
<syntaxhighlight lang='py'> | |||
df = pd.read_csv("./timeseries.csv", parse_dates=["date"]) | |||
</syntaxhighlight> | |||
This assumes a <code>YYYY-MM-DD</code> "2023-12-31" format. If the string format is different there are several options: | |||
=<tt>date_format</tt> Parameter= | |||
<syntaxhighlight lang='py'> | <syntaxhighlight lang='py'> | ||
df = pd.read_csv("./timeseries.csv", parse_dates=["date"], date_format='%m/%Y/%d') | df = pd.read_csv("./timeseries.csv", parse_dates=["date"], date_format='%m/%Y/%d') | ||
</syntaxhighlight> | </syntaxhighlight> | ||
The | More details on format: {{Internal|Time,_Date,_Timestamp_in_Python#Format|<tt>datetime</tt> Format}} | ||
<font color=darkkhaki>The problem with that is that I don't get a series of datetimes, but a series of objects. Why?</font> | |||
=<tt>date_parser</tt> Parameter= | |||
For more complicated formats, the parsing function can be provided as a named function or a lambda, and that function can be passed to <code>read_csv</code> with the <code> date_parser</code> parameter. Do not use it, it will be deprecated for [https://github.com/pandas-dev/pandas/issues/50601 performance reasons]. | |||
<syntaxhighlight lang='py'> | <syntaxhighlight lang='py'> | ||
def parse_timestamp(s: str): | def parse_timestamp(s: str): | ||
from datetime import datetime | |||
df = pd.read_csv("./timeseries.csv", parse_dates=["date"], | return datetime.strptime(s, "%m/%d/%Y") | ||
df = pd.read_csv("./timeseries.csv", parse_dates=["date"], date_parser=parse_timestamp) | |||
</syntaxhighlight> | </syntaxhighlight> | ||
For more details on timestamp parsing see: {{Internal|Time,_Date,_Timestamp_in_Python#Time.2C_Date_and_Timestamp_Parsing|Time, Date, Timestamp in Python}} | For more details on timestamp parsing see: {{Internal|Time,_Date,_Timestamp_in_Python#Time.2C_Date_and_Timestamp_Parsing|Time, Date, Timestamp in Python}} |
Latest revision as of 02:19, 9 October 2023
Internal
Overview
A CSV column can be parsed as date with:
df = pd.read_csv("./timeseries.csv", parse_dates=["date"])
This assumes a YYYY-MM-DD
"2023-12-31" format. If the string format is different there are several options:
date_format Parameter
df = pd.read_csv("./timeseries.csv", parse_dates=["date"], date_format='%m/%Y/%d')
More details on format:
The problem with that is that I don't get a series of datetimes, but a series of objects. Why?
date_parser Parameter
For more complicated formats, the parsing function can be provided as a named function or a lambda, and that function can be passed to read_csv
with the date_parser
parameter. Do not use it, it will be deprecated for performance reasons.
def parse_timestamp(s: str):
from datetime import datetime
return datetime.strptime(s, "%m/%d/%Y")
df = pd.read_csv("./timeseries.csv", parse_dates=["date"], date_parser=parse_timestamp)
For more details on timestamp parsing see: