Pandas read csv Custom Date Format: Difference between revisions

From NovaOrdis Knowledge Base
Jump to navigation Jump to search
(Created page with "=Internal= * Time Series Processing with Pandas =<tt>date_format</tt> Parameter= =<tt>date_parser</tt> Parameter=...")
 
 
(13 intermediate revisions by the same user not shown)
Line 3: Line 3:
* [[Time_Series_Processing_with_Pandas#Custom_Date_Format|Time Series Processing with Pandas]]
* [[Time_Series_Processing_with_Pandas#Custom_Date_Format|Time Series Processing with Pandas]]


=<tt>date_format</tt> Parameter=
=Overview=
=<tt>date_parser</tt> Parameter=


A CSV column can be parsed as date with:
<syntaxhighlight lang='py'>
df = pd.read_csv("./timeseries.csv", parse_dates=["date"])
</syntaxhighlight>


==Custom Date ==
This assumes a <code>YYYY-MM-DD</code> "2023-12-31" format. If the string format is different there are several options:


This syntax assumes that the "date" column is encoded in the default Panda date format ('YYYY-MM-DD'). If that is not the case, the format can be specified with the <code>date_format</code> parameters, as shown below:
=<tt>date_format</tt> Parameter=
<syntaxhighlight lang='py'>
<syntaxhighlight lang='py'>
df = pd.read_csv("./timeseries.csv", parse_dates=["date"], date_format='%m/%Y/%d')
df = pd.read_csv("./timeseries.csv", parse_dates=["date"], date_format='%m/%Y/%d')
</syntaxhighlight>
</syntaxhighlight>
The common timestamp elements are '%Y-%m-%d %H:%M:%S'. <font color=darkkhaki>For more details on date format, see ?</font>
More details on format: {{Internal|Time,_Date,_Timestamp_in_Python#Format|<tt>datetime</tt> Format}}
 
<font color=darkkhaki>The problem with that is that I don't get a series of datetimes, but a series of objects. Why?</font>
 
=<tt>date_parser</tt> Parameter=
 
For more complicated formats, the parsing function can be provided as a named function or a lambda, and that function can be passed to <code>read_csv</code> with the <code> date_parser</code> parameter. Do not use it, it will be deprecated for [https://github.com/pandas-dev/pandas/issues/50601 performance reasons].


For more complicated formats, the parsing function can be provided as a named function or a lambda:
<syntaxhighlight lang='py'>
<syntaxhighlight lang='py'>
def parse_timestamp(s: str):
def parse_timestamp(s: str):  
  ???
    from datetime import datetime
df = pd.read_csv("./timeseries.csv", parse_dates=["date"], date_format='%m/%Y/%d')
    return datetime.strptime(s, "%m/%d/%Y")
 
df = pd.read_csv("./timeseries.csv", parse_dates=["date"], date_parser=parse_timestamp)
</syntaxhighlight>
</syntaxhighlight>
For more details on timestamp parsing see: {{Internal|Time,_Date,_Timestamp_in_Python#Time.2C_Date_and_Timestamp_Parsing|Time, Date, Timestamp in Python}}
For more details on timestamp parsing see: {{Internal|Time,_Date,_Timestamp_in_Python#Time.2C_Date_and_Timestamp_Parsing|Time, Date, Timestamp in Python}}

Latest revision as of 02:19, 9 October 2023

Internal

Overview

A CSV column can be parsed as date with:

df = pd.read_csv("./timeseries.csv", parse_dates=["date"])

This assumes a YYYY-MM-DD "2023-12-31" format. If the string format is different there are several options:

date_format Parameter

df = pd.read_csv("./timeseries.csv", parse_dates=["date"], date_format='%m/%Y/%d')

More details on format:

datetime Format

The problem with that is that I don't get a series of datetimes, but a series of objects. Why?

date_parser Parameter

For more complicated formats, the parsing function can be provided as a named function or a lambda, and that function can be passed to read_csv with the date_parser parameter. Do not use it, it will be deprecated for performance reasons.

def parse_timestamp(s: str): 
    from datetime import datetime
    return datetime.strptime(s, "%m/%d/%Y")

df = pd.read_csv("./timeseries.csv", parse_dates=["date"], date_parser=parse_timestamp)

For more details on timestamp parsing see:

Time, Date, Timestamp in Python