-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
to_datetime gives todays date when coercing errors of time values #23447
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I am guessing this is hitting our datutil parser as it will interpret these as times corresponding to the current date:
In general we try to stay consistent with dateutil, but I can see how this is a surprising (undocumented) behavior. Could use some discussion on whether we should just document this case (which I am in more favor of) or should coerce these to |
Thanks for the update. It's strange that it is only happening for certain rows. I've created a workaround on my end using an if statement to prevent the to.datetime from affecting rows without a date. |
Well the rows that have "time looking" values are getting parsed like this. As you can see, rows where the first value is < 23 (i.e an hour looking value) are getting parsed and the rest to NaT |
This now throws a warning making it clear that each row will be parsed by In [29]: import pandas as pd
...:
...: d = {'start_date': ['37:04.0', '55:20.0', '44:05.3', '43:41.7', '40:35.0', '26:14.4', '57:00.7', '18:53.8', '12:23.2', '48:33.2', '36:38.1', '29:35.7', '56:02.4', '27:41.4',
...: '11:49.9', '17:36.6', '50:44.2', '36:36.1', '21:14.7']}
...: data = pd.DataFrame(data=d)
...:
...: data['customer_created_at'] = pd.to_datetime(data['start_date'], errors='coerce')
...:
...: print (data['customer_created_at'])
<ipython-input-29-c6a01a69a58d>:6: UserWarning: Could not infer format, so each element will be parsed individually by `dateutil`. To ensure parsing is consistent and as-expected, please specify a format.
data['customer_created_at'] = pd.to_datetime(data['start_date'], errors='coerce')
0 NaT
1 NaT
2 NaT
3 NaT
4 NaT
5 NaT
6 NaT
7 2022-12-17 18:53:48
8 2022-12-17 12:23:12
9 NaT
10 NaT
11 NaT
12 NaT
13 NaT
14 2022-12-17 11:49:54
15 2022-12-17 17:36:36
16 NaT
17 NaT
18 2022-12-17 21:14:42
Name: customer_created_at, dtype: datetime64[ns] I think we can close then, but thanks for the report! |
Code Sample, a copy-pastable example if possible
Problem description
When coercing the list of time values I expect all of them to return NaT. However, a few of these come back with today's date. I tried yesterday and the returned value was:
Today I get:
Expected Output
All time values should return NaT when coerced
Output of
pd.show_versions()
pandas: 0.23.4
pytest: 3.5.1
pip: 18.0
setuptools: 39.1.0
Cython: 0.28.2
numpy: 1.14.3
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 6.4.0
sphinx: 1.7.4
patsy: 0.5.0
dateutil: 2.7.3
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.3
numexpr: 2.6.5
feather: None
matplotlib: 2.2.2
openpyxl: 2.5.3
xlrd: 1.1.0
xlwt: 1.2.0
xlsxwriter: 1.0.4
lxml: 4.2.1
bs4: 4.6.0
html5lib: 1.0.1
sqlalchemy: 1.2.7
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
None
The text was updated successfully, but these errors were encountered: