Skip to content

BUG: Fix possible wrong format guess if first element had May as month #58336

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -356,6 +356,7 @@ Datetimelike
- Bug in :class:`Timestamp` constructor failing to raise when ``tz=None`` is explicitly specified in conjunction with timezone-aware ``tzinfo`` or data (:issue:`48688`)
- Bug in :func:`date_range` where the last valid timestamp would sometimes not be produced (:issue:`56134`)
- Bug in :func:`date_range` where using a negative frequency value would not include all points between the start and end values (:issue:`56382`)
- Bug in :func:`to_datetime` could guess the date format incorrectly if the first element had May as month (:issue:`58328`)
- Bug in :func:`tseries.api.guess_datetime_format` would fail to infer time format when "%Y" == "%H%M" (:issue:`57452`)

Timedelta
Expand Down
11 changes: 11 additions & 0 deletions pandas/core/tools/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,17 @@ def _guess_datetime_format_for_array(arr, dayfirst: bool | None = False) -> str
if (first_non_null := tslib.first_non_null(arr)) != -1:
if type(first_non_nan_element := arr[first_non_null]) is str: # noqa: E721
# GH#32264 np.str_ object
if "may" in first_non_nan_element.lower():
# GH 58328
first_non_nan_element = next(
(
x
for x in arr[first_non_null + 1 :]
if isinstance(x, str) and "may" not in x.lower()
),
first_non_nan_element,
)

guessed_format = guess_datetime_format(
first_non_nan_element, dayfirst=dayfirst
)
Expand Down
14 changes: 14 additions & 0 deletions pandas/tests/tools/test_to_datetime.py
Original file line number Diff line number Diff line change
Expand Up @@ -2649,6 +2649,20 @@ def test_guess_datetime_format_for_array_all_nans(self):
)
assert format_for_string_of_nans is None

@pytest.mark.parametrize(
"test_list, expected_format",
[
(["1 May 2023", "1 Apr 2023", "1 Sep 2023"], "%d %b %Y"),
(["1 May 2023", "1 April 2023", "1 September 2023"], "%d %B %Y"),
(["1 May 2023", "2 May 2023", "3 May 2023"], "%d %B %Y"),
],
)
def test_guess_datetime_format_for_array_first_element_month_may(
self, test_list, expected_format
): # GH 58328
test_array = np.array(test_list, dtype=object)
assert tools._guess_datetime_format_for_array(test_array) == expected_format


class TestToDatetimeInferFormat:
@pytest.mark.parametrize(
Expand Down