Skip to content

ENH: Add HalfYear offsets #60946

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Mar 3, 2025
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v3.0.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ Other enhancements
- :meth:`pandas.concat` will raise a ``ValueError`` when ``ignore_index=True`` and ``keys`` is not ``None`` (:issue:`59274`)
- :py:class:`frozenset` elements in pandas objects are now natively printed (:issue:`60690`)
- Add ``"delete_rows"`` option to ``if_exists`` argument in :meth:`DataFrame.to_sql` deleting all records of the table before inserting data (:issue:`37210`).
- Added half-year offset classes :class:`HalfYearBegin`, :class:`HalfYearEnd`, :class:`BHalfYearBegin` and :class:`BHalfYearEnd` (:issue:`60928`)
- Errors occurring during SQL I/O will now throw a generic :class:`.DatabaseError` instead of the raw Exception type from the underlying driver manager library (:issue:`60748`)
- Implemented :meth:`Series.str.isascii` and :meth:`Series.str.isascii` (:issue:`59091`)
- Multiplying two :class:`DateOffset` objects will now raise a ``TypeError`` instead of a ``RecursionError`` (:issue:`59442`)
Expand Down
10 changes: 10 additions & 0 deletions pandas/_libs/tslibs/offsets.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,16 @@ class BQuarterEnd(QuarterOffset): ...
class BQuarterBegin(QuarterOffset): ...
class QuarterEnd(QuarterOffset): ...
class QuarterBegin(QuarterOffset): ...

class HalfYearOffset(SingleConstructorOffset):
def __init__(
self, n: int = ..., normalize: bool = ..., startingMonth: int | None = ...
) -> None: ...

class BHalfYearEnd(HalfYearOffset): ...
class BHalfYearBegin(HalfYearOffset): ...
class HalfYearEnd(HalfYearOffset): ...
class HalfYearBegin(HalfYearOffset): ...
class MonthOffset(SingleConstructorOffset): ...
class MonthEnd(MonthOffset): ...
class MonthBegin(MonthOffset): ...
Expand Down
226 changes: 226 additions & 0 deletions pandas/_libs/tslibs/offsets.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -3011,6 +3011,228 @@ cdef class QuarterBegin(QuarterOffset):
_day_opt = "start"


# ----------------------------------------------------------------------
# HalfYear-Based Offset Classes

cdef class HalfYearOffset(SingleConstructorOffset):
_attributes = tuple(["n", "normalize", "startingMonth"])
# TODO: Consider combining HalfYearOffset, QuarterOffset and YearOffset

# FIXME(cython#4446): python annotation here gives compile-time errors
# _default_starting_month: int
# _from_name_starting_month: int
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious if you've tried seeing if this is resolved.

cython/cython#4446 (comment)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this is not resolved yet. Still throws compile time errors when I uncomment those lines -

  pandas/_libs/tslibs/offsets.cpython-310-darwin.so.p/pandas/_libs/tslibs/offsets.pyx.c:98384:21: error: use of undeclared identifier '__pyx_base'
   98384 |   __Pyx_XDECREF_SET(__pyx_base._default_starting_month, ((PyObject*)__pyx_t_2));
         |                     ^
  pandas/_libs/tslibs/offsets.cpython-310-darwin.so.p/pandas/_libs/tslibs/offsets.pyx.c:98384:21: error: use of undeclared identifier '__pyx_base'; did you mean '__pyx_k_base'?
   98384 |   __Pyx_XDECREF_SET(__pyx_base._default_starting_month, ((PyObject*)__pyx_t_2));
         |                     ^~~~~~~~~~
         |                     __pyx_k_base

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the resolution would be type-hinting as:

_default_starting_month: typing.ClassVar[int]

That's what I'm wondering will work with Cython 3.0.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, that seems to have fixed it. Thanks!


cdef readonly:
int startingMonth

def __init__(self, n=1, normalize=False, startingMonth=None):
BaseOffset.__init__(self, n, normalize)

if startingMonth is None:
startingMonth = self._default_starting_month
self.startingMonth = startingMonth

cpdef __setstate__(self, state):
self.startingMonth = state.pop("startingMonth")
self.n = state.pop("n")
self.normalize = state.pop("normalize")

@classmethod
def _from_name(cls, suffix=None):
kwargs = {}
if suffix:
kwargs["startingMonth"] = MONTH_TO_CAL_NUM[suffix]
else:
if cls._from_name_starting_month is not None:
kwargs["startingMonth"] = cls._from_name_starting_month
return cls(**kwargs)

@property
def rule_code(self) -> str:
month = MONTH_ALIASES[self.startingMonth]
return f"{self._prefix}-{month}"

def is_on_offset(self, dt: datetime) -> bool:
if self.normalize and not _is_normalized(dt):
return False
mod_month = (dt.month - self.startingMonth) % 6
return mod_month == 0 and dt.day == self._get_offset_day(dt)

@apply_wraps
def _apply(self, other: datetime) -> datetime:
# months_since: find the calendar half containing other.month,
# e.g. if other.month == 8, the calendar half is [Jul, Aug, Sep, ..., Dec].
# Then find the month in that half containing an is_on_offset date for
# self. `months_since` is the number of months to shift other.month
# to get to this on-offset month.
months_since = other.month % 6 - self.startingMonth % 6
hlvs = roll_qtrday(
other, self.n, self.startingMonth, day_opt=self._day_opt, modby=6
)
months = hlvs * 6 - months_since
return shift_month(other, months, self._day_opt)

def _apply_array(self, dtarr: np.ndarray) -> np.ndarray:
reso = get_unit_from_dtype(dtarr.dtype)
shifted = shift_quarters(
dtarr.view("i8"),
self.n,
self.startingMonth,
self._day_opt,
modby=6,
reso=reso,
)
return shifted


cdef class BHalfYearEnd(HalfYearOffset):
"""
DateOffset increments between the last business day of each half-year.

startingMonth = 1 corresponds to dates like 1/31/2007, 7/31/2007, ...
startingMonth = 2 corresponds to dates like 2/28/2007, 8/31/2007, ...
startingMonth = 6 corresponds to dates like 6/30/2007, 12/31/2007, ...

Attributes
----------
n : int, default 1
The number of half-years represented.
normalize : bool, default False
Normalize start/end dates to midnight before generating date range.
startingMonth : int, default 6
A specific integer for the month of the year from which we start half-years.

See Also
--------
:class:`~pandas.tseries.offsets.DateOffset` : Standard kind of date increment.

Examples
--------
>>> from pandas.tseries.offsets import BHalfYearEnd
>>> ts = pd.Timestamp('2020-05-24 05:01:15')
>>> ts + BHalfYearEnd()
Timestamp('2020-06-30 05:01:15')
>>> ts + BHalfYearEnd(2)
Timestamp('2020-12-31 05:01:15')
>>> ts + BHalfYearEnd(1, startingMonth=2)
Timestamp('2020-08-31 05:01:15')
>>> ts + BHalfYearEnd(startingMonth=2)
Timestamp('2020-08-31 05:01:15')
"""
_output_name = "BusinessHalfYearEnd"
_default_starting_month = 6
_from_name_starting_month = 12
_prefix = "BHYE"
_day_opt = "business_end"


cdef class BHalfYearBegin(HalfYearOffset):
"""
DateOffset increments between the first business day of each half-year.

startingMonth = 1 corresponds to dates like 1/01/2007, 7/01/2007, ...
startingMonth = 2 corresponds to dates like 2/01/2007, 8/01/2007, ...
startingMonth = 3 corresponds to dates like 3/01/2007, 9/01/2007, ...

Attributes
----------
n : int, default 1
The number of half-years represented.
normalize : bool, default False
Normalize start/end dates to midnight before generating date range.
startingMonth : int, default 1
A specific integer for the month of the year from which we start half-years.

See Also
--------
:class:`~pandas.tseries.offsets.DateOffset` : Standard kind of date increment.

Examples
--------
>>> from pandas.tseries.offsets import BHalfYearBegin
>>> ts = pd.Timestamp('2020-05-24 05:01:15')
>>> ts + BHalfYearBegin()
Timestamp('2020-07-01 05:01:15')
>>> ts + BHalfYearBegin(2)
Timestamp('2021-01-01 05:01:15')
>>> ts + BHalfYearBegin(startingMonth=2)
Timestamp('2020-08-03 05:01:15')
>>> ts + BHalfYearBegin(-1)
Timestamp('2020-01-01 05:01:15')
"""
_output_name = "BusinessHalfYearBegin"
_default_starting_month = 1
_from_name_starting_month = 1
_prefix = "BHYS"
_day_opt = "business_start"


cdef class HalfYearEnd(HalfYearOffset):
"""
DateOffset increments between half-year end dates.

startingMonth = 1 corresponds to dates like 1/31/2007, 7/31/2007, ...
startingMonth = 2 corresponds to dates like 2/28/2007, 8/31/2007, ...
startingMonth = 6 corresponds to dates like 6/30/2007, 12/31/2007, ...

Attributes
----------
n : int, default 1
The number of half-years represented.
normalize : bool, default False
Normalize start/end dates to midnight before generating date range.
startingMonth : int, default 6
A specific integer for the month of the year from which we start half-years.

See Also
--------
:class:`~pandas.tseries.offsets.DateOffset` : Standard kind of date increment.

Examples
--------
>>> ts = pd.Timestamp(2022, 1, 1)
>>> ts + pd.offsets.HalfYearEnd()
Timestamp('2022-06-30 00:00:00')
"""
_default_starting_month = 6
_from_name_starting_month = 12
_prefix = "HYE"
_day_opt = "end"


cdef class HalfYearBegin(HalfYearOffset):
"""
DateOffset increments between half-year start dates.

startingMonth = 1 corresponds to dates like 1/01/2007, 7/01/2007, ...
startingMonth = 2 corresponds to dates like 2/01/2007, 8/01/2007, ...
startingMonth = 3 corresponds to dates like 3/01/2007, 9/01/2007, ...

Attributes
----------
n : int, default 1
The number of half-years represented.
normalize : bool, default False
Normalize start/end dates to midnight before generating date range.
startingMonth : int, default 1
A specific integer for the month of the year from which we start half-years.

See Also
--------
:class:`~pandas.tseries.offsets.DateOffset` : Standard kind of date increment.

Examples
--------
>>> ts = pd.Timestamp(2022, 2, 1)
>>> ts + pd.offsets.HalfYearBegin()
Timestamp('2022-07-01 00:00:00')
"""
_default_starting_month = 1
_from_name_starting_month = 1
_prefix = "HYS"
_day_opt = "start"


# ----------------------------------------------------------------------
# Month-Based Offset Classes

Expand Down Expand Up @@ -4823,6 +5045,8 @@ prefix_mapping = {
BusinessMonthEnd, # 'BME'
BQuarterEnd, # 'BQE'
BQuarterBegin, # 'BQS'
BHalfYearEnd, # 'BHYE'
BHalfYearBegin, # 'BHYS'
BusinessHour, # 'bh'
CustomBusinessDay, # 'C'
CustomBusinessMonthEnd, # 'CBME'
Expand All @@ -4839,6 +5063,8 @@ prefix_mapping = {
Micro, # 'us'
QuarterEnd, # 'QE'
QuarterBegin, # 'QS'
HalfYearEnd, # 'HYE'
HalfYearBegin, # 'HYS'
Milli, # 'ms'
Hour, # 'h'
Day, # 'D'
Expand Down
Loading