Skip to content

Commit a457506

Browse files
committed
PERF: avoid creating numpy array in groupby.first|last
1 parent 085af07 commit a457506

File tree

2 files changed

+18
-15
lines changed

2 files changed

+18
-15
lines changed

doc/source/whatsnew/v1.1.0.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -606,6 +606,8 @@ Performance improvements
606606
sparse values from ``scipy.sparse`` matrices using the
607607
:meth:`DataFrame.sparse.from_spmatrix` constructor (:issue:`32821`,
608608
:issue:`32825`, :issue:`32826`, :issue:`32856`, :issue:`32858`).
609+
- Performance improvement for groupby methods :meth:`~pandas.core.groupby.groupby.Groupby.first`
610+
and :meth:`~pandas.core.groupby.groupby.Groupby.last` (:issue:`xxxxx`)
609611
- Performance improvement in :func:`factorize` for nullable (integer and boolean) dtypes (:issue:`33064`).
610612
- Performance improvement in reductions (sum, prod, min, max) for nullable (integer and boolean) dtypes (:issue:`30982`, :issue:`33261`, :issue:`33442`).
611613

pandas/core/groupby/groupby.py

Lines changed: 16 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1504,32 +1504,33 @@ def func(self, numeric_only=numeric_only, min_count=min_count):
15041504

15051505
return func
15061506

1507-
def first_compat(x, axis=0):
1508-
def first(x):
1509-
x = x.to_numpy()
1510-
1511-
x = x[notna(x)]
1507+
def first_compat(obj: FrameOrSeries, axis: int = 0):
1508+
def first(x: Series):
1509+
x = x.array[notna(x.array)]
15121510
if len(x) == 0:
15131511
return np.nan
15141512
return x[0]
15151513

1516-
if isinstance(x, DataFrame):
1517-
return x.apply(first, axis=axis)
1514+
if isinstance(obj, DataFrame):
1515+
return obj.apply(first, axis=axis)
1516+
elif isinstance(obj, Series):
1517+
return first(obj)
15181518
else:
1519-
return first(x)
1519+
raise TypeError(type(obj))
15201520

1521-
def last_compat(x, axis=0):
1522-
def last(x):
1523-
x = x.to_numpy()
1524-
x = x[notna(x)]
1521+
def last_compat(obj: FrameOrSeries, axis: int = 0):
1522+
def last(x: Series):
1523+
x = x.array[notna(x.array)]
15251524
if len(x) == 0:
15261525
return np.nan
15271526
return x[-1]
15281527

1529-
if isinstance(x, DataFrame):
1530-
return x.apply(last, axis=axis)
1528+
if isinstance(obj, DataFrame):
1529+
return obj.apply(last, axis=axis)
1530+
elif isinstance(obj, Series):
1531+
return last(obj)
15311532
else:
1532-
return last(x)
1533+
raise TypeError(type(obj))
15331534

15341535
cls.sum = groupby_function("sum", "add", np.sum, min_count=0)
15351536
cls.prod = groupby_function("prod", "prod", np.prod, min_count=0)

0 commit comments

Comments
 (0)