Skip to content

Commit 46c615d

Browse files
DEPR: inplace keyword for Categorical.set_ordered, setting .categories directly (#47834)
* DEPR: inplcae keyword for Categorical.set_ordered, setting .categories directly * update docs * typo fixup * suppress warning Co-authored-by: Jeff Reback <[email protected]>
1 parent 74e08d7 commit 46c615d

File tree

12 files changed

+108
-55
lines changed

12 files changed

+108
-55
lines changed

doc/source/user_guide/10min.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -680,12 +680,12 @@ Converting the raw grades to a categorical data type:
680680
df["grade"] = df["raw_grade"].astype("category")
681681
df["grade"]
682682
683-
Rename the categories to more meaningful names (assigning to
684-
:meth:`Series.cat.categories` is in place!):
683+
Rename the categories to more meaningful names:
685684

686685
.. ipython:: python
687686
688-
df["grade"].cat.categories = ["very good", "good", "very bad"]
687+
new_categories = ["very good", "good", "very bad"]
688+
df["grade"] = df["grade"].cat.rename_categories(new_categories)
689689
690690
Reorder the categories and simultaneously add the missing categories (methods under :meth:`Series.cat` return a new :class:`Series` by default):
691691

doc/source/user_guide/categorical.rst

Lines changed: 8 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -334,18 +334,16 @@ It's also possible to pass in the categories in a specific order:
334334
Renaming categories
335335
~~~~~~~~~~~~~~~~~~~
336336

337-
Renaming categories is done by assigning new values to the
338-
``Series.cat.categories`` property or by using the
337+
Renaming categories is done by using the
339338
:meth:`~pandas.Categorical.rename_categories` method:
340339

341340

342341
.. ipython:: python
343342
344343
s = pd.Series(["a", "b", "c", "a"], dtype="category")
345344
s
346-
s.cat.categories = ["Group %s" % g for g in s.cat.categories]
347-
s
348-
s = s.cat.rename_categories([1, 2, 3])
345+
new_categories = ["Group %s" % g for g in s.cat.categories]
346+
s = s.cat.rename_categories(new_categories)
349347
s
350348
# You can also pass a dict-like object to map the renaming
351349
s = s.cat.rename_categories({1: "x", 2: "y", 3: "z"})
@@ -365,7 +363,7 @@ Categories must be unique or a ``ValueError`` is raised:
365363
.. ipython:: python
366364
367365
try:
368-
s.cat.categories = [1, 1, 1]
366+
s = s.cat.rename_categories([1, 1, 1])
369367
except ValueError as e:
370368
print("ValueError:", str(e))
371369
@@ -374,7 +372,7 @@ Categories must also not be ``NaN`` or a ``ValueError`` is raised:
374372
.. ipython:: python
375373
376374
try:
377-
s.cat.categories = [1, 2, np.nan]
375+
s = s.cat.rename_categories([1, 2, np.nan])
378376
except ValueError as e:
379377
print("ValueError:", str(e))
380378
@@ -702,7 +700,7 @@ of length "1".
702700
.. ipython:: python
703701
704702
df.iat[0, 0]
705-
df["cats"].cat.categories = ["x", "y", "z"]
703+
df["cats"] = df["cats"].cat.rename_categories(["x", "y", "z"])
706704
df.at["h", "cats"] # returns a string
707705
708706
.. note::
@@ -960,7 +958,7 @@ relevant columns back to ``category`` and assign the right categories and catego
960958
961959
s = pd.Series(pd.Categorical(["a", "b", "b", "a", "a", "d"]))
962960
# rename the categories
963-
s.cat.categories = ["very good", "good", "bad"]
961+
s = s.cat.rename_categories(["very good", "good", "bad"])
964962
# reorder the categories and add missing categories
965963
s = s.cat.set_categories(["very bad", "bad", "medium", "good", "very good"])
966964
df = pd.DataFrame({"cats": s, "vals": [1, 2, 3, 4, 5, 6]})
@@ -1164,6 +1162,7 @@ Constructing a ``Series`` from a ``Categorical`` will not copy the input
11641162
change the original ``Categorical``:
11651163

11661164
.. ipython:: python
1165+
:okwarning:
11671166
11681167
cat = pd.Categorical([1, 2, 3, 10], categories=[1, 2, 3, 4, 10])
11691168
s = pd.Series(cat, name="cat")

doc/source/user_guide/io.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -558,7 +558,8 @@ This matches the behavior of :meth:`Categorical.set_categories`.
558558
df = pd.read_csv(StringIO(data), dtype="category")
559559
df.dtypes
560560
df["col3"]
561-
df["col3"].cat.categories = pd.to_numeric(df["col3"].cat.categories)
561+
new_categories = pd.to_numeric(df["col3"].cat.categories)
562+
df["col3"] = df["col3"].cat.rename_categories(new_categories)
562563
df["col3"]
563564
564565

doc/source/whatsnew/v0.19.0.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -271,6 +271,7 @@ Individual columns can be parsed as a ``Categorical`` using a dict specification
271271
such as :func:`to_datetime`.
272272

273273
.. ipython:: python
274+
:okwarning:
274275
275276
df = pd.read_csv(StringIO(data), dtype="category")
276277
df.dtypes

doc/source/whatsnew/v1.5.0.rst

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -820,6 +820,8 @@ Other Deprecations
820820
- Deprecated :meth:`Series.rank` returning an empty result when the dtype is non-numeric and ``numeric_only=True`` is provided; this will raise a ``TypeError`` in a future version (:issue:`47500`)
821821
- Deprecated argument ``errors`` for :meth:`Series.mask`, :meth:`Series.where`, :meth:`DataFrame.mask`, and :meth:`DataFrame.where` as ``errors`` had no effect on this methods (:issue:`47728`)
822822
- Deprecated arguments ``*args`` and ``**kwargs`` in :class:`Rolling`, :class:`Expanding`, and :class:`ExponentialMovingWindow` ops. (:issue:`47836`)
823+
- Deprecated the ``inplace`` keyword in :meth:`Categorical.set_ordered`, :meth:`Categorical.as_ordered`, and :meth:`Categorical.as_unordered` (:issue:`37643`)
824+
- Deprecated setting a categorical's categories with ``cat.categories = ['a', 'b', 'c']``, use :meth:`Categorical.rename_categories` instead (:issue:`37643`)
823825
- Deprecated unused arguments ``encoding`` and ``verbose`` in :meth:`Series.to_excel` and :meth:`DataFrame.to_excel` (:issue:`47912`)
824826
- Deprecated producing a single element when iterating over a :class:`DataFrameGroupBy` or a :class:`SeriesGroupBy` that has been grouped by a list of length 1; A tuple of length one will be returned instead (:issue:`42795`)
825827

pandas/core/arrays/categorical.py

Lines changed: 48 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -745,15 +745,14 @@ def categories(self) -> Index:
745745

746746
@categories.setter
747747
def categories(self, categories) -> None:
748-
new_dtype = CategoricalDtype(categories, ordered=self.ordered)
749-
if self.dtype.categories is not None and len(self.dtype.categories) != len(
750-
new_dtype.categories
751-
):
752-
raise ValueError(
753-
"new categories need to have the same number of "
754-
"items as the old categories!"
755-
)
756-
super().__init__(self._ndarray, new_dtype)
748+
warn(
749+
"Setting categories in-place is deprecated and will raise in a "
750+
"future version. Use rename_categories instead.",
751+
FutureWarning,
752+
stacklevel=find_stack_level(),
753+
)
754+
755+
self._set_categories(categories)
757756

758757
@property
759758
def ordered(self) -> Ordered:
@@ -814,7 +813,7 @@ def _set_categories(self, categories, fastpath=False):
814813
):
815814
raise ValueError(
816815
"new categories need to have the same number of "
817-
"items than the old categories!"
816+
"items as the old categories!"
818817
)
819818

820819
super().__init__(self._ndarray, new_dtype)
@@ -836,7 +835,9 @@ def _set_dtype(self, dtype: CategoricalDtype) -> Categorical:
836835
return type(self)(codes, dtype=dtype, fastpath=True)
837836

838837
@overload
839-
def set_ordered(self, value, *, inplace: Literal[False] = ...) -> Categorical:
838+
def set_ordered(
839+
self, value, *, inplace: NoDefault | Literal[False] = ...
840+
) -> Categorical:
840841
...
841842

842843
@overload
@@ -848,7 +849,9 @@ def set_ordered(self, value, *, inplace: bool) -> Categorical | None:
848849
...
849850

850851
@deprecate_nonkeyword_arguments(version=None, allowed_args=["self", "value"])
851-
def set_ordered(self, value, inplace: bool = False) -> Categorical | None:
852+
def set_ordered(
853+
self, value, inplace: bool | NoDefault = no_default
854+
) -> Categorical | None:
852855
"""
853856
Set the ordered attribute to the boolean value.
854857
@@ -859,7 +862,22 @@ def set_ordered(self, value, inplace: bool = False) -> Categorical | None:
859862
inplace : bool, default False
860863
Whether or not to set the ordered attribute in-place or return
861864
a copy of this categorical with ordered set to the value.
865+
866+
.. deprecated:: 1.5.0
867+
862868
"""
869+
if inplace is not no_default:
870+
warn(
871+
"The `inplace` parameter in pandas.Categorical."
872+
"set_ordered is deprecated and will be removed in "
873+
"a future version. setting ordered-ness on categories will always "
874+
"return a new Categorical object.",
875+
FutureWarning,
876+
stacklevel=find_stack_level(),
877+
)
878+
else:
879+
inplace = False
880+
863881
inplace = validate_bool_kwarg(inplace, "inplace")
864882
new_dtype = CategoricalDtype(self.categories, ordered=value)
865883
cat = self if inplace else self.copy()
@@ -869,15 +887,15 @@ def set_ordered(self, value, inplace: bool = False) -> Categorical | None:
869887
return None
870888

871889
@overload
872-
def as_ordered(self, *, inplace: Literal[False] = ...) -> Categorical:
890+
def as_ordered(self, *, inplace: NoDefault | Literal[False] = ...) -> Categorical:
873891
...
874892

875893
@overload
876894
def as_ordered(self, *, inplace: Literal[True]) -> None:
877895
...
878896

879897
@deprecate_nonkeyword_arguments(version=None, allowed_args=["self"])
880-
def as_ordered(self, inplace: bool = False) -> Categorical | None:
898+
def as_ordered(self, inplace: bool | NoDefault = no_default) -> Categorical | None:
881899
"""
882900
Set the Categorical to be ordered.
883901
@@ -887,24 +905,29 @@ def as_ordered(self, inplace: bool = False) -> Categorical | None:
887905
Whether or not to set the ordered attribute in-place or return
888906
a copy of this categorical with ordered set to True.
889907
908+
.. deprecated:: 1.5.0
909+
890910
Returns
891911
-------
892912
Categorical or None
893913
Ordered Categorical or None if ``inplace=True``.
894914
"""
895-
inplace = validate_bool_kwarg(inplace, "inplace")
915+
if inplace is not no_default:
916+
inplace = validate_bool_kwarg(inplace, "inplace")
896917
return self.set_ordered(True, inplace=inplace)
897918

898919
@overload
899-
def as_unordered(self, *, inplace: Literal[False] = ...) -> Categorical:
920+
def as_unordered(self, *, inplace: NoDefault | Literal[False] = ...) -> Categorical:
900921
...
901922

902923
@overload
903924
def as_unordered(self, *, inplace: Literal[True]) -> None:
904925
...
905926

906927
@deprecate_nonkeyword_arguments(version=None, allowed_args=["self"])
907-
def as_unordered(self, inplace: bool = False) -> Categorical | None:
928+
def as_unordered(
929+
self, inplace: bool | NoDefault = no_default
930+
) -> Categorical | None:
908931
"""
909932
Set the Categorical to be unordered.
910933
@@ -914,12 +937,15 @@ def as_unordered(self, inplace: bool = False) -> Categorical | None:
914937
Whether or not to set the ordered attribute in-place or return
915938
a copy of this categorical with ordered set to False.
916939
940+
.. deprecated:: 1.5.0
941+
917942
Returns
918943
-------
919944
Categorical or None
920945
Unordered Categorical or None if ``inplace=True``.
921946
"""
922-
inplace = validate_bool_kwarg(inplace, "inplace")
947+
if inplace is not no_default:
948+
inplace = validate_bool_kwarg(inplace, "inplace")
923949
return self.set_ordered(False, inplace=inplace)
924950

925951
def set_categories(
@@ -1108,11 +1134,11 @@ def rename_categories(
11081134
cat = self if inplace else self.copy()
11091135

11101136
if is_dict_like(new_categories):
1111-
cat.categories = [new_categories.get(item, item) for item in cat.categories]
1137+
new_categories = [new_categories.get(item, item) for item in cat.categories]
11121138
elif callable(new_categories):
1113-
cat.categories = [new_categories(item) for item in cat.categories]
1114-
else:
1115-
cat.categories = new_categories
1139+
new_categories = [new_categories(item) for item in cat.categories]
1140+
1141+
cat._set_categories(new_categories)
11161142
if not inplace:
11171143
return cat
11181144
return None

pandas/io/stata.py

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1921,9 +1921,8 @@ def _do_convert_categoricals(
19211921
categories = list(vl.values())
19221922
try:
19231923
# Try to catch duplicate categories
1924-
# error: Incompatible types in assignment (expression has
1925-
# type "List[str]", variable has type "Index")
1926-
cat_data.categories = categories # type: ignore[assignment]
1924+
# TODO: if we get a non-copying rename_categories, use that
1925+
cat_data = cat_data.rename_categories(categories)
19271926
except ValueError as err:
19281927
vc = Series(categories).value_counts()
19291928
repeated_cats = list(vc.index[vc > 1])

pandas/tests/arrays/categorical/test_analytics.py

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -323,13 +323,22 @@ def test_validate_inplace_raises(self, value):
323323
f"received type {type(value).__name__}"
324324
)
325325
with pytest.raises(ValueError, match=msg):
326-
cat.set_ordered(value=True, inplace=value)
326+
with tm.assert_produces_warning(
327+
FutureWarning, match="Use rename_categories"
328+
):
329+
cat.set_ordered(value=True, inplace=value)
327330

328331
with pytest.raises(ValueError, match=msg):
329-
cat.as_ordered(inplace=value)
332+
with tm.assert_produces_warning(
333+
FutureWarning, match="Use rename_categories"
334+
):
335+
cat.as_ordered(inplace=value)
330336

331337
with pytest.raises(ValueError, match=msg):
332-
cat.as_unordered(inplace=value)
338+
with tm.assert_produces_warning(
339+
FutureWarning, match="Use rename_categories"
340+
):
341+
cat.as_unordered(inplace=value)
333342

334343
with pytest.raises(ValueError, match=msg):
335344
with tm.assert_produces_warning(FutureWarning):

pandas/tests/arrays/categorical/test_api.py

Lines changed: 13 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -34,22 +34,30 @@ def test_ordered_api(self):
3434
assert cat4.ordered
3535

3636
def test_set_ordered(self):
37-
37+
msg = (
38+
"The `inplace` parameter in pandas.Categorical.set_ordered is "
39+
"deprecated and will be removed in a future version. setting "
40+
"ordered-ness on categories will always return a new Categorical object"
41+
)
3842
cat = Categorical(["a", "b", "c", "a"], ordered=True)
3943
cat2 = cat.as_unordered()
4044
assert not cat2.ordered
4145
cat2 = cat.as_ordered()
4246
assert cat2.ordered
43-
cat2.as_unordered(inplace=True)
47+
with tm.assert_produces_warning(FutureWarning, match=msg):
48+
cat2.as_unordered(inplace=True)
4449
assert not cat2.ordered
45-
cat2.as_ordered(inplace=True)
50+
with tm.assert_produces_warning(FutureWarning, match=msg):
51+
cat2.as_ordered(inplace=True)
4652
assert cat2.ordered
4753

4854
assert cat2.set_ordered(True).ordered
4955
assert not cat2.set_ordered(False).ordered
50-
cat2.set_ordered(True, inplace=True)
56+
with tm.assert_produces_warning(FutureWarning, match=msg):
57+
cat2.set_ordered(True, inplace=True)
5158
assert cat2.ordered
52-
cat2.set_ordered(False, inplace=True)
59+
with tm.assert_produces_warning(FutureWarning, match=msg):
60+
cat2.set_ordered(False, inplace=True)
5361
assert not cat2.ordered
5462

5563
# removed in 0.19.0

pandas/tests/arrays/categorical/test_indexing.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -194,7 +194,8 @@ def test_periodindex(self):
194194
def test_categories_assignments(self):
195195
cat = Categorical(["a", "b", "c", "a"])
196196
exp = np.array([1, 2, 3, 1], dtype=np.int64)
197-
cat.categories = [1, 2, 3]
197+
with tm.assert_produces_warning(FutureWarning, match="Use rename_categories"):
198+
cat.categories = [1, 2, 3]
198199
tm.assert_numpy_array_equal(cat.__array__(), exp)
199200
tm.assert_index_equal(cat.categories, Index([1, 2, 3]))
200201

@@ -216,8 +217,9 @@ def test_categories_assignments_wrong_length_raises(self, new_categories):
216217
"new categories need to have the same number of items "
217218
"as the old categories!"
218219
)
219-
with pytest.raises(ValueError, match=msg):
220-
cat.categories = new_categories
220+
with tm.assert_produces_warning(FutureWarning, match="Use rename_categories"):
221+
with pytest.raises(ValueError, match=msg):
222+
cat.categories = new_categories
221223

222224
# Combinations of sorted/unique:
223225
@pytest.mark.parametrize(

pandas/tests/series/accessors/test_cat_accessor.py

Lines changed: 8 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -110,7 +110,8 @@ def test_categorical_delegations(self):
110110
ser = Series(Categorical(["a", "b", "c", "a"], ordered=True))
111111
exp_categories = Index(["a", "b", "c"])
112112
tm.assert_index_equal(ser.cat.categories, exp_categories)
113-
ser.cat.categories = [1, 2, 3]
113+
with tm.assert_produces_warning(FutureWarning, match="Use rename_categories"):
114+
ser.cat.categories = [1, 2, 3]
114115
exp_categories = Index([1, 2, 3])
115116
tm.assert_index_equal(ser.cat.categories, exp_categories)
116117

@@ -120,7 +121,8 @@ def test_categorical_delegations(self):
120121
assert ser.cat.ordered
121122
ser = ser.cat.as_unordered()
122123
assert not ser.cat.ordered
123-
return_value = ser.cat.as_ordered(inplace=True)
124+
with tm.assert_produces_warning(FutureWarning, match="The `inplace`"):
125+
return_value = ser.cat.as_ordered(inplace=True)
124126
assert return_value is None
125127
assert ser.cat.ordered
126128

@@ -267,8 +269,10 @@ def test_set_categories_setitem(self):
267269
df = DataFrame({"Survived": [1, 0, 1], "Sex": [0, 1, 1]}, dtype="category")
268270

269271
# change the dtype in-place
270-
df["Survived"].cat.categories = ["No", "Yes"]
271-
df["Sex"].cat.categories = ["female", "male"]
272+
with tm.assert_produces_warning(FutureWarning, match="Use rename_categories"):
273+
df["Survived"].cat.categories = ["No", "Yes"]
274+
with tm.assert_produces_warning(FutureWarning, match="Use rename_categories"):
275+
df["Sex"].cat.categories = ["female", "male"]
272276

273277
# values should not be coerced to NaN
274278
assert list(df["Sex"]) == ["female", "male", "male"]

0 commit comments

Comments
 (0)