Skip to content

BUG (string dtype): replace() value in string column with non-string should cast to object dtype instead of raising an error #60282

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jorisvandenbossche opened this issue Nov 12, 2024 · 0 comments · Fixed by #60285
Labels
Bug replace replace method Strings String extension data type and string data
Milestone

Comments

@jorisvandenbossche
Copy link
Member

For all other dtypes (I think, just checked with the one below), if the value to replace with in replace() doesn't fit into the calling series, then we "upcast" to object dtype and then do the replacement anyway.

Simple example with an integer series:

>>> ser = pd.Series([1, 2])
>>> ser.replace(1, "str")
0    str
1      2
dtype: object

However, for the future string dtype, and then trying to replace a value with a non-string, we do not cast to object dtype currently, but raise instead:

>>> pd.options.future.infer_string = True
>>> ser = pd.Series(["a", "b"])
>>> ser.replace("a", 1)
...
File ~/scipy/repos/pandas/pandas/core/internals/blocks.py:713, in Block.replace(self, to_replace, value, inplace, mask)
    709 elif self._can_hold_element(value):
    710     # TODO(CoW): Maybe split here as well into columns where mask has True
    711     # and rest?
    712     blk = self._maybe_copy(inplace)
--> 713     putmask_inplace(blk.values, mask, value)
    714     return [blk]
    716 elif self.ndim == 1 or self.shape[0] == 1:
...

File ~/scipy/repos/pandas/pandas/core/arrays/string_.py:746, in __setitem__(self, key, value)
...
TypeError: Invalid value '1' for dtype 'str'. Value should be a string or missing value, got 'int' instead.

Making replace() strict (preserve dtype) in general is a much bigger topic, so I think for now we should just keep the current behaviour of upcasting to object dtype when needed.

@jorisvandenbossche jorisvandenbossche added Bug Strings String extension data type and string data replace replace method labels Nov 12, 2024
@jorisvandenbossche jorisvandenbossche added this to the 2.3 milestone Nov 12, 2024
@jorisvandenbossche jorisvandenbossche changed the title BUG (string dtype): replace value in string column with non-string should cast to object dtype instead of raising an error BUG (string dtype): replace() value in string column with non-string should cast to object dtype instead of raising an error Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug replace replace method Strings String extension data type and string data
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant