-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
REGR: Fix assignment bug for unary operators #39971
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REGR: Fix assignment bug for unary operators #39971
Conversation
for a backport, let's keep this simple. |
@@ -317,3 +317,13 @@ def test_abs_nullable_int(any_signed_nullable_int_dtype, source, target): | |||
result = abs(s) | |||
expected = pd.array(target, dtype=dtype) | |||
tm.assert_extension_array_equal(result, expected) | |||
|
|||
|
|||
@pytest.mark.parametrize("op", ["__neg__", "__abs__"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we also test __invert__, since I believe has the same issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch, that required another change
this is not necessary, these copies are very cheap, semantics are way more important. |
@@ -317,3 +317,13 @@ def test_abs_nullable_int(any_signed_nullable_int_dtype, source, target): | |||
result = abs(s) | |||
expected = pd.array(target, dtype=dtype) | |||
tm.assert_extension_array_equal(result, expected) | |||
|
|||
|
|||
@pytest.mark.parametrize("op", ["__neg__", "__abs__", "__invert__"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you test for boolean as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, added
lgtm. waiting on CI. |
TestAccessor.test_from_coo also failing on master https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=54759&view=logs&j=4b4d380c-92b5-58cf-5b80-2c6c9b1d5805&t=8eab1181-1335-58ed-7ed5-a271d9d86005&l=80
|
will merge after the #39979 is merged. (otherwise may not be able to auto backport) |
@meeseeksdev backport 1.2.x |
) Co-authored-by: Daniel Saxton <[email protected]>
@@ -16,6 +16,7 @@ Fixed regressions | |||
~~~~~~~~~~~~~~~~~ | |||
|
|||
- Fixed regression in :meth:`~DataFrame.to_excel` raising ``KeyError`` when giving duplicate columns with ``columns`` attribute (:issue:`39695`) | |||
- Fixed regression in :class:`IntegerArray` unary ops propagating mask on assignment (:issue:`39943`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For future reference, I think we should generally say something like "nullable integer dtype" instead of "IntegerArray", as most users should never deal / know about IntegerArray
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Valid point, I can update this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@@ -316,13 +316,13 @@ def __init__(self, values: np.ndarray, mask: np.ndarray, copy: bool = False): | |||
super().__init__(values, mask, copy=copy) | |||
|
|||
def __neg__(self): | |||
return type(self)(-self._data, self._mask) | |||
return type(self)(-self._data, self._mask.copy()) | |||
|
|||
def __pos__(self): | |||
return self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should do the same for __pos__
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
probably not for a backport. but indeed this needs some discussion to ensure consistency across the codebase. see TimedeltaArray
def __pos__(self) -> TimedeltaArray:
return type(self)(self._data, freq=self.freq)
>>> import pandas as pd
>>> pd.__version__
'1.3.0.dev0+804.g3289f82975'
>>>
>>> arr = pd.timedelta_range(0, periods=10)._values
>>> arr
<TimedeltaArray>
['0 days', '1 days', '2 days', '3 days', '4 days', '5 days', '6 days',
'7 days', '8 days', '9 days']
Length: 10, dtype: timedelta64[ns]
>>>
>>> arr2 = +arr
>>>
>>> arr2 is arr
False
>>>
>>> arr2._data is arr._data
True
>>>
>>> arr2[5] = None
>>>
>>> arr
<TimedeltaArray>
['0 days', '1 days', '2 days', '3 days', '4 days', NaT, '6 days',
'7 days', '8 days', '9 days']
Length: 10, dtype: timedelta64[ns]
>>>
>>>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I see, so that's indeed something we should fix more generally. Is there already an issue about it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the context here I had been thinking that something like s1 = +s
should behave the same way as s1 = s
, but actually I think you are right. Likely +s
should return something new instead of simply a reference to the thing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there already an issue about it?
no. will need to create one, since the issue that this PR addresses is slightly different. #39943 is about different arrays sharing a mask, whereas returning self for __pos__, although maybe incorrect, does not create the inconsistencies when assigning null and non-null vales
|
||
@pytest.mark.parametrize("op", ["__neg__", "__abs__", "__invert__"]) | ||
@pytest.mark.parametrize( | ||
"values, dtype", [([1, 2, 3], "Int64"), ([True, False, True], "boolean")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can use the data
fixture here as used in the other tests above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is being addressed in #39916, so no action required here
If there was somehow a way to copy the mask only on write so we aren't always copying that may be better. @simonjayhawkins I put this in the
IntegerArray
tests even though we're technically checking aSeries
, let me know if you'd prefer I rewrite or move elsewhere.