Skip to content

REGR: Fix assignment bug for unary operators #39971

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Feb 23, 2021
Merged

REGR: Fix assignment bug for unary operators #39971

merged 6 commits into from
Feb 23, 2021

Conversation

dsaxton
Copy link
Member

@dsaxton dsaxton commented Feb 22, 2021

If there was somehow a way to copy the mask only on write so we aren't always copying that may be better. @simonjayhawkins I put this in the IntegerArray tests even though we're technically checking a Series, let me know if you'd prefer I rewrite or move elsewhere.

@simonjayhawkins simonjayhawkins added Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays labels Feb 22, 2021
@simonjayhawkins simonjayhawkins added this to the 1.2.3 milestone Feb 22, 2021
@simonjayhawkins
Copy link
Member

If there was somehow a way to copy the mask only on write so we aren't always copying that may be better.

for a backport, let's keep this simple.

@@ -317,3 +317,13 @@ def test_abs_nullable_int(any_signed_nullable_int_dtype, source, target):
result = abs(s)
expected = pd.array(target, dtype=dtype)
tm.assert_extension_array_equal(result, expected)


@pytest.mark.parametrize("op", ["__neg__", "__abs__"])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also test __invert__, since I believe has the same issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, that required another change

@jreback
Copy link
Contributor

jreback commented Feb 22, 2021

If there was somehow a way to copy the mask only on write so we aren't always copying that may be better. @simonjayhawkins I put this in the IntegerArray tests even though we're technically checking a Series, let me know if you'd prefer I rewrite or move elsewhere.

this is not necessary, these copies are very cheap, semantics are way more important.

@@ -317,3 +317,13 @@ def test_abs_nullable_int(any_signed_nullable_int_dtype, source, target):
result = abs(s)
expected = pd.array(target, dtype=dtype)
tm.assert_extension_array_equal(result, expected)


@pytest.mark.parametrize("op", ["__neg__", "__abs__", "__invert__"])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you test for boolean as well

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, added

@jreback
Copy link
Contributor

jreback commented Feb 22, 2021

lgtm. waiting on CI.

@simonjayhawkins
Copy link
Member

simonjayhawkins commented Feb 23, 2021

TestAccessor.test_from_coo also failing on master https://dev.azure.com/pandas-dev/pandas/_build/results?buildId=54759&view=logs&j=4b4d380c-92b5-58cf-5b80-2c6c9b1d5805&t=8eab1181-1335-58ed-7ed5-a271d9d86005&l=80

2021-02-23T01:07:24.4100333Z 
2021-02-23T01:07:24.4100753Z ================================== FAILURES ===================================
2021-02-23T01:07:24.4101532Z _________________________ TestAccessor.test_from_coo __________________________
2021-02-23T01:07:24.4102065Z [gw0] win32 -- Python 3.7.9 C:\Miniconda\envs\pandas-dev\python.exe
2021-02-23T01:07:24.4102361Z 
2021-02-23T01:07:24.4103135Z self = <pandas.tests.arrays.sparse.test_array.TestAccessor object at 0x000001317E6BC888>
2021-02-23T01:07:24.4103462Z 
2021-02-23T01:07:24.4103797Z     @td.skip_if_no_scipy
2021-02-23T01:07:24.4104207Z     def test_from_coo(self):
2021-02-23T01:07:24.4104597Z         import scipy.sparse
2021-02-23T01:07:24.4104947Z     
2021-02-23T01:07:24.4105283Z         row = [0, 3, 1, 0]
2021-02-23T01:07:24.4105674Z         col = [0, 3, 1, 2]
2021-02-23T01:07:24.4106045Z         data = [4, 5, 7, 9]
2021-02-23T01:07:24.4106494Z         sp_array = scipy.sparse.coo_matrix((data, (row, col)))
2021-02-23T01:07:24.4106960Z         result = pd.Series.sparse.from_coo(sp_array)
2021-02-23T01:07:24.4107356Z     
2021-02-23T01:07:24.4107834Z         index = pd.MultiIndex.from_arrays([[0, 0, 1, 3], [0, 2, 1, 3]])
2021-02-23T01:07:24.4108389Z         expected = pd.Series([4, 9, 7, 5], index=index, dtype="Sparse[int]")
2021-02-23T01:07:24.4109198Z >       tm.assert_series_equal(result, expected)
2021-02-23T01:07:24.4109817Z E       AssertionError: Attributes of Series are different
2021-02-23T01:07:24.4110213Z E       
2021-02-23T01:07:24.4110548Z E       Attribute "dtype" are different
2021-02-23T01:07:24.4110965Z E       [left]:  Sparse[float64, nan]
2021-02-23T01:07:24.4111366Z E       [right]: Sparse[int32, 0]
2021-02-23T01:07:24.4111579Z 
2021-02-23T01:07:24.4112141Z pandas\tests\arrays\sparse\test_array.py:1193: AssertionError
2021-02-23T01:07:24.4112871Z ============================== warnings summary ===============================

@simonjayhawkins
Copy link
Member

will merge after the #39979 is merged. (otherwise may not be able to auto backport)

@simonjayhawkins simonjayhawkins merged commit d25cb7f into pandas-dev:master Feb 23, 2021
@simonjayhawkins
Copy link
Member

@meeseeksdev backport 1.2.x

@@ -16,6 +16,7 @@ Fixed regressions
~~~~~~~~~~~~~~~~~

- Fixed regression in :meth:`~DataFrame.to_excel` raising ``KeyError`` when giving duplicate columns with ``columns`` attribute (:issue:`39695`)
- Fixed regression in :class:`IntegerArray` unary ops propagating mask on assignment (:issue:`39943`)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For future reference, I think we should generally say something like "nullable integer dtype" instead of "IntegerArray", as most users should never deal / know about IntegerArray

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Valid point, I can update this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done. #40019. thanks @dsaxton

@@ -316,13 +316,13 @@ def __init__(self, values: np.ndarray, mask: np.ndarray, copy: bool = False):
super().__init__(values, mask, copy=copy)

def __neg__(self):
return type(self)(-self._data, self._mask)
return type(self)(-self._data, self._mask.copy())

def __pos__(self):
return self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should do the same for __pos__?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably not for a backport. but indeed this needs some discussion to ensure consistency across the codebase. see TimedeltaArray

    def __pos__(self) -> TimedeltaArray:
        return type(self)(self._data, freq=self.freq)
>>> import pandas as pd
>>> pd.__version__
'1.3.0.dev0+804.g3289f82975'
>>>
>>> arr = pd.timedelta_range(0, periods=10)._values
>>> arr
<TimedeltaArray>
['0 days', '1 days', '2 days', '3 days', '4 days', '5 days', '6 days',
 '7 days', '8 days', '9 days']
Length: 10, dtype: timedelta64[ns]
>>>
>>> arr2 = +arr
>>>
>>> arr2 is arr
False
>>>
>>> arr2._data is arr._data
True
>>>
>>> arr2[5] = None
>>>
>>> arr
<TimedeltaArray>
['0 days', '1 days', '2 days', '3 days', '4 days',      NaT, '6 days',
 '7 days', '8 days', '9 days']
Length: 10, dtype: timedelta64[ns]
>>>
>>>

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see, so that's indeed something we should fix more generally. Is there already an issue about it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the context here I had been thinking that something like s1 = +s should behave the same way as s1 = s, but actually I think you are right. Likely +s should return something new instead of simply a reference to the thing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there already an issue about it?

no. will need to create one, since the issue that this PR addresses is slightly different. #39943 is about different arrays sharing a mask, whereas returning self for __pos__, although maybe incorrect, does not create the inconsistencies when assigning null and non-null vales


@pytest.mark.parametrize("op", ["__neg__", "__abs__", "__invert__"])
@pytest.mark.parametrize(
"values, dtype", [([1, 2, 3], "Int64"), ([True, False, True], "boolean")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can use the data fixture here as used in the other tests above

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is being addressed in #39916, so no action required here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug NA - MaskedArrays Related to pd.NA and nullable extension arrays
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants