Skip to content

TST: base test for ExtensionArray.astype to its own type + copy keyword #35116

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 33 commits into from
Sep 22, 2020

Conversation

tomaszps
Copy link
Contributor

@tomaszps tomaszps commented Jul 3, 2020

Tomasz Sakrejda added 10 commits July 2, 2020 16:41
- Issue is that .astype of self's type always returned a copy
- Change BooleanArray fix to have consistent test order
-Just check if dtype is same, then return self or self.copy()
- Make test order consistent w/ other classes
-Continue to make style consistent
- I turned off the test that was failing temporarily because of __eq__
so that I could continue with the other tests
@tomaszps
Copy link
Contributor Author

tomaszps commented Jul 3, 2020

Notes: The test_string tests still fail, I have some ideas about how to fix it but I'm not sure about them. I'll try to give it a go.

I'm also planning to squash&merge and clean up the commit names. If I should do that beforehand so other folks can see it, that's fine too. I'll just have to open another PR?

@tomaszps tomaszps changed the title Extension base test TST: Extension base Jul 3, 2020
@tomaszps
Copy link
Contributor Author

tomaszps commented Jul 7, 2020

This branch is not building because of an issue I'm not sure what folks want to do with- the "test_string" tests.

pandas/tests/extension/test_string.py::TestCasting::test_astype_own_type[True] FAILED [ 87%]
(assert_extension_array_equal is what fails in testing_ module, at line 1198.)

I believe it fails because StringArray does not have eq implemented, which ExtensionArray requires.

@gfyoung gfyoung added ExtensionArray Extending pandas with custom dtypes or arrays. Testing pandas testing functions or related to the test suite labels Jul 9, 2020
@tomaszps
Copy link
Contributor Author

I don't know what to do next. I can take a look at implementing eq, but I wanted an okay to move forward on that.

@TomAugspurger
Copy link
Contributor

Sorry for the delay, prepping the release. No need to squash. We do that on merge.

There are some merge conflicts now. Can you fix those?

I don't know what to do next. I can take a look at implementing eq, but I wanted an okay to move forward on that.

Let's keep each PR as small as possible.

@tomaszps
Copy link
Contributor Author

Sounds good, I should get around to the merge conflicts tomorrow.

@tomaszps
Copy link
Contributor Author

Fixed up the merge conflicts. There are still two tests that aren't passing due to the eq issue.

pandas/tests/extension/test_string.py::TestCasting::test_astype_own_type[False] FAILED [ 90%]```

@tomaszps
Copy link
Contributor Author

tomaszps commented Sep 6, 2020

I temporarily disabled the failing test, in case that's why nobody was looking at this PR yet.

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this!

Comment on lines 1066 to 1069
if dtype == self.dtype and copy:
return self.copy()
elif dtype == self.dtype and not copy:
return self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you do this one using the same structure as the others (first only checking if dtype == self.dtype: and then if not copy: ... else: ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep! Of course.

pass
#
# class TestCasting(base.BaseCastingTests):
# pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you re-enable this again? Then we can look into the failure (and it will need to work anyway, before we can merge this PR)

(__eq__ is certainly implemented for StringArray, but something else might be going wrong)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem. (Sorry about that, from what I could see a lack of eq looked like the issue. I'll try taking another look at it.)

@jorisvandenbossche jorisvandenbossche changed the title TST: Extension base TST: base test for ExtensionArray.astype to its own type + copy keyword Sep 7, 2020
@tomaszps
Copy link
Contributor Author

tomaszps commented Sep 9, 2020

I think I found the problem, and I'm not sure of the correct layer to fix. I'd appreciate your input.

/Users/tomasz/src/dev_pandas/pandas/_testing.py(1196)assert_extension_array_equal() does an astype with the base object class. (on line 1194).

> /Users/tomasz/src/dev_pandas/pandas/core/arrays/string_.py(283)astype()
transforms it into dtype('O') (Line 271), and then checks if it's an instance of StringDtype(line 272). My guess is that that call should return true?

The result is it goes through to the base numpy_. extensionarray astype method, which blows up when you try to compare dtype('O') (not sure what the name should be. Pandas object dtype?) and self.dtype, which is StringDtype.

You should be able to replicate easily enough by pulling the pr and running pytest pandas/tests/extension/ -k test_astype_own_type --tb=line --pdb (I suspect you know already, just thought I'd make it convenient for you.)

if not copy:
return self
elif copy:
return self.copy()
dtype = pandas_dtype(dtype)
if isinstance(dtype, StringDtype):
if copy:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you can remove the new code, as you can see on this line and the two lines below, the logic is already present

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point; must've not been paying attention.

elif copy:
return self.copy()
else:
return super().astype(dtype, copy)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to move this to the base class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, tried it and it worked, seems logically consistent. Going to also look to see if I can drop the checks elsewhere by leaving it to the base class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't seem so, without doing more restructuring than I'm comfortable with ATM.

@@ -276,6 +276,15 @@ def __setitem__(self, key, value) -> None:

self._ndarray[key] = value

def astype(self, dtype, copy: bool = True) -> ArrayLike:
if dtype == self.dtype:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the problem is that np.dtype(object) == ExtensionDtype raises an error, which is an error coming from numpy and not directly solvable in pandas.
So to workaround it, can you do is_dtype_equal(dtype, self.dtype):, with is_dtype_equal imported from pandas.core.dtypes.common

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed and moved checks to base class. This just calls super().astype now.

Tomasz Sakrejda added 2 commits September 9, 2020 11:54
- Remove corresponding check from numpy_.py
- Note that all tests now pass
Comment on lines 137 to 138
elif copy:
return self.copy()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche Is this implemented correctly? I'm concerned because of the way returning a copy is implemented on line 141, which seems to take into account dtype.context. Maybe the fix would be to remove lines 137-138 and let that case fall to 141-142?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is correct, yes. The first is_dtype_equal(dtype, self._dtype) will only catch decimal dtypes with an equal context, and then the if isinstance(dtype, type(self.dtype)): will work correctly in case the context differs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, but we might need to pass through the copy

(anyway, this array implementation is only for testing, it's not actually used, so it's not super important)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Easy enough to check, so I changed it to pass through the copy. It passes the test suite the same way, and seems right, so I'll go with that version.

@tomaszps
Copy link
Contributor Author

tomaszps commented Sep 9, 2020

Welp, ran the full test suite and saw things failing. I'll go in and start trying to fix them. Should've been doing that locally given that the builds were failing.

edit: Cool, looks like it was the same problem you mentioned with is_dtype_equal in the other checks as well.
edit2: Mostly. Ah well, far fewer errors anyway.

@tomaszps
Copy link
Contributor Author

tomaszps commented Sep 9, 2020

Something I don't understand is going wrong with pytests' xfail marking- tests are still failing despite being marked. You can run pandas/tests/extension/test_numpy.py with my latest commit to see what I'm talking about?

(I was unfamiliar with this pytest feature, so I might be misinterpreting.)

@tomaszps
Copy link
Contributor Author

tomaszps commented Sep 9, 2020

A copy/paste of the (failing) output of the entire test suite, for reference:

=========================================================== FAILURES ===========================================================
_____________________________________ TestGetitem.test_loc_iloc_frame_single_dtype[float] ______________________________________
[XPASS(strict)] GH#33125 astype doesn't recognize data.dtype
____________________________________ TestGroupby.test_groupby_extension_apply[object-float] ____________________________________
[XPASS(strict)] GH#33125 astype doesn't recognize data.dtype
_____________________________________________ TestPrinting.test_series_repr[float] _____________________________________________
[XPASS(strict)] GH#33125 PandasArray.astype does not recognize PandasDtype
____________________________________________ TestPrinting.test_series_repr[object] _____________________________________________
[XPASS(strict)] GH#33125 PandasArray.astype does not recognize PandasDtype
_____________________________________ TestReadHtml.test_banklist_url_positional_match[bs4] _____________________________________

self = <pandas.tests.io.test_html.TestReadHtml object at 0x7f8a138168e0>

    @tm.network
    def test_banklist_url_positional_match(self):
        url = "http://www.fdic.gov/bank/individual/failed/banklist.html"
        # Passing match argument as positional should cause a FutureWarning.
        with tm.assert_produces_warning(FutureWarning):
>           df1 = self.read_html(
                url, "First Federal Bank of Florida", attrs={"id": "table"}
            )

pandas/tests/io/test_html.py:129:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas/util/_decorators.py:296: in wrapper
    return func(*args, **kwargs)
pandas/io/html.py:1086: in read_html
    return _parse(
pandas/io/html.py:898: in _parse
    tables = p.parse_tables()
pandas/io/html.py:217: in parse_tables
    tables = self._parse_tables(self._build_doc(), self.match, self.attrs)
pandas/io/html.py:596: in _build_doc
    bdoc = self._setup_build_doc()
pandas/io/html.py:588: in _setup_build_doc
    raw_text = _read(self.io)
pandas/io/html.py:125: in _read
    with urlopen(obj) as url:
pandas/io/common.py:153: in urlopen
    return urllib.request.urlopen(*args, **kwargs)
../../miniconda3/envs/pandas-dev/lib/python3.8/urllib/request.py:222: in urlopen
    return opener.open(url, data, timeout)
../../miniconda3/envs/pandas-dev/lib/python3.8/urllib/request.py:531: in open
    response = meth(req, response)
../../miniconda3/envs/pandas-dev/lib/python3.8/urllib/request.py:640: in http_response
    response = self.parent.error(
../../miniconda3/envs/pandas-dev/lib/python3.8/urllib/request.py:569: in error
    return self._call_chain(*args)
../../miniconda3/envs/pandas-dev/lib/python3.8/urllib/request.py:502: in _call_chain
    result = func(*args)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <urllib.request.HTTPDefaultErrorHandler object at 0x7f8a25a685b0>
req = <urllib.request.Request object at 0x7f8a1447bb80>, fp = <http.client.HTTPResponse object at 0x7f8a1447bd00>, code = 404
msg = 'Not Found', hdrs = <http.client.HTTPMessage object at 0x7f8a14485040>

    def http_error_default(self, req, fp, code, msg, hdrs):
>       raise HTTPError(req.full_url, code, msg, hdrs, fp)
E       urllib.error.HTTPError: HTTP Error 404: Not Found

../../miniconda3/envs/pandas-dev/lib/python3.8/urllib/request.py:649: HTTPError
_____________________________________________ TestReadHtml.test_banklist_url[bs4] ______________________________________________

self = <pandas.tests.io.test_html.TestReadHtml object at 0x7f8a0aca8520>

    @tm.network
    def test_banklist_url(self):
        url = "http://www.fdic.gov/bank/individual/failed/banklist.html"
>       df1 = self.read_html(
            url, match="First Federal Bank of Florida", attrs={"id": "table"}
        )

pandas/tests/io/test_html.py:140:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas/util/_decorators.py:296: in wrapper
    return func(*args, **kwargs)
pandas/io/html.py:1086: in read_html
    return _parse(
pandas/io/html.py:898: in _parse
    tables = p.parse_tables()
pandas/io/html.py:217: in parse_tables
    tables = self._parse_tables(self._build_doc(), self.match, self.attrs)
pandas/io/html.py:596: in _build_doc
    bdoc = self._setup_build_doc()
pandas/io/html.py:588: in _setup_build_doc
    raw_text = _read(self.io)
pandas/io/html.py:125: in _read
    with urlopen(obj) as url:
pandas/io/common.py:153: in urlopen
    return urllib.request.urlopen(*args, **kwargs)
../../miniconda3/envs/pandas-dev/lib/python3.8/urllib/request.py:222: in urlopen
    return opener.open(url, data, timeout)
../../miniconda3/envs/pandas-dev/lib/python3.8/urllib/request.py:531: in open
    response = meth(req, response)
../../miniconda3/envs/pandas-dev/lib/python3.8/urllib/request.py:640: in http_response
    response = self.parent.error(
../../miniconda3/envs/pandas-dev/lib/python3.8/urllib/request.py:569: in error
    return self._call_chain(*args)
../../miniconda3/envs/pandas-dev/lib/python3.8/urllib/request.py:502: in _call_chain
    result = func(*args)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <urllib.request.HTTPDefaultErrorHandler object at 0x7f8a25a685b0>
req = <urllib.request.Request object at 0x7f8a0aca88e0>, fp = <http.client.HTTPResponse object at 0x7f8a0aca8b50>, code = 404
msg = 'Not Found', hdrs = <http.client.HTTPMessage object at 0x7f8a0aca8d30>

    def http_error_default(self, req, fp, code, msg, hdrs):
>       raise HTTPError(req.full_url, code, msg, hdrs, fp)
E       urllib.error.HTTPError: HTTP Error 404: Not Found

../../miniconda3/envs/pandas-dev/lib/python3.8/urllib/request.py:649: HTTPError
____________________________________ TestReadHtml.test_banklist_url_positional_match[lxml] _____________________________________

self = <pandas.tests.io.test_html.TestReadHtml object at 0x7f89fcb8e340>

    @tm.network
    def test_banklist_url_positional_match(self):
        url = "http://www.fdic.gov/bank/individual/failed/banklist.html"
        # Passing match argument as positional should cause a FutureWarning.
        with tm.assert_produces_warning(FutureWarning):
>           df1 = self.read_html(
                url, "First Federal Bank of Florida", attrs={"id": "table"}
            )

pandas/tests/io/test_html.py:129:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas/util/_decorators.py:296: in wrapper
    return func(*args, **kwargs)
pandas/io/html.py:1086: in read_html
    return _parse(
pandas/io/html.py:898: in _parse
    tables = p.parse_tables()
pandas/io/html.py:217: in parse_tables
    tables = self._parse_tables(self._build_doc(), self.match, self.attrs)
pandas/io/html.py:736: in _build_doc
    raise e
pandas/io/html.py:717: in _build_doc
    with urlopen(self.io) as f:
pandas/io/common.py:153: in urlopen
    return urllib.request.urlopen(*args, **kwargs)
../../miniconda3/envs/pandas-dev/lib/python3.8/urllib/request.py:222: in urlopen
    return opener.open(url, data, timeout)
../../miniconda3/envs/pandas-dev/lib/python3.8/urllib/request.py:531: in open
    response = meth(req, response)
../../miniconda3/envs/pandas-dev/lib/python3.8/urllib/request.py:640: in http_response
    response = self.parent.error(
../../miniconda3/envs/pandas-dev/lib/python3.8/urllib/request.py:569: in error
    return self._call_chain(*args)
../../miniconda3/envs/pandas-dev/lib/python3.8/urllib/request.py:502: in _call_chain
    result = func(*args)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <urllib.request.HTTPDefaultErrorHandler object at 0x7f8a25a685b0>
req = <urllib.request.Request object at 0x7f89fdac5910>, fp = <http.client.HTTPResponse object at 0x7f89fdac57c0>, code = 404
msg = 'Not Found', hdrs = <http.client.HTTPMessage object at 0x7f89fdac5ac0>

    def http_error_default(self, req, fp, code, msg, hdrs):
>       raise HTTPError(req.full_url, code, msg, hdrs, fp)
E       urllib.error.HTTPError: HTTP Error 404: Not Found

../../miniconda3/envs/pandas-dev/lib/python3.8/urllib/request.py:649: HTTPError
_____________________________________________ TestReadHtml.test_banklist_url[lxml] _____________________________________________

self = <pandas.tests.io.test_html.TestReadHtml object at 0x7f8a07710040>

    @tm.network
    def test_banklist_url(self):
        url = "http://www.fdic.gov/bank/individual/failed/banklist.html"
>       df1 = self.read_html(
            url, match="First Federal Bank of Florida", attrs={"id": "table"}
        )

pandas/tests/io/test_html.py:140:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
pandas/util/_decorators.py:296: in wrapper
    return func(*args, **kwargs)
pandas/io/html.py:1086: in read_html
    return _parse(
pandas/io/html.py:898: in _parse
    tables = p.parse_tables()
pandas/io/html.py:217: in parse_tables
    tables = self._parse_tables(self._build_doc(), self.match, self.attrs)
pandas/io/html.py:736: in _build_doc
    raise e
pandas/io/html.py:717: in _build_doc
    with urlopen(self.io) as f:
pandas/io/common.py:153: in urlopen
    return urllib.request.urlopen(*args, **kwargs)
../../miniconda3/envs/pandas-dev/lib/python3.8/urllib/request.py:222: in urlopen
    return opener.open(url, data, timeout)
../../miniconda3/envs/pandas-dev/lib/python3.8/urllib/request.py:531: in open
    response = meth(req, response)
../../miniconda3/envs/pandas-dev/lib/python3.8/urllib/request.py:640: in http_response
    response = self.parent.error(
../../miniconda3/envs/pandas-dev/lib/python3.8/urllib/request.py:569: in error
    return self._call_chain(*args)
../../miniconda3/envs/pandas-dev/lib/python3.8/urllib/request.py:502: in _call_chain
    result = func(*args)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <urllib.request.HTTPDefaultErrorHandler object at 0x7f8a25a685b0>
req = <urllib.request.Request object at 0x7f8a07710970>, fp = <http.client.HTTPResponse object at 0x7f8a07710940>, code = 404
msg = 'Not Found', hdrs = <http.client.HTTPMessage object at 0x7f8a07710c40>

    def http_error_default(self, req, fp, code, msg, hdrs):
>       raise HTTPError(req.full_url, code, msg, hdrs, fp)
E       urllib.error.HTTPError: HTTP Error 404: Not Found

../../miniconda3/envs/pandas-dev/lib/python3.8/urllib/request.py:649: HTTPError
____________________________________________ TestTSPlot.test_ts_plot_with_tz['UTC'] ____________________________________________

self = <pandas.tests.plotting.test_datetimelike.TestTSPlot object at 0x7f89dcd535b0>, tz_aware_fixture = 'UTC'

    @pytest.mark.slow
    def test_ts_plot_with_tz(self, tz_aware_fixture):
        # GH2877, GH17173, GH31205, GH31580
        tz = tz_aware_fixture
        index = date_range("1/1/2011", periods=2, freq="H", tz=tz)
        ts = Series([188.5, 328.25], index=index)
        with tm.assert_produces_warning(None):
            _check_plot_works(ts.plot)
            ax = ts.plot()
            xdata = list(ax.get_lines())[0].get_xdata()
            # Check first and last points' labels are correct
>           assert (xdata[0].hour, xdata[0].minute) == (0, 0)
E           AttributeError: 'numpy.datetime64' object has no attribute 'hour'

pandas/tests/plotting/test_datetimelike.py:57: AttributeError
======================================================= warnings summary =======================================================
pandas/tests/arrays/masked/test_arithmetic.py::test_array_scalar_like_equivalence[boolean-__pow__]
pandas/tests/extension/test_boolean.py::TestArithmeticOps::test_arith_series_with_scalar[__pow__]
pandas/tests/extension/test_boolean.py::TestArithmeticOps::test_arith_frame_with_scalar[__pow__]
  /Users/tomasz/src/dev_pandas/pandas/core/arrays/boolean.py:727: DeprecationWarning: In future, it will be an error for 'np.bool_' scalars to be interpreted as an index
    result = op(self._data, other)

pandas/tests/frame/test_api.py::TestDataFrameMisc::test_constructor_expanddim_lookup
  /Users/tomasz/miniconda3/envs/pandas-dev/lib/python3.8/inspect.py:350: FutureWarning: _AXIS_NAMES has been deprecated.
    value = getattr(object, key)

pandas/tests/frame/test_api.py::TestDataFrameMisc::test_constructor_expanddim_lookup
  /Users/tomasz/miniconda3/envs/pandas-dev/lib/python3.8/inspect.py:350: FutureWarning: _AXIS_NUMBERS has been deprecated.
    value = getattr(object, key)

pandas/tests/indexes/test_index_new.py::TestIndexConstructorInference::test_constructor_infer_nat_dt_like[<NA>-DatetimeIndex-datetime64[ns]-ctor0-0]
pandas/tests/indexes/test_index_new.py::TestIndexConstructorInference::test_constructor_infer_nat_dt_like[<NA>-DatetimeIndex-datetime64[ns]-ctor0-0]
pandas/tests/indexes/test_index_new.py::TestIndexConstructorInference::test_constructor_infer_nat_dt_like[<NA>-DatetimeIndex-datetime64[ns]-ctor0-1]
pandas/tests/indexes/test_index_new.py::TestIndexConstructorInference::test_constructor_infer_nat_dt_like[<NA>-DatetimeIndex-datetime64[ns]-ctor0-1]
pandas/tests/indexes/test_index_new.py::TestIndexConstructorInference::test_constructor_infer_nat_dt_like[<NA>-TimedeltaIndex-timedelta64[ns]-ctor1-0]
pandas/tests/indexes/test_index_new.py::TestIndexConstructorInference::test_constructor_infer_nat_dt_like[<NA>-TimedeltaIndex-timedelta64[ns]-ctor1-0]
pandas/tests/indexes/test_index_new.py::TestIndexConstructorInference::test_constructor_infer_nat_dt_like[<NA>-TimedeltaIndex-timedelta64[ns]-ctor1-1]
pandas/tests/indexes/test_index_new.py::TestIndexConstructorInference::test_constructor_infer_nat_dt_like[<NA>-TimedeltaIndex-timedelta64[ns]-ctor1-1]
  /Users/tomasz/src/dev_pandas/pandas/_testing.py:790: DeprecationWarning: elementwise comparison failed; this will raise an error in the future.
    diff = np.sum((left.values != right.values).astype(int)) * 100.0 / len(left)

pandas/tests/io/test_common.py: 4 tests with warnings
pandas/tests/io/excel/test_readers.py: 18 tests with warnings
pandas/tests/io/excel/test_writers.py: 1094 tests with warnings
pandas/tests/io/excel/test_xlrd.py: 8 tests with warnings
  /Users/tomasz/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/xlrd/xlsx.py:266: DeprecationWarning: This method will be removed in future versions.  Use 'tree.iter()' or 'list(tree.iter())' instead.
    for elem in self.tree.iter() if Element_has_iter else self.tree.getiterator():

pandas/tests/io/test_common.py: 2 tests with warnings
pandas/tests/io/excel/test_readers.py: 9 tests with warnings
pandas/tests/io/excel/test_writers.py: 547 tests with warnings
pandas/tests/io/excel/test_xlrd.py: 4 tests with warnings
  /Users/tomasz/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/xlrd/xlsx.py:312: DeprecationWarning: This method will be removed in future versions.  Use 'tree.iter()' or 'list(tree.iter())' instead.
    for elem in self.tree.iter() if Element_has_iter else self.tree.getiterator():

pandas/tests/io/test_fsspec.py::test_fastparquet_options
pandas/tests/io/test_fsspec.py::test_s3_parquet
pandas/tests/io/test_parquet.py::test_cross_engine_pa_fp
pandas/tests/io/test_parquet.py::test_cross_engine_pa_fp
pandas/tests/io/test_parquet.py::TestParquetFastParquet::test_filter_row_groups
pandas/tests/io/test_parquet.py::TestParquetFastParquet::test_partition_cols_supported
pandas/tests/io/test_parquet.py::TestParquetFastParquet::test_partition_cols_string
pandas/tests/io/test_parquet.py::TestParquetFastParquet::test_partition_on_supported
  /Users/tomasz/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/fastparquet/parquet_thrift/parquet/ttypes.py:1929: DeprecationWarning: PY_SSIZE_T_CLEAN will be required for '#' formats
    iprot._fast_decode(self, iprot, [self.__class__, self.thrift_spec])

pandas/tests/io/test_fsspec.py: 5 tests with warnings
pandas/tests/io/test_parquet.py: 14 tests with warnings
  /Users/tomasz/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/fastparquet/parquet_thrift/parquet/ttypes.py:975: DeprecationWarning: PY_SSIZE_T_CLEAN will be required for '#' formats
    iprot._fast_decode(self, iprot, [self.__class__, self.thrift_spec])

pandas/tests/io/test_orc.py: 4000 tests with warnings
  sys:1: DeprecationWarning: PY_SSIZE_T_CLEAN will be required for '#' formats

pandas/tests/io/excel/test_writers.py: 10 tests with warnings
  /Users/tomasz/src/dev_pandas/pandas/tests/io/excel/test_writers.py:1289: FutureWarning: inplace is deprecated and will be removed in a future version.
    expected.columns.set_levels(

pandas/tests/io/pytables/test_store.py::TestHDFStore::test_categorical
  /Users/tomasz/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/tables/file.py:426: UserWarning: a closed node found in the registry: ``/s/_i_table``
    warnings.warn("a closed node found in the registry: "

pandas/tests/io/pytables/test_store.py::TestHDFStore::test_categorical
  /Users/tomasz/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/tables/file.py:426: UserWarning: a closed node found in the registry: ``/s/meta/values/meta/_i_table``
    warnings.warn("a closed node found in the registry: "

pandas/tests/io/pytables/test_store.py::TestHDFStore::test_categorical
  /Users/tomasz/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/tables/file.py:426: UserWarning: a closed node found in the registry: ``/s_ordered/_i_table``
    warnings.warn("a closed node found in the registry: "

pandas/tests/io/pytables/test_store.py::TestHDFStore::test_categorical
  /Users/tomasz/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/tables/file.py:426: UserWarning: a closed node found in the registry: ``/s_ordered/meta/values/meta/_i_table``
    warnings.warn("a closed node found in the registry: "

pandas/tests/io/pytables/test_store.py::TestHDFStore::test_categorical
  /Users/tomasz/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/tables/file.py:426: UserWarning: a closed node found in the registry: ``/si/_i_table``
    warnings.warn("a closed node found in the registry: "

pandas/tests/io/pytables/test_store.py::TestHDFStore::test_categorical
  /Users/tomasz/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/tables/file.py:426: UserWarning: a closed node found in the registry: ``/si/meta/values/meta/_i_table``
    warnings.warn("a closed node found in the registry: "

pandas/tests/io/pytables/test_store.py::TestHDFStore::test_categorical
  /Users/tomasz/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/tables/file.py:426: UserWarning: a closed node found in the registry: ``/si2/_i_table``
    warnings.warn("a closed node found in the registry: "

pandas/tests/io/pytables/test_store.py::TestHDFStore::test_categorical
  /Users/tomasz/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/tables/file.py:426: UserWarning: a closed node found in the registry: ``/si2/meta/values/meta/_i_table``
    warnings.warn("a closed node found in the registry: "

pandas/tests/io/pytables/test_store.py::TestHDFStore::test_categorical
  /Users/tomasz/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/tables/file.py:426: UserWarning: a closed node found in the registry: ``/df2/meta/values_block_2/meta/_i_table``
    warnings.warn("a closed node found in the registry: "

pandas/tests/io/pytables/test_store.py::TestHDFStore::test_categorical
  /Users/tomasz/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/tables/file.py:426: UserWarning: a closed node found in the registry: ``/s2/_i_table``
    warnings.warn("a closed node found in the registry: "

pandas/tests/io/pytables/test_store.py::TestHDFStore::test_categorical
  /Users/tomasz/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/tables/file.py:426: UserWarning: a closed node found in the registry: ``/s2/meta/values/meta/_i_table``
    warnings.warn("a closed node found in the registry: "

pandas/tests/io/pytables/test_store.py::TestHDFStore::test_categorical_conversion
  /Users/tomasz/miniconda3/envs/pandas-dev/lib/python3.8/site-packages/tables/file.py:426: UserWarning: a closed node found in the registry: ``/df/meta/imgids/meta/_i_table``
    warnings.warn("a closed node found in the registry: "

pandas/tests/plotting/test_frame.py::TestDataFramePlots::test_mpl2_color_cycle_str
  /Users/tomasz/src/dev_pandas/pandas/plotting/_matplotlib/style.py:64: MatplotlibDeprecationWarning: Support for uppercase single-letter colors is deprecated since Matplotlib 3.1 and will be removed in 3.3; please use lowercase instead.
    [conv.to_rgba(c) for c in colors]

pandas/tests/plotting/test_frame.py::TestDataFramePlots::test_errorbar_plot
pandas/tests/plotting/test_frame.py::TestDataFramePlots::test_errorbar_plot
pandas/tests/plotting/test_frame.py::TestDataFramePlots::test_errorbar_plot
pandas/tests/plotting/test_frame.py::TestDataFramePlots::test_errorbar_timeseries
pandas/tests/plotting/test_frame.py::TestDataFramePlots::test_errorbar_timeseries
pandas/tests/plotting/test_frame.py::TestDataFramePlots::test_errorbar_timeseries
  /Users/tomasz/src/dev_pandas/pandas/plotting/_matplotlib/__init__.py:61: UserWarning: To output multiple subplots, the figure containing the passed axes is being cleared
    plot_obj.generate()

pandas/tests/plotting/test_hist_method.py::TestSeriesPlots::test_hist_legacy
pandas/tests/plotting/test_hist_method.py::TestSeriesPlots::test_hist_legacy
pandas/tests/plotting/test_hist_method.py::TestSeriesPlots::test_hist_legacy
pandas/tests/plotting/test_series.py::TestSeriesPlots::test_hist_legacy
pandas/tests/plotting/test_series.py::TestSeriesPlots::test_hist_legacy
pandas/tests/plotting/test_series.py::TestSeriesPlots::test_hist_legacy
  /Users/tomasz/src/dev_pandas/pandas/tests/plotting/common.py:535: MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance.  In a future version, a new instance will always be created and returned.  Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.
    kwargs.get("ax", fig.add_subplot(211))

pandas/tests/plotting/test_hist_method.py::TestSeriesPlots::test_hist_legacy
pandas/tests/plotting/test_hist_method.py::TestSeriesPlots::test_hist_legacy
pandas/tests/plotting/test_hist_method.py::TestSeriesPlots::test_hist_legacy
pandas/tests/plotting/test_series.py::TestSeriesPlots::test_hist_legacy
pandas/tests/plotting/test_series.py::TestSeriesPlots::test_hist_legacy
pandas/tests/plotting/test_series.py::TestSeriesPlots::test_hist_legacy
  /Users/tomasz/src/dev_pandas/pandas/tests/plotting/common.py:543: MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance.  In a future version, a new instance will always be created and returned.  Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.
    kwargs["ax"] = fig.add_subplot(212)

pandas/tests/plotting/test_hist_method.py::TestSeriesPlots::test_hist_with_legend[b-2-expected_layout1]
  /Users/tomasz/src/dev_pandas/pandas/plotting/_matplotlib/hist.py:354: UserWarning: To output multiple subplots, the figure containing the passed axes is being cleared
    axes = _grouped_hist(

pandas/tests/plotting/test_hist_method.py::TestDataFramePlots::test_tight_layout
pandas/tests/plotting/test_hist_method.py::TestDataFramePlots::test_hist_column_order_unchanged[None-expected0]
pandas/tests/plotting/test_hist_method.py::TestDataFramePlots::test_hist_column_order_unchanged[column1-expected1]
pandas/tests/plotting/test_hist_method.py::TestDataFramePlots::test_hist_with_legend[None-None]
  /Users/tomasz/src/dev_pandas/pandas/tests/plotting/common.py:545: UserWarning: To output multiple subplots, the figure containing the passed axes is being cleared
    ret = f(**kwargs)

pandas/tests/plotting/test_hist_method.py::TestDataFramePlots::test_hist_subplot_xrot
pandas/tests/plotting/test_hist_method.py::TestDataFramePlots::test_hist_with_legend[None-c]
pandas/tests/plotting/test_hist_method.py::TestDataFramePlots::test_hist_with_legend[b-c]
  /Users/tomasz/src/dev_pandas/pandas/plotting/_matplotlib/hist.py:396: UserWarning: To output multiple subplots, the figure containing the passed axes is being cleared
    axes = _grouped_hist(

pandas/tests/plotting/test_misc.py::TestSeriesPlots::test_autocorrelation_plot
pandas/tests/plotting/test_misc.py::TestSeriesPlots::test_autocorrelation_plot
  /Users/tomasz/src/dev_pandas/pandas/plotting/_matplotlib/misc.py:443: UserWarning: Requested projection is different from current axis projection, creating new axis with requested projection.
    ax = plt.gca(xlim=(1, n), ylim=(-1.0, 1.0))

pandas/tests/plotting/test_misc.py::TestDataFramePlots::test_andrews_curves
pandas/tests/plotting/test_misc.py::TestDataFramePlots::test_andrews_curves
pandas/tests/plotting/test_misc.py::TestDataFramePlots::test_andrews_curves
pandas/tests/plotting/test_misc.py::TestDataFramePlots::test_andrews_curves
pandas/tests/plotting/test_misc.py::TestDataFramePlots::test_andrews_curves
pandas/tests/plotting/test_misc.py::TestDataFramePlots::test_andrews_curves
pandas/tests/plotting/test_misc.py::TestDataFramePlots::test_andrews_curves
pandas/tests/plotting/test_misc.py::TestDataFramePlots::test_andrews_curves
  /Users/tomasz/src/dev_pandas/pandas/plotting/_matplotlib/misc.py:263: UserWarning: Requested projection is different from current axis projection, creating new axis with requested projection.
    ax = plt.gca(xlim=(-np.pi, np.pi))

pandas/tests/plotting/test_misc.py::TestDataFramePlots::test_radviz
pandas/tests/plotting/test_misc.py::TestDataFramePlots::test_radviz
pandas/tests/plotting/test_misc.py::TestDataFramePlots::test_radviz
pandas/tests/plotting/test_misc.py::TestDataFramePlots::test_radviz
  /Users/tomasz/src/dev_pandas/pandas/plotting/_matplotlib/misc.py:147: UserWarning: Requested projection is different from current axis projection, creating new axis with requested projection.
    ax = plt.gca(xlim=[-1, 1], ylim=[-1, 1])

pandas/tests/series/test_api.py::TestSeriesMisc::test_tab_complete_warning
  <string>:1: DeprecationWarning: The default dtype for empty Series will be 'object' instead of 'float64' in a future version. Specify a dtype explicitly to silence this warning.

-- Docs: https://docs.pytest.org/en/latest/warnings.html
=================================================== short test summary info ====================================================
FAILED pandas/tests/extension/test_numpy.py::TestGetitem::test_loc_iloc_frame_single_dtype[float]
FAILED pandas/tests/extension/test_numpy.py::TestGroupby::test_groupby_extension_apply[object-float]
FAILED pandas/tests/extension/test_numpy.py::TestPrinting::test_series_repr[float]
FAILED pandas/tests/extension/test_numpy.py::TestPrinting::test_series_repr[object]
FAILED pandas/tests/io/test_html.py::TestReadHtml::test_banklist_url_positional_match[bs4] - urllib.error.HTTPError: HTTP Err...
FAILED pandas/tests/io/test_html.py::TestReadHtml::test_banklist_url[bs4] - urllib.error.HTTPError: HTTP Error 404: Not Found
FAILED pandas/tests/io/test_html.py::TestReadHtml::test_banklist_url_positional_match[lxml] - urllib.error.HTTPError: HTTP Er...
FAILED pandas/tests/io/test_html.py::TestReadHtml::test_banklist_url[lxml] - urllib.error.HTTPError: HTTP Error 404: Not Found
FAILED pandas/tests/plotting/test_datetimelike.py::TestTSPlot::test_ts_plot_with_tz['UTC'] - AttributeError: 'numpy.datetime6...
============== 9 failed, 90776 passed, 1254 skipped, 1055 xfailed, 6 xpassed, 5790 warnings in 1620.23s (0:27:00) ==============

Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the updates!

The failures you mention, you can ignore I think, as they seem unrelated (eg the network failure). But the tests that were xfailed and are now passing need to be checked:

=================================== FAILURES ===================================
_____________ TestGetitem.test_loc_iloc_frame_single_dtype[float] ______________
[gw0] linux -- Python 3.7.9 /home/vsts/miniconda3/envs/pandas-dev/bin/python
[XPASS(strict)] GH#33125 astype doesn't recognize data.dtype
____________ TestGroupby.test_groupby_extension_apply[object-float] ____________
[gw0] linux -- Python 3.7.9 /home/vsts/miniconda3/envs/pandas-dev/bin/python
[XPASS(strict)] GH#33125 astype doesn't recognize data.dtype
_____________________ TestPrinting.test_series_repr[float] _____________________
[gw0] linux -- Python 3.7.9 /home/vsts/miniconda3/envs/pandas-dev/bin/python
[XPASS(strict)] GH#33125 PandasArray.astype does not recognize PandasDtype
____________________ TestPrinting.test_series_repr[object] _____________________
[gw0] linux -- Python 3.7.9 /home/vsts/miniconda3/envs/pandas-dev/bin/python
[XPASS(strict)] GH#33125 PandasArray.astype does not recognize PandasDtype
=============================== warnings summary ===============================

They are all in test_numpy.py, I think, and I checked the first one: it seems that now for float dtype this test is working correctly, so we no longer need to xfail it:

--- a/pandas/tests/extension/test_numpy.py
+++ b/pandas/tests/extension/test_numpy.py
@@ -177,7 +177,7 @@ class TestGetitem(BaseNumPyTests, base.BaseGetitemTests):
 
     def test_loc_iloc_frame_single_dtype(self, data, request):
         npdtype = data.dtype.numpy_dtype
-        if npdtype == object or npdtype == np.float64:
+        if npdtype == object:
             # GH#33125
             mark = pytest.mark.xfail(
                 reason="GH#33125 astype doesn't recognize data.dtype"

Comment on lines 279 to 280
def astype(self, dtype, copy: bool = True) -> ArrayLike:
return super().astype(dtype, copy)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now it is only a call to super(), you can leave out it entirely (so it will just use the inherited method)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I should've pulled that yesterday ¯_(ツ)_/¯

Comment on lines 137 to 138
elif copy:
return self.copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is correct, yes. The first is_dtype_equal(dtype, self._dtype) will only catch decimal dtypes with an equal context, and then the if isinstance(dtype, type(self.dtype)): will work correctly in case the context differs.

Comment on lines 137 to 138
elif copy:
return self.copy()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, but we might need to pass through the copy

(anyway, this array implementation is only for testing, it's not actually used, so it's not super important)

if not copy:
return self
elif copy:
return self.copy()
dtype = pandas_dtype(dtype)
if isinstance(dtype, type(self.dtype)):
return type(self)(self._data, context=dtype.context)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
return type(self)(self._data, context=dtype.context)
return type(self)(self._data, copy=copy, context=dtype.context)

@tomaszps
Copy link
Contributor Author

tomaszps commented Sep 10, 2020

Thanks for the updates!

You're welcome! Gotta do something to stay sane while applying for jobs, and I got tired of working on a data science thing.

The failures you mention, you can ignore I think, as they seem unrelated (eg the network failure). But the tests that were xfailed and are now passing need to be checked:

Cool, makes sense.

=================================== FAILURES ===================================
_____________ TestGetitem.test_loc_iloc_frame_single_dtype[float] ______________
[gw0] linux -- Python 3.7.9 /home/vsts/miniconda3/envs/pandas-dev/bin/python
[XPASS(strict)] GH#33125 astype doesn't recognize data.dtype
____________ TestGroupby.test_groupby_extension_apply[object-float] ____________
[gw0] linux -- Python 3.7.9 /home/vsts/miniconda3/envs/pandas-dev/bin/python
[XPASS(strict)] GH#33125 astype doesn't recognize data.dtype
_____________________ TestPrinting.test_series_repr[float] _____________________
[gw0] linux -- Python 3.7.9 /home/vsts/miniconda3/envs/pandas-dev/bin/python
[XPASS(strict)] GH#33125 PandasArray.astype does not recognize PandasDtype
____________________ TestPrinting.test_series_repr[object] _____________________
[gw0] linux -- Python 3.7.9 /home/vsts/miniconda3/envs/pandas-dev/bin/python
[XPASS(strict)] GH#33125 PandasArray.astype does not recognize PandasDtype
=============================== warnings summary ===============================

They are all in test_numpy.py, I think, and I checked the first one: it seems that now for float dtype this test is working correctly, so we no longer need to xfail it:

ok good, makes sense.

re: one of the tests-

Should the ValueError comment in this test be left alone? Not sure why it's there.

class TestGroupby(BaseNumPyTests, base.BaseGroupbyTests):
    @skip_nested
    def test_groupby_extension_apply(
        self, data_for_grouping, groupby_apply_op, request
    ):
        # ValueError: Names should be list-like for a MultiIndex
        a = "a"
        is_identity = groupby_apply_op(a) is a
        if data_for_grouping.dtype.numpy_dtype == np.float64 and is_identity:
            mark = pytest.mark.xfail(
                reason="GH#33125 astype doesn't recognize data.dtype"
            )
            request.node.add_marker(mark)
        super().test_groupby_extension_apply(data_for_grouping, groupby_apply_op)

Just running the full test suite now. Once that's done I'll push.

@tomaszps
Copy link
Contributor Author

Figured out that the linting was what was causing the failure; should've looked more closely at the details. Fixed the issue.

@jorisvandenbossche jorisvandenbossche added this to the 1.2 milestone Sep 19, 2020
Copy link
Member

@jorisvandenbossche jorisvandenbossche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last comments!

tomaszps and others added 2 commits September 21, 2020 07:43
Co-authored-by: Joris Van den Bossche <[email protected]>
@jorisvandenbossche jorisvandenbossche merged commit ace1dd5 into pandas-dev:master Sep 22, 2020
@jorisvandenbossche
Copy link
Member

@tomaszps Thanks a lot for the contribution!

kesmit13 pushed a commit to kesmit13/pandas that referenced this pull request Nov 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ExtensionArray Extending pandas with custom dtypes or arrays. Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add base test for ExtensionArray.astype and copy
4 participants