-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Remove possibly illegal test data #31146
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove possibly illegal test data #31146
Conversation
… tests Regain test coverage after the previous commit's removals
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm.
https://github.com/pandas-dev/pandas/pull/31146/checks?check_run_id=398501596
linting issue, ping on green.
Looks like you just need to remove the |
From the checks bot: Check import format using isort No other errors, so not sure what that's supposed to mean? The build bots fail with: Git checkout failed with exit code: 128 |
@rebecca-palmer you should install the pre-commit hooks to help you on PRs, they automatically do black,isort,mypy, do |
Thanks @rebecca-palmer |
assert not any(s.isna().any() for _, s in df.items()) | ||
|
||
@pytest.mark.slow | ||
def test_thousands_macau_index_col(self, datapath, request): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there no replacement for this test? It highlight behaviour we where hoping to fix in #29622 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I seem to have misinterpreted this issue as "blank column names should not load as NaN", after observing that empty headers loaded as "Unnamed: n" while empty dtype=str body cells loaded as NaN. (I hence added "assert "Unnamed" in result.columns[-1]" to check for that.)
I now suspect this was because I was testing this in bs4 4.8.
wikipedia_states has empty body cells, so re-adding this check should be easy. (Possibly relevant to the "what caused this" discussion: it also has nested tables but they aren't the one being read.)
@meeseeksdev backport to 1.0.x |
Co-authored-by: rebecca-palmer <[email protected]>
Much of the test_html data looks like it was saved from real web pages, and not all of them look like the kind that are under a free software license. (I couldn't actually check because only one of the three identifies the source site, and that site raises a security warning.)
This removes the questionable data, and adds tests that try to do something equivalent on known-free data.
If I am mistaken and these pages are under a free license, please instead document that.
This data was added in:
1d573b4 computer_sales_page (HP), for MultiIndex header and empty string handling in headers (should give empty not NaN)
e22fe1b nyse_wsj, for thousands separator
92aa277 macau, for MultiIndex header and thousands separator