-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: replace of numeric by string / dtype coversion (GH15743) #15812
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 6 commits
676a4e5
e12bca7
9fc617b
8b463cb
080c71e
e62763c
97e1f18
0a98557
45e67e4
73805ce
bd31b2b
e6e4971
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,6 +1,6 @@ | ||
Release Notes | ||
============= | ||
|
||
The list of changes to pandas between each release can be found | ||
The list of changes to Pandas between each release can be found | ||
[here](http://pandas.pydata.org/pandas-docs/stable/whatsnew.html). For full | ||
details, see the commit logs at http://github.com/pandas-dev/pandas. |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -21,11 +21,16 @@ def mask_missing(arr, values_to_mask): | |
Return a masking array of same size/shape as arr | ||
with entries equaling any member of values_to_mask set to True | ||
""" | ||
if not isinstance(values_to_mask, (list, np.ndarray)): | ||
if isinstance(values_to_mask, np.ndarray): | ||
mask_type = values_to_mask.dtype.type | ||
elif isinstance(values_to_mask, list): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you can change this entire test to:
I think this will work. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. may need to include 'mixed' here as well, and tests this too: mixed is There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this change only to simplify? Or is this change a must do? I ask before I implemented and it broke all tests. I tried to investigate why, didn't understand yet. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. what did this break? yes, testing the first value is wrong (as it could also be 0-len), further it might have mixed values anyhow. show me a test that broke? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We could build on what I wrote and just add the mixed support. Anyway, following your approach, the beginning of the function is this: def mask_missing(arr, values_to_mask):
"""
Return a masking array of same size/shape as arr
with entries equaling any member of values_to_mask set to True
"""
inferred = infer_dtype(values_to_mask)
if inferred in ['string', 'unicode']:
mask_type = np.object
else:
mask_type = np.asarray(values_to_mask).dtype
if not isinstance(values_to_mask, (list, np.ndarray)):
values_to_mask = [values_to_mask]
try:
values_to_mask = np.array(values_to_mask, dtype=mask_type)
except Exception:
values_to_mask = np.array(values_to_mask, dtype=object)
... This breaks the following tests: Here's the output:
I could invest time to find why those 5 tests now are failing, to then tackle the mixed support.... Or just build on my approach and only tackle the mixed support. Anyway, I'm here to learn, let me know what's the best approach and I'll follow. Thanks. |
||
mask_type = type(values_to_mask[0]) | ||
else: | ||
mask_type = type(values_to_mask) | ||
values_to_mask = [values_to_mask] | ||
|
||
try: | ||
values_to_mask = np.array(values_to_mask, dtype=arr.dtype) | ||
values_to_mask = np.array(values_to_mask, dtype=mask_type) | ||
except Exception: | ||
values_to_mask = np.array(values_to_mask, dtype=object) | ||
|
||
|
@@ -409,7 +414,7 @@ def interpolate_2d(values, method='pad', axis=0, limit=None, fill_value=None, | |
if axis != 0: # pragma: no cover | ||
raise AssertionError("cannot interpolate on a ndim == 1 with " | ||
"axis != 0") | ||
values = values.reshape(tuple((1, ) + values.shape)) | ||
values = values.reshape(tuple((1,) + values.shape)) | ||
|
||
if fill_value is None: | ||
mask = None | ||
|
@@ -447,7 +452,6 @@ def wrapper(arr, mask, limit=None): | |
|
||
|
||
def pad_1d(values, limit=None, mask=None, dtype=None): | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. normally don't like to edit think not-associated with the PR (e.g. you may have some editor setting which change this)...no big deal There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok... Sorry for that... I'm using IntelliJ IDEA, and it formatted all file with PEP8 standard There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. no problem. we dont' quite follow PEP8 (as flake8 doesn't actually)...... |
||
if dtype is None: | ||
dtype = values.dtype | ||
_method = None | ||
|
@@ -472,7 +476,6 @@ def pad_1d(values, limit=None, mask=None, dtype=None): | |
|
||
|
||
def backfill_1d(values, limit=None, mask=None, dtype=None): | ||
|
||
if dtype is None: | ||
dtype = values.dtype | ||
_method = None | ||
|
@@ -498,7 +501,6 @@ def backfill_1d(values, limit=None, mask=None, dtype=None): | |
|
||
|
||
def pad_2d(values, limit=None, mask=None, dtype=None): | ||
|
||
if dtype is None: | ||
dtype = values.dtype | ||
_method = None | ||
|
@@ -528,7 +530,6 @@ def pad_2d(values, limit=None, mask=None, dtype=None): | |
|
||
|
||
def backfill_2d(values, limit=None, mask=None, dtype=None): | ||
|
||
if dtype is None: | ||
dtype = values.dtype | ||
_method = None | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -10,7 +10,6 @@ | |
|
||
|
||
class TestSeriesReplace(TestData, tm.TestCase): | ||
|
||
def test_replace(self): | ||
N = 100 | ||
ser = pd.Series(np.random.randn(N)) | ||
|
@@ -227,3 +226,10 @@ def test_replace_with_empty_dictlike(self): | |
s = pd.Series(list('abcd')) | ||
tm.assert_series_equal(s, s.replace(dict())) | ||
tm.assert_series_equal(s, s.replace(pd.Series([]))) | ||
|
||
def test_replace_string_with_nan(self): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can you test this with unicode as well There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done! |
||
# GH 15743 | ||
s = pd.Series([1, 2, 3]) | ||
result = s.replace('2', np.nan) | ||
expected = pd.Series([1, 2, 3]) | ||
tm.assert_series_equal(expected, result) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI for the future if you put this somewhere in the Bug Fixes section , rather than the end you won't have merge conflicts. (we have blank lines for this purpose)