Skip to content

BUG: replace of numeric by string / dtype coversion (GH15743) #15812

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 12 commits into from
Closed
2 changes: 1 addition & 1 deletion RELEASE.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Release Notes
=============

The list of changes to pandas between each release can be found
The list of changes to Pandas between each release can be found
[here](http://pandas.pydata.org/pandas-docs/stable/whatsnew.html). For full
details, see the commit logs at http://github.com/pandas-dev/pandas.
2 changes: 2 additions & 0 deletions doc/source/whatsnew/v0.20.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -985,3 +985,5 @@ Bug Fixes
- Bug in ``pd.melt()`` where passing a tuple value for ``value_vars`` caused a ``TypeError`` (:issue:`15348`)
- Bug in ``.eval()`` which caused multiline evals to fail with local variables not on the first line (:issue:`15342`)
- Bug in ``pd.read_msgpack`` which did not allow to load dataframe with an index of type ``CategoricalIndex`` (:issue:`15487`)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI for the future if you put this somewhere in the Bug Fixes section , rather than the end you won't have merge conflicts. (we have blank lines for this purpose)

- Bug in ``Series.replace`` which replaced a numeric by string (:issue:`15743`)
15 changes: 8 additions & 7 deletions pandas/core/missing.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,11 +21,16 @@ def mask_missing(arr, values_to_mask):
Return a masking array of same size/shape as arr
with entries equaling any member of values_to_mask set to True
"""
if not isinstance(values_to_mask, (list, np.ndarray)):
if isinstance(values_to_mask, np.ndarray):
mask_type = values_to_mask.dtype.type
elif isinstance(values_to_mask, list):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can change this entire test to:

# import at top if its not
from pandas._libs.lib import infer_dtype
....
inferred = infer_dtype(values_to_mask)
if inferred in ['string', 'unicode']:
    mask_type = np.object
else:
    mask_type = np.asarray(values_to_mask).dtype

I think this will work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may need to include 'mixed' here as well, and tests this too:

mixed is [1, '1']

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this change only to simplify? Or is this change a must do? I ask before I implemented and it broke all tests. I tried to investigate why, didn't understand yet.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what did this break?

yes, testing the first value is wrong (as it could also be 0-len), further it might have mixed values anyhow.

show me a test that broke?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could build on what I wrote and just add the mixed support. Anyway, following your approach, the beginning of the function is this:

def mask_missing(arr, values_to_mask):
    """
    Return a masking array of same size/shape as arr
    with entries equaling any member of values_to_mask set to True
    """
    inferred = infer_dtype(values_to_mask)
    if inferred in ['string', 'unicode']:
        mask_type = np.object
    else:
        mask_type = np.asarray(values_to_mask).dtype

    if not isinstance(values_to_mask, (list, np.ndarray)):
        values_to_mask = [values_to_mask]

    try:
        values_to_mask = np.array(values_to_mask, dtype=mask_type)
    except Exception:
        values_to_mask = np.array(values_to_mask, dtype=object)
...

This breaks the following tests:
image

Here's the output:

/Users/carlos/anaconda/envs/pandas_dev/bin/python3.6 "/Users/carlos/Library/Application Support/IntelliJIdea2017.1/python/helpers/pycharm/_jb_pytest_runner.py" --path /Users/carlos/Dropbox/opensource/pandas-ucals/pandas/tests/series/test_replace.py
Testing started at 21:32 ...
 Launching py.test with arguments /Users/carlos/Dropbox/opensource/pandas-ucals/pandas/tests/series/test_replace.py
============================= test session starts ==============================
platform darwin -- Python 3.6.0, pytest-3.0.7, py-1.4.32, pluggy-0.4.0
rootdir: /Users/carlos/Dropbox/opensource/pandas-ucals, inifile: setup.cfg
plugins: cov-2.3.1
collected 11 items
 
pandas/tests/series/test_replace.py       F 
pandas/tests/series/test_replace.py:12 (TestSeriesReplace.test_replace)
self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace>

    def test_replace(self):
        N = 100
        ser = pd.Series(np.random.randn(N))
        ser[0:4] = np.nan
        ser[6:10] = 0
    
        # replace list with a single value
        ser.replace([np.nan], -1, inplace=True)
    
        exp = ser.fillna(-1)
        tm.assert_series_equal(ser, exp)
    
        rs = ser.replace(0., np.nan)
        ser[ser == 0.] = np.nan
>       tm.assert_series_equal(rs, ser)

pandas/tests/series/test_replace.py:27: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/util/testing.py:1215: in assert_series_equal
    obj='{0}'.format(obj))
ls/pandas/util/testing.pyx:59: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:4156)
    ???
ls/pandas/util/testing.pyx:173: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:3274)
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

obj = 'Series', message = 'Series values are different (4.0 %)'
left = '[-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, 0.0, 0.0, 0.0, 0.0, 0.967334209683, -1.12749699126, 1.006215...722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557]'
right = '[-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, nan, nan, nan, nan, 0.967334209683, -1.12749699126, 1.006215...722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557]'
diff = None

    def raise_assert_detail(obj, message, left, right, diff=None):
        if isinstance(left, np.ndarray):
            left = pprint_thing(left)
        if isinstance(right, np.ndarray):
            right = pprint_thing(right)
    
        msg = """{0} are different
    
    {1}
    [left]:  {2}
    [right]: {3}""".format(obj, message, left, right)
    
        if diff is not None:
            msg = msg + "\n[diff]: {diff}".format(diff=diff)
    
>       raise AssertionError(msg)
E       AssertionError: Series are different
E       
E       Series values are different (4.0 %)
E       [left]:  [-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, 0.0, 0.0, 0.0, 0.0, 0.967334209683, -1.12749699126, 1.00621520732, 0.467115769273, -0.665495302938, -1.9655758973, 0.314295658919, -1.5728548579, 1.60539543955, 1.20132044052, -0.267834389937, -1.3125275111, 0.827027080809, -0.750655389751, -0.646701964354, -0.564806568125, 1.04153633485, -0.175289544241, -0.771798272938, -0.353146592188, -0.895526823358, -0.229003615743, -1.24668695712, -0.396975143203, 1.28664372671, 1.43113842599, 0.954652683573, 1.21141700331, -1.15516473451, 2.14816148205, 1.0492538281, -0.36137923595, -0.750632548499, -0.24502818186, 0.651587577021, -1.33034613473, 0.446654064159, -0.216192740252, -0.988088651194, 0.341802605183, 0.7488135734, -0.596658039592, -0.759760465904, 0.650746773025, 1.47640000528, -0.963593630477, -0.264742407812, 0.91147138281, -0.116493770275, -0.840843917606, 0.713860639926, -0.999446407034, -0.261993101942, 0.660244548292, 0.283304496904, 0.417297181001, 1.13236254504, -1.04559448586, -0.302416962494, 1.06231513633, 0.0376809290172, -0.00528160487426, -0.753751886674, -1.76853768804, 1.05207654029, 0.646266446052, -0.817276175661, 0.347974618646, 2.49401568105, -1.59727151377, 0.637718637115, 0.445203010849, 1.6222785846, 0.397953946747, 0.810931905513, -0.244945263003, 1.09902523539, 1.5024980885, -0.189142680513, -1.0871214807, -0.216461016432, -0.395180231199, -0.466997134722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557]
E       [right]: [-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, nan, nan, nan, nan, 0.967334209683, -1.12749699126, 1.00621520732, 0.467115769273, -0.665495302938, -1.9655758973, 0.314295658919, -1.5728548579, 1.60539543955, 1.20132044052, -0.267834389937, -1.3125275111, 0.827027080809, -0.750655389751, -0.646701964354, -0.564806568125, 1.04153633485, -0.175289544241, -0.771798272938, -0.353146592188, -0.895526823358, -0.229003615743, -1.24668695712, -0.396975143203, 1.28664372671, 1.43113842599, 0.954652683573, 1.21141700331, -1.15516473451, 2.14816148205, 1.0492538281, -0.36137923595, -0.750632548499, -0.24502818186, 0.651587577021, -1.33034613473, 0.446654064159, -0.216192740252, -0.988088651194, 0.341802605183, 0.7488135734, -0.596658039592, -0.759760465904, 0.650746773025, 1.47640000528, -0.963593630477, -0.264742407812, 0.91147138281, -0.116493770275, -0.840843917606, 0.713860639926, -0.999446407034, -0.261993101942, 0.660244548292, 0.283304496904, 0.417297181001, 1.13236254504, -1.04559448586, -0.302416962494, 1.06231513633, 0.0376809290172, -0.00528160487426, -0.753751886674, -1.76853768804, 1.05207654029, 0.646266446052, -0.817276175661, 0.347974618646, 2.49401568105, -1.59727151377, 0.637718637115, 0.445203010849, 1.6222785846, 0.397953946747, 0.810931905513, -0.244945263003, 1.09902523539, 1.5024980885, -0.189142680513, -1.0871214807, -0.216461016432, -0.395180231199, -0.466997134722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557]

pandas/util/testing.py:1053: AssertionError
F 
pandas/tests/series/test_replace.py:189 (TestSeriesReplace.test_replace2)
self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace2>

    def test_replace2(self):
        N = 100
        ser = pd.Series(np.fabs(np.random.randn(N)), tm.makeDateIndex(N),
                        dtype=object)
        ser[:5] = np.nan
        ser[6:10] = 'foo'
        ser[20:30] = 'bar'
    
        # replace list with a single value
        rs = ser.replace([np.nan, 'foo', 'bar'], -1)
    
>       self.assertTrue((rs[:5] == -1).all())
E       AssertionError: False is not true

pandas/tests/series/test_replace.py:201: AssertionError
F 
pandas/tests/series/test_replace.py:178 (TestSeriesReplace.test_replace_bool_with_bool)
self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace_bool_with_bool>

    def test_replace_bool_with_bool(self):
        s = pd.Series([True, False, True])
        result = s.replace(True, False)
        expected = pd.Series([False] * len(s))
>       tm.assert_series_equal(expected, result)

pandas/tests/series/test_replace.py:183: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/util/testing.py:1215: in assert_series_equal
    obj='{0}'.format(obj))
ls/pandas/util/testing.pyx:59: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:4156)
    ???
ls/pandas/util/testing.pyx:173: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:3274)
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

obj = 'Series', message = 'Series values are different (66.66667 %)'
left = '[False, False, False]', right = '[True, False, True]', diff = None

    def raise_assert_detail(obj, message, left, right, diff=None):
        if isinstance(left, np.ndarray):
            left = pprint_thing(left)
        if isinstance(right, np.ndarray):
            right = pprint_thing(right)
    
        msg = """{0} are different
    
    {1}
    [left]:  {2}
    [right]: {3}""".format(obj, message, left, right)
    
        if diff is not None:
            msg = msg + "\n[diff]: {diff}".format(diff=diff)
    
>       raise AssertionError(msg)
E       AssertionError: Series are different
E       
E       Series values are different (66.66667 %)
E       [left]:  [False, False, False]
E       [right]: [True, False, True]

pandas/util/testing.py:1053: AssertionError
F 
pandas/tests/series/test_replace.py:171 (TestSeriesReplace.test_replace_bool_with_string)
self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace_bool_with_string>

    def test_replace_bool_with_string(self):
        # nonexistent elements
        s = pd.Series([True, False, True])
        result = s.replace(True, '2u')
        expected = pd.Series(['2u', False, '2u'])
>       tm.assert_series_equal(expected, result)

pandas/tests/series/test_replace.py:177: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/util/testing.py:1188: in assert_series_equal
    assert_attr_equal('dtype', left, right)
pandas/util/testing.py:918: in assert_attr_equal
    left_attr, right_attr)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

obj = 'Attributes', message = 'Attribute "dtype" are different'
left = dtype('O'), right = dtype('bool'), diff = None

    def raise_assert_detail(obj, message, left, right, diff=None):
        if isinstance(left, np.ndarray):
            left = pprint_thing(left)
        if isinstance(right, np.ndarray):
            right = pprint_thing(right)
    
        msg = """{0} are different
    
    {1}
    [left]:  {2}
    [right]: {3}""".format(obj, message, left, right)
    
        if diff is not None:
            msg = msg + "\n[diff]: {diff}".format(diff=diff)
    
>       raise AssertionError(msg)
E       AssertionError: Attributes are different
E       
E       Attribute "dtype" are different
E       [left]:  object
E       [right]: bool

pandas/util/testing.py:1053: AssertionError
. . F 
pandas/tests/series/test_replace.py:123 (TestSeriesReplace.test_replace_mixed_types)
self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace_mixed_types>

    def test_replace_mixed_types(self):
        s = pd.Series(np.arange(5), dtype='int64')
    
        def check_replace(to_rep, val, expected):
            sc = s.copy()
            r = s.replace(to_rep, val)
            sc.replace(to_rep, val, inplace=True)
            tm.assert_series_equal(expected, r)
            tm.assert_series_equal(expected, sc)
    
        # MUST upcast to float
        e = pd.Series([0., 1., 2., 3., 4.])
        tr, v = [3], [3.0]
        check_replace(tr, v, e)
    
        # MUST upcast to float
        e = pd.Series([0, 1, 2, 3.5, 4])
        tr, v = [3], [3.5]
        check_replace(tr, v, e)
    
        # casts to object
        e = pd.Series([0, 1, 2, 3.5, 'a'])
        tr, v = [3, 4], [3.5, 'a']
        check_replace(tr, v, e)
    
        # again casts to object
        e = pd.Series([0, 1, 2, 3.5, pd.Timestamp('20130101')])
        tr, v = [3, 4], [3.5, pd.Timestamp('20130101')]
        check_replace(tr, v, e)
    
        # casts to object
        e = pd.Series([0, 1, 2, 3.5, True], dtype='object')
        tr, v = [3, 4], [3.5, True]
        check_replace(tr, v, e)
    
        # test an object with dates + floats + integers + strings
        dr = pd.date_range('1/1/2001', '1/10/2001',
                           freq='D').to_series().reset_index(drop=True)
        result = dr.astype(object).replace(
            [dr[0], dr[1], dr[2]], [1.0, 2, 'a'])
        expected = pd.Series([1.0, 2, 'a'] + dr[3:].tolist(), dtype=object)
>       tm.assert_series_equal(result, expected)

pandas/tests/series/test_replace.py:165: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/util/testing.py:1215: in assert_series_equal
    obj='{0}'.format(obj))
ls/pandas/util/testing.pyx:59: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:4156)
    ???
ls/pandas/util/testing.pyx:173: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:3274)
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

obj = 'Series', message = 'Series values are different (30.0 %)'
left = '[2001-01-01 00:00:00, 2001-01-02 00:00:00, 2001-01-03 00:00:00, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00]'
right = '[1.0, 2, a, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00]'
diff = None

    def raise_assert_detail(obj, message, left, right, diff=None):
        if isinstance(left, np.ndarray):
            left = pprint_thing(left)
        if isinstance(right, np.ndarray):
            right = pprint_thing(right)
    
        msg = """{0} are different
    
    {1}
    [left]:  {2}
    [right]: {3}""".format(obj, message, left, right)
    
        if diff is not None:
            msg = msg + "\n[diff]: {diff}".format(diff=diff)
    
>       raise AssertionError(msg)
E       AssertionError: Series are different
E       
E       Series values are different (30.0 %)
E       [left]:  [2001-01-01 00:00:00, 2001-01-02 00:00:00, 2001-01-03 00:00:00, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00]
E       [right]: [1.0, 2, a, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00]

pandas/util/testing.py:1053: AssertionError
. . . .                

=================================== FAILURES ===================================
________________________ TestSeriesReplace.test_replace ________________________

self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace>

    def test_replace(self):
        N = 100
        ser = pd.Series(np.random.randn(N))
        ser[0:4] = np.nan
        ser[6:10] = 0
    
        # replace list with a single value
        ser.replace([np.nan], -1, inplace=True)
    
        exp = ser.fillna(-1)
        tm.assert_series_equal(ser, exp)
    
        rs = ser.replace(0., np.nan)
        ser[ser == 0.] = np.nan
>       tm.assert_series_equal(rs, ser)

pandas/tests/series/test_replace.py:27: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/util/testing.py:1215: in assert_series_equal
    obj='{0}'.format(obj))
ls/pandas/util/testing.pyx:59: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:4156)
    ???
ls/pandas/util/testing.pyx:173: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:3274)
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

obj = 'Series', message = 'Series values are different (4.0 %)'
left = '[-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, 0.0, 0.0, 0.0, 0.0, 0.967334209683, -1.12749699126, 1.006215...722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557]'
right = '[-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, nan, nan, nan, nan, 0.967334209683, -1.12749699126, 1.006215...722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557]'
diff = None

    def raise_assert_detail(obj, message, left, right, diff=None):
        if isinstance(left, np.ndarray):
            left = pprint_thing(left)
        if isinstance(right, np.ndarray):
            right = pprint_thing(right)
    
        msg = """{0} are different
    
    {1}
    [left]:  {2}
    [right]: {3}""".format(obj, message, left, right)
    
        if diff is not None:
            msg = msg + "\n[diff]: {diff}".format(diff=diff)
    
>       raise AssertionError(msg)
E       AssertionError: Series are different
E       
E       Series values are different (4.0 %)
E       [left]:  [-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, 0.0, 0.0, 0.0, 0.0, 0.967334209683, -1.12749699126, 1.00621520732, 0.467115769273, -0.665495302938, -1.9655758973, 0.314295658919, -1.5728548579, 1.60539543955, 1.20132044052, -0.267834389937, -1.3125275111, 0.827027080809, -0.750655389751, -0.646701964354, -0.564806568125, 1.04153633485, -0.175289544241, -0.771798272938, -0.353146592188, -0.895526823358, -0.229003615743, -1.24668695712, -0.396975143203, 1.28664372671, 1.43113842599, 0.954652683573, 1.21141700331, -1.15516473451, 2.14816148205, 1.0492538281, -0.36137923595, -0.750632548499, -0.24502818186, 0.651587577021, -1.33034613473, 0.446654064159, -0.216192740252, -0.988088651194, 0.341802605183, 0.7488135734, -0.596658039592, -0.759760465904, 0.650746773025, 1.47640000528, -0.963593630477, -0.264742407812, 0.91147138281, -0.116493770275, -0.840843917606, 0.713860639926, -0.999446407034, -0.261993101942, 0.660244548292, 0.283304496904, 0.417297181001, 1.13236254504, -1.04559448586, -0.302416962494, 1.06231513633, 0.0376809290172, -0.00528160487426, -0.753751886674, -1.76853768804, 1.05207654029, 0.646266446052, -0.817276175661, 0.347974618646, 2.49401568105, -1.59727151377, 0.637718637115, 0.445203010849, 1.6222785846, 0.397953946747, 0.810931905513, -0.244945263003, 1.09902523539, 1.5024980885, -0.189142680513, -1.0871214807, -0.216461016432, -0.395180231199, -0.466997134722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557]
E       [right]: [-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, nan, nan, nan, nan, 0.967334209683, -1.12749699126, 1.00621520732, 0.467115769273, -0.665495302938, -1.9655758973, 0.314295658919, -1.5728548579, 1.60539543955, 1.20132044052, -0.267834389937, -1.3125275111, 0.827027080809, -0.750655389751, -0.646701964354, -0.564806568125, 1.04153633485, -0.175289544241, -0.771798272938, -0.353146592188, -0.895526823358, -0.229003615743, -1.24668695712, -0.396975143203, 1.28664372671, 1.43113842599, 0.954652683573, 1.21141700331, -1.15516473451, 2.14816148205, 1.0492538281, -0.36137923595, -0.750632548499, -0.24502818186, 0.651587577021, -1.33034613473, 0.446654064159, -0.216192740252, -0.988088651194, 0.341802605183, 0.7488135734, -0.596658039592, -0.759760465904, 0.650746773025, 1.47640000528, -0.963593630477, -0.264742407812, 0.91147138281, -0.116493770275, -0.840843917606, 0.713860639926, -0.999446407034, -0.261993101942, 0.660244548292, 0.283304496904, 0.417297181001, 1.13236254504, -1.04559448586, -0.302416962494, 1.06231513633, 0.0376809290172, -0.00528160487426, -0.753751886674, -1.76853768804, 1.05207654029, 0.646266446052, -0.817276175661, 0.347974618646, 2.49401568105, -1.59727151377, 0.637718637115, 0.445203010849, 1.6222785846, 0.397953946747, 0.810931905513, -0.244945263003, 1.09902523539, 1.5024980885, -0.189142680513, -1.0871214807, -0.216461016432, -0.395180231199, -0.466997134722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557]

pandas/util/testing.py:1053: AssertionError
_______________________ TestSeriesReplace.test_replace2 ________________________

self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace2>

    def test_replace2(self):
        N = 100
        ser = pd.Series(np.fabs(np.random.randn(N)), tm.makeDateIndex(N),
                        dtype=object)
        ser[:5] = np.nan
        ser[6:10] = 'foo'
        ser[20:30] = 'bar'
    
        # replace list with a single value
        rs = ser.replace([np.nan, 'foo', 'bar'], -1)
    
>       self.assertTrue((rs[:5] == -1).all())
E       AssertionError: False is not true

pandas/tests/series/test_replace.py:201: AssertionError
________________ TestSeriesReplace.test_replace_bool_with_bool _________________

self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace_bool_with_bool>

    def test_replace_bool_with_bool(self):
        s = pd.Series([True, False, True])
        result = s.replace(True, False)
        expected = pd.Series([False] * len(s))
>       tm.assert_series_equal(expected, result)

pandas/tests/series/test_replace.py:183: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/util/testing.py:1215: in assert_series_equal
    obj='{0}'.format(obj))
ls/pandas/util/testing.pyx:59: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:4156)
    ???
ls/pandas/util/testing.pyx:173: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:3274)
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

obj = 'Series', message = 'Series values are different (66.66667 %)'
left = '[False, False, False]', right = '[True, False, True]', diff = None

    def raise_assert_detail(obj, message, left, right, diff=None):
        if isinstance(left, np.ndarray):
            left = pprint_thing(left)
        if isinstance(right, np.ndarray):
            right = pprint_thing(right)
    
        msg = """{0} are different
    
    {1}
    [left]:  {2}
    [right]: {3}""".format(obj, message, left, right)
    
        if diff is not None:
            msg = msg + "\n[diff]: {diff}".format(diff=diff)
    
>       raise AssertionError(msg)
E       AssertionError: Series are different
E       
E       Series values are different (66.66667 %)
E       [left]:  [False, False, False]
E       [right]: [True, False, True]

pandas/util/testing.py:1053: AssertionError
_______________ TestSeriesReplace.test_replace_bool_with_string ________________

self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace_bool_with_string>

    def test_replace_bool_with_string(self):
        # nonexistent elements
        s = pd.Series([True, False, True])
        result = s.replace(True, '2u')
        expected = pd.Series(['2u', False, '2u'])
>       tm.assert_series_equal(expected, result)

pandas/tests/series/test_replace.py:177: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/util/testing.py:1188: in assert_series_equal
    assert_attr_equal('dtype', left, right)
pandas/util/testing.py:918: in assert_attr_equal
    left_attr, right_attr)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

obj = 'Attributes', message = 'Attribute "dtype" are different'
left = dtype('O'), right = dtype('bool'), diff = None

    def raise_assert_detail(obj, message, left, right, diff=None):
        if isinstance(left, np.ndarray):
            left = pprint_thing(left)
        if isinstance(right, np.ndarray):
            right = pprint_thing(right)
    
        msg = """{0} are different
    
    {1}
    [left]:  {2}
    [right]: {3}""".format(obj, message, left, right)
    
        if diff is not None:
            msg = msg + "\n[diff]: {diff}".format(diff=diff)
    
>       raise AssertionError(msg)
E       AssertionError: Attributes are different
E       
E       Attribute "dtype" are different
E       [left]:  object
E       [right]: bool

pandas/util/testing.py:1053: AssertionError
__________________ TestSeriesReplace.test_replace_mixed_types __________________

self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace_mixed_types>

    def test_replace_mixed_types(self):
        s = pd.Series(np.arange(5), dtype='int64')
    
        def check_replace(to_rep, val, expected):
            sc = s.copy()
            r = s.replace(to_rep, val)
            sc.replace(to_rep, val, inplace=True)
            tm.assert_series_equal(expected, r)
            tm.assert_series_equal(expected, sc)
    
        # MUST upcast to float
        e = pd.Series([0., 1., 2., 3., 4.])
        tr, v = [3], [3.0]
        check_replace(tr, v, e)
    
        # MUST upcast to float
        e = pd.Series([0, 1, 2, 3.5, 4])
        tr, v = [3], [3.5]
        check_replace(tr, v, e)
    
        # casts to object
        e = pd.Series([0, 1, 2, 3.5, 'a'])
        tr, v = [3, 4], [3.5, 'a']
        check_replace(tr, v, e)
    
        # again casts to object
        e = pd.Series([0, 1, 2, 3.5, pd.Timestamp('20130101')])
        tr, v = [3, 4], [3.5, pd.Timestamp('20130101')]
        check_replace(tr, v, e)
    
        # casts to object
        e = pd.Series([0, 1, 2, 3.5, True], dtype='object')
        tr, v = [3, 4], [3.5, True]
        check_replace(tr, v, e)
    
        # test an object with dates + floats + integers + strings
        dr = pd.date_range('1/1/2001', '1/10/2001',
                           freq='D').to_series().reset_index(drop=True)
        result = dr.astype(object).replace(
            [dr[0], dr[1], dr[2]], [1.0, 2, 'a'])
        expected = pd.Series([1.0, 2, 'a'] + dr[3:].tolist(), dtype=object)
>       tm.assert_series_equal(result, expected)

pandas/tests/series/test_replace.py:165: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
pandas/util/testing.py:1215: in assert_series_equal
    obj='{0}'.format(obj))
ls/pandas/util/testing.pyx:59: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:4156)
    ???
ls/pandas/util/testing.pyx:173: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:3274)
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

obj = 'Series', message = 'Series values are different (30.0 %)'
left = '[2001-01-01 00:00:00, 2001-01-02 00:00:00, 2001-01-03 00:00:00, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00]'
right = '[1.0, 2, a, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00]'
diff = None

    def raise_assert_detail(obj, message, left, right, diff=None):
        if isinstance(left, np.ndarray):
            left = pprint_thing(left)
        if isinstance(right, np.ndarray):
            right = pprint_thing(right)
    
        msg = """{0} are different
    
    {1}
    [left]:  {2}
    [right]: {3}""".format(obj, message, left, right)
    
        if diff is not None:
            msg = msg + "\n[diff]: {diff}".format(diff=diff)
    
>       raise AssertionError(msg)
E       AssertionError: Series are different
E       
E       Series values are different (30.0 %)
E       [left]:  [2001-01-01 00:00:00, 2001-01-02 00:00:00, 2001-01-03 00:00:00, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00]
E       [right]: [1.0, 2, a, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00]

pandas/util/testing.py:1053: AssertionError
====================== 5 failed, 6 passed in 0.54 seconds ======================
     
Process finished with exit code 0

I could invest time to find why those 5 tests now are failing, to then tackle the mixed support.... Or just build on my approach and only tackle the mixed support. Anyway, I'm here to learn, let me know what's the best approach and I'll follow. Thanks.

mask_type = type(values_to_mask[0])
else:
mask_type = type(values_to_mask)
values_to_mask = [values_to_mask]

try:
values_to_mask = np.array(values_to_mask, dtype=arr.dtype)
values_to_mask = np.array(values_to_mask, dtype=mask_type)
except Exception:
values_to_mask = np.array(values_to_mask, dtype=object)

Expand Down Expand Up @@ -409,7 +414,7 @@ def interpolate_2d(values, method='pad', axis=0, limit=None, fill_value=None,
if axis != 0: # pragma: no cover
raise AssertionError("cannot interpolate on a ndim == 1 with "
"axis != 0")
values = values.reshape(tuple((1, ) + values.shape))
values = values.reshape(tuple((1,) + values.shape))

if fill_value is None:
mask = None
Expand Down Expand Up @@ -447,7 +452,6 @@ def wrapper(arr, mask, limit=None):


def pad_1d(values, limit=None, mask=None, dtype=None):

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

normally don't like to edit think not-associated with the PR (e.g. you may have some editor setting which change this)...no big deal

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok... Sorry for that... I'm using IntelliJ IDEA, and it formatted all file with PEP8 standard

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no problem. we dont' quite follow PEP8 (as flake8 doesn't actually)......

if dtype is None:
dtype = values.dtype
_method = None
Expand All @@ -472,7 +476,6 @@ def pad_1d(values, limit=None, mask=None, dtype=None):


def backfill_1d(values, limit=None, mask=None, dtype=None):

if dtype is None:
dtype = values.dtype
_method = None
Expand All @@ -498,7 +501,6 @@ def backfill_1d(values, limit=None, mask=None, dtype=None):


def pad_2d(values, limit=None, mask=None, dtype=None):

if dtype is None:
dtype = values.dtype
_method = None
Expand Down Expand Up @@ -528,7 +530,6 @@ def pad_2d(values, limit=None, mask=None, dtype=None):


def backfill_2d(values, limit=None, mask=None, dtype=None):

if dtype is None:
dtype = values.dtype
_method = None
Expand Down
8 changes: 7 additions & 1 deletion pandas/tests/series/test_replace.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,6 @@


class TestSeriesReplace(TestData, tm.TestCase):

def test_replace(self):
N = 100
ser = pd.Series(np.random.randn(N))
Expand Down Expand Up @@ -227,3 +226,10 @@ def test_replace_with_empty_dictlike(self):
s = pd.Series(list('abcd'))
tm.assert_series_equal(s, s.replace(dict()))
tm.assert_series_equal(s, s.replace(pd.Series([])))

def test_replace_string_with_nan(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you test this with unicode as well

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done!

# GH 15743
s = pd.Series([1, 2, 3])
result = s.replace('2', np.nan)
expected = pd.Series([1, 2, 3])
tm.assert_series_equal(expected, result)