BUG: replace of numeric by string / dtype coversion (GH15743) #15812

jreback · 2017-03-27T12:08:45Z

FYI for the future if you put this somewhere in the Bug Fixes section , rather than the end you won't have merge conflicts. (we have blank lines for this purpose)

jreback · 2017-03-27T12:15:27Z

you can change this entire test to:

# import at top if its not from pandas._libs.lib import infer_dtype .... inferred = infer_dtype(values_to_mask) if inferred in ['string', 'unicode']: mask_type = np.object else: mask_type = np.asarray(values_to_mask).dtype

I think this will work.

may need to include 'mixed' here as well, and tests this too:

mixed is [1, '1']

Is this change only to simplify? Or is this change a must do? I ask before I implemented and it broke all tests. I tried to investigate why, didn't understand yet.

what did this break?

yes, testing the first value is wrong (as it could also be 0-len), further it might have mixed values anyhow.

show me a test that broke?

We could build on what I wrote and just add the mixed support. Anyway, following your approach, the beginning of the function is this:

def mask_missing(arr, values_to_mask): """ Return a masking array of same size/shape as arr with entries equaling any member of values_to_mask set to True """ inferred = infer_dtype(values_to_mask) if inferred in ['string', 'unicode']: mask_type = np.object else: mask_type = np.asarray(values_to_mask).dtype if not isinstance(values_to_mask, (list, np.ndarray)): values_to_mask = [values_to_mask] try: values_to_mask = np.array(values_to_mask, dtype=mask_type) except Exception: values_to_mask = np.array(values_to_mask, dtype=object) ...

This breaks the following tests:

Here's the output:

/Users/carlos/anaconda/envs/pandas_dev/bin/python3.6 "/Users/carlos/Library/Application Support/IntelliJIdea2017.1/python/helpers/pycharm/_jb_pytest_runner.py" --path /Users/carlos/Dropbox/opensource/pandas-ucals/pandas/tests/series/test_replace.py Testing started at 21:32 ... Launching py.test with arguments /Users/carlos/Dropbox/opensource/pandas-ucals/pandas/tests/series/test_replace.py ============================= test session starts ============================== platform darwin -- Python 3.6.0, pytest-3.0.7, py-1.4.32, pluggy-0.4.0 rootdir: /Users/carlos/Dropbox/opensource/pandas-ucals, inifile: setup.cfg plugins: cov-2.3.1 collected 11 items pandas/tests/series/test_replace.py F pandas/tests/series/test_replace.py:12 (TestSeriesReplace.test_replace) self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace> def test_replace(self): N = 100 ser = pd.Series(np.random.randn(N)) ser[0:4] = np.nan ser[6:10] = 0 # replace list with a single value ser.replace([np.nan], -1, inplace=True) exp = ser.fillna(-1) tm.assert_series_equal(ser, exp) rs = ser.replace(0., np.nan) ser[ser == 0.] = np.nan > tm.assert_series_equal(rs, ser) pandas/tests/series/test_replace.py:27: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ pandas/util/testing.py:1215: in assert_series_equal obj='{0}'.format(obj)) ls/pandas/util/testing.pyx:59: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:4156) ??? ls/pandas/util/testing.pyx:173: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:3274) ??? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ obj = 'Series', message = 'Series values are different (4.0 %)' left = '[-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, 0.0, 0.0, 0.0, 0.0, 0.967334209683, -1.12749699126, 1.006215...722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557]' right = '[-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, nan, nan, nan, nan, 0.967334209683, -1.12749699126, 1.006215...722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557]' diff = None def raise_assert_detail(obj, message, left, right, diff=None): if isinstance(left, np.ndarray): left = pprint_thing(left) if isinstance(right, np.ndarray): right = pprint_thing(right) msg = """{0} are different {1} [left]: {2} [right]: {3}""".format(obj, message, left, right) if diff is not None: msg = msg + "\n[diff]: {diff}".format(diff=diff) > raise AssertionError(msg) E AssertionError: Series are different E E Series values are different (4.0 %) E [left]: [-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, 0.0, 0.0, 0.0, 0.0, 0.967334209683, -1.12749699126, 1.00621520732, 0.467115769273, -0.665495302938, -1.9655758973, 0.314295658919, -1.5728548579, 1.60539543955, 1.20132044052, -0.267834389937, -1.3125275111, 0.827027080809, -0.750655389751, -0.646701964354, -0.564806568125, 1.04153633485, -0.175289544241, -0.771798272938, -0.353146592188, -0.895526823358, -0.229003615743, -1.24668695712, -0.396975143203, 1.28664372671, 1.43113842599, 0.954652683573, 1.21141700331, -1.15516473451, 2.14816148205, 1.0492538281, -0.36137923595, -0.750632548499, -0.24502818186, 0.651587577021, -1.33034613473, 0.446654064159, -0.216192740252, -0.988088651194, 0.341802605183, 0.7488135734, -0.596658039592, -0.759760465904, 0.650746773025, 1.47640000528, -0.963593630477, -0.264742407812, 0.91147138281, -0.116493770275, -0.840843917606, 0.713860639926, -0.999446407034, -0.261993101942, 0.660244548292, 0.283304496904, 0.417297181001, 1.13236254504, -1.04559448586, -0.302416962494, 1.06231513633, 0.0376809290172, -0.00528160487426, -0.753751886674, -1.76853768804, 1.05207654029, 0.646266446052, -0.817276175661, 0.347974618646, 2.49401568105, -1.59727151377, 0.637718637115, 0.445203010849, 1.6222785846, 0.397953946747, 0.810931905513, -0.244945263003, 1.09902523539, 1.5024980885, -0.189142680513, -1.0871214807, -0.216461016432, -0.395180231199, -0.466997134722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557] E [right]: [-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, nan, nan, nan, nan, 0.967334209683, -1.12749699126, 1.00621520732, 0.467115769273, -0.665495302938, -1.9655758973, 0.314295658919, -1.5728548579, 1.60539543955, 1.20132044052, -0.267834389937, -1.3125275111, 0.827027080809, -0.750655389751, -0.646701964354, -0.564806568125, 1.04153633485, -0.175289544241, -0.771798272938, -0.353146592188, -0.895526823358, -0.229003615743, -1.24668695712, -0.396975143203, 1.28664372671, 1.43113842599, 0.954652683573, 1.21141700331, -1.15516473451, 2.14816148205, 1.0492538281, -0.36137923595, -0.750632548499, -0.24502818186, 0.651587577021, -1.33034613473, 0.446654064159, -0.216192740252, -0.988088651194, 0.341802605183, 0.7488135734, -0.596658039592, -0.759760465904, 0.650746773025, 1.47640000528, -0.963593630477, -0.264742407812, 0.91147138281, -0.116493770275, -0.840843917606, 0.713860639926, -0.999446407034, -0.261993101942, 0.660244548292, 0.283304496904, 0.417297181001, 1.13236254504, -1.04559448586, -0.302416962494, 1.06231513633, 0.0376809290172, -0.00528160487426, -0.753751886674, -1.76853768804, 1.05207654029, 0.646266446052, -0.817276175661, 0.347974618646, 2.49401568105, -1.59727151377, 0.637718637115, 0.445203010849, 1.6222785846, 0.397953946747, 0.810931905513, -0.244945263003, 1.09902523539, 1.5024980885, -0.189142680513, -1.0871214807, -0.216461016432, -0.395180231199, -0.466997134722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557] pandas/util/testing.py:1053: AssertionError F pandas/tests/series/test_replace.py:189 (TestSeriesReplace.test_replace2) self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace2> def test_replace2(self): N = 100 ser = pd.Series(np.fabs(np.random.randn(N)), tm.makeDateIndex(N), dtype=object) ser[:5] = np.nan ser[6:10] = 'foo' ser[20:30] = 'bar' # replace list with a single value rs = ser.replace([np.nan, 'foo', 'bar'], -1) > self.assertTrue((rs[:5] == -1).all()) E AssertionError: False is not true pandas/tests/series/test_replace.py:201: AssertionError F pandas/tests/series/test_replace.py:178 (TestSeriesReplace.test_replace_bool_with_bool) self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace_bool_with_bool> def test_replace_bool_with_bool(self): s = pd.Series([True, False, True]) result = s.replace(True, False) expected = pd.Series([False] * len(s)) > tm.assert_series_equal(expected, result) pandas/tests/series/test_replace.py:183: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ pandas/util/testing.py:1215: in assert_series_equal obj='{0}'.format(obj)) ls/pandas/util/testing.pyx:59: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:4156) ??? ls/pandas/util/testing.pyx:173: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:3274) ??? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ obj = 'Series', message = 'Series values are different (66.66667 %)' left = '[False, False, False]', right = '[True, False, True]', diff = None def raise_assert_detail(obj, message, left, right, diff=None): if isinstance(left, np.ndarray): left = pprint_thing(left) if isinstance(right, np.ndarray): right = pprint_thing(right) msg = """{0} are different {1} [left]: {2} [right]: {3}""".format(obj, message, left, right) if diff is not None: msg = msg + "\n[diff]: {diff}".format(diff=diff) > raise AssertionError(msg) E AssertionError: Series are different E E Series values are different (66.66667 %) E [left]: [False, False, False] E [right]: [True, False, True] pandas/util/testing.py:1053: AssertionError F pandas/tests/series/test_replace.py:171 (TestSeriesReplace.test_replace_bool_with_string) self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace_bool_with_string> def test_replace_bool_with_string(self): # nonexistent elements s = pd.Series([True, False, True]) result = s.replace(True, '2u') expected = pd.Series(['2u', False, '2u']) > tm.assert_series_equal(expected, result) pandas/tests/series/test_replace.py:177: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ pandas/util/testing.py:1188: in assert_series_equal assert_attr_equal('dtype', left, right) pandas/util/testing.py:918: in assert_attr_equal left_attr, right_attr) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ obj = 'Attributes', message = 'Attribute "dtype" are different' left = dtype('O'), right = dtype('bool'), diff = None def raise_assert_detail(obj, message, left, right, diff=None): if isinstance(left, np.ndarray): left = pprint_thing(left) if isinstance(right, np.ndarray): right = pprint_thing(right) msg = """{0} are different {1} [left]: {2} [right]: {3}""".format(obj, message, left, right) if diff is not None: msg = msg + "\n[diff]: {diff}".format(diff=diff) > raise AssertionError(msg) E AssertionError: Attributes are different E E Attribute "dtype" are different E [left]: object E [right]: bool pandas/util/testing.py:1053: AssertionError . . F pandas/tests/series/test_replace.py:123 (TestSeriesReplace.test_replace_mixed_types) self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace_mixed_types> def test_replace_mixed_types(self): s = pd.Series(np.arange(5), dtype='int64') def check_replace(to_rep, val, expected): sc = s.copy() r = s.replace(to_rep, val) sc.replace(to_rep, val, inplace=True) tm.assert_series_equal(expected, r) tm.assert_series_equal(expected, sc) # MUST upcast to float e = pd.Series([0., 1., 2., 3., 4.]) tr, v = [3], [3.0] check_replace(tr, v, e) # MUST upcast to float e = pd.Series([0, 1, 2, 3.5, 4]) tr, v = [3], [3.5] check_replace(tr, v, e) # casts to object e = pd.Series([0, 1, 2, 3.5, 'a']) tr, v = [3, 4], [3.5, 'a'] check_replace(tr, v, e) # again casts to object e = pd.Series([0, 1, 2, 3.5, pd.Timestamp('20130101')]) tr, v = [3, 4], [3.5, pd.Timestamp('20130101')] check_replace(tr, v, e) # casts to object e = pd.Series([0, 1, 2, 3.5, True], dtype='object') tr, v = [3, 4], [3.5, True] check_replace(tr, v, e) # test an object with dates + floats + integers + strings dr = pd.date_range('1/1/2001', '1/10/2001', freq='D').to_series().reset_index(drop=True) result = dr.astype(object).replace( [dr[0], dr[1], dr[2]], [1.0, 2, 'a']) expected = pd.Series([1.0, 2, 'a'] + dr[3:].tolist(), dtype=object) > tm.assert_series_equal(result, expected) pandas/tests/series/test_replace.py:165: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ pandas/util/testing.py:1215: in assert_series_equal obj='{0}'.format(obj)) ls/pandas/util/testing.pyx:59: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:4156) ??? ls/pandas/util/testing.pyx:173: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:3274) ??? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ obj = 'Series', message = 'Series values are different (30.0 %)' left = '[2001-01-01 00:00:00, 2001-01-02 00:00:00, 2001-01-03 00:00:00, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00]' right = '[1.0, 2, a, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00]' diff = None def raise_assert_detail(obj, message, left, right, diff=None): if isinstance(left, np.ndarray): left = pprint_thing(left) if isinstance(right, np.ndarray): right = pprint_thing(right) msg = """{0} are different {1} [left]: {2} [right]: {3}""".format(obj, message, left, right) if diff is not None: msg = msg + "\n[diff]: {diff}".format(diff=diff) > raise AssertionError(msg) E AssertionError: Series are different E E Series values are different (30.0 %) E [left]: [2001-01-01 00:00:00, 2001-01-02 00:00:00, 2001-01-03 00:00:00, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00] E [right]: [1.0, 2, a, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00] pandas/util/testing.py:1053: AssertionError . . . . =================================== FAILURES =================================== ________________________ TestSeriesReplace.test_replace ________________________ self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace> def test_replace(self): N = 100 ser = pd.Series(np.random.randn(N)) ser[0:4] = np.nan ser[6:10] = 0 # replace list with a single value ser.replace([np.nan], -1, inplace=True) exp = ser.fillna(-1) tm.assert_series_equal(ser, exp) rs = ser.replace(0., np.nan) ser[ser == 0.] = np.nan > tm.assert_series_equal(rs, ser) pandas/tests/series/test_replace.py:27: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ pandas/util/testing.py:1215: in assert_series_equal obj='{0}'.format(obj)) ls/pandas/util/testing.pyx:59: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:4156) ??? ls/pandas/util/testing.pyx:173: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:3274) ??? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ obj = 'Series', message = 'Series values are different (4.0 %)' left = '[-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, 0.0, 0.0, 0.0, 0.0, 0.967334209683, -1.12749699126, 1.006215...722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557]' right = '[-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, nan, nan, nan, nan, 0.967334209683, -1.12749699126, 1.006215...722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557]' diff = None def raise_assert_detail(obj, message, left, right, diff=None): if isinstance(left, np.ndarray): left = pprint_thing(left) if isinstance(right, np.ndarray): right = pprint_thing(right) msg = """{0} are different {1} [left]: {2} [right]: {3}""".format(obj, message, left, right) if diff is not None: msg = msg + "\n[diff]: {diff}".format(diff=diff) > raise AssertionError(msg) E AssertionError: Series are different E E Series values are different (4.0 %) E [left]: [-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, 0.0, 0.0, 0.0, 0.0, 0.967334209683, -1.12749699126, 1.00621520732, 0.467115769273, -0.665495302938, -1.9655758973, 0.314295658919, -1.5728548579, 1.60539543955, 1.20132044052, -0.267834389937, -1.3125275111, 0.827027080809, -0.750655389751, -0.646701964354, -0.564806568125, 1.04153633485, -0.175289544241, -0.771798272938, -0.353146592188, -0.895526823358, -0.229003615743, -1.24668695712, -0.396975143203, 1.28664372671, 1.43113842599, 0.954652683573, 1.21141700331, -1.15516473451, 2.14816148205, 1.0492538281, -0.36137923595, -0.750632548499, -0.24502818186, 0.651587577021, -1.33034613473, 0.446654064159, -0.216192740252, -0.988088651194, 0.341802605183, 0.7488135734, -0.596658039592, -0.759760465904, 0.650746773025, 1.47640000528, -0.963593630477, -0.264742407812, 0.91147138281, -0.116493770275, -0.840843917606, 0.713860639926, -0.999446407034, -0.261993101942, 0.660244548292, 0.283304496904, 0.417297181001, 1.13236254504, -1.04559448586, -0.302416962494, 1.06231513633, 0.0376809290172, -0.00528160487426, -0.753751886674, -1.76853768804, 1.05207654029, 0.646266446052, -0.817276175661, 0.347974618646, 2.49401568105, -1.59727151377, 0.637718637115, 0.445203010849, 1.6222785846, 0.397953946747, 0.810931905513, -0.244945263003, 1.09902523539, 1.5024980885, -0.189142680513, -1.0871214807, -0.216461016432, -0.395180231199, -0.466997134722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557] E [right]: [-1.0, -1.0, -1.0, -1.0, -1.74439069784, 0.800838457366, nan, nan, nan, nan, 0.967334209683, -1.12749699126, 1.00621520732, 0.467115769273, -0.665495302938, -1.9655758973, 0.314295658919, -1.5728548579, 1.60539543955, 1.20132044052, -0.267834389937, -1.3125275111, 0.827027080809, -0.750655389751, -0.646701964354, -0.564806568125, 1.04153633485, -0.175289544241, -0.771798272938, -0.353146592188, -0.895526823358, -0.229003615743, -1.24668695712, -0.396975143203, 1.28664372671, 1.43113842599, 0.954652683573, 1.21141700331, -1.15516473451, 2.14816148205, 1.0492538281, -0.36137923595, -0.750632548499, -0.24502818186, 0.651587577021, -1.33034613473, 0.446654064159, -0.216192740252, -0.988088651194, 0.341802605183, 0.7488135734, -0.596658039592, -0.759760465904, 0.650746773025, 1.47640000528, -0.963593630477, -0.264742407812, 0.91147138281, -0.116493770275, -0.840843917606, 0.713860639926, -0.999446407034, -0.261993101942, 0.660244548292, 0.283304496904, 0.417297181001, 1.13236254504, -1.04559448586, -0.302416962494, 1.06231513633, 0.0376809290172, -0.00528160487426, -0.753751886674, -1.76853768804, 1.05207654029, 0.646266446052, -0.817276175661, 0.347974618646, 2.49401568105, -1.59727151377, 0.637718637115, 0.445203010849, 1.6222785846, 0.397953946747, 0.810931905513, -0.244945263003, 1.09902523539, 1.5024980885, -0.189142680513, -1.0871214807, -0.216461016432, -0.395180231199, -0.466997134722, -0.383566928512, -0.625996793246, 0.647007259928, 1.96797576966, -1.99782584579, 0.733212757326, -0.444315911557] pandas/util/testing.py:1053: AssertionError _______________________ TestSeriesReplace.test_replace2 ________________________ self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace2> def test_replace2(self): N = 100 ser = pd.Series(np.fabs(np.random.randn(N)), tm.makeDateIndex(N), dtype=object) ser[:5] = np.nan ser[6:10] = 'foo' ser[20:30] = 'bar' # replace list with a single value rs = ser.replace([np.nan, 'foo', 'bar'], -1) > self.assertTrue((rs[:5] == -1).all()) E AssertionError: False is not true pandas/tests/series/test_replace.py:201: AssertionError ________________ TestSeriesReplace.test_replace_bool_with_bool _________________ self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace_bool_with_bool> def test_replace_bool_with_bool(self): s = pd.Series([True, False, True]) result = s.replace(True, False) expected = pd.Series([False] * len(s)) > tm.assert_series_equal(expected, result) pandas/tests/series/test_replace.py:183: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ pandas/util/testing.py:1215: in assert_series_equal obj='{0}'.format(obj)) ls/pandas/util/testing.pyx:59: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:4156) ??? ls/pandas/util/testing.pyx:173: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:3274) ??? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ obj = 'Series', message = 'Series values are different (66.66667 %)' left = '[False, False, False]', right = '[True, False, True]', diff = None def raise_assert_detail(obj, message, left, right, diff=None): if isinstance(left, np.ndarray): left = pprint_thing(left) if isinstance(right, np.ndarray): right = pprint_thing(right) msg = """{0} are different {1} [left]: {2} [right]: {3}""".format(obj, message, left, right) if diff is not None: msg = msg + "\n[diff]: {diff}".format(diff=diff) > raise AssertionError(msg) E AssertionError: Series are different E E Series values are different (66.66667 %) E [left]: [False, False, False] E [right]: [True, False, True] pandas/util/testing.py:1053: AssertionError _______________ TestSeriesReplace.test_replace_bool_with_string ________________ self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace_bool_with_string> def test_replace_bool_with_string(self): # nonexistent elements s = pd.Series([True, False, True]) result = s.replace(True, '2u') expected = pd.Series(['2u', False, '2u']) > tm.assert_series_equal(expected, result) pandas/tests/series/test_replace.py:177: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ pandas/util/testing.py:1188: in assert_series_equal assert_attr_equal('dtype', left, right) pandas/util/testing.py:918: in assert_attr_equal left_attr, right_attr) _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ obj = 'Attributes', message = 'Attribute "dtype" are different' left = dtype('O'), right = dtype('bool'), diff = None def raise_assert_detail(obj, message, left, right, diff=None): if isinstance(left, np.ndarray): left = pprint_thing(left) if isinstance(right, np.ndarray): right = pprint_thing(right) msg = """{0} are different {1} [left]: {2} [right]: {3}""".format(obj, message, left, right) if diff is not None: msg = msg + "\n[diff]: {diff}".format(diff=diff) > raise AssertionError(msg) E AssertionError: Attributes are different E E Attribute "dtype" are different E [left]: object E [right]: bool pandas/util/testing.py:1053: AssertionError __________________ TestSeriesReplace.test_replace_mixed_types __________________ self = <pandas.tests.series.test_replace.TestSeriesReplace testMethod=test_replace_mixed_types> def test_replace_mixed_types(self): s = pd.Series(np.arange(5), dtype='int64') def check_replace(to_rep, val, expected): sc = s.copy() r = s.replace(to_rep, val) sc.replace(to_rep, val, inplace=True) tm.assert_series_equal(expected, r) tm.assert_series_equal(expected, sc) # MUST upcast to float e = pd.Series([0., 1., 2., 3., 4.]) tr, v = [3], [3.0] check_replace(tr, v, e) # MUST upcast to float e = pd.Series([0, 1, 2, 3.5, 4]) tr, v = [3], [3.5] check_replace(tr, v, e) # casts to object e = pd.Series([0, 1, 2, 3.5, 'a']) tr, v = [3, 4], [3.5, 'a'] check_replace(tr, v, e) # again casts to object e = pd.Series([0, 1, 2, 3.5, pd.Timestamp('20130101')]) tr, v = [3, 4], [3.5, pd.Timestamp('20130101')] check_replace(tr, v, e) # casts to object e = pd.Series([0, 1, 2, 3.5, True], dtype='object') tr, v = [3, 4], [3.5, True] check_replace(tr, v, e) # test an object with dates + floats + integers + strings dr = pd.date_range('1/1/2001', '1/10/2001', freq='D').to_series().reset_index(drop=True) result = dr.astype(object).replace( [dr[0], dr[1], dr[2]], [1.0, 2, 'a']) expected = pd.Series([1.0, 2, 'a'] + dr[3:].tolist(), dtype=object) > tm.assert_series_equal(result, expected) pandas/tests/series/test_replace.py:165: _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ pandas/util/testing.py:1215: in assert_series_equal obj='{0}'.format(obj)) ls/pandas/util/testing.pyx:59: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:4156) ??? ls/pandas/util/testing.pyx:173: in pandas.util.libtesting.assert_almost_equal (pandas/util/testing.c:3274) ??? _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ obj = 'Series', message = 'Series values are different (30.0 %)' left = '[2001-01-01 00:00:00, 2001-01-02 00:00:00, 2001-01-03 00:00:00, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00]' right = '[1.0, 2, a, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00]' diff = None def raise_assert_detail(obj, message, left, right, diff=None): if isinstance(left, np.ndarray): left = pprint_thing(left) if isinstance(right, np.ndarray): right = pprint_thing(right) msg = """{0} are different {1} [left]: {2} [right]: {3}""".format(obj, message, left, right) if diff is not None: msg = msg + "\n[diff]: {diff}".format(diff=diff) > raise AssertionError(msg) E AssertionError: Series are different E E Series values are different (30.0 %) E [left]: [2001-01-01 00:00:00, 2001-01-02 00:00:00, 2001-01-03 00:00:00, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00] E [right]: [1.0, 2, a, 2001-01-04 00:00:00, 2001-01-05 00:00:00, 2001-01-06 00:00:00, 2001-01-07 00:00:00, 2001-01-08 00:00:00, 2001-01-09 00:00:00, 2001-01-10 00:00:00] pandas/util/testing.py:1053: AssertionError ====================== 5 failed, 6 passed in 0.54 seconds ====================== Process finished with exit code 0

I could invest time to find why those 5 tests now are failing, to then tackle the mixed support.... Or just build on my approach and only tackle the mixed support. Anyway, I'm here to learn, let me know what's the best approach and I'll follow. Thanks.

jreback · 2017-03-27T12:16:04Z

normally don't like to edit think not-associated with the PR (e.g. you may have some editor setting which change this)...no big deal

Ok... Sorry for that... I'm using IntelliJ IDEA, and it formatted all file with PEP8 standard

no problem. we dont' quite follow PEP8 (as flake8 doesn't actually)......

jreback · 2017-03-27T12:16:21Z

can you test this with unicode as well

-Original file line number
+Diff line change
@@ -1,6 +1,6 @@
     Release Notes
     =============
-    The list of changes to pandas between each release can be found
+    The list of changes to Pandas between each release can be found
     [here](http://pandas.pydata.org/pandas-docs/stable/whatsnew.html). For full
     details, see the commit logs at http://github.com/pandas-dev/pandas.

-Original file line number
+Diff line change
@@ Expand Up / @@ -985,3 +985,5 @@ Bug Fixes @@
     - Bug in ``pd.melt()`` where passing a tuple value for ``value_vars`` caused a ``TypeError`` (:issue:`15348`)
     - Bug in ``.eval()`` which caused multiline evals to fail with local variables not on the first line (:issue:`15342`)
     - Bug in ``pd.read_msgpack`` which did not allow to load dataframe with an index of type ``CategoricalIndex`` (:issue:`15487`)
+    - Bug in ``Series.replace`` which replaced a numeric by string (:issue:`15743`)

-Original file line number
+Diff line change
@@ Expand Up / @@ -21,11 +21,16 @@ def mask_missing(arr, values_to_mask): @@
         Return a masking array of same size/shape as arr
         with entries equaling any member of values_to_mask set to True
         """
-        if not isinstance(values_to_mask, (list, np.ndarray)):
+        if isinstance(values_to_mask, np.ndarray):
+            mask_type = values_to_mask.dtype.type
+        elif isinstance(values_to_mask, list):
+            mask_type = type(values_to_mask[0])
+        else:
+            mask_type = type(values_to_mask)
             values_to_mask = [values_to_mask]
         try:
-            values_to_mask = np.array(values_to_mask, dtype=arr.dtype)
+            values_to_mask = np.array(values_to_mask, dtype=mask_type)
         except Exception:
             values_to_mask = np.array(values_to_mask, dtype=object)
@@ Expand Down Expand Up @@
             if axis != 0:  # pragma: no cover
                 raise AssertionError("cannot interpolate on a ndim == 1 with "
                                      "axis != 0")
-            values = values.reshape(tuple((1, ) + values.shape))
+            values = values.reshape(tuple((1,) + values.shape))
         if fill_value is None:
             mask = None
@@ Expand Down Expand Up / @@ -447,7 +452,6 @@ def wrapper(arr, mask, limit=None): @@
     def pad_1d(values, limit=None, mask=None, dtype=None):
         if dtype is None:
             dtype = values.dtype
         _method = None
@@ Expand All / @@ -472,7 +476,6 @@ def pad_1d(values, limit=None, mask=None, dtype=None): @@
     def backfill_1d(values, limit=None, mask=None, dtype=None):
         if dtype is None:
             dtype = values.dtype
         _method = None
@@ Expand All / @@ -498,7 +501,6 @@ def backfill_1d(values, limit=None, mask=None, dtype=None): @@
     def pad_2d(values, limit=None, mask=None, dtype=None):
         if dtype is None:
             dtype = values.dtype
         _method = None
@@ Expand Down Expand Up / @@ -528,7 +530,6 @@ def pad_2d(values, limit=None, mask=None, dtype=None): @@
     def backfill_2d(values, limit=None, mask=None, dtype=None):
         if dtype is None:
             dtype = values.dtype
         _method = None
@@ Expand Down @@

-Original file line number
+Diff line change
@@ Expand Up / @@ -10,7 +10,6 @@ @@
     class TestSeriesReplace(TestData, tm.TestCase):
         def test_replace(self):
             N = 100
             ser = pd.Series(np.random.randn(N))
@@ Expand Down Expand Up / @@ -227,3 +226,10 @@ def test_replace_with_empty_dictlike(self): @@
             s = pd.Series(list('abcd'))
             tm.assert_series_equal(s, s.replace(dict()))
             tm.assert_series_equal(s, s.replace(pd.Series([])))
+        def test_replace_string_with_nan(self):
+            # GH 15743
+            s = pd.Series([1, 2, 3])
+            result = s.replace('2', np.nan)
+            expected = pd.Series([1, 2, 3])
+            tm.assert_series_equal(expected, result)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

BUG: replace of numeric by string / dtype coversion (GH15743) #15812

Uh oh!

Diff view

Diff view

There are no files selected for viewing

jreback Mar 27, 2017

Uh oh!

jreback Mar 27, 2017

Uh oh!

jreback Mar 27, 2017

Uh oh!

ucals Mar 27, 2017

Uh oh!

jreback Mar 27, 2017

Uh oh!

ucals Mar 28, 2017

Uh oh!

jreback Mar 27, 2017

Uh oh!

ucals Mar 28, 2017

Uh oh!

jreback Mar 28, 2017

Uh oh!

jreback Mar 27, 2017

Uh oh!

ucals Mar 28, 2017

Uh oh!

Uh oh!

BUG: replace of numeric by string / dtype coversion (GH15743) #15812

Uh oh!

BUG: replace of numeric by string / dtype coversion (GH15743) #15812

Uh oh!

Uh oh!

Diff view

Diff view

There are no files selected for viewing

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!