Enforce boolean types #14318

rforgione · 2016-09-29T06:10:37Z

[ x ] closes Pandas inplace argument is not explicit True or False #14189
[ x ] tests added / passed
passes git diff upstream/master | flake8 --diff
whatsnew entry

This PR only addresses the inplace argument, though there was a comment on #14189 that identifies the issue as being with the copy argument as well. I can build this out further to account for all of the frequently occurring boolean arguments.

I also wrote tests for the common function _enforce_bool_type, but didn't write individual tests for every method with an inplace argument. This is something I can add if it makes sense. I wanted to get something reviewed sooner rather than later to get feedback and ensure that I'm on the right track. Feedback is much appreciated -- thanks!

jreback · 2016-09-29T09:20:54Z

pandas/computation/eval.py

        If expression mutates, whether to modify object inplace or return
        copy with mutation.

-        WARNING: inplace=None currently falls back to to True, but
-        in a future version, will default to False.  Use inplace=True


don't change this

this will be an explict decision to remove.

jreback · 2016-09-29T10:47:42Z

call this _validate_bool_type, should accept None as valid. (just ignore)

use pandas.types.common.is_bool (for consistency with other routines)

put this in pandas.types.validate (make a new module). add tests in pandas.tests.types.test_validate. Need to cover each routine that is covered (just a smoke test, IOW, just loop thru a couple of invalid values and make sure it raises)

rforgione · 2016-10-01T02:12:09Z

thanks for the feedback @jreback, will work on these!

jreback · 2016-10-06T10:40:36Z

pandas/computation/eval.py

@@ -147,7 +148,7 @@ def _check_for_locals(expr, stack_level, parser):

 def eval(expr, parser='pandas', engine=None, truediv=True,
         local_dict=None, global_dict=None, resolvers=(), level=0,
-         target=None, inplace=None):
+         target=None, inplace=False):


don't change this

jreback · 2016-10-06T10:40:41Z

pandas/computation/eval.py

@@ -206,14 +207,10 @@ def eval(expr, parser='pandas', engine=None, truediv=True,
        scope. Most users will **not** need to change this parameter.
    target : a target object for assignment, optional, default is None
        essentially this is a passed in resolver
-    inplace : bool, default True
+    inplace : bool, default False


jorisvandenbossche · 2016-10-27T14:13:29Z

@rforgione Do you have time to update this?

rforgione · 2016-10-28T20:43:55Z

@jorisvandenbossche yeah, apologies. Things have been a little crazy with my day job of late. I'll dig into this over the weekend.

jorisvandenbossche · 2016-10-29T13:19:18Z

@rforgione No problem, just ping here when you updated it

rforgione · 2016-12-02T08:10:32Z

hey @jorisvandenbossche / @jreback -- sorry for the delay on this. I think I knocked out more-or-less all of the feedback items listed by @jreback above. I struggled a little with some of the tests for internals.py, namely for the putmask method on the NonConsolidatableMixIn class, and the _replace_single method on the ObjectBlock class (I had trouble instantiating the appropriate classes to run tests on). I'd be happy to write tests for those if you wouldn't mind pointing me in the right direction. Any other thoughts/feedback are appreciated. Thanks!

jreback · 2016-12-04T18:03:47Z

pandas/computation/eval.py

@@ -11,6 +11,8 @@
 from pandas.computation.scope import _ensure_scope
 from pandas.compat import string_types
 from pandas.computation.engines import _engines
+from pandas.core import common as com


this will fail linting (as not used)

jreback · 2016-12-04T18:04:32Z

pandas/core/frame.py

@@ -2218,7 +2219,7 @@ def query(self, expr, inplace=False, **kwargs):
        else:
            return new_data

-    def eval(self, expr, inplace=None, **kwargs):
+    def eval(self, expr, inplace=True, **kwargs):


leave this alone

missed this one when reverting all of my original changes, thanks for pointing out!

jreback · 2016-12-04T18:05:28Z

pandas/types/validate.py

+from pandas.types import common as com
+
+def _validate_bool_type(value):
+    if not com.is_bool(value) and not value is None:


not (is_bool(value) or value is None))

jreback · 2016-12-04T18:05:49Z

pandas/types/validate.py

@@ -0,0 +1,7 @@
+from pandas.types import common as com


from pandas.types.common import is_bool

jreback · 2016-12-04T18:08:12Z

pandas/tests/types/test_validate.py

@@ -0,0 +1,169 @@
+from pandas.types.validate import _validate_bool_type
+from unittest import TestCase


these tests all need to be in

tests/series/test_validate
tests/frame/test_validate

in THIS file, just test _validate_bool_type itself

@jreback how should we handle tests unrelated to the series and the dataframe, for instance the Categorical class? Should we add an additional test_validate file in the tests/ directory to account for those that don't fit under series and frame? Similarly I'd assume we could place put the eval tests inside computation/tests/test_validate or something like that. Thanks!

you can put the categorical tests in tests_categorical. iow, these tests go with the type that they are testing.

Got it, thanks @jreback

jreback · 2016-12-04T18:09:01Z

@rforgione looks pretty good. just a few more comments.

jreback · 2016-12-04T18:09:21Z

pls add a whatsnew for 0.20.0.

jorisvandenbossche · 2016-12-04T21:36:22Z

One concern about this PR I have is that if you now call a method with argument with a wrong type (eg like df.where(lambda x: x > 4, lambda x: x + 10, 'fail') as the original example in the issue), you simply get the error message "ValueError: Expected type bool, received type str". From this it is not clear at all that it is about the inplace keyword.
So I think you should either make this validation function specific for inplace and add that to the error message, or either pass in the keyword name we are checking to the _validate_bool_type function (to include it in the error message) in all places.

rforgione · 2016-12-04T21:48:05Z

@jorisvandenbossche I like the second suggestion. I can add a second argument to the _validate_bool_type function and pass the argument name as a string in each instance throughout the code base (so if I'm testing the inplace argument, I can call _validate_bool_type(inplace, 'inplace') where _validate_bool_type will use the string to create a more specific error message). Is that what you had in mind @jorisvandenbossche ?

rforgione · 2016-12-05T02:23:40Z

@jreback @jorisvandenbossche I pushed the discussed changes, feel free to take a look. Any additional feedback is appreciated!

jreback · 2016-12-05T11:01:24Z

doc/source/whatsnew/v0.20.0.txt

@@ -62,6 +62,8 @@ Backwards incompatible API changes

 - ``CParserError`` has been renamed to ``ParserError`` in ``pd.read_csv`` and will be removed in the future (:issue:`12665`)

+- ``inplace`` arguments now require a boolean value, else a ``ValueError`` is thrown (:issue:`14189`)
+


thrown -> raised

jreback · 2016-12-05T11:04:10Z

pandas/sparse/tests/test_list.py

+    def test_validate_bool_args(self):
+        invalid_values = [1, "True", [1,2,3], 5.0]
+        lst = SparseList(self.na_data)
+        for value in invalid_values:


this is deprecated so need to assert_produces_warning as well

or just leave out this test, since its deprecated it's not that important to test this

jreback · 2016-12-05T11:04:42Z

pandas/tests/frame/test_validate.py

+
+class TestDataFrameValidate(TestCase):
+
+    df = DataFrame({'a':[1,2], 'b':[3,4]})


1 line comment on what this is checking

jreback · 2016-12-05T11:04:52Z

pandas/tests/series/test_validate.py

+class TestSeriesValidate(TestCase):
+
+    s = Series([1,2,3,4,5])
+


jreback · 2016-12-05T11:04:59Z

pandas/tests/test_categorical.py

@@ -1671,6 +1671,41 @@ def test_map(self):
        result = c.map(lambda x: 1)
        tm.assert_numpy_array_equal(result, np.array([1] * 5, dtype=np.int64))

+    def test_validate_inplace(self):
+        cat = Categorical(['A','B','B','C','A'])
+        invalid_values = [1, "True", [1,2,3], 5.0]


nvm this one is descriptive enough

jreback · 2016-12-05T11:06:29Z

pandas/tests/test_internals.py

@@ -297,6 +297,28 @@ def test_split_block_at(self):
        # bs = list(bblock.split_block_at('f'))
        # self.assertEqual(len(bs), 0)

+    def test_validate_bool_args(self):
+        invalid_values = [1, "True", [1,2,3], 5.0]


internals shouldn't have tests for this at all (or even check)

you can assert if u want

jreback · 2016-12-05T11:07:19Z

pandas/tests/types/test_validate.py

+    invalid_values = [1, "True", [1,2,3], 5.0]
+
+    for name in arg_names:
+        for value in invalid_values:


add tests that are valid, iow True, False, None

jorisvandenbossche · 2016-12-06T22:22:17Z

pandas/core/categorical.py

@@ -631,7 +633,7 @@ def as_ordered(self, inplace=False):
           Whether or not to set the ordered attribute inplace or return a copy
           of this categorical with ordered set to True
        """
-        return self.set_ordered(True, inplace=inplace)
+        return self.set_ordered(True, inplace=_validate_bool_type(inplace, 'inplace'))


This line is too long (PEP8, that is the reason travis is failing). You can put the validation of inplace on its own line

jorisvandenbossche · 2016-12-06T22:22:24Z

pandas/core/categorical.py

@@ -643,7 +645,7 @@ def as_unordered(self, inplace=False):
           Whether or not to set the ordered attribute inplace or return a copy
           of this categorical with ordered set to False
        """
-        return self.set_ordered(False, inplace=inplace)
+        return self.set_ordered(False, inplace=_validate_bool_type(inplace, 'inplace'))


jorisvandenbossche · 2016-12-06T22:25:37Z

pandas/sparse/tests/test_list.py

+    def test_validate_bool_args(self):
+        invalid_values = [1, "True", [1,2,3], 5.0]
+        lst = SparseList(self.na_data)
+        for value in invalid_values:


or just leave out this test, since its deprecated it's not that important to test this

jorisvandenbossche · 2016-12-06T22:28:20Z

pandas/tests/series/test_validate.py

+
+class TestSeriesValidate(TestCase):
+
+    s = Series([1,2,3,4,5])


missing spaces after commas (PEP8). You can see on travis what is all failing: https://travis-ci.org/pandas-dev/pandas/jobs/181229873 (scroll down, at the bottom there is a list of failures)

jorisvandenbossche · 2016-12-06T22:31:16Z

pandas/types/validate.py

+
+def _validate_bool_type(value, arg_name):
+    if not (is_bool(value) or value is None):
+        raise ValueError('For argument %s expected type bool, received type %s.' %\


Can you insert quotes around %s? (make it clearer that is the arg name), like "Argument '%s' exp ..."

jorisvandenbossche · 2016-12-06T22:34:22Z

pandas/types/validate.py

@@ -0,0 +1,7 @@
+from pandas.types.common import is_bool
+
+def _validate_bool_type(value, arg_name):


I would maybe call this _validate_bool_kwarg to make it clearer that it is specifically to validate a keyword

@jorisvandenbossche would you recommend making the validation function itself accept a keyworded variable-length argument? or leave as is and just rename?

codecov-io · 2016-12-07T04:11:07Z

Current coverage is 84.77% (diff: 100%)

Merging #14318 into master will increase coverage by 0.01%

@@             master     #14318   diff @@
==========================================
  Files           145        145          
  Lines         51151      51208    +57   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43355      43413    +58   
+ Misses         7796       7795     -1   
  Partials          0          0

Powered by Codecov. Last update db08da2...2478d8f

rforgione · 2016-12-07T22:29:57Z

just a couple of more changes to pass the Travis tests, will work on these tonight

jreback · 2016-12-08T12:19:43Z

pandas/types/validate.py

@@ -0,0 +1,7 @@
+from pandas.types.common import is_bool
+
+def _validate_bool_kwarg(value, arg_name):


move this to: https://github.com/pandas-dev/pandas/blob/master/pandas/util/validators.py

had forgotten that we put input validations here

jreback · 2016-12-08T12:20:22Z

pandas/tests/types/test_validate.py

+    for name in arg_names:
+        for value in invalid_values:
+            with tm.assertRaisesRegexp(ValueError, "For argument \"%s\" expected type bool, received type %s" % (name, type(value).__name__)):
+                _validate_bool_kwarg(value, name)


move this to pandas/util/tests where other validation tests are)

jreback · 2016-12-11T22:49:28Z

pandas/core/series.py

@@ -2346,6 +2351,11 @@ def align(self, other, join='outer', axis=None, level=None, copy=True,

    @Appender(generic._shared_docs['rename'] % _shared_doc_kwargs)
    def rename(self, index=None, **kwargs):
+        if 'inplace' in kwargs:
+            kwargs['inplace'] = validate_bool_kwarg(kwargs['inplace'], 'inplace')


kwargs['inplace'] = validate_bool_kwarg(kwargs.get('inplace', False), 'inplace')
works?

jreback · 2016-12-11T22:50:10Z

pandas/util/validators.py

@@ -215,3 +215,9 @@ def validate_args_and_kwargs(fname, args, kwargs,

    kwargs.update(args_dict)
    validate_kwargs(fname, kwargs, compat_args)
+
+def validate_bool_kwarg(value, arg_name):
+    if not (is_bool(value) or value is None):


can you add a doc-string here

jreback · 2016-12-11T22:51:07Z

just some very minor comments.

ping on green.

jorisvandenbossche · 2016-12-14T11:25:26Z

pandas/util/validators.py

+
+def validate_bool_kwarg(value, arg_name):
+    if not (is_bool(value) or value is None):
+        raise ValueError('For argument "%s" expected type bool, ' % arg_name +\


the \ is not needed here

jreback · 2016-12-30T21:24:25Z

can you rebase

rforgione · 2017-01-05T14:46:02Z

@jreback @jorisvandenbossche it's not immediately clear to me why the first set of tests is failing. I looked at the output in Travis but not seeing anything (literally blank output).

Any idea what's going on? Was about to rebase but wanted to check in on this first.

rforgione · 2017-01-06T04:55:37Z

@jreback @jorisvandenbossche figured it out -- I needed to re-build after I rebased against the upstream changes. All checks are on green, I think we are good to go!

jorisvandenbossche · 2017-01-06T12:33:35Z

@rforgione Thanks!

jreback requested changes Sep 29, 2016

View reviewed changes

jreback added Dtype Conversions Unexpected or buggy dtype conversions Error Reporting Incorrect or improved errors from pandas labels Sep 29, 2016

jreback reviewed Oct 6, 2016

View reviewed changes

jreback reviewed Dec 4, 2016

View reviewed changes

pandas/types/validate.py

@@ -0,0 +1,7 @@

from pandas.types import common as com

Copy link

Contributor

jreback Dec 4, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from pandas.types.common import is_bool

jreback reviewed Dec 4, 2016

View reviewed changes

rforgione force-pushed the enforce_boolean_types branch from c626626 to 93b476b Compare December 5, 2016 00:04

jreback requested changes Dec 5, 2016

View reviewed changes

jorisvandenbossche reviewed Dec 6, 2016

View reviewed changes

jreback reviewed Dec 8, 2016

View reviewed changes

jreback reviewed Dec 11, 2016

View reviewed changes

jreback approved these changes Dec 11, 2016

View reviewed changes

jreback added this to the 0.20.0 milestone Dec 11, 2016

jorisvandenbossche reviewed Dec 14, 2016

View reviewed changes

rforgione force-pushed the enforce_boolean_types branch from 3d63ead to e874410 Compare January 5, 2017 04:24

rforgione force-pushed the enforce_boolean_types branch from 1067a2c to 879e9ca Compare January 6, 2017 03:42

add type enforcement function for boolean args and apply to inplace args

2478d8f

rforgione force-pushed the enforce_boolean_types branch from 879e9ca to 2478d8f Compare January 6, 2017 04:18

clean-up merge left-over + test comment

ba8b6f6

jorisvandenbossche merged commit b895968 into pandas-dev:master Jan 6, 2017

MarcoGorelli mentioned this pull request Oct 2, 2022

fix pylint bad-super-call #48896

Merged

3 tasks

		@@ -0,0 +1,169 @@
		from pandas.types.validate import _validate_bool_type
		from unittest import TestCase

		@@ -62,6 +62,8 @@ Backwards incompatible API changes

		- ``CParserError`` has been renamed to ``ParserError`` in ``pd.read_csv`` and will be removed in the future (:issue:`12665`)

		- ``inplace`` arguments now require a boolean value, else a ``ValueError`` is thrown (:issue:`14189`)


		class TestDataFrameValidate(TestCase):

		df = DataFrame({'a':[1,2], 'b':[3,4]})

		@@ -0,0 +1,7 @@
		from pandas.types.common import is_bool

		def _validate_bool_type(value, arg_name):

		@@ -0,0 +1,7 @@
		from pandas.types.common import is_bool

		def _validate_bool_kwarg(value, arg_name):

Enforce boolean types #14318

Enforce boolean types #14318

Conversation

rforgione commented Sep 29, 2016 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Sep 29, 2016

rforgione commented Oct 1, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jorisvandenbossche commented Oct 27, 2016

rforgione commented Oct 28, 2016

jorisvandenbossche commented Oct 29, 2016

rforgione commented Dec 2, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Dec 4, 2016

jreback commented Dec 4, 2016

jorisvandenbossche commented Dec 4, 2016

rforgione commented Dec 4, 2016

rforgione commented Dec 5, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov-io commented Dec 7, 2016 • edited Loading

Current coverage is 84.77% (diff: 100%)

rforgione commented Dec 7, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Dec 11, 2016

Choose a reason for hiding this comment

jreback commented Dec 30, 2016

rforgione commented Jan 5, 2017 • edited Loading

rforgione commented Jan 6, 2017

jorisvandenbossche commented Jan 6, 2017

rforgione commented Sep 29, 2016 •

edited

Loading

codecov-io commented Dec 7, 2016 •

edited

Loading

rforgione commented Jan 5, 2017 •

edited

Loading