Skip to content

Enforce boolean types #14318

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

rforgione
Copy link
Contributor

@rforgione rforgione commented Sep 29, 2016

This PR only addresses the inplace argument, though there was a comment on #14189 that identifies the issue as being with the copy argument as well. I can build this out further to account for all of the frequently occurring boolean arguments.

I also wrote tests for the common function _enforce_bool_type, but didn't write individual tests for every method with an inplace argument. This is something I can add if it makes sense. I wanted to get something reviewed sooner rather than later to get feedback and ensure that I'm on the right track. Feedback is much appreciated -- thanks!

If expression mutates, whether to modify object inplace or return
copy with mutation.

WARNING: inplace=None currently falls back to to True, but
in a future version, will default to False. Use inplace=True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't change this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be an explict decision to remove.

@jreback jreback added Dtype Conversions Unexpected or buggy dtype conversions Error Reporting Incorrect or improved errors from pandas labels Sep 29, 2016
@jreback
Copy link
Contributor

jreback commented Sep 29, 2016

call this _validate_bool_type, should accept None as valid. (just ignore)

use pandas.types.common.is_bool (for consistency with other routines)

put this in pandas.types.validate (make a new module). add tests in pandas.tests.types.test_validate. Need to cover each routine that is covered (just a smoke test, IOW, just loop thru a couple of invalid values and make sure it raises)

@rforgione
Copy link
Contributor Author

thanks for the feedback @jreback, will work on these!

@@ -147,7 +148,7 @@ def _check_for_locals(expr, stack_level, parser):

def eval(expr, parser='pandas', engine=None, truediv=True,
local_dict=None, global_dict=None, resolvers=(), level=0,
target=None, inplace=None):
target=None, inplace=False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't change this

@@ -206,14 +207,10 @@ def eval(expr, parser='pandas', engine=None, truediv=True,
scope. Most users will **not** need to change this parameter.
target : a target object for assignment, optional, default is None
essentially this is a passed in resolver
inplace : bool, default True
inplace : bool, default False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nor this

@jorisvandenbossche
Copy link
Member

@rforgione Do you have time to update this?

@rforgione
Copy link
Contributor Author

@jorisvandenbossche yeah, apologies. Things have been a little crazy with my day job of late. I'll dig into this over the weekend.

@jorisvandenbossche
Copy link
Member

@rforgione No problem, just ping here when you updated it

@rforgione
Copy link
Contributor Author

hey @jorisvandenbossche / @jreback -- sorry for the delay on this. I think I knocked out more-or-less all of the feedback items listed by @jreback above. I struggled a little with some of the tests for internals.py, namely for the putmask method on the NonConsolidatableMixIn class, and the _replace_single method on the ObjectBlock class (I had trouble instantiating the appropriate classes to run tests on). I'd be happy to write tests for those if you wouldn't mind pointing me in the right direction. Any other thoughts/feedback are appreciated. Thanks!

@@ -11,6 +11,8 @@
from pandas.computation.scope import _ensure_scope
from pandas.compat import string_types
from pandas.computation.engines import _engines
from pandas.core import common as com
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will fail linting (as not used)

@@ -2218,7 +2219,7 @@ def query(self, expr, inplace=False, **kwargs):
else:
return new_data

def eval(self, expr, inplace=None, **kwargs):
def eval(self, expr, inplace=True, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leave this alone

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed this one when reverting all of my original changes, thanks for pointing out!

from pandas.types import common as com

def _validate_bool_type(value):
if not com.is_bool(value) and not value is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not (is_bool(value) or value is None))

@@ -0,0 +1,7 @@
from pandas.types import common as com
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from pandas.types.common import is_bool

@@ -0,0 +1,169 @@
from pandas.types.validate import _validate_bool_type
from unittest import TestCase
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these tests all need to be in

tests/series/test_validate
tests/frame/test_validate

in THIS file, just test _validate_bool_type itself

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback how should we handle tests unrelated to the series and the dataframe, for instance the Categorical class? Should we add an additional test_validate file in the tests/ directory to account for those that don't fit under series and frame? Similarly I'd assume we could place put the eval tests inside computation/tests/test_validate or something like that. Thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can put the categorical tests in tests_categorical. iow, these tests go with the type that they are testing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks @jreback

@jreback
Copy link
Contributor

jreback commented Dec 4, 2016

@rforgione looks pretty good. just a few more comments.

@jreback
Copy link
Contributor

jreback commented Dec 4, 2016

pls add a whatsnew for 0.20.0.

@jorisvandenbossche
Copy link
Member

One concern about this PR I have is that if you now call a method with argument with a wrong type (eg like df.where(lambda x: x > 4, lambda x: x + 10, 'fail') as the original example in the issue), you simply get the error message "ValueError: Expected type bool, received type str". From this it is not clear at all that it is about the inplace keyword.
So I think you should either make this validation function specific for inplace and add that to the error message, or either pass in the keyword name we are checking to the _validate_bool_type function (to include it in the error message) in all places.

@rforgione
Copy link
Contributor Author

@jorisvandenbossche I like the second suggestion. I can add a second argument to the _validate_bool_type function and pass the argument name as a string in each instance throughout the code base (so if I'm testing the inplace argument, I can call _validate_bool_type(inplace, 'inplace') where _validate_bool_type will use the string to create a more specific error message). Is that what you had in mind @jorisvandenbossche ?

@rforgione rforgione force-pushed the enforce_boolean_types branch from c626626 to 93b476b Compare December 5, 2016 00:04
@rforgione
Copy link
Contributor Author

@jreback @jorisvandenbossche I pushed the discussed changes, feel free to take a look. Any additional feedback is appreciated!

@@ -62,6 +62,8 @@ Backwards incompatible API changes

- ``CParserError`` has been renamed to ``ParserError`` in ``pd.read_csv`` and will be removed in the future (:issue:`12665`)

- ``inplace`` arguments now require a boolean value, else a ``ValueError`` is thrown (:issue:`14189`)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thrown -> raised

def test_validate_bool_args(self):
invalid_values = [1, "True", [1,2,3], 5.0]
lst = SparseList(self.na_data)
for value in invalid_values:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is deprecated so need to assert_produces_warning as well

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or just leave out this test, since its deprecated it's not that important to test this


class TestDataFrameValidate(TestCase):

df = DataFrame({'a':[1,2], 'b':[3,4]})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1 line comment on what this is checking

class TestSeriesValidate(TestCase):

s = Series([1,2,3,4,5])

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

@@ -1671,6 +1671,41 @@ def test_map(self):
result = c.map(lambda x: 1)
tm.assert_numpy_array_equal(result, np.array([1] * 5, dtype=np.int64))

def test_validate_inplace(self):
cat = Categorical(['A','B','B','C','A'])
invalid_values = [1, "True", [1,2,3], 5.0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nvm this one is descriptive enough

@@ -297,6 +297,28 @@ def test_split_block_at(self):
# bs = list(bblock.split_block_at('f'))
# self.assertEqual(len(bs), 0)

def test_validate_bool_args(self):
invalid_values = [1, "True", [1,2,3], 5.0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

internals shouldn't have tests for this at all (or even check)

you can assert if u want

invalid_values = [1, "True", [1,2,3], 5.0]

for name in arg_names:
for value in invalid_values:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add tests that are valid, iow True, False, None

@@ -631,7 +633,7 @@ def as_ordered(self, inplace=False):
Whether or not to set the ordered attribute inplace or return a copy
of this categorical with ordered set to True
"""
return self.set_ordered(True, inplace=inplace)
return self.set_ordered(True, inplace=_validate_bool_type(inplace, 'inplace'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line is too long (PEP8, that is the reason travis is failing). You can put the validation of inplace on its own line

@@ -643,7 +645,7 @@ def as_unordered(self, inplace=False):
Whether or not to set the ordered attribute inplace or return a copy
of this categorical with ordered set to False
"""
return self.set_ordered(False, inplace=inplace)
return self.set_ordered(False, inplace=_validate_bool_type(inplace, 'inplace'))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here

def test_validate_bool_args(self):
invalid_values = [1, "True", [1,2,3], 5.0]
lst = SparseList(self.na_data)
for value in invalid_values:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or just leave out this test, since its deprecated it's not that important to test this


class TestSeriesValidate(TestCase):

s = Series([1,2,3,4,5])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing spaces after commas (PEP8). You can see on travis what is all failing: https://travis-ci.org/pandas-dev/pandas/jobs/181229873 (scroll down, at the bottom there is a list of failures)


def _validate_bool_type(value, arg_name):
if not (is_bool(value) or value is None):
raise ValueError('For argument %s expected type bool, received type %s.' %\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you insert quotes around %s? (make it clearer that is the arg name), like "Argument '%s' exp ..."

@@ -0,0 +1,7 @@
from pandas.types.common import is_bool

def _validate_bool_type(value, arg_name):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would maybe call this _validate_bool_kwarg to make it clearer that it is specifically to validate a keyword

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche would you recommend making the validation function itself accept a keyworded variable-length argument? or leave as is and just rename?

@codecov-io
Copy link

codecov-io commented Dec 7, 2016

Current coverage is 84.77% (diff: 100%)

Merging #14318 into master will increase coverage by 0.01%

@@             master     #14318   diff @@
==========================================
  Files           145        145          
  Lines         51151      51208    +57   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43355      43413    +58   
+ Misses         7796       7795     -1   
  Partials          0          0          

Powered by Codecov. Last update db08da2...2478d8f

@rforgione
Copy link
Contributor Author

just a couple of more changes to pass the Travis tests, will work on these tonight

@@ -0,0 +1,7 @@
from pandas.types.common import is_bool

def _validate_bool_kwarg(value, arg_name):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to: https://github.com/pandas-dev/pandas/blob/master/pandas/util/validators.py

had forgotten that we put input validations here

for name in arg_names:
for value in invalid_values:
with tm.assertRaisesRegexp(ValueError, "For argument \"%s\" expected type bool, received type %s" % (name, type(value).__name__)):
_validate_bool_kwarg(value, name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move this to pandas/util/tests where other validation tests are)

@@ -2346,6 +2351,11 @@ def align(self, other, join='outer', axis=None, level=None, copy=True,

@Appender(generic._shared_docs['rename'] % _shared_doc_kwargs)
def rename(self, index=None, **kwargs):
if 'inplace' in kwargs:
kwargs['inplace'] = validate_bool_kwarg(kwargs['inplace'], 'inplace')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kwargs['inplace'] = validate_bool_kwarg(kwargs.get('inplace', False), 'inplace')
works?

@@ -215,3 +215,9 @@ def validate_args_and_kwargs(fname, args, kwargs,

kwargs.update(args_dict)
validate_kwargs(fname, kwargs, compat_args)

def validate_bool_kwarg(value, arg_name):
if not (is_bool(value) or value is None):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a doc-string here

@jreback
Copy link
Contributor

jreback commented Dec 11, 2016

just some very minor comments.

ping on green.

@jreback jreback added this to the 0.20.0 milestone Dec 11, 2016

def validate_bool_kwarg(value, arg_name):
if not (is_bool(value) or value is None):
raise ValueError('For argument "%s" expected type bool, ' % arg_name +\
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the \ is not needed here

@jreback
Copy link
Contributor

jreback commented Dec 30, 2016

can you rebase

@rforgione rforgione force-pushed the enforce_boolean_types branch from 3d63ead to e874410 Compare January 5, 2017 04:24
@rforgione
Copy link
Contributor Author

rforgione commented Jan 5, 2017

@jreback @jorisvandenbossche it's not immediately clear to me why the first set of tests is failing. I looked at the output in Travis but not seeing anything (literally blank output).

job 24181 3 - pandas-dev pandas - travis ci 2017-01-05 09-46-35

Any idea what's going on? Was about to rebase but wanted to check in on this first.

@rforgione rforgione force-pushed the enforce_boolean_types branch from 1067a2c to 879e9ca Compare January 6, 2017 03:42
@rforgione rforgione force-pushed the enforce_boolean_types branch from 879e9ca to 2478d8f Compare January 6, 2017 04:18
@rforgione
Copy link
Contributor Author

@jreback @jorisvandenbossche figured it out -- I needed to re-build after I rebased against the upstream changes. All checks are on green, I think we are good to go!

@jorisvandenbossche jorisvandenbossche merged commit b895968 into pandas-dev:master Jan 6, 2017
@jorisvandenbossche
Copy link
Member

@rforgione Thanks!

@MarcoGorelli MarcoGorelli mentioned this pull request Oct 2, 2022
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Dtype Conversions Unexpected or buggy dtype conversions Error Reporting Incorrect or improved errors from pandas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Pandas inplace argument is not explicit True or False
4 participants