-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Enforce boolean types #14318
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enforce boolean types #14318
Conversation
If expression mutates, whether to modify object inplace or return | ||
copy with mutation. | ||
|
||
WARNING: inplace=None currently falls back to to True, but | ||
in a future version, will default to False. Use inplace=True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't change this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will be an explict decision to remove.
call this use put this in |
thanks for the feedback @jreback, will work on these! |
@@ -147,7 +148,7 @@ def _check_for_locals(expr, stack_level, parser): | |||
|
|||
def eval(expr, parser='pandas', engine=None, truediv=True, | |||
local_dict=None, global_dict=None, resolvers=(), level=0, | |||
target=None, inplace=None): | |||
target=None, inplace=False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
don't change this
@@ -206,14 +207,10 @@ def eval(expr, parser='pandas', engine=None, truediv=True, | |||
scope. Most users will **not** need to change this parameter. | |||
target : a target object for assignment, optional, default is None | |||
essentially this is a passed in resolver | |||
inplace : bool, default True | |||
inplace : bool, default False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nor this
@rforgione Do you have time to update this? |
@jorisvandenbossche yeah, apologies. Things have been a little crazy with my day job of late. I'll dig into this over the weekend. |
@rforgione No problem, just ping here when you updated it |
hey @jorisvandenbossche / @jreback -- sorry for the delay on this. I think I knocked out more-or-less all of the feedback items listed by @jreback above. I struggled a little with some of the tests for internals.py, namely for the putmask method on the NonConsolidatableMixIn class, and the _replace_single method on the ObjectBlock class (I had trouble instantiating the appropriate classes to run tests on). I'd be happy to write tests for those if you wouldn't mind pointing me in the right direction. Any other thoughts/feedback are appreciated. Thanks! |
@@ -11,6 +11,8 @@ | |||
from pandas.computation.scope import _ensure_scope | |||
from pandas.compat import string_types | |||
from pandas.computation.engines import _engines | |||
from pandas.core import common as com |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this will fail linting (as not used)
@@ -2218,7 +2219,7 @@ def query(self, expr, inplace=False, **kwargs): | |||
else: | |||
return new_data | |||
|
|||
def eval(self, expr, inplace=None, **kwargs): | |||
def eval(self, expr, inplace=True, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
leave this alone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missed this one when reverting all of my original changes, thanks for pointing out!
from pandas.types import common as com | ||
|
||
def _validate_bool_type(value): | ||
if not com.is_bool(value) and not value is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not (is_bool(value) or value is None))
@@ -0,0 +1,7 @@ | |||
from pandas.types import common as com |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from pandas.types.common import is_bool
@@ -0,0 +1,169 @@ | |||
from pandas.types.validate import _validate_bool_type | |||
from unittest import TestCase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these tests all need to be in
tests/series/test_validate
tests/frame/test_validate
in THIS file, just test _validate_bool_type
itself
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback how should we handle tests unrelated to the series and the dataframe, for instance the Categorical
class? Should we add an additional test_validate file in the tests/ directory to account for those that don't fit under series and frame? Similarly I'd assume we could place put the eval
tests inside computation/tests/test_validate or something like that. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can put the categorical tests in tests_categorical. iow, these tests go with the type that they are testing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it, thanks @jreback
@rforgione looks pretty good. just a few more comments. |
pls add a whatsnew for 0.20.0. |
One concern about this PR I have is that if you now call a method with argument with a wrong type (eg like |
@jorisvandenbossche I like the second suggestion. I can add a second argument to the |
c626626
to
93b476b
Compare
@jreback @jorisvandenbossche I pushed the discussed changes, feel free to take a look. Any additional feedback is appreciated! |
@@ -62,6 +62,8 @@ Backwards incompatible API changes | |||
|
|||
- ``CParserError`` has been renamed to ``ParserError`` in ``pd.read_csv`` and will be removed in the future (:issue:`12665`) | |||
|
|||
- ``inplace`` arguments now require a boolean value, else a ``ValueError`` is thrown (:issue:`14189`) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thrown -> raised
def test_validate_bool_args(self): | ||
invalid_values = [1, "True", [1,2,3], 5.0] | ||
lst = SparseList(self.na_data) | ||
for value in invalid_values: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is deprecated so need to assert_produces_warning as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or just leave out this test, since its deprecated it's not that important to test this
|
||
class TestDataFrameValidate(TestCase): | ||
|
||
df = DataFrame({'a':[1,2], 'b':[3,4]}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 line comment on what this is checking
class TestSeriesValidate(TestCase): | ||
|
||
s = Series([1,2,3,4,5]) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
@@ -1671,6 +1671,41 @@ def test_map(self): | |||
result = c.map(lambda x: 1) | |||
tm.assert_numpy_array_equal(result, np.array([1] * 5, dtype=np.int64)) | |||
|
|||
def test_validate_inplace(self): | |||
cat = Categorical(['A','B','B','C','A']) | |||
invalid_values = [1, "True", [1,2,3], 5.0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nvm this one is descriptive enough
@@ -297,6 +297,28 @@ def test_split_block_at(self): | |||
# bs = list(bblock.split_block_at('f')) | |||
# self.assertEqual(len(bs), 0) | |||
|
|||
def test_validate_bool_args(self): | |||
invalid_values = [1, "True", [1,2,3], 5.0] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
internals shouldn't have tests for this at all (or even check)
you can assert if u want
invalid_values = [1, "True", [1,2,3], 5.0] | ||
|
||
for name in arg_names: | ||
for value in invalid_values: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add tests that are valid, iow True, False, None
@@ -631,7 +633,7 @@ def as_ordered(self, inplace=False): | |||
Whether or not to set the ordered attribute inplace or return a copy | |||
of this categorical with ordered set to True | |||
""" | |||
return self.set_ordered(True, inplace=inplace) | |||
return self.set_ordered(True, inplace=_validate_bool_type(inplace, 'inplace')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is too long (PEP8, that is the reason travis is failing). You can put the validation of inplace on its own line
@@ -643,7 +645,7 @@ def as_unordered(self, inplace=False): | |||
Whether or not to set the ordered attribute inplace or return a copy | |||
of this categorical with ordered set to False | |||
""" | |||
return self.set_ordered(False, inplace=inplace) | |||
return self.set_ordered(False, inplace=_validate_bool_type(inplace, 'inplace')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here
def test_validate_bool_args(self): | ||
invalid_values = [1, "True", [1,2,3], 5.0] | ||
lst = SparseList(self.na_data) | ||
for value in invalid_values: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or just leave out this test, since its deprecated it's not that important to test this
|
||
class TestSeriesValidate(TestCase): | ||
|
||
s = Series([1,2,3,4,5]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing spaces after commas (PEP8). You can see on travis what is all failing: https://travis-ci.org/pandas-dev/pandas/jobs/181229873 (scroll down, at the bottom there is a list of failures)
|
||
def _validate_bool_type(value, arg_name): | ||
if not (is_bool(value) or value is None): | ||
raise ValueError('For argument %s expected type bool, received type %s.' %\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you insert quotes around %s
? (make it clearer that is the arg name), like "Argument '%s' exp ..."
@@ -0,0 +1,7 @@ | |||
from pandas.types.common import is_bool | |||
|
|||
def _validate_bool_type(value, arg_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would maybe call this _validate_bool_kwarg
to make it clearer that it is specifically to validate a keyword
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jorisvandenbossche would you recommend making the validation function itself accept a keyworded variable-length argument? or leave as is and just rename?
Current coverage is 84.77% (diff: 100%)@@ master #14318 diff @@
==========================================
Files 145 145
Lines 51151 51208 +57
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43355 43413 +58
+ Misses 7796 7795 -1
Partials 0 0
|
just a couple of more changes to pass the Travis tests, will work on these tonight |
@@ -0,0 +1,7 @@ | |||
from pandas.types.common import is_bool | |||
|
|||
def _validate_bool_kwarg(value, arg_name): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to: https://github.com/pandas-dev/pandas/blob/master/pandas/util/validators.py
had forgotten that we put input validations here
for name in arg_names: | ||
for value in invalid_values: | ||
with tm.assertRaisesRegexp(ValueError, "For argument \"%s\" expected type bool, received type %s" % (name, type(value).__name__)): | ||
_validate_bool_kwarg(value, name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move this to pandas/util/tests where other validation tests are)
@@ -2346,6 +2351,11 @@ def align(self, other, join='outer', axis=None, level=None, copy=True, | |||
|
|||
@Appender(generic._shared_docs['rename'] % _shared_doc_kwargs) | |||
def rename(self, index=None, **kwargs): | |||
if 'inplace' in kwargs: | |||
kwargs['inplace'] = validate_bool_kwarg(kwargs['inplace'], 'inplace') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
kwargs['inplace'] = validate_bool_kwarg(kwargs.get('inplace', False), 'inplace')
works?
@@ -215,3 +215,9 @@ def validate_args_and_kwargs(fname, args, kwargs, | |||
|
|||
kwargs.update(args_dict) | |||
validate_kwargs(fname, kwargs, compat_args) | |||
|
|||
def validate_bool_kwarg(value, arg_name): | |||
if not (is_bool(value) or value is None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a doc-string here
just some very minor comments. ping on green. |
|
||
def validate_bool_kwarg(value, arg_name): | ||
if not (is_bool(value) or value is None): | ||
raise ValueError('For argument "%s" expected type bool, ' % arg_name +\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the \
is not needed here
can you rebase |
3d63ead
to
e874410
Compare
@jreback @jorisvandenbossche it's not immediately clear to me why the first set of tests is failing. I looked at the output in Travis but not seeing anything (literally blank output). Any idea what's going on? Was about to rebase but wanted to check in on this first. |
1067a2c
to
879e9ca
Compare
879e9ca
to
2478d8f
Compare
@jreback @jorisvandenbossche figured it out -- I needed to re-build after I rebased against the upstream changes. All checks are on green, I think we are good to go! |
@rforgione Thanks! |
git diff upstream/master | flake8 --diff
This PR only addresses the
inplace
argument, though there was a comment on #14189 that identifies the issue as being with thecopy
argument as well. I can build this out further to account for all of the frequently occurring boolean arguments.I also wrote tests for the common function
_enforce_bool_type
, but didn't write individual tests for every method with aninplace
argument. This is something I can add if it makes sense. I wanted to get something reviewed sooner rather than later to get feedback and ensure that I'm on the right track. Feedback is much appreciated -- thanks!