You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Reductions #11 (question: pandas has parameters (bool_only, numeric_only) to let only apply the operation over columns of certain types only. Do we want it?)
If we consider more methods to be applied directly over a dataframe, for example:
>>>df[['first_'name', 'last_name']].str.lower()
We may end up with a huge amount of string_only, bool_only, numeric_only parameters. All meaning something similar, but IMO adding a decent amount of complexity, and being difficult to keep the behavior consistent.
My preference would be to always raise, but being a software engineer I'm biased, and I guess many users may want this "magic".
So, I guess implementing an option, for example: pandas.options.mode.invalid_dtype {raise or skip} could make more sense.
The main problem with this approach is probably that it's not as easy to define the behavior for each operation:
Personally, I don't see this as an issue. IMO, the behavior depends more on the user than on the operation. I'd say for production code, having to be explicit, and selecting the columns to operate with, makes more sense. While in a notebook, avoiding exceptions with this sort of "magic" seems to be more useful.
I guess for Series/1-column DataFrame (see #6) it always makes sense to raise an exception.
Thoughts?
The text was updated successfully, but these errors were encountered:
So, I guess implementing an option, for example: pandas.options.mode.invalid_dtype {raise or skip} could make more sense.
I don't think anything in this API can rely on global options. One goal is to allow writing code that works against multiple backends, and global options I think defeats that.
It's worth noting that in pandas, the is most painful when it comes to object-dtype columns. That's the case where the reduction / method actually needs to be executed on the values to determine the output columns / dtypes. For the remaining dtypes we know ahead of time what the result metadata will be.
In sklearn we basically decided not to do something like that, and it makes it somewhat nicer for the devs but certainly somewhat annoying for users. For my use-cases I often want to distinguish categorical and continuous data, and how those are determined in a pandas context are often less than clear in practice. However, you could argue that it's up to the user to use the type system to ensure columns have the correct type.
This is a follow up of the discussions in:
pandas has parameters (bool_only, numeric_only) to let only apply the operation over columns of certain types only. Do we want it?
)See this example:
Even if the
name
column is selected, it is being ignored, since the mean of a string columns does not make sense. As opposed to raising an exception.Many reductions implement a parameter to let control this behavior:
If we consider more methods to be applied directly over a dataframe, for example:
We may end up with a huge amount of
string_only
,bool_only
,numeric_only
parameters. All meaning something similar, but IMO adding a decent amount of complexity, and being difficult to keep the behavior consistent.My preference would be to always raise, but being a software engineer I'm biased, and I guess many users may want this "magic".
So, I guess implementing an option, for example:
pandas.options.mode.invalid_dtype
{raise
orskip
} could make more sense.The main problem with this approach is probably that it's not as easy to define the behavior for each operation:
Personally, I don't see this as an issue. IMO, the behavior depends more on the user than on the operation. I'd say for production code, having to be explicit, and selecting the columns to operate with, makes more sense. While in a notebook, avoiding exceptions with this sort of "magic" seems to be more useful.
I guess for
Series
/1-column DataFrame
(see #6) it always makes sense to raise an exception.Thoughts?
The text was updated successfully, but these errors were encountered: