-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
REF: Implement core._algos #32767
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REF: Implement core._algos #32767
Conversation
pandas/core/_algos/__init__.py
Outdated
@@ -0,0 +1,9 @@ | |||
""" | |||
core._algos is for algorithms that operate on ndarray and ExtensionArray. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use algos
instead of _algos
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ambivalent on this, as I do like the non-private name, but we have a bunch of import algorithms as algos
and i want to avoid ambiguity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then let’s just create core.algorithms as a subdir
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, that's a good point. The underscore is still only a small difference though. So maybe we should think of yet another name (or actually make algorithms.py into a module and rethink it more broader)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So yes, similar as what @jreback said.
But Brock, you wanted to better distinguish pure array algos, that don't need to deal with series/dataframe etc (which I think is a good goal). But then making algorithms.py into a submodule (and use that for the function in this PR) defeats that purpose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But then making algorithms.py into a submodule (and use that for the function in this PR) defeats that purpose?
That's my thought too.
So maybe we should think of yet another name
Suggestions?
pandas/core/_algos/__init__.py
Outdated
- Assume that any Index, Series, or DataFrame objects have already been unwrapped. | ||
- Assume that any list arguments have already been cast to ndarray/EA. | ||
- Not depend on Index, Series, or DataFrame, nor import any of these. | ||
- May dispatch to ExtensionArray methods, but should not import from core.arrays. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what else you are planning to put here, but it could also be pure-numpy algos? That would also have value in being very clear.
Eg the shift
you moved here now is strictly numpy arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah, I have no problem with some functions in here being explicitly ndarray-only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't speaking about some functions, but the full file.
But so, what other functions do you envision to put here? (as for a single function it's not worth creating a module I think)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wasn't speaking about some functions, but the full file.
That may end up being a reasonable organization, but it isn't obvious to me. e.g. if we end up with a shift
that dispatches to shift_ea
and shift_ndarray
, would we want those to all be in the same file or split across multiple files?
pandas/core/arrays/datetimelike.py
Outdated
@@ -39,6 +39,7 @@ | |||
from pandas.core.dtypes.missing import is_valid_nat_for_dtype, isna | |||
|
|||
from pandas.core import missing, nanops, ops | |||
from pandas.core._algos.transforms import shift |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you import from "pandas.core._algos" instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see
comments
|
Renamed "_algos" -> "array_algos" |
thanks @jbrockmendel the above sounds like a good plan |
@jorisvandenbossche another candidate for array_algos would be most of what is currently in dtypes.concat, which you and i agreed was a weird place for those functions |
Yep, now another option for those concat things would be to move each of them to their respective array implementation (but that might depend a bit on how we rework the EA interface regarding concatting) |
I like that idea quite a bit |
ATM core.algorithms and core.nanops are a mish-mash in terms of what inputs they expect. This implements core._algos directory intended for guaranteed-ndarray/EA-only implementations.
For the first function to move I de-duplicated a
shift
method. Need suggestions for what to call this module.