-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
API: Add pandas.api.typing #48577
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
makes sense for typing purposes, but does it make sense for users to use these directly? |
I would say no - they should only be invoked via |
I agree this is useful for typing purposes but otherwise not, so I'm -1 on putting them is the main namespace. I suggest they go into |
Thanks @topper-123. I don't believe |
Hi @rhshadrach. I'm positive to adding the groupby classes to the public namespace and my only hesitance is whether to put them in the main namespace or not. My feeling is that the main namespace is very crowded already and so should be added to only sparingly, and these two classes will not be instantiated directly from the main namespace, so it may be confusing having them there. |
I think the idea of having the main namespace be "objects / functions users should create / use" is a meaningful one, so +1 on putting this somewhere else. Looking through the current entries, the only potential class I see that a user should likely not be constructing themselves is Currently, |
+1 on somewhere in pd.api, I don't have a very strong preference on There are probably a few other classes that could be exposed publicly, e..g. |
We should be careful here with respect to what we want to do in the pandas code and encouraging people to use The following code works and type checks just fine if you have import pandas as pd
from pandas.core.groupby.generic import DataFrameGroupBy
def myfun(dfgb: DataFrameGroupBy):
return dfgb.sum()
df = pd.DataFrame(
{
"Animal": ["Falcon", "Falcon", "Parrot", "Parrot"],
"Max Speed": [380.0, 370.0, 24.0, 26.0],
}
)
dfg: DataFrameGroupBy = df.groupby("Animal")
print(myfun(dfg)) So isn't this just an issue of making sure that types like It's also worth mentioning that in the example above, if you were to do a I don't think this is a typing issue. I think this is more about that we have docs that return classes that are not fully documented, and if you wanted to use those classes in your own type declarations, then you have to know where to import them from. |
While this works, I do not think the current state is a good solution.
|
Indeed, thanks, I do find that interesting. But I don't see the connection to this issue - are there additional problems incurred if the user was to import |
I agree with you here, except I'm not convinced that this is only needed for typing, so I'm not sure that calling it |
I don't think so, but if you put these classes under |
Thanks @Dr-Irv; I agree with your comments. I've updated the linked PR with |
Works for me. |
I wrote a script to take the classes in pandas that occur in the documentation but aren't in the public API. There are likely some missing classes as well as false positives here, I've crossed off the ones that I can identify and put the ones I'm not familiar enough with in bold. In addition to the top-level and pandas.api I'm considering the following as already being public: pandas.arrays, pandas.tseries, pandas.tseries.offesets. Code
|
These are documented in
This is an object from the dataframe interchange protocol that I think should be an implementation detail. |
You can also consider For XlsxWriter, And |
Thanks @mroeschke / @jorisvandenbossche - I've updated the script to include
If we agree that subclasses of ExcelWriter should not be allowed to add any public methods, then it would be sufficient. That looks to currently be the case, and I think that is a reasonable restriction on the subclasses. I've crossed out XlsxWriter and opened #49602.
The original motivation of this issue is to start making more clear to users what is / isn't public, so unless |
I've edited the list above, crossing off a few classes that are purely internal. From the November dev call, I recall there was support for My take: while I agree there is potential for some confusion, I don't believe it to be very high, and would prefer |
unfortunately I missed the November dev call. I disagree with the comparison to
I’m concerned about setting a precedent if we use |
If the purpose is really only for typing, than it seems most natural to have it at either |
A thought: I never really liked that the type checking functions are located in
The functions is |
I would differentiate the two main (at least, in my opinion) purposes of type-hinting: (a) is increasing readability and (b) static type checking. You can accomplish (a) without e.g. pandas-stubs. In this sense, both numpy.typing and pandas.api.typing would contain classes for the user to add type hints to their code, regardless of if that user is going to use static type-checking. |
I see your point. As long as we document it that way, I won't stand in the way of calling it |
@TomAugspurger commented #27522 (comment) that if we're disciplined Now if #48578 is merged we have effectively removed some barriers to making |
Revision 2:
Currently many intermediate classes can only be found in
pandas.core
, for exampleDataFrameGroupBy
andSeriesGroupBy
. These classes should not be instantiated directly, but may be necessary to access for users who use type-hinting. I think we should add these classes to a new API submodule,pandas.api.aux
as they are auxiliary classes. Users shouldn't need to know e.g. they are defined inpandas.core.groupby.generic
.Somewhat related: #6944, #19302
cc @pandas-dev/pandas-core @pandas-dev/pandas-triage for feedback.
Original
Currently one cannot run `from pandas import DataFrameGroupBy, SeriesGroupBy`. We should add these classes to the top level as they are particularly important for type-hinting in user code. Users shouldn't need to know they are defined in `pandas.core.groupby.generic`.Revision 1
Currently one cannot run `from pandas import DataFrameGroupBy, SeriesGroupBy`. We should add these classes to a new API submodule, `pandas.api.trying` as they are particularly important for type-hinting in user code. Users shouldn't need to know they are defined in `pandas.core.groupby.generic`.The text was updated successfully, but these errors were encountered: