Skip to content

TYP: Fix typing in ExtensionDtype registry #41203

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Aug 3, 2021
Merged
Show file tree
Hide file tree
Changes from 7 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 38 additions & 9 deletions pandas/core/dtypes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,13 +8,16 @@
TYPE_CHECKING,
Any,
TypeVar,
cast,
overload,
)

import numpy as np

from pandas._libs.hashtable import object_hash
from pandas._typing import (
DtypeObj,
NpDtype,
type_t,
)
from pandas.errors import AbstractMethodError
Expand All @@ -29,7 +32,7 @@
from pandas.core.arrays import ExtensionArray

# To parameterize on same ExtensionDtype
E = TypeVar("E", bound="ExtensionDtype")
ExtensionDtypeT = TypeVar("ExtensionDtypeT", bound="ExtensionDtype")


class ExtensionDtype:
Expand Down Expand Up @@ -206,7 +209,9 @@ def construct_array_type(cls) -> type_t[ExtensionArray]:
raise AbstractMethodError(cls)

@classmethod
def construct_from_string(cls, string: str):
def construct_from_string(
cls: type_t[ExtensionDtypeT], string: str
) -> ExtensionDtypeT:
r"""
Construct this type from a string.

Expand Down Expand Up @@ -368,7 +373,7 @@ def _can_hold_na(self) -> bool:
return True


def register_extension_dtype(cls: type[E]) -> type[E]:
def register_extension_dtype(cls: type[ExtensionDtypeT]) -> type[ExtensionDtypeT]:
"""
Register an ExtensionType with pandas as class decorator.

Expand Down Expand Up @@ -422,28 +427,52 @@ def register(self, dtype: type[ExtensionDtype]) -> None:

self.dtypes.append(dtype)

def find(self, dtype: type[ExtensionDtype] | str) -> type[ExtensionDtype] | None:
@overload
def find(self, dtype: type[ExtensionDtypeT]) -> ExtensionDtypeT:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

>>> typ
<class 'pandas.core.arrays.integer.Int64Dtype'>
>>> pd.core.dtypes.base.Registry().find(typ)
<class 'pandas.core.arrays.integer.Int64Dtype'>
>>> 
Suggested change
def find(self, dtype: type[ExtensionDtypeT]) -> ExtensionDtypeT:
def find(self, dtype: type[ExtensionDtypeT]) -> type[ExtensionDtypeT]:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out that type[ExtensionDtypeT] is not a possible return type, so I removed it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

see code sample in comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so I can put it back, but then type[ExtensionDtypeT] conflicts with npt.DTypeLike, because that can be any type, so have to go back to specifying the specific types.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type[object] can create issues. but unfortunately object is a valid dtype, use ignores where needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal here is to avoid false positives for users passing a valid dtype and also avoid using our NpDtype alias. Ideally we want to remove use of that alias completely.

Examining this function and ignoring the docstring, any type can be passed and the return would be None if not an EA type. i.e. this function never raises a TypeError.

maybe widen the types to any, use object and if mypy complains use Any or ignore?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If working with the public api, you may want to consider reviving #40202.

if not create a test file and post the results (and the test file used) in a similar maner to #40200

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registry.find() is not in the public API, so do we need to worry about "false positives for users" ? This code is just so pandas developers don't mess things up.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, The registry attribute was privatized in #40538 so that pyright --verifytypes passes. The changes in this PR are not to fix pyright issues?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I don't remember! It's been since January since I started this adventure. I did this PR as part of getting ExtensionArray working. I haven't been using pyright to verify this.

...

@overload
def find(self, dtype: ExtensionDtypeT) -> ExtensionDtypeT:
...

@overload
def find(
self, dtype: NpDtype | type_t[str | float | int | complex | bool | object] | str
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

type_t and type used. choose one for consistency.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to type_t in next commit

) -> ExtensionDtype | None:
...

def find(
self,
dtype: type[ExtensionDtype]
| ExtensionDtype
| NpDtype
| type_t[str | float | int | complex | bool | object],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arent these subsumed by NpDtype?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, NpDtype includes str

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does #41945 help simplify

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, in next commit

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to undo this

) -> type[ExtensionDtype] | ExtensionDtype | None:
"""
Parameters
----------
dtype : Type[ExtensionDtype] or str
dtype : ExtensionDtype class or instance or str or numpy dtype or python type

Returns
-------
return the first matching dtype, otherwise return None
"""
if not isinstance(dtype, str):
dtype_type = dtype
dtype_type: type[ExtensionDtype] | type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about NpDtype added to the signature above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this PR, NpDtype is Union[str, np.dtype] . If dtype is str, this code isn't entered. If dtype is an np.dtype, then an np.dtype is a type, so adding the np.dtype type to the type of dtype_type is redundant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the union of type[ExtensionDtype] and type is just type?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in next commit

if not isinstance(dtype, type):
dtype_type = type(dtype)
else:
dtype_type = dtype
if issubclass(dtype_type, ExtensionDtype):
return dtype
# cast needed here as mypy doesn't know we have figured
# out it is an ExtensionDtype
return cast(ExtensionDtype, dtype)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dtype could also be type[ExtensionDtype] ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at this point in the logic

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using code sample from #41203 (comment)

>>> typ
<class 'pandas.core.arrays.integer.Int64Dtype'>
>>> pd.core.dtypes.base.Registry().find(typ)
> /home/simon/pandas/pandas/core/dtypes/base.py(460)find()
-> return cast(ExtensionDtype, dtype)
(Pdb) dtype
<class 'pandas.core.arrays.integer.Int64Dtype'>
(Pdb) type(dtype)
<class 'type'>
(Pdb) 

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I see. Fixed in a new commit.


return None

for dtype_type in self.dtypes:
for dtype_loop in self.dtypes:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to the annotations self.dtypes: list[type[ExtensionDtype]]

and dtype_type is dtype_type: type[ExtensionDtype] | type

so why is this changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reverted back due to change of declaration of dtype_type above.

try:
return dtype_type.construct_from_string(dtype)
return dtype_loop.construct_from_string(dtype)
except TypeError:
pass

Expand Down
4 changes: 1 addition & 3 deletions pandas/core/dtypes/common.py
Original file line number Diff line number Diff line change
Expand Up @@ -1766,9 +1766,7 @@ def pandas_dtype(dtype) -> DtypeObj:
# registered extension types
result = registry.find(dtype)
if result is not None:
# error: Incompatible return value type (got "Type[ExtensionDtype]",
# expected "Union[dtype, ExtensionDtype]")
return result # type: ignore[return-value]
return result

# try a numpy dtype
# raise a consistent TypeError if failed
Expand Down