Skip to content

Improve type annotations in many places #6220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

dnaaun
Copy link

@dnaaun dnaaun commented Aug 3, 2020

  1. This is purely a type annotation PR( pinging @sgugger ), except for number 3 below.
  2. Admittedly, this PR orients type annotations as a code-correctness tool rather than a documentation tool, which I haven't seen in the repo before.
  3. The only non-type-annotation change is making DataProcessor inherit from abc.ABC. Along with that, the methods that previously had raise NotImplementedError() are now decorated by @abc.abstractmethod. This change was motivated by the fact that the static analyzers like mypy can correctly identify unimplemented methods in subclasses.

This PR is a result of a question on post on the forum.

Closes #6118 too.

@dnaaun
Copy link
Author

dnaaun commented Aug 3, 2020

From what I can see, test failures are because of the usage of typing.Literal, which is available starting on Python3.8.

The typing_extensions (from mypy devs) package backports such features into older versions of Python. If it's ok with the devs to add it to setup.py, I can do so. Otherwise, we can make the annotations less specific as to avoid Literal types.

@sgugger
Copy link
Collaborator

sgugger commented Aug 3, 2020

I'd avoid adding a new dependency or pinning us on python 3.8. I'd also avoid using overload type annotations to keep the code readable. I think we can have less specific type annotations, and detail what the kwargs are if needed to be able to type-annotate them (which will be useful for tab-completion in an IDE anyways).

For instance, the first:

@overload
    @classmethod
    def from_pretrained(
        cls,
        pretrained_model_name_or_path: str,
        *,
        return_unused_kwargs: Literal[True],
        cache_dir: Optional[str] = None,
        force_download: bool = False,
        resume_download: bool = False,
        proxies: Optional[Dict[str, str]] = None,
        **kwargs: Any
    ) -> Tuple[PretrainedConfig, Dict[str, str]]:
        ...

    @overload
    @classmethod
    def from_pretrained(
        cls,
        pretrained_model_name_or_path: str,
        *,
        return_unused_kwargs: Literal[False] = False,
        cache_dir: Optional[str] = None,
        force_download: bool = False,
        resume_download: bool = False,
        proxies: Optional[Dict[str, str]] = None,
        **kwargs: Any
    ) -> PretrainedConfig:
        ...

could be written like this in the main definition:

def from_pretrained(
        cls,
        pretrained_model_name_or_path: str,
        *,
        return_unused_kwargs: bool = False,
        cache_dir: Optional[str] = None,
        force_download: bool = False,
        resume_download: bool = False,
        proxies: Optional[Dict[str, str]] = None,
        **kwargs: Any
    ) -> Union[PretrainedConfig, Tuple[PretrainedConfig, Dict[str, str]]]:

and then the body of the function can be updated.

@dnaaun
Copy link
Author

dnaaun commented Aug 3, 2020

Sounds good. I'll get around to removing usages of Literal and overload tomorrow.

@sgugger
Copy link
Collaborator

sgugger commented Aug 4, 2020

Thinking a bit more about this, if you really need some types in typing_extensions, we can add this as a dep.

@dnaaun
Copy link
Author

dnaaun commented Aug 4, 2020

The types in this PR that need typing_extensions in py<3.8 are Literal and Protocol.

  1. There's only one usage of Literal in the PR that is not in conjunction with @overload. The other usages of Literal are to specify overloading signatures. If we avoid @overload, the use of Literal is much less justified.
  2. Protocol was used to replace
DataClass = NewType("DataClass", Any)
DataClassType = NewType("DataClassType", Any)

with

class DataClassProtocol(Protocol):
     def __init__(self, *args: Any, **kwargs: Any) -> None: pass
# And DataClassType would be replaced with Type[DataClassProtocol] in the rest of the code

The benefit of this is that mypy complains that dataclasses are in fact not compatible with DataClass when using the first definition.

I really like static type checking, so I'm for adding typing_extensions, but do the devs think the above usages are worth it, is the question ...

@julien-c
Copy link
Member

julien-c commented Aug 4, 2020

Personally I'd love to use Literals and I'd be in favor of adding the dependency just for this, but I'll let others chime in.

For the DataClassProtocol, I'm not convinced, given that this type won't lead to a lot of inference type checking anyways... I feel like the current DataClass and DataClassType are pretty much just documentation.

@julien-c
Copy link
Member

julien-c commented Aug 4, 2020

Slightly off topic, but @davidatbu do you have experience on Pyright vs. mypy? (in particular in the context of vscode)

I've been wanting to dig in for a long time.

@dnaaun
Copy link
Author

dnaaun commented Aug 4, 2020

@julien-c ,

For the DataClassProtocol, I'm not convinced, given that this type won't lead to a lot of inference type checking anyways... I feel like the current DataClass and DataClassType are pretty much just documentation.

Yes, this won't lead to inference of the actual dataclass passed in the current mypy implementation(it might if mypy ever supports variadic generics, and we make HfArgumentParser generic). However, using DataClass as previously defined leads mypy to raise errors about passing perfectly legal dataclasses to HfArgumentParser. Using DataClassProtocol doesn't. I use mypy a lot, so I might be the only one that this bothers.

Do you have experience on Pyright vs. mypy? (in particular in the context of vscode)

I've used only mypy so far. (And I find it so helpful that I'm willing to create PRs for type annotation :) )Also, I use Vim with plugins, so wouldn't be able to comment on the vscode aspect either.

@stale
Copy link

stale bot commented Oct 4, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Oct 4, 2020
@stale stale bot closed this Oct 11, 2020
@mrahtz
Copy link

mrahtz commented Feb 20, 2021

@davidatbu I noticed you referenced python/typing#193 on variadic generics in this thread. Heads up that we've been working on a draft of a PEP for this in PEP 646. If this is something you still care about, take a read and let us know any feedback in this thread in typing-sig. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Is guid allowed to be None in InputExample?
4 participants