Skip to content

Add a new, lower priority for imports inside "if MYPY" #2167

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Oct 27, 2016
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 22 additions & 8 deletions mypy/build.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@
from typing import (AbstractSet, Dict, Iterable, Iterator, List,
NamedTuple, Optional, Set, Tuple, Union)

from mypy.nodes import (MypyFile, Import, ImportFrom, ImportAll)
from mypy.nodes import (MypyFile, Node, ImportBase, Import, ImportFrom, ImportAll)
from mypy.semanal import FirstPass, SemanticAnalyzer, ThirdPass
from mypy.checker import TypeChecker
from mypy.indirection import TypeIndirectionVisitor
Expand Down Expand Up @@ -306,10 +306,23 @@ def default_lib_path(data_dir: str, pyversion: Tuple[int, int]) -> List[str]:
PRI_HIGH = 5 # top-level "from X import blah"
PRI_MED = 10 # top-level "import X"
PRI_LOW = 20 # either form inside a function
PRI_MYPY = 25 # inside "if MYPY" or "if TYPE_CHECKING"
PRI_INDIRECT = 30 # an indirect dependency
PRI_ALL = 99 # include all priorities

Copy link
Contributor

@elazarg elazarg Sep 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only a single additional priority? Looks like the concerns are pretty independent. I would think a more detailed set of priorities, e.g. lexicographical ordering of the form

(is_mypy, is_toplevel, is_from)

might help the algorithm breaking cycles less arbitrarily, Alternatively,

(is_mypy, nesting_level, is_from)

perhaps even more so (the actual tuple is of course unimportant; I only talk about the implied ordering).

I don't know if all this will have any effect on any real codebase, but the current system feels somewhat arbitrary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would still be arbitrary. There are ideas for something better in the tracker, but I'd like to see how things work out with this small improvement first.


def import_priority(imp: ImportBase, toplevel_priority: int) -> int:
"""Compute import priority from an import node."""
if not imp.is_top_level:
# Inside a function
return PRI_LOW
if imp.is_mypy_only:
# Inside "if MYPY" or "if typing.TYPE_CHECKING"
return max(PRI_MYPY, toplevel_priority)
# A regular import; priority determined by argument.
return toplevel_priority


# TODO: Get rid of all_types. It's not used except for one log message.
# Maybe we could instead publish a map from module ID to its type_map.
class BuildManager:
Expand Down Expand Up @@ -395,20 +408,21 @@ def correct_rel_imp(imp: Union[ImportFrom, ImportAll]) -> str:
for imp in file.imports:
if not imp.is_unreachable:
if isinstance(imp, Import):
pri = PRI_MED if imp.is_top_level else PRI_LOW
pri = import_priority(imp, PRI_MED)
ancestor_pri = import_priority(imp, PRI_LOW)
Copy link
Contributor

@elazarg elazarg Sep 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the story of ancestor_pri. I'm sure I can figure it out but it feels like it deserves a comment - how ancestor_pri is pri with a different default? It's not obvious to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's only used once, for the ancestors. The reason is that if you have "import x.y" you depend on both x.y and on x, but the priority for x should be less than for x.y. There are many other ways we could potentially improve the heuristics here but they're still just heuristics, so I don't really care. The change here is really just that if we decide an import is inside if MYPY it should get an even lesser priority.

for id, _ in imp.ids:
ancestor_parts = id.split(".")[:-1]
ancestors = []
for part in ancestor_parts:
ancestors.append(part)
res.append((PRI_LOW, ".".join(ancestors), imp.line))
res.append((ancestor_pri, ".".join(ancestors), imp.line))
res.append((pri, id, imp.line))
elif isinstance(imp, ImportFrom):
cur_id = correct_rel_imp(imp)
pos = len(res)
all_are_submodules = True
# Also add any imported names that are submodules.
pri = PRI_MED if imp.is_top_level else PRI_LOW
pri = import_priority(imp, PRI_MED)
for name, __ in imp.names:
sub_id = cur_id + '.' + name
if self.is_module(sub_id):
Expand All @@ -421,10 +435,10 @@ def correct_rel_imp(imp: Union[ImportFrom, ImportAll]) -> str:
# cur_id is also a dependency, and we should
# insert it *before* any submodules.
if not all_are_submodules:
pri = PRI_HIGH if imp.is_top_level else PRI_LOW
pri = import_priority(imp, PRI_HIGH)
res.insert(pos, ((pri, cur_id, imp.line)))
elif isinstance(imp, ImportAll):
pri = PRI_HIGH if imp.is_top_level else PRI_LOW
pri = import_priority(imp, PRI_HIGH)
res.append((pri, correct_rel_imp(imp), imp.line))

return res
Expand Down Expand Up @@ -1703,8 +1717,8 @@ def order_ascc(graph: Graph, ascc: AbstractSet[str], pri_max: int = PRI_ALL) ->
each SCC thus found. The recursion is bounded because at each
recursion the spread in priorities is (at least) one less.

In practice there are only a few priority levels (currently
N=3) and in the worst case we just carry out the same algorithm
In practice there are only a few priority levels (less than a
dozen) and in the worst case we just carry out the same algorithm
for finding SCCs N times. Thus the complexity is no worse than
the complexity of the original SCC-finding algorithm -- see
strongly_connected_components() below for a reference.
Expand Down
7 changes: 5 additions & 2 deletions mypy/nodes.py
Original file line number Diff line number Diff line change
Expand Up @@ -285,8 +285,11 @@ def deserialize(cls, data: JsonDict) -> 'MypyFile':

class ImportBase(Statement):
"""Base class for all import statements."""
is_unreachable = False
is_top_level = False # Set by semanal.FirstPass

is_unreachable = False # Set by semanal.FirstPass if inside `if False` etc.
is_top_level = False # Ditto if outside any class or def
is_mypy_only = False # Ditto if inside `if TYPE_CHECKING` or `if MYPY`

# If an import replaces existing definitions, we construct dummy assignment
# statements that assign the imported names to the names in the current scope,
# for type checking purposes. Example:
Expand Down
59 changes: 44 additions & 15 deletions mypy/semanal.py
Original file line number Diff line number Diff line change
Expand Up @@ -84,10 +84,20 @@
T = TypeVar('T')


# Inferred value of an expression.
ALWAYS_TRUE = 0
ALWAYS_FALSE = 1
TRUTH_VALUE_UNKNOWN = 2
# Inferred truth value of an expression.
ALWAYS_TRUE = 1
MYPY_TRUE = 2 # True in mypy, False at runtime
ALWAYS_FALSE = 3
Copy link
Contributor

@elazarg elazarg Sep 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments: uniform prefix will make it slightly easier to search and possibly replace, as will happen eventually when it will become an Enum. Not that TRUTH_VALUE_UNKNOWN is a hard name to search :)

How about giving this type a dedicated alias until then? making it under the namespace class Truth: ... already, before its values gets spread any further.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, but that would just cause more code churn and the benefit would be minuscule.

MYPY_FALSE = 4 # False in mypy, True at runtime
TRUTH_VALUE_UNKNOWN = 5

inverted_truth_mapping = {
ALWAYS_TRUE: ALWAYS_FALSE,
ALWAYS_FALSE: ALWAYS_TRUE,
TRUTH_VALUE_UNKNOWN: TRUTH_VALUE_UNKNOWN,
MYPY_TRUE: MYPY_FALSE,
MYPY_FALSE: MYPY_TRUE,
}
Copy link
Contributor

@elazarg elazarg Sep 22, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you consider this numbering more readable than the range [-2, 2], with the obvious negation? An explicit mapping can still be used just to be sure. TRUTH_UNKNOWN should be zero this way or the other,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The numbers are entirely uninteresting. I intentionally changed them around a bit to show that this is so.


# Map from obsolete name to the current spelling.
obsolete_name_mapping = {
Expand Down Expand Up @@ -3082,12 +3092,16 @@ def infer_reachability_of_if_statement(s: IfStmt,
platform: str) -> None:
for i in range(len(s.expr)):
result = infer_if_condition_value(s.expr[i], pyversion, platform)
if result == ALWAYS_FALSE:
# The condition is always false, so we skip the if/elif body.
if result in (ALWAYS_FALSE, MYPY_FALSE):
# The condition is considered always false, so we skip the if/elif body.
mark_block_unreachable(s.body[i])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure, the intention is that if not TYPE_CHECKING: means "don't type check"? That's fine but not explicitly documented in PEP-484.

In this case

if TYPE_CHECKING:
    import expensive_mod
else:
    # I don't care about runtime annotations
    expensive_mod = CleverPseudoModule('SomeClass')

def a_func(arg: expensive_mod.SomeClass) -> None:
    ...

The else branch will not be checked at all. Again, it's fine, but I think it should be documented.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even though PEP 484 only gives a positive example, it clearly states that TYPE_CHECKING

is considered True during type checking (or other static analysis) but False at runtime.

I don't see any ambiguity there.

elif result == ALWAYS_TRUE:
# This condition is always true, so all of the remaining
# elif/else bodies will never be executed.
elif result in (ALWAYS_TRUE, MYPY_TRUE):
# This condition is considered always true, so all of the remaining
# elif/else bodies should not be checked.
if result == MYPY_TRUE:
# This condition is false at runtime; this will affect
# import priorities.
mark_block_mypy_only(s.body[i])
for body in s.body[i + 1:]:
mark_block_unreachable(body)
if s.else_body:
Expand All @@ -3099,7 +3113,8 @@ def infer_if_condition_value(expr: Expression, pyversion: Tuple[int, int], platf
"""Infer whether if condition is always true/false.

Return ALWAYS_TRUE if always true, ALWAYS_FALSE if always false,
and TRUTH_VALUE_UNKNOWN otherwise.
MYPY_TRUE if true under mypy and false at runtime, MYPY_FALSE if
false under mypy and true at runtime, else TRUTH_VALUE_UNKNOWN.
"""
name = ''
negated = False
Expand All @@ -3123,12 +3138,9 @@ def infer_if_condition_value(expr: Expression, pyversion: Tuple[int, int], platf
elif name == 'PY3':
result = ALWAYS_TRUE if pyversion[0] == 3 else ALWAYS_FALSE
elif name == 'MYPY' or name == 'TYPE_CHECKING':
result = ALWAYS_TRUE
result = MYPY_TRUE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer TYPECHECK_TRUE or TYPE_CHECK_TRUE to be intuitively linked to the TYPE_CHECK flag, especially if you intend to phase MYPY flag out. (Similarly MarkImportsTypechekOnlyVisitor, mark_block_typecheck_only, Import.is_typecheck_only, and related documentation.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we don't really care about other type checkers in this codebase, do we? :-)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No. My concern was greppability, but it is a minor one (it's only the connection between MYPY_TRUE and TYPE_CHECKING that is not immediate; each is perfectly greppable on its own).

if negated:
if result == ALWAYS_TRUE:
result = ALWAYS_FALSE
elif result == ALWAYS_FALSE:
result = ALWAYS_TRUE
result = inverted_truth_mapping[result]
return result


Expand Down Expand Up @@ -3303,6 +3315,23 @@ def visit_import_all(self, node: ImportAll) -> None:
node.is_unreachable = True


def mark_block_mypy_only(block: Block) -> None:
block.accept(MarkImportsMypyOnlyVisitor())


class MarkImportsMypyOnlyVisitor(TraverserVisitor):
"""Visitor that sets is_mypy_only (which affects priority)."""

def visit_import(self, node: Import) -> None:
node.is_mypy_only = True

def visit_import_from(self, node: ImportFrom) -> None:
node.is_mypy_only = True

def visit_import_all(self, node: ImportAll) -> None:
node.is_mypy_only = True


def is_identity_signature(sig: Type) -> bool:
"""Is type a callable of form T -> T (where T is a type variable)?"""
if isinstance(sig, CallableType) and sig.arg_kinds == [ARG_POS]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Completely off topic but such comparisons assumes that arg_kinds is a list, in a way that mypy does not warn about. Feels like a subtle bug waiting to happen).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we have a bug open about that already.

Expand Down
24 changes: 24 additions & 0 deletions test-data/unit/check-modules.test
Original file line number Diff line number Diff line change
Expand Up @@ -1313,3 +1313,27 @@ pass
[file b]
pass
[out]

[case testTypeCheckPrio]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this test case fail without the changes in this PR?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

# cmd: mypy -m part1 part2 part3 part4

[file part1.py]
from part3 import Thing
class FirstThing: pass

[file part2.py]
from part4 import part4_thing as Thing

[file part3.py]
from part2 import Thing
reveal_type(Thing)

[file part4.py]
from typing import TYPE_CHECKING
if TYPE_CHECKING:
from part1 import FirstThing
def part4_thing(a: int) -> str: pass

[builtins fixtures/bool.pyi]
[out]
tmp/part3.py:2: error: Revealed type is 'def (a: builtins.int) -> builtins.str'