-
Notifications
You must be signed in to change notification settings - Fork 1.1k
GADTs might need a different implementation scheme #5254
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
One consequence of the new scheme is that it is not applicable to opaque types, since it relies on local copies of local (i.e. term-owned) symbols. So #4028 would be invalidated and we have to go back to square one and develop a different scheme, and potentially a different spec for opaque types. |
The new scheme is similar to the old in that GADTness does not propagate: The types referenced in scrutinee are the only ones that can be restricted further. We could widen that in two directions:
For the moment, we are most conservative, which is appropriate because the problem already looks hard enough as it is. With convincing use cases we might be forced to generalize. |
The root of the problem is: what do GADT constraints mean? Assume the following simple GADT program: trait C[X](elem: X)
class CInt(elem: Int) extends C(elem)
class CBoolean(elem: Boolean) extends C(elem)
def f[T](x: C[T]): T = x match {
case CInt(y) => y
case CBoolean(y) => y
} This program needs GADT reasoning for typechecking. For instance, in the first case of Now, the question: Knowing that case CInt(y) => x.elem * y The justification why this should work is as follows:
Note that this reasoning uses the subsumption rule of subtyping. Subsumption is part of DOT but it's actually not part of the Scala spec, nor does the compiler implement it. So we conclude that the justification via subsumption gives some intuition but no hard guidance what to do. If we want to implement some approximation to subsumption we have to do it explicitly. It would mean lots of special purpose rules in the spec and lots of special cases in the compiler. In light of the problems outlined in this issue it seems best to leave things where they are now: GADT constraints generate new subtyping relationships but nothing else. This solves points (2) and (3) outlined in the issue: scope creep and cache invalidation. It does mean that some code that intuitively "should work" doesn't. For instance, case CInt(y) => x.elem * y would give an error "
Open question: Can we design hints that suggest such upcasts? |
Are there any known counterexamples to subsumption in Scala 2/3? |
For instance:
With subsumption, this could return either 1 or 2. Without, it's a type error. The example shows that adding full subsumption is not even possible in full Scala since it would make type checking non-deterministic. |
OK, even in the absence of subsumption, couldn't we have an "adapt to |
It seems to me this example is more about an issue with algorithmic subtyping and overloading resolution. I think current Dotty (with algorithmic subtyping) is correct to produce an error here for ambiguities in overloading resolution. It is mentioned in the slides of FoS (search for Theoretically, it is hopeless, as even algorithmic subtyping for |
We could try but I find it unappealing. Views don't propose, so I fear this will be fragile and it could also be very expensive. Adding hints to error messages seems like an easier alternative. |
Is it really so bad? If type checking fails we retry with an ascription inserted ... this is only going to be an issue in opaque contexts and the RHSs of GADT pattern matches. |
To give a concrete example: If we do the widening as a (compiler-generated) implicit conversion, then we could support members of the widened type, but we would fail to infer selections added by decorators or extension methods. Or we compose both and risk a combinatorial explosion. Either way, it's unappealing. I believe it's actually better to say "these are subtypes, i.e. you can coerce one to the other by a simple type ascription, but that's it". |
I'm not sure I see that. If we're typing the body of the companion of an opaque type, or the RHS of a GADT pattern match then we know both the abstract type and the widened type. If type checking a selection fails for a term of the abstract type in one of those scenarios then we can insert the type ascription you're suggesting that users do by hand and then retypecheck. Having inserted the acription we'll pick up both direct members and members accessible via a view using the normal mechanisms. I don't see where the risk of a combinatorial explosion comes in, other than in artificial cases which I don't think would being sufficiently likely programming patterns to be worrisome. (The unlikely cases I have in mind being: opaque types nested in the companion of an opaque type; GADT pattern matches nested in the RHS of GADT pattern matches; and combinations of the two. In the unlikely event of someone doing that and it causing a compile time performance issue, manually inserting the ascription would fix it). |
@milessabin The two biggest violations of subsumptions are A full fix for the latter requires checking all type members of everything in scope, which is clearly too expensive — (expert) users know an assertion is required instead, and this could be specified as required. This has precedents in dependent type theories — if you have a proof of equality in scope, you usually need to apply it explicitly (e.g. in Coq/Agda/Idris). However, if we encoded opaque types as type members (as in the original blog post, tho we don't), opaque types would reduce to type aliases |
I don't see why that's the case in the scenarios in question (opaque type companion bodies and GADT match RHSs) ... can you give me an example? |
Not really, I was talking about a fix for type members. |
@milessabin You might be right - it could be cheap enough to insert a view, at least as far as opaque types are concerned. The idea is that if we type a value of an opaque type in its companion object with a selection prototype, we can always widen to the right hand side since the original type is useless for selections anyway. That way, we avoid the problem of trying first-this-then-that which leads to the combinatorial explosion of possibilities. (I mean each choice just doubles the possibilities, so does not look so bad, but the combination of several such choices is very bad. That's why I am reluctant introducing new ones - we have been bitten too often). [EDIT: No scratch that, it does not work if the selected name is added as an extension method to the original type. We really need to try both] |
Working with opaque types, which are based on our existing GADT machinery, I see more and more problems with our existing approach:
This one is known:
But these have not been voiced before:
Knowledge about GADT bounds has spread far into the core algorithms of the compiler. It's not just subtyping, by now we consult the GADT bounds also
It's not clear at all whether this is enough, or whether we need to consult GADT maps in more places.
GADT maps pose enormous problems for caching. Every update to a GADT map invalidates all caches that contain values whose computation relied on the previous value (or relied on the fact that there was no GADT entry yet). But there is no way to trace these caches. The alternative of not caching any info that could be affected by GADT bounds is unappealing as well, because it looks very hard to figure out what might be affected later on.
So far, we have not seen much fail empirically, because GADTs were not really fully implemented so very little code relied on them. But if we want that to change, we have to take these problems seriously.
An alternative approach
Here's a different implementation scheme which avoids the cited problems (whether it will create new ones remains to be seen).
When type checking a case in a match, determine all type parameters with term owners that are referenced from the widened scrutinee type. Call these the GADT parameters. (This step is the same as in the current scheme)
Determine the support set of the GADT parameters. This is the smallest set S of term-owned symbols such that:
Every GADT parameter in S is mapped to a type variable whose
TypeParamRef
origin has the bounds of the original symbol mapped recursively by M.Every other symbol s in S is mapped to a fresh symbol of the same kind as s, whose info is the original info of s mapped recursively by M.
Contexts have a new field
gadtMap
that contains the current map.A match case gets typechecked in a new context, where a new instance of M as computed in the last step is added to the
gadtMap
field and all TypeVars in M s image are added to the constraint.When typechecking an identifier in
typedIdent
, map the computed type (always aTermRef
orTypeRef
) using the context's GADT map.[Aside: The mapping would be avoided if we could replace old scopes with new scopes in the context. But this means duplicating potentially a stack of context objects, because GADT parameters might be bound in outer contexts. This looks tricky.]
Typecheck pattern against scrutinee as usual. This might further constrain images of GADT parameters in the constraint. After that is done,
conforms to the expected type of the match after mapping the latter with M. If the type of the RHS does not conform to expected type without applying M, insert a cast.
The text was updated successfully, but these errors were encountered: