Skip to content

Reducing open types is potentially unsafe #2771

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sstucki opened this issue Jun 15, 2017 · 6 comments
Closed

Reducing open types is potentially unsafe #2771

sstucki opened this issue Jun 15, 2017 · 6 comments

Comments

@sstucki
Copy link
Contributor

sstucki commented Jun 15, 2017

The following currently causes a stack overflow in dotty (it's rejected by scalac):

trait A { type L[X] }
trait B { type L }
trait C { type M <: A }
trait D { type M >: B }

object Test {
  def test(x: C with D): Unit = {
    def f(y: x.M)(z: y.L[y.L]) = z
    f(new B { type L[F[_]] = F[F] })(1)
  }
}

The overflow is caused by the compiler attempting to evaluate the non-terminating type expression y.L[y.L] which is ill-kinded.

The type definition type L[F[_]] = F[F] } is clearly ill-kinded already and scalac actually reports this.

But the deeper issue is in the line above that definition: the type y.L has two conflicting kinds, it is both a type operator and a proper type. This is because L is declared as a type operator in trait A and as a proper type in trait B, which are the upper and lower bounds of x.M, respectively. So we can construct an instance y of B where y.L is a proper type and then apply it to some other type by up-casting y to x.M. Here's a variant of the example which illustrates that point more clearly:

//...
object Test {
  def test(x: C with D): Unit = {
    def f(y: x.M)(z: y.L[Any]) = z  // <-- y.L used as a type operator
    f(new B { type L = Any })(1)    // <-- y.L defined as a proper type
  }
}

It is rejected with the rather absurd error message

-- [E007] Type Mismatch Error:
 9 |		f(new B { type L = Any })(1)
   |		                          ^
   |		                          found:    Int(1)
   |		                          required: Any[Any]

Changing the type of z to y.L results instead in

-- [E055] Syntax Error:
 8 |    def f(y: x.M)(z: y.L) = z
   |                     ^^^
   |                     missing type parameter for y.L

This is another instance of a know problem about type members with absurd bounds (see also issue #50): evaluation is unsafe in a context containing variables/values (here x) of a type (here C & D) containing type members with absurd bounds (here x.M). In this particular case, the unsafe evaluation happens in an open type (namely y.L[Any]) rather than an open term, and at compile time (rather than at runtime).

Clarification: by "open type" I mean a type that contains free type or term variables (as opposed to a closed type which contains no free variables).

@Blaisorblade
Copy link
Contributor

Here it seems that C&D can statically be shown to be empty—and in particular y.L has a clearly undefined kind so it could be detected as ill-kinded (and we needn't reject C&D). I keep reading this sort of strategy doesn't work, but it seems it should be enough to reject ill-formed kinds. Nevertheless, some points are still confusing (see below).

I see how you can try to cheat around this. But it seems that, at compile time, whenever narrowing produces a statically undefined kind k, you can semi-decide statically that k is undefined. (If this check can diverge, please limit its recursion depth or sth.). I mean, this seems implicit in statically undefined—the question is whether that's defined.

That is, even if you cheat like this, the compiler never sees ill-kinded types, and at runtime call can never be called anyway (with the usual caveats)—it could even be accepted:

  def test(x: C): Unit = {
    //here x.M <: A. If y: x.M, then y.L has a *defined* kind * -> *:
    def f(y: x.M)(z: y.L[Any]): y.L[Any] = z
  }
  def call(w: C with D) = test(w)
}

To show the next problem, take the next example and try inferring the return type for call1, or typechecking the signature for call2. Still, in any context, y.L is either still consistent or narrowed into inconsistency—there's no alternative.

  def test(x: C)(y: x.M)(z: y.L[Any]): y.L[Any] = z
  def call1(w: C with D) = test(w)
  def call2(w: C with D)(y: w.M)(z: y.L[Any]) = test(w)(y)(z)

What do we know about typechecking non-higher-kinded DOT? Kinds don't show up, but inconsistent bounds and open terms do.

Yet, if we declare "inconsistent kinds" a static error, substitution in types can produce invalid types and we must detect them. But DOT seems to make lots of effort to make types valid. Rompf and Amin (OOPSLA 2016) give a narrowing lemma without restrictions (Lemma 3), which would not hold for the example with call1 and call2.

@sstucki
Copy link
Contributor Author

sstucki commented Jun 16, 2017

Here it seems that C&D can statically be shown to be empty—

Yes, indeed. Though it's questionable whether the compiler could (and should) determine this statically.

and in particular y.L has a clearly undefined kind so it could be detected as ill-kinded (and we needn't reject C&D). I keep reading this sort of strategy doesn't work, but it seems it should be enough to reject ill-formed kinds.

No, I don't think so. The kind of y.L isn't undefined, nor is y.L ever ill-kinded or has an ill-formed kind. It is true that y.L has multiple kinds (depending on the type of y), and yes, they are incompatible, but that is due to the path y having multiple, incompatible types (namely A and B), and therein lies the problem. It's not problematic per se for y.L to have multiple, possibly incompatible kinds depending on the type of y, after all y.L is a path-dependent type.

I think what makes this example confusing is that A and B both happen to have a type member called L and those respective type members happen to have different signatures/kinds. And this is of course crucial for the point I was trying to illustrate, namely that, in the presence of higher-kinded types, bad bounds can lead to unsafe behaviour at compile time. But at the core this is still about bad bounds on type members, not about "undefined" kinds. So the question really boils down to how the compiler should handle bad bounds, not kinds, IMO.

@Blaisorblade
Copy link
Contributor

So the question really boils down to how the compiler should handle bad bounds, not kinds, IMO.

As you say, reduction of open types is not (kind-)safe and can get stuck/fail. Yet, compiling some examples needs reduction of open types (see below). So it seems some of the failures arising from bad types must be detected before leading to such bad behavior, doesn't it?

Without reduction of open types, it's not clear one can deal with such an example (in particular, kind-checking y.L[Any], but it seems hard to justify rejecting it (I might be wrong). I've excluded all immediate sources of conflicts (like D), but of course they can be added later.

  trait A { type L[X] }
  trait B { type L }
  trait C { type M <: A }
  def test(x: C): Unit = {
    def f(y: x.M)(z: y.L[Any]): y.L[Any] = z
  }

Regarding "undefined kinds" — I expect some code needs to know the (most specific) kind of y.L in your example, but there is no such thing. Hence "undefined kind". Of course, I have to be informal here, so I'll need some slack.

But at the core this is still about bad bounds on type members, not about "undefined" kinds.

Without kinds, up to now, we haven't seen bad bounds trigger errors during compile-time type reduction, for one reason or the other. Even if you prove String <: Int, will Dotty try treating a String as an Int? It seems not yet, but you can try.

At least as of 6f8bc43, dotc -optimise doesn't break yet on the following code, but if it's not in it should probably be added as a testcase—maybe you can ping whoever maintains the simplifier and they can do better?

class TypeError {
  type Absurd = Any { type X >: String <: Int }
  def typesafeCast(x: Absurd, y: String): Int = y: x.X // this cast is needed to exploit the absurdity.
  def foo(x: Absurd) = typesafeCast(x, "") + 1
}

@sstucki
Copy link
Contributor Author

sstucki commented Jun 19, 2017

Good points @Blaisorblade.

As you say, reduction of open types is not (kind-)safe and can get stuck/fail. Yet, compiling some examples needs reduction of open types (see below). So it seems some of the failures arising from bad types must be detected before leading to such bad behavior, doesn't it?

Yes, this seems like a problem and I don't know how best to address it.

Regarding "undefined kinds" — I expect some code needs to know the (most specific) kind of y.L in your example, but there is no such thing. Hence "undefined kind". Of course, I have to be informal here, so I'll need some slack.

OK, I'll use that slack to interpret your "most specific kind" as "synthesized kind" and then it seems the only viable option here is * -> *. It should of course depend on the synthesized type of y which is itself a type selection, namely x.M. So to allow a type selection on y the type checker has to do some widening and I'm really not familiar enough with the compiler internals to tell you how this is done. My guess, though, is that x.M will be widened to A (rather than Any) in which case y.L is considered a type operator. But I might be wrong. Maybe @smarter or @odersky could shed some light on this?

Even if you prove String <: Int, will Dotty try treating a String as an Int? It seems not yet, but you can try.

At least as of 6f8bc43, dotc -optimise doesn't break yet on the following code, but if it's not in it should probably be added as a testcase—maybe you can ping whoever maintains the simplifier and they can do better?

That is a very good point. Evaluation of open terms is of course not safe either, which should be a problem for optimizations such as constant folding, as you suggest. I haven't tried this but it could come down to a final modifier on typesafeCast or something of that sort. Maybe @DarkDimius has some insights here?

@Blaisorblade
Copy link
Contributor

Blaisorblade commented Jun 19, 2017

OK, I'll use that slack to interpret your "most specific kind" as "synthesized kind" and then it seems the only viable option here is * -> *. It should of course depend on the synthesized type of y which is itself a type selection, namely x.M. So to allow a type selection on y the type checker has to do some widening and I'm really not familiar enough with the compiler internals to tell you how this is done. My guess, though, is that x.M will be widened to A (rather than Any) in which case y.L is considered a type operator. But I might be wrong. Maybe @smarter or @odersky could shed some light on this?

I know even less on Dotty, but now I wonder if widening x.M to A and ignoring the lower bound is a good idea. It seems it's more precise to use B <: ? <: A.

BTW, I suspect research on validated metaprograms on DOT (or subsets) might be worthwhile to investigate such issues (though I'm not sure how popular it would be)—there's more to a compiler than a sound type system, though that helps. I've been thinking about this for a while, though of course it's hard.

I use "validated" to allow for a range of options (including forms of systematic testing), but a verified constant folder would not sound stupid—bonus points with some normalization algorithms.
EDIT: I know right now most of these are 'pie-in-the-sky' ideas—I know too little yet on what's feasible.

@sstucki
Copy link
Contributor Author

sstucki commented Jul 18, 2017

This is likely related, though the error is different: #2887.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants