Skip to content

Dotty's name encoding scheme does not match Scala 2 in some situations #5936

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
smarter opened this issue Feb 16, 2019 · 10 comments · Fixed by #7601
Closed

Dotty's name encoding scheme does not match Scala 2 in some situations #5936

smarter opened this issue Feb 16, 2019 · 10 comments · Fixed by #7601

Comments

@smarter
Copy link
Member

smarter commented Feb 16, 2019

Given:

class A {
  def a_+(): Unit = {}
  def `+_a`(): Unit = {}
}

Dotty generates methods:

   public void a_$plus()
   public void +_a()

Whereas Scala 2 generates:

  public void a_$plus()
  public void $plus_a()

I was about to fix this but the comment above NameTransformer#encode written by @odersky suggests that this is intentional: https://github.com/lampepfl/dotty/blob/12eef6d8e51dcd8a9a73387659b0f860d7b9ca80/compiler/src/dotty/tools/dotc/util/NameTransformer.scala#L87-L88

@odersky Is being different from Scala 2 actually intended here ? This makes it impossible to call some Scala 2 methods from Dotty (we may want to change the name encoding scheme, but only if we get Scala 2 to change it in lockstep with us).

Marked as a blocker for the release because upgrading to ASM 7 lead to a name-encoding-related issue: #5917 (comment), I know how to fix it but the fix interacts with the current semantics of encode and I'd like to get some clarity here.

@smarter smarter changed the title Dotty's name encoding scheme does not match Scala 2 in some situation Dotty's name encoding scheme does not match Scala 2 in some situations Feb 16, 2019
@smarter
Copy link
Member Author

smarter commented Feb 16, 2019

The commit where the current encoding scheme was implemented: debdafa#diff-e19ba84e185a59cba334b987a1cadc6a

@smarter
Copy link
Member Author

smarter commented Feb 16, 2019

Subsidiary question: why does the new scheme have both avoidIllegalChars and encode, instead of doing all these things in encode ? avoidIllegalChars does its mangling early, which forced @nicolasstucki to introduce a decodeIllegalChars recently to avoid having the name printer output mangled names. Furthermore, at least / is both a character that avoidIllegalChars replaces by u$002F and an operator name that encode replaces by $div, resulting in a different encoding depending on whether an identifier is backquoted or not 😱

class A {
  def a_/(): Unit = {} // encoded as: public void a_$div()
  def `b_/`(): Unit = {} // encoded as: public void b_$u002F()
}

@odersky
Copy link
Contributor

odersky commented Feb 18, 2019

@smarter The reason to do it this way is that we cannot simply translate all operator occurrences, since they might be unintentional, i.e. some internal name might end up containing a nested $eq which does not come from an =. Say I write in Scala-2

def foo$equals

With blind encode/decode this would be mapped to foo=uals, which is definitely not what we want.
avoidIllegalChars could probably be rolled into encode. It existed before the change to semantic names.

@odersky odersky assigned smarter and unassigned odersky Feb 18, 2019
@smarter
Copy link
Member Author

smarter commented Feb 18, 2019

Could we simplify the problem by simply disallowing $ in user-written Scala identifiers ? (We could keep accessible behind a compiler flag for people who really need it for some reason).

@sjrd
Copy link
Member

sjrd commented Feb 18, 2019

Please don't. $ is regularly used in JavaScript identifiers in libraries. Disallowing $ in Scala will worsen the experience of interacting with those libraries.

(also $ is even valid in user-written Java identifiers, IIRC)

@smarter
Copy link
Member Author

smarter commented Feb 18, 2019

Maybe we should use some other character than $ in our encodings then :).

smarter added a commit to dotty-staging/dotty that referenced this issue Feb 18, 2019
This reverts commit 0abd076.

Until scala#5936 is fixed, this is the most conservative option to get:
> dotty-semanticdb/test:compile
to work again.
@nafg
Copy link

nafg commented Feb 18, 2019 via email

@smarter
Copy link
Member Author

smarter commented Feb 18, 2019

Name encoding in Dotty is already much more principled than in Scala 2. We could replace $ by a Unicode character but that would make Java interop impossible in some cases, e.g. an especially common pattern is writing a Java static class that forwards to a Scala object using the MODULE$ field.

@sjrd
Copy link
Member

sjrd commented Feb 19, 2019

We could use instead of $. It's a valid character in Java identifiers.

🤫

@smarter
Copy link
Member Author

smarter commented Feb 20, 2019

I like that actually, since it's not present on non-european keyboards it's harder to misuse, let's abandon the dollar peg and join the Eurozone!

smarter added a commit to dotty-staging/dotty that referenced this issue Jul 21, 2019
Name encoding does not kick in automatically for operator symbols which
appear in a prefix of a name, so the name `<<<_of_A_given` was kept as
is, which is problematic since `<` is not valid in method names on the
JVM (in `i6902.scala`, `<<<<` runs fine even before this change, because
monomorphic givens are implemented as vals, not defs).

Having to remember to call `avoidIllegalChars` isn't great, but fixing
that should be part of a bigger refactoring of name mangling we still
need to do: scala#5936.
smarter added a commit to dotty-staging/dotty that referenced this issue Jul 21, 2019
Name encoding does not kick in automatically for operator symbols which
appear in a prefix of a name, so the name `<<<_of_A_given` was kept as
is, which is problematic since `<` is not valid in method names on the
JVM (in `i6902.scala`, `<<<<` runs fine even before this change, because
monomorphic givens are implemented as objects, not defs).

Having to remember to call `avoidIllegalChars` isn't great, but fixing
that should be part of a bigger refactoring of name mangling we still
need to do: scala#5936.
smarter added a commit to dotty-staging/dotty that referenced this issue Oct 23, 2019
This reverts commit 5a74718 which
itself reverted 0abd076.

Since scala#5936 is not fixed yet, we need to turn off one test in semanticdb
which uses parens in a class name.
smarter added a commit to dotty-staging/dotty that referenced this issue Oct 23, 2019
This reverts commit 5a74718 which
itself reverted 0abd076.

Since scala#5936 is not fixed yet, so we need to turn off one test in
semanticdb which uses parens in a class name. This is problematic, but
upgrading to ASM 7.0 still seems worth it since it allows running scalac
and dotty in the same classpath which should make it easier to test the
scalac tasty reader.
smarter added a commit to dotty-staging/dotty that referenced this issue Oct 23, 2019
This reverts commit 5a74718 which
itself reverted 0abd076.

Since scala#5936 is not fixed yet, so we need to turn off one test in
semanticdb which uses parens in a class name. This is problematic, but
upgrading to ASM 7.0 still seems worth it since it allows running scalac
and dotty in the same classpath which should make it easier to test the
scalac tasty reader.
smarter added a commit to dotty-staging/dotty that referenced this issue Nov 21, 2019
There were several issues with the scheme we used so far:
- It's different from Scala 2, meaning that some Scala 2 methods could
not be called from Dotty and vice-versa (see the added
sbt-dotty/sbt-test/scala2-compat/akka/i3100.scala test for an example)
- It can lead to invalid filenames on Windows (scala#7492)
- The handling of backticks is broken: adding or removing backticks
around a name changes how it's encoded.

To maintain Scala 2 compat we don't have a lot of choices here, we must
use the same scheme, so this commit aligns NameTransformer.scala with
https://github.com/scala/scala/blob/2.13.x/src/library/scala/reflect/NameTransformer.scala

Fixes scala#3100. Fixes scala#5936. Fixes scala#7492.
smarter added a commit to dotty-staging/dotty that referenced this issue Nov 21, 2019
There were several issues with the scheme we used so far:
- It's different from Scala 2, meaning that some Scala 2 methods could
not be called from Dotty and vice-versa (see the added
sbt-dotty/sbt-test/scala2-compat/akka/i3100.scala test for an example)
- It can lead to invalid filenames on Windows (scala#7492)
- The handling of backticks is broken: adding or removing backticks
around a name changes how it's encoded.

To maintain Scala 2 compat we don't have a lot of choices here, we must
use the same scheme, so this commit aligns NameTransformer.scala with
https://github.com/scala/scala/blob/2.13.x/src/library/scala/reflect/NameTransformer.scala

Fixes scala#3100. Fixes scala#5936. Fixes scala#7492.
smarter added a commit to dotty-staging/dotty that referenced this issue Nov 21, 2019
There were several issues with the scheme we used so far:
- It's different from Scala 2, meaning that some Scala 2 methods could
not be called from Dotty and vice-versa (see the added
sbt-dotty/sbt-test/scala2-compat/akka/i3100.scala test for an example)
- It can lead to invalid filenames on Windows (scala#7492)
- The handling of backticks is broken: adding or removing backticks
around a name changes how it's encoded.

To maintain Scala 2 compat we don't have a lot of choices here, we must
use the same scheme, so this commit aligns NameTransformer.scala with
https://github.com/scala/scala/blob/2.13.x/src/library/scala/reflect/NameTransformer.scala

Fixes scala#3100. Fixes scala#5936. Fixes scala#7492.
smarter added a commit to dotty-staging/dotty that referenced this issue Nov 25, 2019
There were several issues with the scheme we used so far:
- It's different from Scala 2, meaning that some Scala 2 methods could
not be called from Dotty and vice-versa (see the added
sbt-dotty/sbt-test/scala2-compat/akka/i3100.scala test for an example)
- It can lead to invalid filenames on Windows (scala#7492)
- The handling of backticks is broken: adding or removing backticks
around a name changes how it's encoded.

To maintain Scala 2 compat we don't have a lot of choices here, we must
use the same scheme, so this commit aligns NameTransformer.scala with
https://github.com/scala/scala/blob/2.13.x/src/library/scala/reflect/NameTransformer.scala

Some examples:

Method name | Old encoding | New encoding
-----------------------------------------
a_+         |      a_$plus |      a_$plus
`a_+`       |          a_+ |      a_$plus
`+_a`       |          +_a |      $plus_a
a_/         |       a_$div |       a_$div
`a_/`       |     a_$u002F |       a_$div

If a Dotty method is called `def a_$plus` we won't misinterpret it as
`a_+` because the method name comes from the tasty tree which stores
unencoded names. On the other hand, names coming from Java / Scala 2 as
well as top-level classnames might be misinterpreted as encoded names if
they contain a user-written $, this is left unspecified.

Fixes scala#3100. Fixes scala#5936. Fixes scala#7492.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants