Dotty's name encoding scheme does not match Scala 2 in some situations #5936

smarter · 2019-02-16T22:13:36Z

Given:

class A {
  def a_+(): Unit = {}
  def `+_a`(): Unit = {}
}

Dotty generates methods:

   public void a_$plus()
   public void +_a()

Whereas Scala 2 generates:

  public void a_$plus()
  public void $plus_a()

I was about to fix this but the comment above NameTransformer#encode written by @odersky suggests that this is intentional: https://github.com/lampepfl/dotty/blob/12eef6d8e51dcd8a9a73387659b0f860d7b9ca80/compiler/src/dotty/tools/dotc/util/NameTransformer.scala#L87-L88

@odersky Is being different from Scala 2 actually intended here ? This makes it impossible to call some Scala 2 methods from Dotty (we may want to change the name encoding scheme, but only if we get Scala 2 to change it in lockstep with us).

Marked as a blocker for the release because upgrading to ASM 7 lead to a name-encoding-related issue: #5917 (comment), I know how to fix it but the fix interacts with the current semantics of encode and I'd like to get some clarity here.

The text was updated successfully, but these errors were encountered:

smarter · 2019-02-16T22:15:04Z

The commit where the current encoding scheme was implemented: debdafa#diff-e19ba84e185a59cba334b987a1cadc6a

smarter · 2019-02-16T22:32:38Z

Subsidiary question: why does the new scheme have both avoidIllegalChars and encode, instead of doing all these things in encode ? avoidIllegalChars does its mangling early, which forced @nicolasstucki to introduce a decodeIllegalChars recently to avoid having the name printer output mangled names. Furthermore, at least / is both a character that avoidIllegalChars replaces by u$002F and an operator name that encode replaces by $div, resulting in a different encoding depending on whether an identifier is backquoted or not 😱

class A {
  def a_/(): Unit = {} // encoded as: public void a_$div()
  def `b_/`(): Unit = {} // encoded as: public void b_$u002F()
}

odersky · 2019-02-18T10:14:22Z

@smarter The reason to do it this way is that we cannot simply translate all operator occurrences, since they might be unintentional, i.e. some internal name might end up containing a nested $eq which does not come from an =. Say I write in Scala-2

def foo$equals

With blind encode/decode this would be mapped to foo=uals, which is definitely not what we want.
avoidIllegalChars could probably be rolled into encode. It existed before the change to semantic names.

smarter · 2019-02-18T10:20:47Z

Could we simplify the problem by simply disallowing $ in user-written Scala identifiers ? (We could keep accessible behind a compiler flag for people who really need it for some reason).

sjrd · 2019-02-18T10:24:56Z

Please don't. $ is regularly used in JavaScript identifiers in libraries. Disallowing $ in Scala will worsen the experience of interacting with those libraries.

(also $ is even valid in user-written Java identifiers, IIRC)

smarter · 2019-02-18T10:27:08Z

Maybe we should use some other character than $ in our encodings then :).

This reverts commit 0abd076. Until scala#5936 is fixed, this is the most conservative option to get: > dotty-semanticdb/test:compile to work again.

nafg · 2019-02-18T19:49:33Z

paulp had a lot to say on this subject... I believe he said that a lot of bugs in scalac could have been prevented by having a more principled set of encoding rules. I think there are a lot of unicode characters available that can be used, and wouldn't be used by users (can be outlawed). And using more characters means the encodings can be more robust since you don't need to overload meaning.

…

On Mon, Feb 18, 2019 at 5:27 AM Guillaume Martres ***@***.***> wrote: Maybe we should use some other character than $ in our encodings then :). — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#5936 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAGAUAWYYpwcES-aPcjhQMvFGvJiaZQ1ks5vOn_9gaJpZM4a_Og4> .

smarter · 2019-02-18T20:02:11Z

Name encoding in Dotty is already much more principled than in Scala 2. We could replace $ by a Unicode character but that would make Java interop impossible in some cases, e.g. an especially common pattern is writing a Java static class that forwards to a Scala object using the MODULE$ field.

sjrd · 2019-02-19T08:07:39Z

We could use € instead of $. It's a valid character in Java identifiers.

🤫

smarter · 2019-02-20T12:57:21Z

I like that actually, since it's not present on non-european keyboards it's harder to misuse, let's abandon the dollar peg and join the Eurozone!

Name encoding does not kick in automatically for operator symbols which appear in a prefix of a name, so the name `<<<_of_A_given` was kept as is, which is problematic since `<` is not valid in method names on the JVM (in `i6902.scala`, `<<<<` runs fine even before this change, because monomorphic givens are implemented as vals, not defs). Having to remember to call `avoidIllegalChars` isn't great, but fixing that should be part of a bigger refactoring of name mangling we still need to do: scala#5936.

Name encoding does not kick in automatically for operator symbols which appear in a prefix of a name, so the name `<<<_of_A_given` was kept as is, which is problematic since `<` is not valid in method names on the JVM (in `i6902.scala`, `<<<<` runs fine even before this change, because monomorphic givens are implemented as objects, not defs). Having to remember to call `avoidIllegalChars` isn't great, but fixing that should be part of a bigger refactoring of name mangling we still need to do: scala#5936.

This reverts commit 5a74718 which itself reverted 0abd076. Since scala#5936 is not fixed yet, we need to turn off one test in semanticdb which uses parens in a class name.

This reverts commit 5a74718 which itself reverted 0abd076. Since scala#5936 is not fixed yet, so we need to turn off one test in semanticdb which uses parens in a class name. This is problematic, but upgrading to ASM 7.0 still seems worth it since it allows running scalac and dotty in the same classpath which should make it easier to test the scalac tasty reader.

There were several issues with the scheme we used so far: - It's different from Scala 2, meaning that some Scala 2 methods could not be called from Dotty and vice-versa (see the added sbt-dotty/sbt-test/scala2-compat/akka/i3100.scala test for an example) - It can lead to invalid filenames on Windows (scala#7492) - The handling of backticks is broken: adding or removing backticks around a name changes how it's encoded. To maintain Scala 2 compat we don't have a lot of choices here, we must use the same scheme, so this commit aligns NameTransformer.scala with https://github.com/scala/scala/blob/2.13.x/src/library/scala/reflect/NameTransformer.scala Fixes scala#3100. Fixes scala#5936. Fixes scala#7492.

There were several issues with the scheme we used so far: - It's different from Scala 2, meaning that some Scala 2 methods could not be called from Dotty and vice-versa (see the added sbt-dotty/sbt-test/scala2-compat/akka/i3100.scala test for an example) - It can lead to invalid filenames on Windows (scala#7492) - The handling of backticks is broken: adding or removing backticks around a name changes how it's encoded. To maintain Scala 2 compat we don't have a lot of choices here, we must use the same scheme, so this commit aligns NameTransformer.scala with https://github.com/scala/scala/blob/2.13.x/src/library/scala/reflect/NameTransformer.scala Some examples: Method name | Old encoding | New encoding ----------------------------------------- a_+ | a_$plus | a_$plus `a_+` | a_+ | a_$plus `+_a` | +_a | $plus_a a_/ | a_$div | a_$div `a_/` | a_$u002F | a_$div If a Dotty method is called `def a_$plus` we won't misinterpret it as `a_+` because the method name comes from the tasty tree which stores unencoded names. On the other hand, names coming from Java / Scala 2 as well as top-level classnames might be misinterpreted as encoded names if they contain a user-written $, this is left unspecified. Fixes scala#3100. Fixes scala#5936. Fixes scala#7492.

smarter added itype:bug compat:scala2 prio:blocker labels Feb 16, 2019

smarter changed the title ~~Dotty's name encoding scheme does not match Scala 2 in some situation~~ Dotty's name encoding scheme does not match Scala 2 in some situations Feb 16, 2019

smarter assigned odersky Feb 16, 2019

odersky assigned smarter and unassigned odersky Feb 18, 2019

smarter added a commit to dotty-staging/dotty that referenced this issue Feb 18, 2019

Revert "Upgrade to ASM 7.0"

5a74718

This reverts commit 0abd076. Until scala#5936 is fixed, this is the most conservative option to get: > dotty-semanticdb/test:compile to work again.

SethTisue mentioned this issue Feb 19, 2019

Name mangling has outstripped the abilities of lonesome '$' scala/bug#2806

Open

smarter mentioned this issue May 17, 2019

Add infrastructure to run the JUnit tests of upstream Scala.js. #6365

Merged

smarter mentioned this issue Jul 21, 2019

Fix #6902: Avoid illegal characters for generated given names #6903

Merged

smarter mentioned this issue Oct 23, 2019

Revert "Revert "Upgrade to ASM 7.0"" #7448

Merged

smarter mentioned this issue Nov 4, 2019

Invalid file name for class files on Windows #7492

Closed

smarter mentioned this issue Nov 21, 2019

Revert to the Scala 2 name encoding scheme #7601

Merged

smarter closed this as completed in #7601 Nov 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dotty's name encoding scheme does not match Scala 2 in some situations #5936

Dotty's name encoding scheme does not match Scala 2 in some situations #5936

smarter commented Feb 16, 2019

smarter commented Feb 16, 2019

smarter commented Feb 16, 2019

odersky commented Feb 18, 2019 •

edited

Loading

smarter commented Feb 18, 2019

sjrd commented Feb 18, 2019 •

edited

Loading

smarter commented Feb 18, 2019

nafg commented Feb 18, 2019 via email

smarter commented Feb 18, 2019 •

edited

Loading

sjrd commented Feb 19, 2019

smarter commented Feb 20, 2019

Dotty's name encoding scheme does not match Scala 2 in some situations #5936

Dotty's name encoding scheme does not match Scala 2 in some situations #5936

Comments

smarter commented Feb 16, 2019

smarter commented Feb 16, 2019

smarter commented Feb 16, 2019

odersky commented Feb 18, 2019 • edited Loading

smarter commented Feb 18, 2019

sjrd commented Feb 18, 2019 • edited Loading

smarter commented Feb 18, 2019

nafg commented Feb 18, 2019 via email

smarter commented Feb 18, 2019 • edited Loading

sjrd commented Feb 19, 2019

smarter commented Feb 20, 2019

odersky commented Feb 18, 2019 •

edited

Loading

sjrd commented Feb 18, 2019 •

edited

Loading

smarter commented Feb 18, 2019 •

edited

Loading