Skip to content

Startup regression due to HotSpot JVM's population of vtables with default methods #243

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
retronym opened this issue Oct 12, 2016 · 14 comments

Comments

@retronym
Copy link
Member

retronym commented Oct 12, 2016

Running scalac -version in 2.12.0-SNAPSHOT takes 0.9s, a significant regression from 2.11.8, which took 0.2s.

% time scalac -debug -version

/Library/Java/JavaVirtualMachines/jbsdk8u112b403_osx_x64/Contents/Home/bin/java
-Xmx256M
-Xms32M
-Xbootclasspath/a:/Users/jason/scala/2.11.8/lib/akka-actor_2.11-2.3.10.jar:/Users/jason/scala/2.11.8/lib/config-1.2.1.jar:/Users/jason/scala/2.11.8/lib/jline-2.12.1.jar:/Users/jason/scala/2.11.8/lib/scala-actors-2.11.0.jar:/Users/jason/scala/2.11.8/lib/scala-actors-migration_2.11-1.1.0.jar:/Users/jason/scala/2.11.8/lib/scala-compiler.jar:/Users/jason/scala/2.11.8/lib/scala-continuations-library_2.11-1.0.2.jar:/Users/jason/scala/2.11.8/lib/scala-continuations-plugin_2.11.8-1.0.2.jar:/Users/jason/scala/2.11.8/lib/scala-library.jar:/Users/jason/scala/2.11.8/lib/scala-parser-combinators_2.11-1.0.4.jar:/Users/jason/scala/2.11.8/lib/scala-reflect.jar:/Users/jason/scala/2.11.8/lib/scala-swing_2.11-1.0.2.jar:/Users/jason/scala/2.11.8/lib/scala-xml_2.11-1.0.4.jar:/Users/jason/scala/2.11.8/lib/scalap-2.11.8.jar
-classpath
""
-Dscala.home=/Users/jason/scala/2.11.8
-Dscala.usejavacp=true
-Denv.emacs=
scala.tools.nsc.Main
-version

Scala compiler version 2.11.8 -- Copyright 2002-2016, LAMP/EPFL

real    0m0.219s
user    0m0.213s
sys 0m0.047s

% time qscalac -debug -version

/Library/Java/JavaVirtualMachines/jbsdk8u112b403_osx_x64/Contents/Home/bin/java
-Xmx256M
-Xms32M
-Xbootclasspath/a:/Users/jason/code/scala/build/quick/classes/compiler:/Users/jason/code/scala/build/quick/classes/library:/Users/jason/code/scala/build/quick/classes/reflect:/Users/jason/.ivy2/cache/org.apache.ant/ant/jars/ant-1.9.4.jar:/Users/jason/.ivy2/cache/org.apache.ant/ant-launcher/jars/ant-launcher-1.9.4.jar:/Users/jason/.ivy2/cache/org.scala-lang.modules/scala-asm/bundles/scala-asm-5.1.0-scala-1.jar:/Users/jason/.ivy2/cache/org.scala-lang.modules/scala-xml_2.12.0-RC1/bundles/scala-xml_2.12.0-RC1-1.0.5.jar:/Users/jason/.ivy2/cache/jline/jline/jars/jline-2.14.1.jar
-classpath
""
-Dscala.home=/Users/jason/code/scala/build/quick
-Dscala.usejavacp=true
-Denv.emacs=
scala.tools.nsc.Main
-version

Scala compiler version 2.12.0-20161011-181858-1e81a09 -- Copyright 2002-2016, LAMP/EPFL and Lightbend, Inc.

real    0m0.897s
user    0m0.923s
sys 0m0.088s

Profiling this suggests that the dominant factor is resolution of default methods, which HotSpot JVM performs eagerly during classloading. I believe that collections present the biggest challenge, which suggests that this regression will be felt by all Scala programs, not just the compiler.

This might be inherent complexity required to implement method resolution, in particular:

https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-5.html#jvms-5.4.3.3

A maximally-specific superinterface method of a class or interface C for a particular method name and descriptor is any method for which all of the following are true:

The method is declared in a superinterface (direct or indirect) of C.

The method is declared with the specified name and descriptor.

The method has neither its ACC_PRIVATE flag nor its ACC_STATIC flag set.

Where the method is declared in interface I, there exists no other maximally-specific superinterface method of C with the specified name and descriptor that is declared in a subinterface of I.

For completeness, here's the corresponding spec in the Java language:
https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-15.12.2.5

The implementation in OpenJDK is in defaultmethods.cpp.

I have ported this to Scala to try to understand the algorithm and potential optimizations (either in our code generation or in OpenJDK itself).

I have also synthesized a pure Java test case that demonstrates exponential performance, both in javac and java.

@retronym
Copy link
Member Author

/cc @DarkDimius @smarter

@retronym
Copy link
Member Author

retronym commented Oct 12, 2016

A workaround is to add scala-library.jar to the Class Data Sharing dump.

  • Create a copy of the JDK
  • Edit ./jre/lib/classlist
    • Remove the checksum comment from the end of the file
    • Add the output of jar -tf $SCALA_HOME/lib/scala-library.jar | grep .class | sed 's/.class$//' to the file
  • (re)dump the CDS archive: JAVA_HOME=/custom/jdk $SCALA_HOME/bin/scala -J-Xshare:dump -J-XX:SharedReadWriteSize=33554432 -J-Xshare:dump
  • Subsequent invocations of JAVA_HOME=/custom/jdk $SCALA_HOME/bin/scala -J-Xshare:on will run faster.

This is pretty laborious.

There are JVM command line options that could be used to tweak the process:

  • -XX:ExtraSharedClassListFile= could be used rather than editing classlist directly.
  • -XX:SharedArchiveFile= would let us write the archive somewhere other than $JAVA_HOME/jre/server/classes.jsa, perhaps letting us do this without a copy of the JDK. But this flag is disabled in the release builds of the JDK.

@adriaanm
Copy link
Contributor

adriaanm commented Oct 13, 2016

I have ported this to Scala to try to understand the algorithm and potential optimizations (either in our code generation or in OpenJDK itself).

❤️

@lrytz
Copy link
Member

lrytz commented Oct 13, 2016

I reproduced this with your jmh compiler benchmark (with source=@/.../version containing -version). The mixin forwarders (added in scala/scala#5429) make a difference:

  • 2.11.8: 703.979 ± 41.798
  • 2.12.x without forwarders: 1845.874 ± 40.393
  • 2.12.x with forwarders: 1140.418 ± 42.881

I also spent some time yesterday reading through the JVM's classfile parser, but i'm not through yet. So the following might be incorrect.

IUC, a "miranda method" is a method that a class inherits from an interface without having an override in the class or any of its superclasses. This includes non-overridden default methods. Miranda methods get a slot in the class's vtable. generate_default_methods will add a method stub to every miranda slot, the stub is called "overpass method". The overpass method

  • throws an AbstractMethodError if no implementation can be found (there's no default method implementation)
  • invokes the correct default method if there's exactly one maximally specific
  • throws an IncompatibleClassChangeError if there are multiple competing default methods

IUC, according to your observations, the performance bottleneck is finding the maximally specific method. If Scalac emits all mixin forwarders this does not need to be done: there are no miranda methods added for default methods, because there's always a concrete overriding method in the class (the forwarder).

Let me know what I got wrong :)

@retronym
Copy link
Member Author

Interesting. I didn't see a difference with/without forwarders, but maybe I was measuring the wrong revisions. I was only using time ... from the command line to measure, which wasn't as rigorous as using JMH.

As I see it, the presence of our mixin forwarders (or generally, a matching concrete method in the superclass chain) won't prevent the traversal of all paths through the supertype lattice, but it will prevent a later iteration through all the candidate methods.

I can't see any reason not to short circuit the hierarchy walk when a matching class method. I'm trying this out in OpenJDK.

In summary, I think our decision to use forwarders will not only have a benefit for classfie parsing with current JDKs, but might get closer to zero-cost if we can patch the JDK.

@retronym
Copy link
Member Author

retronym commented Oct 13, 2016

UPDATE: OpenJDK accepted this bug as JDK-8167995 and fixed it for JDK 9


I'm reporting another bug with the diagnostic code in OpenJDK. Noting it here for posterity:

Enabling diagnostic logging of default method resolution in the triggers a VM crash during a buffer resize. This resize is triggered by using a method with a large descriptor.

A code comment in the implementation warns:

  // This resource mark is the bound for all memory allocation that takes
  // place during default method processing.  After this goes out of scope,
  // all (Resource) objects' memory will be reclaimed.  Be careful if adding an
  // embedded resource mark under here as that memory can't be used outside
  // whatever scope it's in.
  ResourceMark rm(THREAD);

But down below, print_slot and print_method define a nested ResourceMark.

package test.test.test.test.test.test.test.test.test.test.test.test.test.test.test.test;

public class Test {
  interface I {
    default void f(Test a, Test b, Test c, Test e, Test f, Test g, Test h, Test i) {}
  }
  static class C implements I {}
  public static void main(String[] args) {
    new C();
  }
}
% export JAVA_HOME=/code/openjdk9dev/build/macosx-x86_64-normal-server-fastdebug/jdk
% $JAVA_HOME/bin/javac -d /tmp /tmp/Test.java
% $JAVA_HOME/bin/java -cp /tmp -Xlog:defaultmethods=debug test.test.test.test.test.test.test.test.test.test.test.test.test.test.test.test.Test
...
(/Users/jason/code/openjdk9dev/hotspot/src/share/vm/utilities/ostream.cpp:340), pid=54634, tid=7427
#  assert(rm == __null || Thread::current()->current_resource_mark() == rm) failed: StringStream is re-allocated with a different ResourceMark. Current: 0x00007000053a76a0 original: 0x00007000053a78e8
diff -r fec31089c2ef src/share/vm/classfile/defaultMethods.cpp
--- a/src/share/vm/classfile/defaultMethods.cpp Thu Oct 06 18:05:53 2016 -0700
+++ b/src/share/vm/classfile/defaultMethods.cpp Fri Oct 14 10:09:56 2016 +1100
@@ -77,12 +77,10 @@
 };

 static void print_slot(outputStream* str, Symbol* name, Symbol* signature) {
-  ResourceMark rm;
   str->print("%s%s", name->as_C_string(), signature->as_C_string());
 }

 static void print_method(outputStream* str, Method* mo, bool with_class=true) {
-  ResourceMark rm;
   if (with_class) {
     str->print("%s.", mo->klass_name()->as_C_string());
   }

@retronym
Copy link
Member Author

retronym commented Oct 14, 2016

I've reproduced your results showing that performance for 2.11.8 > 2.12 with forwarders > 2.12 without forwarders. Previously, I'd failed to bootstrap to see the effect addition of forwarders.

Adding forwarders reduces the number of "Looking for default methods for" log entries from 27581 to 15687 running scalac -version -J-Xlog:defaultmethods=debug.

Here are the remaining entries: https://gist.github.com/retronym/d223044d719a13350da7715eb5b5a14b

Looking at these in context, I see that interfaces are also subject to default method processing. For instance, here's the log for GenIterableLike

@retronym
Copy link
Member Author

Looking at XCode profiler again, this time using the proposed 2.12.0-RC2 ("with forwarders"), scalac -version spends 32% of the time in DefaultMethods::generate_default_methods and its callees.

I'm a newbie with the tool, and couldn't find a direct way to measure this, other than by "pruning" that method from the trace and seeing that the reported sample time went from 630ms down to 446ms.

I don't see any way forward to eat into that number (other than the extreme measure of Class Data Sharing outlined above). Scala collections are probably a worst case scenario (lots of methods and lots of hierarchy).

@lrytz
Copy link
Member

lrytz commented Oct 14, 2016

Very interesting observation that defaults are also resolved for interfaces, and that it's still such a significant chunk of startup time.

@retronym
Copy link
Member Author

There has been some progress on this in OpenJDK under https://bugs.openjdk.java.net/browse/JDK-8167431, although it appears that so far the only fix is to the performance of javac, not to default method resolution in the JVM itself.

@smarter
Copy link
Member

smarter commented Nov 14, 2016

it appears that so far the only fix is to the performance of javac, not to default method resolution in the JVM itself.

And the bug was closed, so the original issue you reported may have been forgotten.

@retronym
Copy link
Member Author

I've emailed contact from Oracle to clarify.

@SethTisue
Copy link
Member

fwiw, for time scalac -version I'm seeing 2.11 0.3s, 2.12 1.0s, 2.13 0.7s

@SethTisue
Copy link
Member

and time dotc -version 1.1s

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants