-
Notifications
You must be signed in to change notification settings - Fork 14
Startup regression due to HotSpot JVM's population of vtables with default methods #243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
/cc @DarkDimius @smarter |
A workaround is to add scala-library.jar to the Class Data Sharing dump.
This is pretty laborious. There are JVM command line options that could be used to tweak the process:
|
❤️ |
I reproduced this with your jmh compiler benchmark (with
I also spent some time yesterday reading through the JVM's classfile parser, but i'm not through yet. So the following might be incorrect. IUC, a "miranda method" is a method that a class inherits from an interface without having an override in the class or any of its superclasses. This includes non-overridden default methods. Miranda methods get a slot in the class's vtable.
IUC, according to your observations, the performance bottleneck is finding the maximally specific method. If Scalac emits all mixin forwarders this does not need to be done: there are no miranda methods added for default methods, because there's always a concrete overriding method in the class (the forwarder). Let me know what I got wrong :) |
Interesting. I didn't see a difference with/without forwarders, but maybe I was measuring the wrong revisions. I was only using As I see it, the presence of our mixin forwarders (or generally, a matching concrete method in the superclass chain) won't prevent the traversal of all paths through the supertype lattice, but it will prevent a later iteration through all the candidate methods. I can't see any reason not to short circuit the hierarchy walk when a matching class method. I'm trying this out in OpenJDK. In summary, I think our decision to use forwarders will not only have a benefit for classfie parsing with current JDKs, but might get closer to zero-cost if we can patch the JDK. |
UPDATE: OpenJDK accepted this bug as JDK-8167995 and fixed it for JDK 9 I'm reporting another bug with the diagnostic code in OpenJDK. Noting it here for posterity: Enabling diagnostic logging of default method resolution in the triggers a VM crash during a buffer resize. This resize is triggered by using a method with a large descriptor. A code comment in the implementation warns: // This resource mark is the bound for all memory allocation that takes
// place during default method processing. After this goes out of scope,
// all (Resource) objects' memory will be reclaimed. Be careful if adding an
// embedded resource mark under here as that memory can't be used outside
// whatever scope it's in.
ResourceMark rm(THREAD); But down below, package test.test.test.test.test.test.test.test.test.test.test.test.test.test.test.test;
public class Test {
interface I {
default void f(Test a, Test b, Test c, Test e, Test f, Test g, Test h, Test i) {}
}
static class C implements I {}
public static void main(String[] args) {
new C();
}
}
diff -r fec31089c2ef src/share/vm/classfile/defaultMethods.cpp
--- a/src/share/vm/classfile/defaultMethods.cpp Thu Oct 06 18:05:53 2016 -0700
+++ b/src/share/vm/classfile/defaultMethods.cpp Fri Oct 14 10:09:56 2016 +1100
@@ -77,12 +77,10 @@
};
static void print_slot(outputStream* str, Symbol* name, Symbol* signature) {
- ResourceMark rm;
str->print("%s%s", name->as_C_string(), signature->as_C_string());
}
static void print_method(outputStream* str, Method* mo, bool with_class=true) {
- ResourceMark rm;
if (with_class) {
str->print("%s.", mo->klass_name()->as_C_string());
} |
I've reproduced your results showing that performance for 2.11.8 > 2.12 with forwarders > 2.12 without forwarders. Previously, I'd failed to bootstrap to see the effect addition of forwarders. Adding forwarders reduces the number of "Looking for default methods for" log entries from 27581 to 15687 running Here are the remaining entries: https://gist.github.com/retronym/d223044d719a13350da7715eb5b5a14b Looking at these in context, I see that interfaces are also subject to default method processing. For instance, here's the log for |
Looking at XCode profiler again, this time using the proposed 2.12.0-RC2 ("with forwarders"), I'm a newbie with the tool, and couldn't find a direct way to measure this, other than by "pruning" that method from the trace and seeing that the reported sample time went from 630ms down to 446ms. I don't see any way forward to eat into that number (other than the extreme measure of Class Data Sharing outlined above). Scala collections are probably a worst case scenario (lots of methods and lots of hierarchy). |
Very interesting observation that defaults are also resolved for interfaces, and that it's still such a significant chunk of startup time. |
There has been some progress on this in OpenJDK under https://bugs.openjdk.java.net/browse/JDK-8167431, although it appears that so far the only fix is to the performance of javac, not to default method resolution in the JVM itself. |
And the bug was closed, so the original issue you reported may have been forgotten. |
I've emailed contact from Oracle to clarify. |
fwiw, for |
and |
Running
scalac -version
in 2.12.0-SNAPSHOT takes 0.9s, a significant regression from 2.11.8, which took 0.2s.Profiling this suggests that the dominant factor is resolution of default methods, which HotSpot JVM performs eagerly during classloading. I believe that collections present the biggest challenge, which suggests that this regression will be felt by all Scala programs, not just the compiler.
This might be inherent complexity required to implement method resolution, in particular:
https://docs.oracle.com/javase/specs/jvms/se8/html/jvms-5.html#jvms-5.4.3.3
For completeness, here's the corresponding spec in the Java language:
https://docs.oracle.com/javase/specs/jls/se8/html/jls-15.html#jls-15.12.2.5
The implementation in OpenJDK is in
defaultmethods.cpp
.I have ported this to Scala to try to understand the algorithm and potential optimizations (either in our code generation or in OpenJDK itself).
I have also synthesized a pure Java test case that demonstrates exponential performance, both in javac and java.
The text was updated successfully, but these errors were encountered: