Skip to content

virtualize unsafe compare and swap calls #636

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 26, 2018

Conversation

fwbrasil
Copy link
Contributor

@fwbrasil fwbrasil commented Aug 24, 2018

Problem

Compare and swap calls using Unsafe are currently always lowered to an atomic operation. That's unnecessary when the CAS mutates a field of an instance that is virtual. It introduces some performance overhead and prevents other optimizations.

Let's take this JMH benchmark class as an example:

public class VirtualCASBench {

  public static final long valueOffset = UnsafeUtil.fieldOffset(TestClass.class, "value");

  private static class TestClass {
    public volatile int value;

    public TestClass(int value) {
      this.value = value;
    }
  }

  @Benchmark
  public boolean testUnsafe() {
    TestClass t = new TestClass(0);
    return UnsafeUtil.unsafe.compareAndSwapInt(t, valueOffset, 0, 1);
  }

  @Benchmark
  public boolean testIfElse() {
    TestClass t = new TestClass(0);
    synchronized (t) {
      if (t.value != 0)
        return false;
      else {
        t.value = 1;
        return true;
      }
    }
  }
}

The if/else version has much better performance than the version using the unsafe compare and swap:

image

It is optimized before lowering to a graph that returns a constant:

image

That's expected since the instance doesn't escape and constant folding can determine that the CAS will always return true. The same doesn't happen with the unsafe version since the compare and swap node is opaque to constant folding:

image

Solution

When an object is virtualized, compare and swap can be replaced by a guard and the new value can be set using the VirtualizerTool. I've implemented such optimization and the benchmark results are promising:

image

With the CAS virtualization, the unsafe compare and swap benchmark is also optimized to a constant:

image

Notes and questions

  1. I've used a guard that will trigger deoptimization in case the current value doesn't match the expected value. It assumes that users won't write code that will makes a CAS operation fail since the object doesn't escape and can't be used concurrently.

  2. I couldn't link the guard node to the predecessor using the VirtualizerTool, so I set it in UnsafeCompareAndSwapNode.virtualize. I'm not sure if that could be problematic since it seems that all effects should be done through the tool during virtualization.

  3. The optimization is applied only when the field can be resolved. If the user uses a dynamic or invalid offset, it won't be applied.

  4. I've added an option to enable the optimization. Should it be enabled by default?

@graalvmbot
Copy link
Collaborator

  • Hello Flavio Brasil, thanks for contributing a PR to our project!

We use the Oracle Contributor Agreement to make the copyright of contributions clear. We don't have a record of you having signed this yet, based on your email address fbrasil -(at)- twitter -(dot)- com. You can sign it at that link.

If you think you've already signed it, please comment below and we'll check.

@fwbrasil
Copy link
Contributor Author

Twitter has already signed the OCA, but I've also submitted the form by email.

@graalvmbot
Copy link
Collaborator

  • Flavio Brasil has signed the Oracle Contributor Agreement (based on email address fbrasil -(at)- twitter -(dot)- com) so can contribute to this repository.

LogicNode equalsNode = CompareNode.createCompareNode(EQ, expected, read, tool.getConstantReflectionProvider(), NodeView.DEFAULT);

FixedGuardNode guardNode = new FixedGuardNode(equalsNode, DeoptimizationReason.Aliasing, DeoptimizationAction.InvalidateRecompile);
guardNode.replaceAtPredecessor(predecessor());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This node has just been created and has no predecessor, this will not do anything. If it did it would be a problem anyway since in virtualize every graph modification has to be delayed using the tool (as you have done below).


LogicNode equalsNode = CompareNode.createCompareNode(EQ, expected, read, tool.getConstantReflectionProvider(), NodeView.DEFAULT);

FixedGuardNode guardNode = new FixedGuardNode(equalsNode, DeoptimizationReason.Aliasing, DeoptimizationAction.InvalidateRecompile);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A deopt without any alternative is a bit dangerous performance-wise: it might lead to deopt-loops. In particular here the "action" is InvalidateRecompile which is quite costly considering that the recompilation will lead to exactly the same code.

What about just using ConditionalNodes?

ConditionalNode fieldValue = ConditionalNode.create(equalsNode, newValue, read);
ConditionalNode result = ConditionalNode.create(equalsNode);
//...
tool.setVirtualEntry(obj, index, fieldValue);
//...
tool.replaceWith(result);

In the "simple" cases it will immediately fold away and in more complex cases where equality can't be statically proven, performance should still be OK.


// @formatter:off
@Option(help = "Virtualize unsafe CAS calls", type = OptionType.Expert)
public static final OptionKey<Boolean> VirtualCAS = new OptionKey<>(false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This option is fine while you're experimenting but ultimately i don't think we need an option for this.

@gilles-duboscq
Copy link
Member

Thank you for the PR, that's a good idea.

I've answered some of your questions in comments.

Regarding the predecessor issue, the tool's addNode method weaves fixed nodes as needed: when the addNode is applied it adds the fixed node before the node that is being virtualized using StructuredGraph#addBeforeFixed.

@fwbrasil
Copy link
Contributor Author

@gilles-duboscq thank you for the review! I've changed the impl to use ConditionalNodes as suggested. I had to add a condition for when the logic node becomes a constant since the graph becomes invalid if I create the other nodes based on the logic constant.

@gilles-duboscq
Copy link
Member

I had to add a condition for when the logic node becomes a constant since the graph becomes invalid if I create the other nodes based on the logic constant.

Right. I guess this is due to tool.addNode choking on nodes that are already in the graph. That's a bit inconvenient but we can probably work around that:

  • equalsNode is always a new node (a constant or a comparison)
  • fieldValue might be a new node, newValue, or currentValue
  • result is always a new node

Based on this it might be enough to just gate the tool.addNode(fieldValue) on if (!fieldValue.isAlive()). If that works it would simplify this code a bit.

In any case, i think we can remove the option and we should be good to go.

@fwbrasil
Copy link
Contributor Author

@gilles-duboscq it seems more complicate than that. For some reason the partial escape analysis tries to remove twice the constant node representing true.

I'm testing this new version with a service and it seems that there's a bug. The constant folding somehow infers the comparison as false in some cases where they should be true. I'm investigating if it could be related to the stamps of the values.

@gilles-duboscq
Copy link
Member

I tried to write a few unit tests on top of your changes but i couldn't reproduce either error.
Do you have some example code that could reproduce?

@fwbrasil
Copy link
Contributor Author

@gilles-duboscq thank you for looking into it. I haven't been able to reproduce the issue with the logic constant being false in isolation yet. For the invalid graph issue, this always fail if I remove the if condition for logic constants:

  public static void main(String[] args) {
    while (true) {
      test();
    }
  }

  private static boolean test() {
    AtomicInteger a = new AtomicInteger(0);
    return a.compareAndSet(0, 1);
  }
[thread:5] scope: JVMCI CompilerThread0
    [thread:5] scope: JVMCI CompilerThread0.Compiling.GraalCompiler
    Context: StructuredGraph:139{HotSpotMethod<Bugs.main(String[])>}
                [thread:5] scope: JVMCI CompilerThread0.Compiling.GraalCompiler.FrontEnd.HighTier.PartialEscapePhase.iteration 0.EffectsPhaseWithSchedule.DeadCodeEliminationPhase
                Exception raised in scope JVMCI CompilerThread0.Compiling.GraalCompiler.FrontEnd.HighTier.PartialEscapePhase.iteration 0.EffectsPhaseWithSchedule.DeadCodeEliminationPhase: java.lang.ArrayIndexOutOfBoundsException: -15625002
                	at org.graalvm.compiler.graph.NodeBitMap.isMarked(NodeBitMap.java:85)
                	at org.graalvm.compiler.graph.NodeBitMap.isMarked(NodeBitMap.java:71)
                	at org.graalvm.compiler.graph.NodeFlood.isMarked(NodeFlood.java:65)
                	at org.graalvm.compiler.phases.common.DeadCodeEliminationPhase.deleteNodes(DeadCodeEliminationPhase.java:140)
                	at org.graalvm.compiler.phases.common.DeadCodeEliminationPhase.run(DeadCodeEliminationPhase.java:103)
                	at org.graalvm.compiler.phases.Phase.run(Phase.java:49)
                	at org.graalvm.compiler.phases.BasePhase.apply(BasePhase.java:197)
                	at org.graalvm.compiler.phases.Phase.apply(Phase.java:42)
                	at org.graalvm.compiler.phases.Phase.apply(Phase.java:38)
                	at org.graalvm.compiler.virtual.phases.ea.EffectsPhase.runAnalysis(EffectsPhase.java:108)
                	at org.graalvm.compiler.virtual.phases.ea.PartialEscapePhase.run(PartialEscapePhase.java:82)
                	at org.graalvm.compiler.virtual.phases.ea.EffectsPhase.run(EffectsPhase.java:1)
                	at org.graalvm.compiler.phases.BasePhase.apply(BasePhase.java:197)
                	at org.graalvm.compiler.phases.BasePhase.apply(BasePhase.java:139)
                	at org.graalvm.compiler.phases.PhaseSuite.run(PhaseSuite.java:212)
                	at org.graalvm.compiler.phases.BasePhase.apply(BasePhase.java:197)
                	at org.graalvm.compiler.phases.BasePhase.apply(BasePhase.java:139)
                	at org.graalvm.compiler.core.GraalCompiler.emitFrontEnd(GraalCompiler.java:256)
                	at org.graalvm.compiler.core.GraalCompiler.compile(GraalCompiler.java:180)
                	at org.graalvm.compiler.core.GraalCompiler.compileGraph(GraalCompiler.java:165)
                	at org.graalvm.compiler.hotspot.HotSpotGraalCompiler.compileHelper(HotSpotGraalCompiler.java:191)
                	at org.graalvm.compiler.hotspot.HotSpotGraalCompiler.compile(HotSpotGraalCompiler.java:204)
                	at org.graalvm.compiler.hotspot.CompilationTask$HotSpotCompilationWrapper.performCompilation(CompilationTask.java:181)
                	at org.graalvm.compiler.hotspot.CompilationTask$HotSpotCompilationWrapper.performCompilation(CompilationTask.java:1)
                	at org.graalvm.compiler.core.CompilationWrapper.run(CompilationWrapper.java:171)
                	at org.graalvm.compiler.hotspot.CompilationTask.runCompilation(CompilationTask.java:330)
                	at org.graalvm.compiler.hotspot.HotSpotGraalCompiler.compileMethod(HotSpotGraalCompiler.java:144)
                	at org.graalvm.compiler.hotspot.HotSpotGraalCompiler.compileMethod(HotSpotGraalCompiler.java:111)
                	at jdk.vm.ci.hotspot.HotSpotJVMCIRuntime.compileMethod(HotSpotJVMCIRuntime.java:525)

@fwbrasil
Copy link
Contributor Author

it's a concurrency bug, not related to the stamps. I'm analyzing it

@fwbrasil
Copy link
Contributor Author

fwbrasil commented Aug 28, 2018

@gilles-duboscq should we add a memory barrier somewhere? It seems that other threads see class instances set through a CAS operation with some of their fields null. I haven't been able isolate the issue, but you can reproduce it using:

curl -L https://github.com/fwbrasil/scala-graal/blob/master/scala-graal-assembly-1.0.0-SNAPSHOT.jar?raw=true > bench.jar

mx  vm -XX:+UseJVMCICompiler -cp bench.jar -Dgraal.Dump=:5 -Dgraal.MethodFilter=Promise.transform -Dgraal.VirtualCAS=true org.openjdk.jmh.Main -f 0 -wi 15 -i 15 -r 1 -w 999999 FinagleBench

If you use -Dgraal.VirtualCAS=false the NPEs don't happen

@gilles-duboscq
Copy link
Member

I had a more thorough look at this.
Regarding the NPEs: The issue is that when creating the Compare and Conditional nodes, care must be taken not to mix virtual and non-virtual nodes: you can not have one side virtual and the other one not.

You should use tool.getAlias to get the current node for expected & newValue. Then you have to check the status of the inputs before creating binary nodes. In general, you can only create them if the inputs are not virtual (i.e., not instances of VirtualObjectNode). If both are virtual you can sometimes come to a conclusion. e.g., if 2 VirtualObjectNode are != in the IR then they can also not represent == objects at runtime if they have identity (VirtualObjectNode#hasIdentity).

Regarding the other issue, i can not reproduce it when using

if (!fieldValue.isAlive()) {
    tool.addNode(fieldValue);
}

Also node that this kind of problems is usually detected earlier if you use -esa.

@fwbrasil
Copy link
Contributor Author

fwbrasil commented Aug 29, 2018

@gilles-duboscq thank you for the pointers! I've implemented the logic you suggested, but I couldn't find any CAS operations in our service where both values are virtual, or both non-virtual. They're usually virtual for the new value but a PiNode for the expected value.

I've done some more testing, and the virtualize method is called twice by the virtualization process. During the second pass, if I create the equalsNode it becomes a logic constant because of constant folding and I can apply the CAS statically, so I added the condition back. If I leave that condition out, no CAS operations are virtualized in our codebase.

EDIT: I've tested another codebase and found one instance where both are non-virtual

@fwbrasil fwbrasil force-pushed the cas-elimination branch 2 times, most recently from ff2df3e to 37d889f Compare September 4, 2018 22:23
@fwbrasil
Copy link
Contributor Author

fwbrasil commented Sep 4, 2018

@gilles-duboscq I've added test, removed the option, and removed the virtual object comparison since it's covered by the constant fold applied when the equals node is created. It should be good to merge

@fwbrasil fwbrasil changed the title [wip] virtualize unsafe compare and swap calls virtualize unsafe compare and swap calls Sep 4, 2018
@fwbrasil fwbrasil force-pushed the cas-elimination branch 3 times, most recently from 8c3f21a to 3a68fe1 Compare September 5, 2018 17:44
@fwbrasil
Copy link
Contributor Author

fwbrasil commented Sep 6, 2018

the build failure doesn't seem related to the change: https://travis-ci.org/oracle/graal/jobs/424908424#L1693

tool.replaceWith(ConstantNode.forBoolean(equals));

} else if (!(expectedAlias instanceof VirtualObjectNode) && !(newValueAlias instanceof VirtualObjectNode)) {
ValueNode fieldValue = ConditionalNode.create(equalsNode, newValueAlias, currentValue, NodeView.DEFAULT);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you try

Object o = new Object();
AtomicReference<Object> a = new AtomicReference<>(o);
return a.compareAndSet(obj1, obj2);

?
If currentValue is virtual but expected or new values are not? The created conditional node should lead to issues (virtual only on one side).
You should either disable optimization in this case or look at how ObjectEqualsNode#virtualize resolves == in various cases (note that you might also be able to resolve the case where both currentValue and expectedAlias are virtual if you want).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just added a test for it: https://github.com/oracle/graal/pull/636/files#diff-4039dbafeadfd69113c08dcedbdd913cR114.

It takes the branch where both values are non-virtual and virtualizes the CAS. I'm also running this version in a large codebase without problems.

Looking at the ObjectEquals virtualization, it'll infer that the the objects are not equal if one of them is virtual and the other is not, so it should be ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you return a.get() rather than the result of the CAS, you'll see in the graph that it returns a ConditionalNode with a Virtual node on one side an a LoadField on the other, although this test does not fail, we can not generate valid code for that, it would just crash later in the compilation process. See for example this test:

public static Object onlyInitialValueVirtualMatch2() {
    AtomicReference<Object> a = new AtomicReference<>(new Object());
    a.compareAndSet(obj1, obj2);
    return a.get();
}

@Test
public void onlyInitialValueVirtualMatchTest2() {
    testEscapeAnalysis("onlyInitialValueVirtualMatch2", null, true);
    assertTrue(graph.getNodes(LogicCompareAndSwapNode.TYPE).isEmpty());
    test("onlyInitialValueVirtualMatch2");
}

This last line with test will try to actually compile and use the resulting code which fails because we don't generate code for virtual nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the example. I've added a condition to check if the current value is not virtual as well.


@Test
public void onlyInitialValueVirtualMatchTest() {
testEscapeAnalysis("onlyInitialValueVirtualMatch", null, true);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, this test now fails, (testEscapeAnalysis makes sure there is no more allocation in the snippet after escape analysis).
You can:

  • just remove it
  • handle more virtual vs. non-virtual cases by resolving the == like ObjectEqualsNode#virtualize does (this case for example is virtual vs. non-virtual where both types have identity -> false)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've just removed the test

@gilles-duboscq
Copy link
Member

Thank you @fwbrasil for bearing with all my comments and making the adjustments!
I'll integrate this.

@dougxc dougxc merged commit e5cab6b into oracle:master Sep 26, 2018
@dougxc
Copy link
Member

dougxc commented Oct 18, 2018

Hi @fwbrasil , have you had a chance to measure the performance impact of this change?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants