Skip to content

Hang when using amphp and tracing JIT #12249

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
danog opened this issue Sep 20, 2023 · 15 comments
Closed

Hang when using amphp and tracing JIT #12249

danog opened this issue Sep 20, 2023 · 15 comments

Comments

@danog
Copy link
Contributor

danog commented Sep 20, 2023

Description

Reproducer: https://paste.daniil.it/parallel_jit.tar.xz (run.sh script)

PHP Version

PHP 8.2.10

Operating System

Arch linux

@dstogov dstogov self-assigned this Oct 2, 2023
@danog danog changed the title Hang when using amphp/parallel and tracing JIT Hang when using amphp and tracing JIT Oct 6, 2023
@danog
Copy link
Contributor Author

danog commented Oct 6, 2023

By the way @dstogov, this issue can be easily reproduced without even using network I/O by running the composite cancellation benchmark in https://github.com/amphp/amp:

php -d opcache.enable_cli=1 -d opcache.jit=tracing -d opcache.jit_buffer_size=1G -d opcache.jit_max_root_traces=1000000 -d opcache.jit_max_side_traces=1000000 -d opcache.jit_max_exit_counters=1000000 -d opcache.jit_hot_loop=1 -d opcache.jit_hot_func=1 -d opcache.jit_hot_return=1 -d opcache.jit_hot_side_exit=1 -d memory_limit=-1 vendor/bin/phpunit --debug --filter 'testBenchmark' test/Cancellation/CompositeCancellationTest.php

@danog
Copy link
Contributor Author

danog commented Oct 7, 2023

@dstogov Finally managed to narrow it down to this specific closure execution: https://github.com/revoltphp/event-loop/blob/main/src/EventLoop/Internal/AbstractDriver.php#L593

Replacing that line with a call_user_func fixes the issue.

Replacing https://github.com/revoltphp/event-loop/blob/main/src/EventLoop/Internal/AbstractDriver.php#L86 with an indirect closure function (Closure $i): void { $this->setInterrupt($i); } also seems to fix the issue.

@danog
Copy link
Contributor Author

danog commented Oct 7, 2023

It seems like the actual source of the problem is https://github.com/revoltphp/event-loop/blob/main/src/EventLoop/Internal/AbstractDriver.php#L565, the if is entering when it shouldn't, in fact adding the following assertion fails:

                    if (!isset($this->callbacks[$callback->id]) || !$callback->invokable) {
                        if (isset($this->callbacks[$callback->id]) && $callback->invokable) {
                            throw new AssertionError("Impossible");
                        }
                        unset($callback);

                        continue;
                    }

@danog
Copy link
Contributor Author

danog commented Oct 7, 2023

In fact it's just the isset, splitting the if just this code can trigger the assertion:

                    if (!isset($this->callbacks[$callback->id])) {
                        if (isset($this->callbacks[$callback->id])) {
                            throw new AssertionError("Impossible");
                        }
                        unset($callback);

                        continue;
                    }

@danog
Copy link
Contributor Author

danog commented Oct 7, 2023

Very strangely, the first few runs after modifying the file don't trigger the assertion (something to do with mtime maybe?)

@danog
Copy link
Contributor Author

danog commented Oct 7, 2023

The affected trace seems to be trace 311 from the debug output, the zend bytecode seems correct, let's see if I can find the assembly...

https://paste.daniil.it/debug_run

0013 V4 = FETCH_THIS
0014 T6 = FETCH_OBJ_R CV0($callback) string("id") ; op1(object of class Revolt\EventLoop\Internal\TimerCallback) val(string)
0015 T5 = FETCH_OBJ_IS V4 string("callbacks") ; op1(object of class Revolt\EventLoop\Driver\UvDriver) val(array)
0016 T4 = ISSET_ISEMPTY_DIM_OBJ (isset) T5 T6 ; op1(array) op2(string) val(object)
0017 ;JMPNZ T4 0027

@danog
Copy link
Contributor Author

danog commented Oct 7, 2023

Simpler reproducer:

<?php

use Revolt\EventLoop;
use Revolt\EventLoop\Driver\UvDriver;

use function Amp\delay;

require 'vendor/autoload.php';

EventLoop::setDriver(new UvDriver);

for ($i = 0; $i < 30; $i++) {
    EventLoop::queue(static fn () => null);

    fwrite(STDOUT, "$i START\n");
    delay(0.0001); // Tick loop to allow resources to be freed.
    fwrite(STDOUT, "$i END\n");
}

@danog
Copy link
Contributor Author

danog commented Oct 7, 2023

OK I think I found the cause, the code in the AbstractDriver class is referencing the $callbacks property of the child class UvDriver, not the $callbacks property in AbstractDriver: renaming it in UvDriver fixes the issue.

@nielsdos
Copy link
Member

nielsdos commented Oct 8, 2023

I had to increase the loop count to 100 to be able to reliably reproduce it.
I think you're right it's the same root cause, as my PR #12381 also seems to fix this.

@danog
Copy link
Contributor Author

danog commented Oct 8, 2023

Yep, however the phpunit testsuite still seems to fail with your fix if ASAN is enabled (not a JIT bug tho)...

@nielsdos
Copy link
Member

nielsdos commented Oct 8, 2023

Yep, however the phpunit testsuite still seems to fail with your fix if ASAN is enabled (not a JIT bug tho)...

@danog That's not a bug. If you use ASAN and disable the Zend memory allocator (which you need to do for ASAN to work properly), then memory_get_usage() doesn't work. That function will always return 0 because you bypass Zend's memory manager.
As the CompositeCancellationTest only performs assertions when the memory usage returned by that function is higher than zero, the assertion can never execute. Hence, phpunit reports a risky test because no assertions happened.

@danog
Copy link
Contributor Author

danog commented Oct 8, 2023

Actually I was referring to this phpunit failure in amphp/parallel:

1) Amp\Parallel\Test\Worker\ProcessPoolTest::testSubmitMultipleThenShutdown
Amp\Parallel\Context\ContextException: The context stopped responding, potentially due to a fatal error or calling exit

Which is caused by a crash of the worker process only when ASAN is used, trying to look into it...

@danog
Copy link
Contributor Author

danog commented Oct 8, 2023

Turns out it's just a timeout caused by the slowdown in ASAN, which causes the connection attempts in src/Context/Internal/functions.php to start before the IPC hub starts listening on the UNIX socket, ping @trowski

@danog
Copy link
Contributor Author

danog commented Oct 8, 2023

Oh and also ping @trowski, mind reporting that bus error segfault you were getting on mac with JIT when running php examples/async.php in amphp/amp?

@nielsdos
Copy link
Member

nielsdos commented Oct 9, 2023

Fixed via #12381. Feel free to report the other bus error issue though, preferably in a separate issue though :-)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants