-
Notifications
You must be signed in to change notification settings - Fork 7.9k
JIT deepest function firstly #10897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
JIT deepest function firstly #10897
Conversation
This needs a careful review and testing. |
Yes, if the inline function will be JITTed first and then the loop is JITTed if the loop is continued. log like this: |
This is not what I would like to do. This disables inlining and reduces specialization. |
I think, the better decision would be fallback to compilation of inlined function if its trace becomes too long. All the specializations inherited from the caller (e.g. types of arguments) have to be discarded. |
Yeah, it seems a little aggressive. But this patch will JIT twice if different types of arguments are passed in a function. This also meets the previous side exit design. And it is convenient to link to the JITTed function later. JIT log like this: ---- TRACE 5 start (side trace 1/0) add() /home/wxue/php-src/dup.php:3 <-- JIT add() with float argument |
It makes sense to avoid inining of big functions but not inlining at all. |
If a type of argument is known on the caller side, we don't need to check it on the inlined callee side. In your example <?php
function add($x){
return $x + 1;
}
function main() {
$x = 0;
for($i = 0; $i < 1000; $i++) {
if($i % 10 == 0) {
$x = add($x);
}
if($i % 8 == 0) {
$x = add($x);
}
}
echo("x=$x\n");
}
main();
?> Also you may compare the So in this example a single inlining saves at least 12 executed instructions: 2 indirect jumps between traces, 3 conditional jumps, 2 EG(jit_trace_num) assignments. It also avoids generation of never executed code for rare RETURN cases. Tracing JIT duplicates code by design and your proposed patch is definitely not good enough. Actually, you may achieve similar result changing
I see your intention, and I agree that we may limit inlining of big functions, but this shouls be done in some other way. |
Thanks for your comment. So let's discuss the trade-off and get an appropriate implementation. I am glad to try and test. 😊
|
I think, the problem might be solved by counting the number if inlined instructions, aborting compilation when this number exceeds some limit, and fall-back to compilation only the tail of the recorded trace for the inlined functions that caused abort. |
Okay, if abort, we compile the inlined deepest function or only the inlined function (may exist nested inlined function here) ? |
I think, the best solution would be compiling function that caused limit overflow. It may include other small inlined functions. During tracing we remember function entries (push to some stack). When we trace a function exit and the distance from the corresponding function entry is above some limit, we decide to discard the start of the trace (before that function entry) and compile a trace for that function only. Note that function may be already compiled. |
I tried an easy way to get a similar effect. When a trace is too long (idx > JIT_G(max_inline_func_length)), we JIT inlined function firstly. The parameter max_inline_func_length can be set by developers. Now our WordPress workload gets the best performance when max_inline_func_length is 8. The default value needs more testing later ~ I'm looking forward to your comment~ |
@@ -1062,6 +1062,13 @@ zend_jit_trace_stop ZEND_FASTCALL zend_jit_trace_execute(zend_execute_data *ex, | |||
|
|||
trace_flags = ZEND_OP_TRACE_INFO(opline, offset)->trace_flags; | |||
if (trace_flags) { | |||
/* if inlined functions are too long, stop current tracing and restart a new one */ | |||
if (trace_buffer[idx-1].op == ZEND_JIT_TRACE_ENTER && idx > JIT_G(max_inline_func_length)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't limit the inlined function length. This limits function inlining to the first trace instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't limit the inlined function length. This limits function inlining to the first trace instructions.
Actually, JAVA JIT support inlined function up to a particular size. Here we limit size by separating the long trace by max_inline_func_length.
If we limit the inlined function length, some functions may never be JITTed. So I chose a compromise method
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your code allows inlining only if it occurs in the first 16 instructions of the trace. It easelly prohibits inlining of the short functions.
---- TRACE 44 start (side trace 43/3) WP_Scripts::localize() /home/www/bench/wordpress-3.6/wp-includes/class.wp-scripts.php:148
0027 FE_FREE V8 ; op1(array)
0028 T10 = ROPE_INIT 3 string("var ")
0029 T10 = ROPE_ADD 1 T10 CV1($object_name) ; op2(string)
0030 T9 = ROPE_END 2 T10 string(" = ")
0031 INIT_FCALL 1 96 string("json_encode")
>init json_encode
0032 SEND_VAR CV2($l10n) 1 ; op1(array)
0033 V10 = DO_ICALL
>call json_encode
0034 T8 = FAST_CONCAT T9 V10 ; op1(string) op2(string)
0035 CV6($script) = FAST_CONCAT T8 string(";") ; op1(string)
0036 T8 = ISSET_ISEMPTY_CV (empty) CV3($after) ; op1(undef)
0037 ;JMPNZ T8 0042
0042 INIT_METHOD_CALL 2 THIS string("get_data")
>init WP_Dependencies::get_data
0043 SEND_VAR_EX CV0($handle) 1 ; op1(string)
0044 SEND_VAL_EX string("data") 2
0045 V8 = DO_FCALL
>enter WP_Dependencies::get_data
---- TRACE 44 abort (JIT inlined function and skip current trace)
---- TRACE 44 start (enter) WP_Dependencies::get_data() /home/www/bench/wordpress-3.6/wp-includes/class.wp-dependencies.php:163
0002 T3 = FETCH_OBJ_IS THIS string("registered") ; val(array)
0003 T2 = ISSET_ISEMPTY_DIM_OBJ (isset) T3 CV0($handle) ; op1(array) op2(string) val(object)
0004 ;JMPNZ T2 0006
0006 T3 = FETCH_OBJ_IS THIS string("registered") ; val(array)
0007 T2 = FETCH_DIM_IS T3 CV0($handle) ; op1(array) op2(string) val(object)
0008 T3 = FETCH_OBJ_IS T2 string("extra") ; op1(object of class _WP_Dependency) val(array)
0009 T2 = ISSET_ISEMPTY_DIM_OBJ (isset) T3 CV1($key) ; op1(array) op2(string) val(undef)
0010 ;JMPNZ T2 0012
0011 RETURN bool(false)
---- TRACE 44 stop (return)
JAVA doesn't use tracing JIT. The current PHP tracing rules were mainly developed on top of LuaJIT.
In my opinion, this doesn't make a lot of sense, at least in the current state. By design, in case some function is called from different places it should be JIT-ed first, because its counter is going to be triggered first. Otherwise it should be inlined to provide more opportunities for optimisation. As I told, it is possible to limit the size of inlined function, but the current propose patch does a bit different thing. I made some benchmarks using the following command and I see only very small improvement.
I also see some hot code is blacklisted now
|
Yes, Here is a delay for the function counter. Next time the first function will be JITTed.
The performance gain is 2-3% on our WordPress workload and also reduces the total JIT buffer size. |
In my benchmark, the time improvement is ~0.44% and this is less then the measurement mistake, but CPU instruction fetched measured with valgrind show near similar improvement ~0,4%. The code size was actually insignificantly increased. The patch doesn't limit the inlining function size, but just split the traces in a different (more aggressive) way.
At first, May be in case of the event, the trace should be shorten to the actual enter. |
Yes Yes, forgive my awkward name. This is similar like 'ZEND_JIT_TRACE_MAX_LENGTH'. How about 'opcache.jit_max_trace_length' ?
Little confused about 'the trace should be shorten to the actual entey'. |
I don't see problem replacing the hardcoded
Take a look how the |
good function. Let me take a look. So our discussion shows two ways.
So my cute maintainer, What do you choose? I'll continue to modify this code. |
The (2) option may work with a simple implementation, if we delay the JIT-ing of the second part (the called function) until the next entry. Anyway, this may cause breaks in the original design. |
A little question if we set 'backtrack_' variable to compile the first part. Then we need to relink to the first part after the second part (inlined func) is compiled? I see the original code didn't link the JITTed function.
|
Duplicated JITTed code brings overhead for the instruction cache. This patch reduces duplication by JITting inlined function separately if trace is too long because the same function is JITTed in different root trace or side trace sometimes. It increases 2%~3% performance of our workload in tracing mode. Signed-off-by: Wang, Xue <[email protected]> Signed-off-by: Yang, Lin A <[email protected]> Signed-off-by: Su, Tao <[email protected]>
Dear maintainer, I sent you a mail for this PR. Hope to get your reply. |
The following code:
Duplicated JITTed code in this case:
But I expected the add function to be JITTed only once.
Duplicated JITTed code brings overhead for the instruction cache. This patch reduces duplication by JITting deepest inline function first because the same function is JITTed in different root trace or side trace sometimes.
It increases 3% the performance of our workload in tracing mode.