Skip to content

php-fpm: graceful restart without blocking/losing requests #3758

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

m6w6
Copy link
Contributor

@m6w6 m6w6 commented Jan 22, 2019

Issue:

When php-fpm gets "graceful" reloaded it will stop accepting requests
while finishing reqests it already received. If those requests to finish
are of a long running kind like big uploads over a slow connection, this
will block new requests to be handled for an unacceptable long time.

Current behavior

|
v  FPM (OLD) running
|
v  $ sudo kill -USR2 $(cat /run/php-fpm/php-fpm.pid)
|
v  OLD sends SIGQUIT to children
v  OLD stops handling requests
|
v  OLD keeps running until all children complete or time out
|
v  OLD exec()s
v  FPM (NEW) starts & re-uses listening socket
|
v  NEW handles incoming requests from now on

Proposed Solution:

In the case of a reloading request, we immediately fork and exec a new
fpm instance to seamlessly handle new requests, while the old instance
keeps finishing those requests already accepted.

New behavior

|
v  FPM (OLD) running
|
v  $ sudo kill -USR2 $(cat /run/php-fpm/php-fpm.pid)
|
v  OLD sends SIGQUIT to children
v  OLD fork()s & exec()s
|\
| \
|  v  FPM (NEW) starts & re-uses listening socket
|  v  NEW handles incoming requests from now on
|  |
v  |  OLD keeps running until all children complete or time out
v  |  OLD gone
-  |
  /
 /
v  NEW still going
|

We've been running this patch since PHP 5.6 (early 2017).

@petk petk added the Feature label Jan 22, 2019
@nikic nikic requested a review from bukka January 23, 2019 09:45
@m6w6
Copy link
Contributor Author

m6w6 commented Jan 23, 2019

I also noticed we're shipping an unused php-fpm.service template.

Maybe we should add && wait $MAINPID to the reload command?

/cc @remicollet

@bukka bukka self-assigned this Jan 23, 2019
Copy link
Member

@bukka bukka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense for the mentioned case when there is a long running process that blocks the restart. I guess it might make the actual init scripts a bit more complex. The first thing that came to my mind is the collecting of logs. It might require some extra logic to collect it from both main processes. I'm not really sure how it's usually setup in distros so it would be really good to get some feedback from FPM distro maintainers.

@krakjoe
Copy link
Member

krakjoe commented Jan 31, 2019

This is really nice ...

More noise @remicollet, input please ?

@frederikbosch
Copy link
Contributor

Isn't this basically a fix for #60691?

@aboks
Copy link
Contributor

aboks commented Oct 14, 2019

@bukka @krakjoe Is there anything I could do to help moving this fix forward?

@bukka bukka added the SAPI: fpm label Jun 9, 2020
@bukka
Copy link
Member

bukka commented Jun 9, 2020

I think this would be a nice addition for PHP 8 and we should try to get it in. I'm not sure we will hear from anyone in terms of init scripts so we shouldn't block it forever IMO.

@m6w6 if it's not too much hassle for you and would you be able to rebase it, that would be great! Also question if you have been using this with systemd?

@cmb69
Copy link
Member

cmb69 commented Jul 15, 2020

The test failures appear to be legit.

@m6w6
Copy link
Contributor Author

m6w6 commented Jul 16, 2020

Doh! I'll have a look, thanks.

 ## Issue:

When php-fpm gets "graceful" reloaded it will stop accepting requests
while finishing reqests it already received. If those requests to finish
are of a long running kind like big uploads over a slow connection, this
will block new requests to be handled for an unacceptable long time.

 ### Current behavior

```
|
v  FPM (OLD) running
|
v  $ sudo kill -USR2 $(cat /run/php-fpm/php-fpm.pid)
|
v  OLD sends SIGQUIT to children
v  OLD stops handling requests
|
v  OLD keeps running until all children complete or time out
|
v  OLD exec()s
v  FPM (NEW) starts & re-uses listening socket
|
v  NEW handles incoming requests from now on
```

 ## Proposed Solution:

In the case of a reloading request, we immediately fork and exec a new
fpm instance to seamlessly handle new requests, while the old instance
keeps finishing those requests already accepted.

 ### New behavior

```
|
v  FPM (OLD) running
|
v  $ sudo kill -USR2 $(cat /run/php-fpm/php-fpm.pid)
|
v  OLD sends SIGQUIT to children
v  OLD fork()s & exec()s
|\
| \
|  v  FPM (NEW) starts & re-uses listening socket
|  v  NEW handles incoming requests from now on
|  |
v  |  OLD keeps running until all children complete or time out
v  |  OLD gone
-  |
  /
 /
v  NEW still going
|
```

We've been running this patch since PHP 5.6 (early 2017).
@m6w6
Copy link
Contributor Author

m6w6 commented Jul 16, 2020

I tried to fix the tests, but the output of FPM reloading is non-deterministic, because it depends on the brightness of the moon whether the old parent process first says bye-bye or the new one logs that it reuses the old socket.

On another note, FPM is run by proc_open which is of no use when testing reloading because of fork&exec. Observing output through the inherited file descriptors works, though, but process control won't.

@cmb69
Copy link
Member

cmb69 commented Jul 16, 2020

If solid tests are not possible, I think it's better to remove them (or to skip them).

@bukka
Copy link
Member

bukka commented Jul 19, 2020

We could modify the tester to assert between messages from a set. It means we would pass 2 sets of messages and check input line matches one of them and then move a pointer in the set to the next expected message. Or maybe just check between all messages in the set so the order doesn't matter. That should be doable. I'd prefer not to delete the reloading tests as there has been quite an effort to make them right.

This issue actually reminded me the problem with logging when catch_workers_output = yes and error_log = /dev/stdout are set. Currently we have got the master process making sure that the messages are not interleaved. With this change, we won't be able to assure that because new and old master can be running in parallel and both write to stdout which means that the message can be interleaved. I think that fixing it would require some kind of extra locking which might add up to complexity quite a bit. Also I guess that the above setup is mainly for FPM running in docker where the reload is not usually done so might not be that big issue in reality.

However considering this and that we are also not sure about what different init systems will do and if it actually works fine and that there might be possibly some other edge cases I'd prefer to make this optional and non default. We could introduce a global config option that would set a new global variable which would be then use to decide what kind of reloading should be done. I think this can be possibly less work than trying to fix the logging issue and making all the reloading tests work (those could be done later possibly). It would be still good to have at least one test for the new reloading but that could be just something that it actually reloads.

@m6w6
Copy link
Contributor Author

m6w6 commented Aug 6, 2020

Okay. I'll close this PR, let me explain why.

I think it is a hack. A hack based on another bad hack, which makes it just a worse hack.

The first hack is selling a graceful shutdown+reexec as a graceful reload - as it is now - potentially causing downtime instead of cut-off requests. But just cramming a fork in before the reexec causes confusion and init to loose track or even kill the new instance at last when the previous FPM exits.

After updating this PR the last time, I tried to make the graceful reloading FPM play nice with f.e. systemd, but after days trying to produce something simpler than the 300+ LOC shell shim of our ops team sitting between init and the graceful php-fpm, I ended up with something maybe even worse than that.

A few tests suggested that it works as supposed, but this is unacceptable in a general manner.
To make this PR work in a reliable way, FPM would have to change its current architecture where the master process is supervised by another process, a FPMM (haha). :lolsob:

@m6w6 m6w6 closed this Aug 6, 2020
@langemeijer
Copy link

I think I follow the reasoning why this didn't make it. So the current state is that graceful reload is not going to happen.

I think https://bugs.php.net/bug.php?id=60961 should be closed.
https://www.php.net/manual/en/install.fpm.php now states

These features include:

  • advanced process management with graceful stop/start;

But that is clearly not the case, I think this should be changed.

But then I have the following idea for a solution, maybe someone here can say if this is a feasible solution:

I would like to see php-fpm changed that on receiving a SIGQUIT on the main process we do the normal shutdown sequence, except that we close the listening sockets first. This way, systemd (or any other init system) can start a new php-fpm instance, not related to the php-fpm instance before.

The current situation is that php-fpm still has the listening sockets connected, but is never answering to new requests anyway. Holding the listening ports hostage is not functional, as soon as the the sockets can be closed the better.

With this implemented then at least graceful stop is working. In my opinion SIGQUIT should do graceful and SIGKILL should be used when all threads need to be closed, but obviously the graceful stop could also be implemented on any other signal.

Obviously, this still is not ideal. On sending a signal and starting a new php-fpm process, the listening ports could still be blocked, because the signal is not processed fully yet. starting php-fpm would fail and systemd should retry. Also this would leave a small 'gap' where for some time no socket is listening.
Still I think this would be better that what we have now.

@bukka bukka mentioned this pull request Jul 17, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants