Skip to content

GH-16889: stream_select() timeout useless for pipes on Windows #16917

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

cmb69
Copy link
Member

@cmb69 cmb69 commented Nov 24, 2024

Pipes are blocking on Windows, but php_select() always returns them as ready for read/write. This renders the stream_select() timeout useless, what can cause a following read to block for a very long time.

While there is no general fix (and least not within reach for a stable version), we can at least cater to the important case of read pipes by peeking the pipe to check whether data is available. If there is none, we do not add the handle to the read set.

Pipes are blocking on Windows, but `php_select()` always returns them
as ready for read/write.  This renders the `stream_select()` timeout
useless, what can cause a following read to block for a very long time.

While there is no general fix (and least not within reach for a stable
version), we can at least cater to the important case of read pipes by
peeking the pipe to check whether data is available.  If there is none,
we do not add the handle to the read set.
@cmb69 cmb69 linked an issue Nov 24, 2024 that may be closed by this pull request
The test assumes that at least the stdin and stdout pipes are always
selected as readable, and that the select call will not change their
order.  We're being more defensive now.
@cmb69
Copy link
Member Author

cmb69 commented Nov 24, 2024

stream_select($read, $write, $except, null, 0);

This now waits indefinitely on Windows, what it is supposed to do according to man 2 select, since we're passing timeout=NULL in this case. Don't know why it apparently doesn't do that on other platforms.

PS: ah, because reading files won't block there.

PPS: it's not supposed to block indefinitely; we must not call PeekNamedPipe() on file handles, though.

* bug60602.phpt has the same issue as bug64770.phpt (copy&paste)
* bug49936_win32.phpt should suppress the warnings[1], or be dropped
  altogether[2]

[1] <php@c884d37>
[2] <php@2c6b85f>
`FILE_TYPE_PIPE` will also be returned for sockets, but these have been
filtered out already.
@cmb69
Copy link
Member Author

cmb69 commented Nov 24, 2024

Given that I had to patch a couple of test cases, it might make sense to only apply this to master. However, in https://bugs.php.net/bug.php?id=51800#1561307300, @nicolas-grekas wrote:

A working stream_select() on a pipe would allow writing libs that don't hit the issue. The current ways are hideous (piping to a temporary file...)

@cmb69 cmb69 marked this pull request as ready for review November 24, 2024 19:35
@cmb69 cmb69 requested a review from bukka as a code owner November 24, 2024 19:35
@cmb69 cmb69 closed this in b614b4a Dec 7, 2024
@cmb69 cmb69 deleted the cmb/gh16889 branch December 7, 2024 15:44
@bukka
Copy link
Member

bukka commented Dec 7, 2024

@cmb69 I thought about this a bit and I'm not sure if this is an ideal solution. The problem is that for cases when the blocking is just very short (way below the timeout), the read readiness will just disappear which might break some cases that would be otherwise valid. If I understand it correctly, currently it would return immediately withand read would block for very short time. But after this change it will return also immediately but without the handle in the set so users might think it's not ready and just not wait any longer. Am I right?

Could it be maybe done is some way of looped checking of the handle. Basically some sort of polling and checking the readiness until the timeout expires?

@cmb69
Copy link
Member Author

cmb69 commented Dec 8, 2024

But after this change it will return also immediately but without the handle in the set so users might think it's not ready and just not wait any longer.

Well, that would be a bug in the userland code, but I get your point. However, waiting till the timeout expires appears to be a bad idea, since the timeout is supposed to cater to some worst case scenarios. For instance, it is 60sec by default for run-tests.php. So maybe the best solution to keep compatibility with old code, but still support read pipes handled quickly, would be to change the behavior only if there are nothing but read handles in the set (plus possibly sockets, which should work well on Windows). I.e.:

 win32/select.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/win32/select.c b/win32/select.c
index 76d5ef1ee5..376aaf6a61 100644
--- a/win32/select.c
+++ b/win32/select.c
@@ -34,6 +34,7 @@ PHPAPI int php_select(php_socket_t max_fd, fd_set *rfds, fd_set *wfds, fd_set *e
 	HANDLE handles[MAXIMUM_WAIT_OBJECTS];
 	int handle_slot_to_fd[MAXIMUM_WAIT_OBJECTS];
 	int n_handles = 0, i;
+	int num_read_pipes = 0;
 	fd_set sock_read, sock_write, sock_except;
 	fd_set aread, awrite, aexcept;
 	int sock_max_fd = -1;
@@ -78,6 +79,9 @@ PHPAPI int php_select(php_socket_t max_fd, fd_set *rfds, fd_set *wfds, fd_set *e
 					sock_max_fd = i;
 				}
 			} else {
+				if (SAFE_FD_ISSET(i, rfds) && GetFileType(handles[n_handles]) == FILE_TYPE_PIPE) {
+					num_read_pipes++;
+				}
 				handle_slot_to_fd[n_handles] = i;
 				n_handles++;
 			}
@@ -136,7 +140,7 @@ PHPAPI int php_select(php_socket_t max_fd, fd_set *rfds, fd_set *wfds, fd_set *e
 					if (WAIT_OBJECT_0 == WaitForSingleObject(handles[i], 0)) {
 						if (SAFE_FD_ISSET(handle_slot_to_fd[i], rfds)) {
 							DWORD avail_read = 0;
-							if (GetFileType(handles[i]) != FILE_TYPE_PIPE
+							if (num_read_pipes < n_handles
 								|| !PeekNamedPipe(handles[i], NULL, 0, NULL, &avail_read, NULL)
 								|| avail_read > 0
 							) {
@@ -156,6 +160,9 @@ PHPAPI int php_select(php_socket_t max_fd, fd_set *rfds, fd_set *wfds, fd_set *e
 				}
 			}
 		}
+		if (retcode == 0 && num_read_pipes == n_handles && sock_max_fd < 0) {
+			usleep(100);
+		}
 	} while (retcode == 0 && (ms_total == INFINITE || GetTickCount64() < limit));
 
 	if (rfds) {

@bukka
Copy link
Member

bukka commented Dec 15, 2024

@cmb69 I was looking through the code and I'm not exactly sure what's the actual point to even call WaitForSingleObject on pipe if it returns as ready. Or can it actually return anything else?

What I was thinking was to do those short interval waits and peek the pipe until there is at least one ready. This should happen only if pipe handlers is being check. Pretty much what is suggested in your diff except I don't understand why you removed GetFileType(handles[i]) != FILE_TYPE_PIPE second check. I thought that handles are for files and pipes so if that's the case you don't want to probably call PeekNamedPipe on file. Or do you just relay on the fact that it errors (returns non zero in such case). If so, is that actually safe and couldn't this mix us some regular pipe errors (in such case it's not probably ready for read) with the file handle check? I might be missing some things though as I'm not really an expert on Win API so this is more a check if the behavior is correct...

@cmb69
Copy link
Member Author

cmb69 commented Dec 15, 2024

The calls to WaitForMultipleObjects() and WaitForSingleObject() are useless for files (in the broader sense) which are not opened for asynchronous IO (aka. "overlapped IO"). php-src only ever opens for synchronous IO; there might be extensions, though, which use asynchronous IO, so I didn't bother messing with these calls (they should return immediately for php-src files anyway).

That additional patch first checks whether all handles are pipes (FILE_TYPE_PIPE may also refer to a socket, but these have already been filtered out). Only if all handles are pipes (num_read_pipes == n_handles), the handles are peeked, but we already know that that all are pipes, so no need to call GetFileType() again.

Note that with or without the additional patch this is merely a work-around. The clean solution would be using asynchronous IO, but that is not even supported for anonymous pipes on Windows …

@bukka
Copy link
Member

bukka commented Dec 15, 2024

Shouldn't the pipes always peek so the rset contains the ones that are ready? Why would it now peek only if all handles are pipe? It makes sense to wait only if all handles are pipes but peek should be done nevertheless IMHO. Unless I'm misunderstanding something.

@cmb69
Copy link
Member Author

cmb69 commented Dec 15, 2024

Before this PR had been merged, passing anything but sockets (and waitable objects) to stream_select() (regardless of which set) was completely futile: the function would return immediately reporting everything as ready. After this PR had been merged, passing only pipes to rset almost worked as on POSIX (it didn't wait, so CPU would run until at least one pipe was ready). You commented:

But after this change it will return also immediately but without the handle in the set so users might think it's not ready and just not wait any longer.

This is a valid concern. A pipe in the rset might not be ready, but pipes in the wset (or eset) would reported as being ready.

The additional patch caters to that: it is fully backwards compatible in case pipes (or files etc.) are passed to rset and wset (or eset): stream_select() returns immediately. But if only pipes are passed to rset, the function now behaves as specified by POSIX (well, roughly): it returns only if a pipe is ready, and if not it waits for some time (to unburden the CPU).

Now, we could sleep for some time (how long?) and check whether a pipe in rset is ready; but we cannot know whether a pipe in wset (or eset) is ready, so would need to report these as ready anyway. What would that improve? Nothing. Users might have to wait for a pipe in wset being reported as ready, but writing to this pipe could still block. As I said, there is no general solution regarding portability; but the patch at least fixes an apparently not uncommon case, where only pipes are passed to rset, e.g. used in run-tests.php on stdout and stderr, and I figure that there may be code which spawns several children, and now can properly select on their output.

@bukka
Copy link
Member

bukka commented Dec 15, 2024

So my main worry is about this scenario:

$rset = [$rpipe];
$wset = null;
$eset = null;
if (stream_select($rset, $wset, $eset, 2) > 0) {
  echo $fread($rpipe, 100);
}

And consider that $rpipe is not ready for 100ms.

As I understand it, currently it returns immediately and read from $rpipe (fread blocks for 100ms). After your change, it also returns immediately but no fread because it won't get anything from peek. Is that right? If so, the problem is that users expect waiting for max 2s and they might consider no point to call fread because it suggests it already waited 2s so suddenly they will consider this like a breaking change because it will start timing out (even though there is no wait and the time out is immediate but it won't read anything).

@cmb69
Copy link
Member Author

cmb69 commented Dec 16, 2024

After your change, it also returns immediately but no fread because it won't get anything from peek. Is that right?

No. Since peek wouldn't tell that data is available, no fd would be selected, so php_select() would loop (with this PR busily; with the additional patch usleep(100)ing between loop iterations). Only when at least one fd is available, or the timeout is hit, php_select() will return.

To wit:

<?php

$desc = [
    ["null"],
    ["pipe", "w"],
    ["null"]
];
$proc = proc_open("php -r \"sleep(1); echo 'hello';\"", $desc, $pipes);
$rpipe = $pipes[1];
$rset = [$rpipe];
$wset = null;
$eset = null;
if (stream_select($rset, $wset, $eset, 2) > 0) {
    echo fread($rpipe, 100);
}

outputs after ~1sec:

hello
multiplexing read pipes example
<?php

$desc = [
    ["null"],
    ["pipe", "w"],
    ["null"]
];
$read_pipes = [];
for ($i = 0; $i < 10; $i++) {
    $procs[] = proc_open(["php", "-r", "sleep(1 * (10 - $i)); echo 'hello$i';"], $desc, $pipes);
    $read_pipes[] = $pipes[1];
}
$rset = [...$read_pipes];
$wset = null;
$eset = null;
while (!empty($read_pipes) && stream_select($rset, $wset, $eset, 2) > 0) {
    foreach ($rset as $pipe) {
        echo fread($pipe, 6), "\n";
        unset($read_pipes[array_search($pipe, $read_pipes)]);
    }
    $rset = [...$read_pipes];
}

This has now the same behavior as on Linux: test takes about 10 secs, and prints hello9, hello8, …, hello0 (one line every second). Prior to this patch, it would "do nothing" for 10 secs, and then print hello0, hello1, …, hello9 in one go.

@bukka
Copy link
Member

bukka commented Dec 16, 2024

Ah I didn't realise that there is a busy loop . There should definitely be some delay as suggested in the follow so if you could create a PR for that, it would be great!

cmb69 added a commit to cmb69/php-src that referenced this pull request Dec 16, 2024
The original patch[1] cared only about pipe handles in the rset, but
would be problematic if there are other handles (e.g. files in the
rset, or pipes/files in the other sets), because `php_select()` would
return immediately, reporting all non read-pipe handles as ready, but
possibly never reporting read-pipe handles.

We fix this by applying different logic for the case where only pipe
handles are supplied in the rset, but no handles in the wset or eset.
In this case `php_select()` only returns when actually one of the
handles is ready, or when the timeout expires.  To avoid busy looping
in this case, we sleep for a short amount of time.  This matches POSIX
behavior.

In all other cases, `php_select()` behaves as before (i.e. prior to the
original fix), that is it returns immediately, reporting all handles as
ready.

We also add a test case that demonstrates multiplexing the output of a
couple of child processes.

See also the discussion on <php#16917>.

[1] <php@b614b4a>
@cmb69 cmb69 mentioned this pull request Dec 16, 2024
@cmb69
Copy link
Member Author

cmb69 commented Dec 16, 2024

See #17174.

cmb69 added a commit that referenced this pull request Dec 16, 2024
The original patch[1] cared only about pipe handles in the rset, but
would be problematic if there are other handles (e.g. files in the
rset, or pipes/files in the other sets), because `php_select()` would
return immediately, reporting all non read-pipe handles as ready, but
possibly never reporting read-pipe handles.

We fix this by applying different logic for the case where only pipe
handles are supplied in the rset, but no handles in the wset or eset.
In this case `php_select()` only returns when actually one of the
handles is ready, or when the timeout expires.  To avoid busy looping
in this case, we sleep for a short amount of time.  This matches POSIX
behavior.

In all other cases, `php_select()` behaves as before (i.e. prior to the
original fix), that is it returns immediately, reporting all handles as
ready.

We also add a test case that demonstrates multiplexing the output of a
couple of child processes.

See also the discussion on <#16917>.

[1] <b614b4a>

Closes GH-17174.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

stream_select() timeout useless for pipes on Windows
2 participants