-
-
Notifications
You must be signed in to change notification settings - Fork 732
Use forkserver on Unix and Python 3 #687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
btw, I wonder if a similar change should be made in Dask too. |
This makes the test suite slower on Python 3 (442 s. vs. 178 s.). I'm not terribly surprised, since launching a process is now a bit more expensive (though still less than with the "spawn" method). In stable regime, this probably doesn't matter, but the test suite launches tons of child processes. |
Thoughts on how to resolve the HDFS issue on Python 2? Is forkserver still the right decision in Python 3 if we don't care about HDFS? |
Probably. There are other possible issues with fork(). Most third-party libraries (Python or C) are not fork-safe, so we may run into similar issues (in my previous job we had broken SSL connections until we stopped sharing resources). The one functional downside is that there are a couple things to be aware of when not using the "fork" method. These are the same guidelines as on Windows: https://docs.python.org/3/library/multiprocessing.html#the-spawn-and-forkserver-start-methods |
Not sure. One possibility would be to force all HDFS operations in a dedicated process, but since this would typically deal with large-ish data, the marshalling overhead may be a problem. |
@@ -29,6 +30,12 @@ | |||
logger = logging.getLogger(__name__) | |||
|
|||
|
|||
if PY3 and not sys.platform.startswith('win'): | |||
mp_context = multiprocessing.get_context('forkserver') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be nice to get this from the config file so that we can set this to fork for faster tests during local development.
from .config import config
context = config.get('multiprocessing_context', 'forkserver')
mp_context = multiprocessing.get_context(context)
There is probably a better name for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is probably a better name for this.
Hmm, do you want to suggest one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was more concerned about adding a config option when there was a significant delay to using forkserver
. Now that this is as fast as using fork I'll retract my comment.
Have you guys looked at billiard? It's effectively a backported |
That's interesting. I had never heard about it before. Is there any documentation? I couldn't find any. |
Ok, I looked at billiard, and the "forkserver" method isn't available on Python 2:
|
It turns out one can speed up the spawning by defining some modules to preload in the forkserver process. This makes the test suite as fast on 3.x as on 2.x. |
For some unknown reason, the hangs in test_hdfs have reappeared on Travis. I'm a bit baffled, since I can't reproduce them anymore at home. See https://travis-ci.org/dask/distributed/jobs/177681409 |
Ok, I manage to reproduce the hangs locally with hdfs3 master, but not with hdfs3 0.1.2. After bisecting, I find out the first buggy revision seems to be dask/hdfs3@bd76002. |
Feel free to revert the locket stuff in hdfs3. That appeared to resolve On Mon, Nov 21, 2016 at 11:24 AM, Antoine Pitrou [email protected]
|
Which version of |
3.5.0.2, installed using pip. |
Yeah, I see the same thing. Opened issue ( celery/billiard#200 ) to get more info. |
After a cursory look, it appears one would need to backport the |
@jakirkham, I'm not sure billiard is very well-tested (see e.g. celery/billiard#201) |
Ok, the latest changes to hdfs3 suppressed the hangs. |
I'm curious about the state of Python 2. |
Python 2 can still hang occasionally: https://travis-ci.org/dask/distributed/jobs/177681408 |
Any recommendations on how to resolve this? I see a few options:
On Mon, Nov 21, 2016 at 1:40 PM, Antoine Pitrou [email protected]
|
Judging by the problems on Travis, the file-based locks didn't solve the issue, right?
As long as the parent doesn't try to use hdfs3, forking isn't an issue. That's why forkserver works. So if nanny spawns a first process and then forks children from it, it should work.
And cheap technically :-) A possible solution would be to get forkserver to work on Python 2, using billiard, but billiard doesn't seem tremendously well-tested and I wonder if it's meant for outside use or just Celery's internal use. |
Correct that the current solution didn't resolve the issue. An older solution that locked just around reading
If that's the case then it may be that it's only our tests that fail and that the current solution would work in practice. |
This was the previous solution: distributed/distributed/hdfs.py Lines 34 to 40 in fd64029
For whatever reason (perhaps even unrelated to that code) we didn't run into concurrency issues. |
Well, concurrency issues tend to crop up randomly, so perhaps we were lucky that the tests didn't stress concurrency enough? I'm skeptical that any solution based on locks would actually solve the issue, since the problem lies elsewhere.
Except if people use the |
Using Client APIs directly is pretty common. I know of only a few groups that creates workers manually and even they create the worker as the first thing they do. |
Hmm, so how about skipping the parts of test_hdfs that fork on Python 2? |
I would be OK with this. |
I can confirm that skipping the tests that use |
This looks good to me. Merging soon if no comments. |
The "forkserver" method is a multiprocessing feature on Python 3. In this mode of operation, a middleman process is spawned that will fork children on behalf of the parent. This avoids inheriting the parent's system resources (file descriptors, mutexes, etc.).
This fixes the test_hdfs hangs here (on Python 3, that is).
I'm not terribly sold on
mp_context
as a name. Suggestions welcome.