-
Notifications
You must be signed in to change notification settings - Fork 300
Parallelism of some nodes across processes. #821
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Or, add parameters to the name, so |
I wonder if this would be better in nipype, where a running node locks its working directory with, e.g. fasteners. By the time we get to code we control, nipype has already decided that it hasn't already been run. While we may be able to figure something out here, the time to lock seems to be at the point of that decision. |
Yes, that would be awesome. WDYT @satra? |
Here's a quick implementation of a DirectoryBasedLock. It will only work on filesystems that explicitly emulate local filesystem atomic semantics. Since nipype can't provide guarantees with any lock, because the contexts can vary too widely, we could make it optional, and have users provide a lock that has the following protocol: class LockDir(object):
def __init__(self, outdir):
self.outdir = outdir
self.__enter__ = self.acquire
self.__exit__ = self.release
def acquire(self, ...):
raise NotImplementedError
def release(self):
raise NotImplementedError Then a node that needs to be protected could be instantiated: node = pe.Node(Interface(), name='node', lock=DirectoryBasedLock) The default |
Come to think of it, with a protocol like that, we could set up a little TCP service that does nothing but sit and listen for requests with directory names, and let callers know if they get the lock. So depending on filesystem properties would become entirely optional. |
Another node suffering from this:
|
Why would this node be run on two different nodes? That looks like a directory got deleted out from under a running process. |
Yep, you are right. I got to that conclusion but missed updating this issue. This |
This problem is back:
I'll try to debug this, but may be necessary to escalate to nipype. |
So, are you assuming that one process is erasing the node out from under the other? Do you have two nodes running the same BOLD files at the same time? |
Yes - I'm sure I only have one instance of fmriprep for a given subject. So these nodes should not be deleted while executing. |
So this doesn't seem to be a parallelism of nodes across processes issue, does it? I agree it looks like a bug. Just not related to the |
|
Created, I think this is not an issue anymore. Let's keep an eye on Chris' PR to nipype. |
AFAIK this is still an issue. Even if nipy/nipype#2278 goes in, we'll still need to create a lock that works reasonably reliably and test it. |
@effigies - for multiproc, one can easily create a local lock and portalocker, which we have would be fine. i'm mostly worried about locks across nodes. also different clusters with different filesystems enable different kinds of locking mechanisms. |
Yes, this issue is specifically across nodes. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
I'm getting what I believe is this issue, in 1.5.0:
What should I do? |
@dmd I think this is a different but related issue. Could you open a new issue, please? |
When running subjects in parallel, certain nodes as
fsdir
will be run with the same directory asbase_dir
and introduce races.It'd be nice to have a locking mechanism for these nodes.
The text was updated successfully, but these errors were encountered: