-
Notifications
You must be signed in to change notification settings - Fork 532
fMRIprep / nipype error while checking node hash #3009
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
You seem to be reusing
|
Thanks for the quick response! I'm now using /s1/sub-001 as work directory, but am still getting the same error. Different traceback though:
|
Okay, this is not the same error. Are you trying to parallelize several subjects running several fmriprep processes separately? |
Yes, this is when running 10 separate fmriprep processes in parallel on separate HPC nodes (10 subjects). I defined a different scratch folder for each process |
Can you post your submission script? |
Here's my slurm submission script:
|
I am also getting this error whenever running a workflow in 1.2.1 that has mapnodes and is using multiprocessing. It happens in about 40% of the time I run a workflow and occurs randomly. If I re-run the workflow multiple times it will eventually succeed. So it's not specific to fmriprep, I think it's nipype 1.2.1. Edit: I'm getting this error both when using singularity 3.3.0 and docker |
@mattcieslak could you write a failing example for Nipype that we can use to debug this? |
Okay, I can see this is currently happening in fMRIPrep's master - I'll take this one. |
@jeroenvanbaar can you provide full logs from fMRIPrep? @mattcieslak I've opened the issue referenced above for the case of MapNodes. |
It seems you can ignore this warning, as it seems not to stop the workflow:
Nonetheless, we'll try to find out why this warning is being overused. |
Another one. This time it happened after the interface was run:
EDIT: Added the link, and checked that just rerunning the build (i.e. no clearing of cache) did work out. - might be some sort of synchronization issue. |
Prevents nipy#3009 and nipy#3014 from happening - although this might not solve those issues, this patch will help find their origin by making ``load_resultfile`` more strict (and letting it raise exceptions). The try .. except structure is moved to the only place is was being used within the Node code.
@oesteban - something seems to have fundamentally changed for these errors to start popping up, which makes me worry a bit. can we run it with config option (stop on first crash = True, and also increase the timeout - although normally that should not affect things locally)? also the call to results.outputs happens after this check:
which also suggests a write-timing/results availability issue. i.e. the plugin has returned control without finishing writing a results file. which i didn't think could happen for multiproc, but now that we are using concurrent_futures with a future, this can easily happen, especially on a system like circle, where execution speeds can be limited. how about we try out |
Minimize the access to the ``result`` property when writing pre/post-execution reports. This modification should particularly preempt nipy#3009 (comment)
We're on the same page :(
Yes, I think setting stop on first crash true is a good idea for fMRIPrep anyways.
This happened with
Sounds good, of course. |
Minimize the access to the ``result`` property when writing pre/post-execution reports. This modification should particularly preempt nipy#3009 (comment)
Prevents nipy#3009 and nipy#3014 from happening - although this might not solve those issues, this patch will help find their origin by making ``load_resultfile`` more strict (and letting it raise exceptions). The try .. except structure is moved to the only place is was being used within the Node code.
Minimize the access to the ``result`` property when writing pre/post-execution reports. This modification should particularly preempt nipy#3009 (comment)
Interestingly, this error (warning) on regular nodes seems to be really short-lived:
When the node is checked by the execution plugin, we see the warning is issued when trying to read the outputs of a prior node feeding into the bbregister node. A few seconds later, the results files for the inputs are checked again and now it works - the node is found cached. |
Summary
When running fmriprep 1.5.0rc2 in a singularity container, I get the following nipype.workflow warning. It asked for opening an issue here. I don't get this error when I run fmriprep 1.4.1rc1.
The warning/error:
Script/Workflow details
My Singularity command:
Happy to provide more information.
The text was updated successfully, but these errors were encountered: