fMRIprep / nipype error while checking node hash #3009

jeroenvanbaar · 2019-08-28T18:30:19Z

Summary

When running fmriprep 1.5.0rc2 in a singularity container, I get the following nipype.workflow warning. It asked for opening an issue here. I don't get this error when I run fmriprep 1.4.1rc1.

The warning/error:

190828-18:05:54,262 nipype.workflow WARNING:
	 Error while checking node hash, forcing re-run. Although this error may not prevent the workflow from running, it could indicate a major problem. Please report a new issue at https://github.com/nipy/nipype/issues adding the following information:

	Node: fmriprep_wf.single_subject_001_wf.func_preproc_ses_1_task_wordReading_run_4_wf.bold_reference_wf.enhance_and_skullstrip_bold_wf.map_brainmask
	Interface: niworkflows.interfaces.fixes.FixHeaderApplyTransforms
	Traceback:
Traceback (most recent call last):

  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/plugins/base.py", line 337, in _local_hash_check
    cached, updated = self.procs[jobid].is_cached()

  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 297, in is_cached
    hashed_inputs, hashvalue = self._get_hashval()

  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 494, in _get_hashval
    self._get_inputs()

  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 521, in _get_inputs
    outputs = _load_resultfile(results_file)[0].outputs

  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/utils.py", line 300, in load_resultfile
    result = loadpkl(results_file)

  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/utils/filemanip.py", line 686, in loadpkl
    pkl_file = pklopen(infile.name, 'rb')

  File "/usr/local/miniconda/lib/python3.7/gzip.py", line 53, in open
    binary_file = GzipFile(filename, gz_mode, compresslevel)

  File "/usr/local/miniconda/lib/python3.7/gzip.py", line 163, in __init__
    fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')

FileNotFoundError: [Errno 2] No such file or directory: 'result_norm.pklz'

Script/Workflow details

My Singularity command:

singularity run --cleanenv -B /gpfs_home/jvanbaar/data/jvanbaar/my_project_folder:/p1,/gpfs/scratch/jvanbaar:/s1 fmriprep-1.5.0rc2.simg /p1/sourcedata /p1/derivatives participant -w /s1 --participant-label sub-001 --fs-license-file /p1/freesurfer.txt --fs-no-reconall --output-spaces MNI152NLin2009cAsym --resource-monitor --write-graph --use-syn-sdc --ignore fieldmaps --n_cpus 4 --mem_mb 32000

Happy to provide more information.

The text was updated successfully, but these errors were encountered:

oesteban · 2019-08-28T19:08:13Z

You seem to be reusing /gpfs/scratch/jvanbaar as work directory. Can you test whether the following command works for you?:

singularity run --cleanenv -B /gpfs_home/jvanbaar/data/jvanbaar/my_project_folder:/p1,/gpfs/scratch/jvanbaar:/s1 fmriprep-1.5.0rc2.simg /p1/sourcedata /p1/derivatives participant -w /s1/work-001 --participant-label sub-001 --fs-license-file /p1/freesurfer.txt --fs-no-reconall --output-spaces MNI152NLin2009cAsym --resource-monitor --write-graph --use-syn-sdc --ignore fieldmaps --n_cpus 4 --mem_mb 32000

jeroenvanbaar · 2019-08-28T19:12:50Z

Thanks for the quick response! I'm now using /s1/sub-001 as work directory, but am still getting the same error. Different traceback though:

190828-19:03:41,432 nipype.workflow WARNING:
	 Error while checking node hash, forcing re-run. Although this error may not prevent the workflow from running, it could indicate a major problem. Please report a new issue at https://github.com/nipy/nipype/issues adding the following information:

	Node: fmriprep_wf.single_subject_001_wf.func_preproc_ses_1_task_wordReading_run_1_wf.bold_reference_wf.gen_ref
	Interface: niworkflows.interfaces.registration.EstimateReferenceImage
	Traceback:
Traceback (most recent call last):

  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/plugins/base.py", line 337, in _local_hash_check
    cached, updated = self.procs[jobid].is_cached()

  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 297, in is_cached
    hashed_inputs, hashvalue = self._get_hashval()

  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 494, in _get_hashval
    self._get_inputs()

  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 521, in _get_inputs
    outputs = _load_resultfile(results_file)[0].outputs

  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/utils.py", line 311, in load_resultfile
    if resolve and result.outputs:

AttributeError: 'NoneType' object has no attribute 'outputs'

oesteban · 2019-08-29T01:19:34Z

Okay, this is not the same error. Are you trying to parallelize several subjects running several fmriprep processes separately?

jeroenvanbaar · 2019-08-29T13:53:23Z

Yes, this is when running 10 separate fmriprep processes in parallel on separate HPC nodes (10 subjects). I defined a different scratch folder for each process

oesteban · 2019-08-29T15:33:26Z

Can you post your submission script?

jeroenvanbaar · 2019-08-29T15:58:46Z

Here's my slurm submission script:

#!/bin/bash

#SBATCH -J fMRIprep

#SBATCH -c 4
#SBATCH --mem=32G
#SBATCH --account=my_account_name
#SBATCH -t 24:00:00

#SBATCH --array=1,2,4,5,6,7,8,10,11,13
#SBATCH -e fmriprep_logs/fmriprep_sub-%a.err
#SBATCH -o fmriprep_logs/fmriprep_sub-%a.out

#SBATCH --mail-type=ALL
#SBATCH --mail-user=my_email_address
 
echo "Starting fMRIprep for subject "$SLURM_ARRAY_TASK_ID

subject_dir=$(printf "sub-%03d" $SLURM_ARRAY_TASK_ID)
scratch_dir=$(printf "/s1/sub-%03d" $SLURM_ARRAY_TASK_ID)

echo "Scratch directory: "$scratch_dir

singularity run --cleanenv -B /gpfs_home/jvanbaar/data/jvanbaar/my_project_folder:/p1,/gpfs/scratch/jvanbaar:/s1 fmriprep-1.5.0rc2.simg /p1/sourcedata /p1/derivatives participant -w $scratch_dir --participant-label $subject_dir --fs-license-file /p1/freesurfer.txt --fs-no-reconall --output-spaces MNI152NLin2009cAsym --resource-monitor --write-graph --use-syn-sdc --ignore fieldmaps --n_cpus 4 --mem_mb 32000

mattcieslak · 2019-09-03T17:00:34Z

I am also getting this error whenever running a workflow in 1.2.1 that has mapnodes and is using multiprocessing. It happens in about 40% of the time I run a workflow and occurs randomly. If I re-run the workflow multiple times it will eventually succeed. So it's not specific to fmriprep, I think it's nipype 1.2.1.

Edit: I'm getting this error both when using singularity 3.3.0 and docker

oesteban · 2019-09-03T23:48:47Z

@mattcieslak could you write a failing example for Nipype that we can use to debug this?

oesteban · 2019-09-04T01:40:38Z

Okay, I can see this is currently happening in fMRIPrep's master - I'll take this one.

oesteban · 2019-09-04T16:08:41Z

@jeroenvanbaar can you provide full logs from fMRIPrep?

@mattcieslak I've opened the issue referenced above for the case of MapNodes.

oesteban · 2019-09-04T21:31:44Z

It seems you can ignore this warning, as it seems not to stop the workflow:

190828-19:03:41,432 nipype.workflow WARNING:
	 Error while checking node hash, forcing re-run. Although this error may not prevent the workflow from running, it could indicate a major problem. Please report a new issue at https://github.com/nipy/nipype/issues adding the following information:

	Node: fmriprep_wf.single_subject_001_wf.func_preproc_ses_1_task_wordReading_run_1_wf.bold_reference_wf.gen_ref
	Interface: niworkflows.interfaces.registration.EstimateReferenceImage
	Traceback:
Traceback (most recent call last):

  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/plugins/base.py", line 337, in _local_hash_check
    cached, updated = self.procs[jobid].is_cached()

  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 297, in is_cached
    hashed_inputs, hashvalue = self._get_hashval()

  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 494, in _get_hashval
    self._get_inputs()

  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 521, in _get_inputs
    outputs = _load_resultfile(results_file)[0].outputs

  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/utils.py", line 311, in load_resultfile
    if resolve and result.outputs:

AttributeError: 'NoneType' object has no attribute 'outputs'

190828-19:03:41,434 nipype.workflow INFO:
	 [Node] Setting-up "fmriprep_wf.single_subject_001_wf.func_preproc_ses_1_task_wordReading_run_1_wf.bold_reference_wf.gen_ref" in "/s1/sub-001/fmriprep_wf/single_subject_001_wf/func_preproc_ses_1_task_wordReading_run_1_wf/bold_reference_wf/gen_ref".
190828-19:03:41,442 nipype.workflow INFO:
	 [Node] Cached "fmriprep_wf.single_subject_001_wf.func_preproc_ses_1_task_wordReading_run_1_wf.bold_reference_wf.gen_ref" - collecting precomputed outputs
190828-19:03:41,442 nipype.workflow INFO:
	 [Node] "fmriprep_wf.single_subject_001_wf.func_preproc_ses_1_task_wordReading_run_1_wf.bold_reference_wf.gen_ref" found cached.

Nonetheless, we'll try to find out why this warning is being overused.

oesteban · 2019-09-10T06:56:25Z

Another one. This time it happened after the interface was run:

190910-02:31:11,300 nipype.utils ERROR:
	 No metadata was found in the pkl file. Make sure you are currently using the same Nipype version from the generated pkl.
190910-02:31:11,346 nipype.workflow ERROR:
	 Node add_dvars_header failed to run on host 1064e7e45ec5.
190910-02:31:11,394 nipype.workflow ERROR:
	 Saving crash info to /out/fmriprep/sub-01/log/20190910-021645_86b7c962-680e-4c02-a110-e8aef9529849/crash-20190910-023111-UID1001-add_dvars_header-8b4860c1-2a0c-4a70-a879-961f5dbdc4d8.txt
Traceback (most recent call last):
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/plugins/legacymultiproc.py", line 381, in _send_procs_to_workers
    self.procs[jobid].run(updatehash=updatehash)
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 488, in run
    self, report_type='postexec', is_mapnode=isinstance(self, MapNode))
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/utils.py", line 152, in write_report
    result = node.result  # Locally cache result
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 198, in result
    op.join(self.output_dir(), 'result_%s.pklz' % self.name))[0]
  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/utils.py", line 311, in load_resultfile
    if resolve and result.outputs:
AttributeError: 'NoneType' object has no attribute 'outputs'

190910-02:31:11,473 nipype.workflow INFO:
	 [Job 238] Completed (fmriprep_wf.single_subject_01_wf.func_preproc_task_mixedgamblestask_run_02_wf.bold_confounds_wf.add_dvars_header).

EDIT: Added the link, and checked that just rerunning the build (i.e. no clearing of cache) did work out. - might be some sort of synchronization issue.

Prevents nipy#3009 and nipy#3014 from happening - although this might not solve those issues, this patch will help find their origin by making ``load_resultfile`` more strict (and letting it raise exceptions). The try .. except structure is moved to the only place is was being used within the Node code.

satra · 2019-09-10T13:22:59Z

@oesteban - something seems to have fundamentally changed for these errors to start popping up, which makes me worry a bit.

can we run it with config option (stop on first crash = True, and also increase the timeout - although normally that should not affect things locally)?

also the call to results.outputs happens after this check:

    if not results_file.exists():
        return result, aggregate, attribute_error

which also suggests a write-timing/results availability issue. i.e. the plugin has returned control without finishing writing a results file. which i didn't think could happen for multiproc, but now that we are using concurrent_futures with a future, this can easily happen, especially on a system like circle, where execution speeds can be limited.

how about we try out SoftFileLock from filelock (which we are using in pydra), when saving and loading result files? if this works, that may help a few areas.

Minimize the access to the ``result`` property when writing pre/post-execution reports. This modification should particularly preempt nipy#3009 (comment)

oesteban · 2019-09-10T17:50:19Z

something seems to have fundamentally changed for these errors to start popping up, which makes me worry a bit.

We're on the same page :(

can we run it with config option (stop on first crash = True, and also increase the timeout - although normally that should not affect things locally)?

Yes, I think setting stop on first crash true is a good idea for fMRIPrep anyways.

which also suggests a write-timing/results availability issue. i.e. the plugin has returned control without finishing writing a results file. which i didn't think could happen for multiproc, but now that we are using concurrent_futures with a future, this can easily happen, especially on a system like circle, where execution speeds can be limited.

This happened with LegacyMultiProc, so there must be something else. But yes, I've been looking into this and I can only think of two possibilities: write-timing/results availability and that the results object gets mutated elsewhere (which seems impossible, btw).

how about we try out SoftFileLock from filelock (which we are using in pydra), when saving and loading result files? if this works, that may help a few areas.

Sounds good, of course.

Minimize the access to the ``result`` property when writing pre/post-execution reports. This modification should particularly preempt nipy#3009 (comment)

Prevents nipy#3009 and nipy#3014 from happening - although this might not solve those issues, this patch will help find their origin by making ``load_resultfile`` more strict (and letting it raise exceptions). The try .. except structure is moved to the only place is was being used within the Node code.

Minimize the access to the ``result`` property when writing pre/post-execution reports. This modification should particularly preempt nipy#3009 (comment)

oesteban · 2019-09-11T22:33:23Z

This has been addressed with #3024. I'll leave #3014 open as the changes we've been including might not have solved that one.

oesteban · 2019-10-10T06:50:44Z

Interestingly, this error (warning) on regular nodes seems to be really short-lived:

191010-06:42:30,318 nipype.utils ERROR:
         No metadata was found in the pkl file. Make sure you are currently using the same Nipype version from the generated pkl.
191010-06:42:30,324 nipype.workflow WARNING:
         Error while checking node hash, forcing re-run. Although this error may not prevent the workflow from running, it could indicate a major problem. Please report a new issue
at https://github.com/nipy/nipype/issues adding the following information:

        Node: fmriprep_wf.single_subject_NDARINV00CY2MDM_wf.func_preproc_ses_baselineYear1Arm1_task_rest_run_04_wf.bold_reg_wf.bbreg_wf.bbregister
        Interface: niworkflows.interfaces.freesurfer.PatchedBBRegisterRPT
        Traceback:
Traceback (most recent call last):

  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/plugins/base.py", line 337, in _local_hash_check
    cached, updated = self.procs[jobid].is_cached()

  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 297, in is_cached
    hashed_inputs, hashvalue = self._get_hashval()

  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 494, in _get_hashval
    self._get_inputs()

  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/nodes.py", line 521, in _get_inputs
    outputs = _load_resultfile(results_file)[0].outputs

  File "/usr/local/miniconda/lib/python3.7/site-packages/nipype/pipeline/engine/utils.py", line 311, in load_resultfile
    if resolve and result.outputs:

AttributeError: 'NoneType' object has no attribute 'outputs'

191010-06:42:30,325 nipype.workflow INFO:
         [Node] Setting-up "fmriprep_wf.single_subject_NDARINV00CY2MDM_wf.func_preproc_ses_baselineYear1Arm1_task_rest_run_04_wf.bold_reg_wf.bbreg_wf.bbregister" in "/scratch/03763/oesteban/fmriprep-work/sub-NDARINV00CY2MDM/fmriprep_wf/single_subject_NDARINV00CY2MDM_wf/func_preproc_ses_baselineYear1Arm1_task_rest_run_04_wf/bold_reg_wf/bbreg_wf/bbregister".
191010-06:42:30,353 nipype.workflow INFO:
         [Job 615] Cached (fmriprep_wf.single_subject_NDARINV00CY2MDM_wf.func_preproc_ses_baselineYear1Arm1_task_rest_run_03_wf.bold_reg_wf.bbreg_wf.mri_coreg).
191010-06:42:30,378 nipype.workflow INFO:
         [Node] Cached "fmriprep_wf.single_subject_NDARINV00CY2MDM_wf.func_preproc_ses_baselineYear1Arm1_task_rest_run_04_wf.bold_reg_wf.bbreg_wf.bbregister" - collecting precomputed outputs
191010-06:42:30,378 nipype.workflow INFO:
         [Node] "fmriprep_wf.single_subject_NDARINV00CY2MDM_wf.func_preproc_ses_baselineYear1Arm1_task_rest_run_04_wf.bold_reg_wf.bbreg_wf.bbregister" found cached.

When the node is checked by the execution plugin, we see the warning is issued when trying to read the outputs of a prior node feeding into the bbregister node. A few seconds later, the results files for the inputs are checked again and now it works - the node is found cached.

oesteban self-assigned this Sep 4, 2019

oesteban mentioned this issue Sep 4, 2019

Error checking MapNode hash with fMRIPrep #3014

Closed

oesteban mentioned this issue Sep 10, 2019

FIX: Disallow returning None in pipeline.utils.load_resultfile #3023

Merged

1 task

oesteban mentioned this issue Sep 10, 2019

ENH: Avoid loading result from file when writing reports #3024

Merged

1 task

effigies added this to the 1.2.3 milestone Sep 11, 2019

oesteban closed this as completed Sep 11, 2019

oesteban mentioned this issue Oct 3, 2019

Problems during processing with 1.5.0 nipreps/fmriprep#1798

Closed

dPys mentioned this issue Oct 13, 2019

Strange results file checking behavior that breaks workflows at nested mapnodes in nipype 1.2.3 #3076

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fMRIprep / nipype error while checking node hash #3009

fMRIprep / nipype error while checking node hash #3009

jeroenvanbaar commented Aug 28, 2019 •

edited by oesteban

Loading

oesteban commented Aug 28, 2019

jeroenvanbaar commented Aug 28, 2019 •

edited by oesteban

Loading

oesteban commented Aug 29, 2019

jeroenvanbaar commented Aug 29, 2019 •

edited

Loading

oesteban commented Aug 29, 2019

jeroenvanbaar commented Aug 29, 2019

mattcieslak commented Sep 3, 2019 •

edited

Loading

oesteban commented Sep 3, 2019

oesteban commented Sep 4, 2019

oesteban commented Sep 4, 2019

oesteban commented Sep 4, 2019

oesteban commented Sep 10, 2019 •

edited

Loading

satra commented Sep 10, 2019

oesteban commented Sep 10, 2019

oesteban commented Sep 11, 2019

oesteban commented Oct 10, 2019

fMRIprep / nipype error while checking node hash #3009

fMRIprep / nipype error while checking node hash #3009

Comments

jeroenvanbaar commented Aug 28, 2019 • edited by oesteban Loading

Summary

The warning/error:

Script/Workflow details

oesteban commented Aug 28, 2019

jeroenvanbaar commented Aug 28, 2019 • edited by oesteban Loading

oesteban commented Aug 29, 2019

jeroenvanbaar commented Aug 29, 2019 • edited Loading

oesteban commented Aug 29, 2019

jeroenvanbaar commented Aug 29, 2019

mattcieslak commented Sep 3, 2019 • edited Loading

oesteban commented Sep 3, 2019

oesteban commented Sep 4, 2019

oesteban commented Sep 4, 2019

oesteban commented Sep 4, 2019

oesteban commented Sep 10, 2019 • edited Loading

satra commented Sep 10, 2019

oesteban commented Sep 10, 2019

oesteban commented Sep 11, 2019

oesteban commented Oct 10, 2019

jeroenvanbaar commented Aug 28, 2019 •

edited by oesteban

Loading

jeroenvanbaar commented Aug 28, 2019 •

edited by oesteban

Loading

jeroenvanbaar commented Aug 29, 2019 •

edited

Loading

mattcieslak commented Sep 3, 2019 •

edited

Loading

oesteban commented Sep 10, 2019 •

edited

Loading