When node changed and workflow rerun, child nodes of changed node failed to rerun #2951

mick-d · 2019-06-28T07:14:06Z

Summary

Usually when i am debugging a workflow, i change one of the node, and when rerunning the workflow all child nodes of that node will be rerun but parent and independent nodes will just use the cached previous results. However this does not happen in one of my workflow and i was wondering what were the criteria for this behavior to happen?

Actual behavior

1st run: Node 1 (results cached) --> Node 2 (results cached) --> Node 3 (results cached)

2nd run with Node 2 modified: Node 1 (use previous cached results --> Node 2 (creating new results to be cached) --> Node 3 (keep previous cached results although input changed)

Expected behavior

1st run: Node 1 (results cached) --> Node 2 (results cached) --> Node 3 (results cached)

2nd run with Node 2 modified: Node 1 (use previous cached results --> Node 2 (creating new results to be cached) --> Node 3 (creating new results to be cached)

How to replicate the behavior

I can put more details here but first it'd be great to know if the behavior expectation is correct, and what are the requirements for it to work. I believe the issue may come from the problematic node not rerun fsl.CopyGeom which creates a local copy of the file on which to copy the header information. It'd be great to have more information on how the "Node rerun" decision is made.

Script/Workflow details

    rerunissue.connect([
                               (Node1, CopyGeom,
                                [("out_file", "in_file")]),
                               (Node2, CopyGeom,
                                [("out_file", "dest_file")]),
                               (CopyGeom, Node4,
                                [("out_file", "in_file")]),
                         ])

When Node2 is modified and the workflow is rerun, CopyGeom is not rerun (and subsequent nodes such as Node4 are not rerun either).

Platform details:

{'commit_hash': '%h',
 'commit_source': 'archive substitution',
 'networkx_version': '2.3',
 'nibabel_version': '2.4.1',
 'nipype_version': '1.2.0',
 'numpy_version': '1.16.4',
 'pkg_path': '/home/<my_username>/pyutils/miniconda3/envs/mri36/lib/python3.6/site-packages/nipype',
 'scipy_version': '1.2.1',
 'sys_executable': '/home/<my_username>/pyutils/miniconda3/envs/mri36/bin/python',
 'sys_platform': 'linux',
 'sys_version': '3.6.8 |Anaconda, Inc.| (default, Dec 30 2018, 01:22:34) \n'
                '[GCC 7.3.0]',
 'traits_version': '5.1.1'}

Execution environment

Choose one

My python environment outside container

The text was updated successfully, but these errors were encountered:

satra · 2019-06-28T11:55:54Z

@mick-d - are you positive that node2 is producing new outputs after change? it may still be producing the same output (content). if you are using content hash instead of timestamp that can help.

but more details would help.

mick-d · 2019-06-28T14:04:46Z

Hi Satra, yes i am 100% positive, both the output content and timestamp of node2 changed in the workflow basedir, but still node3 (fsl.CopyGeom) would not rerun (timestamp of node3 content was anterior to output of node2).

satra · 2019-06-30T12:30:27Z

@mick-d - in that case would it be possible to create a small example that replicates the issue?

mick-d · 2019-06-30T15:55:39Z

@satra Yes, the previous nodes are actually part of a worflow so i'll create a simple and clear example from scratch to better illustrate it.

oesteban · 2019-08-07T23:48:45Z

Hey @mick-d, I've just merged #2971 which potentially affects this particular problem.

Can you check whether this has been fixed? (please remind running both instances of the workflow with use_relative_paths set on).

axiezai · 2020-05-28T17:56:28Z

Hi, Just to add onto this,

I created a workflow with the following MapNodes for a BIDS dataset with 1 subject and 2 sessions:

# other FSL nodes...

# FSL ApplyWarp interface
ACPC_warp = MapNode(fsl.preprocess.ApplyWarp(), name='apply_warp', iterfield=["in_file", "premat"])
ACPC_warp.inputs.out_file = 'acpc_t1.nii.gz'
ACPC_warp.inputs.relwarp = True
ACPC_warp.inputs.interp = "spline"
ACPC_warp.inputs.ref_file = MNI_template

# gunzip:
gz2nii = MapNode(gunzip_nii(), name='gunzip', iterfield="in_file")
gz2nii.inputs.out_file = 'acpc_t1.nii'

On the first run, I did not have the gunzip MapNode and I set ACPC_warp.inputs.out_file = 'acpc_t1.nii, and it produced such file with .nii extension, then I changed the input to .nii.gz and re-ran my workflow after deleting the previous output directory.

For my workflow I provided the following:

workflow = Workflow(name='mni_reconall', base_dir = os.path.join(Path(BIDS_DIR).parent, 'derivatives'))
workflow.connect(
        [
            (subject_source, select_files, [("subject_id", "subject_id")]),
            (select_files, reduceFOV, [("anat", "in_file")]),
            (reduceFOV, xfminverse, [("out_transform", "in_file")]),
            (reduceFOV, flirt, [("out_roi", "in_file")]),
            (xfminverse, concatxfm, [("out_file", "in_file")]),
            (flirt, concatxfm, [("out_matrix_file", "in_file2")]),
            (concatxfm, alignxfm, [("out_file", "in_file")]),
            (select_files, ACPC_warp, [("anat", "in_file")]),
            (alignxfm, ACPC_warp, [("out_file", "premat")]),
            (ACPC_warp, gz2nii, [("out_file", "in_file")]),
            (gz2nii, reconall, [("out_file", "T1_files")]),
            (select_files, get_fs_id, [("anat", "anat_files")]),
            (get_fs_id, reconall, [("fs_id_list", "subject_id")])
        ]
    )
workflow.config['execution'] = {'use_relative_paths': 'True', 'hash_method': 'content'}
workflow.run('MultiProc', plugin_args = {'n_procs': 2})

On the second run where I expect a .nii.gz output, the workflow still uses the old cached results:

200528-10:45:03,70 nipype.workflow INFO:
	 [Node] Cached "_apply_warp0" - collecting precomputed outputs
200528-10:45:03,70 nipype.workflow INFO:
	 [Node] "_apply_warp0" found cached.

And sure enough, the gunzip MapNode doesnt find .nii.gz file, because a .nii file from the previous run is used:

Standard error:
gzip: acpc_t1.nii.gz: No such file or directory
Return code: 1

Singularity> ls /dwi_preproc/derivatives/mni_reconall/_subject_id_01/apply_warp/mapflow/_apply_warp1/
_0x0573287f3994c86d318ed310ffb09564.json  **acpc_t1.nii**  command.txt  _inputs.pklz  _node.pklz  _report  result__apply_warp1.pklz

Platform details:

200528-08:35:53,637 nipype.utils INFO:
         Running nipype version 1.5.0-rc1 (latest: 1.4.2)
{'commit_hash': '%h',
 'commit_source': 'archive substitution',
 'networkx_version': '2.4',
 'nibabel_version': '3.1.0',
 'nipype_version': '1.5.0-rc1',
 'numpy_version': '1.18.4',
 'pkg_path': '/opt/miniconda-latest/envs/tracts/lib/python3.7/site-packages/nipype',
 'scipy_version': '1.4.1',
 'sys_executable': '/opt/miniconda-latest/envs/tracts/bin/python',
 'sys_platform': 'linux',
 'sys_version': '3.7.3 | packaged by conda-forge | (default, Dec  6 2019, '
                '08:54:18) \n'
                '[GCC 7.3.0]',
 'traits_version': '6.0.0'}

Let me know how else I can help. Thank you :)

axiezai · 2020-05-28T18:22:11Z

After a more detailed look, specifying the output type to the FSL interface ACPC_warp.inputs.output_type = 'NIFTI'/'NIFT_GZ' solves the problem.

However, in my previous runs where I did not specify the output type, the inputs to the node did change from .nii to .nii.gz and the workflow still used the cached results instead of reproducing despite use_relative_paths = True. Hopefully this helps...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

When node changed and workflow rerun, child nodes of changed node failed to rerun #2951

When node changed and workflow rerun, child nodes of changed node failed to rerun #2951

mick-d commented Jun 28, 2019 •

edited

Loading

satra commented Jun 28, 2019

mick-d commented Jun 28, 2019 •

edited

Loading

satra commented Jun 30, 2019

mick-d commented Jun 30, 2019

oesteban commented Aug 7, 2019

axiezai commented May 28, 2020 •

edited

Loading

axiezai commented May 28, 2020 •

edited

Loading

When node changed and workflow rerun, child nodes of changed node failed to rerun #2951

When node changed and workflow rerun, child nodes of changed node failed to rerun #2951

Comments

mick-d commented Jun 28, 2019 • edited Loading

Summary

Actual behavior

Expected behavior

How to replicate the behavior

Script/Workflow details

Platform details:

Execution environment

satra commented Jun 28, 2019

mick-d commented Jun 28, 2019 • edited Loading

satra commented Jun 30, 2019

mick-d commented Jun 30, 2019

oesteban commented Aug 7, 2019

axiezai commented May 28, 2020 • edited Loading

axiezai commented May 28, 2020 • edited Loading

mick-d commented Jun 28, 2019 •

edited

Loading

mick-d commented Jun 28, 2019 •

edited

Loading

axiezai commented May 28, 2020 •

edited

Loading

axiezai commented May 28, 2020 •

edited

Loading