Skip to content

[ENH] CompCor enhancement #2878

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 42 commits into from
Apr 29, 2019
Merged

[ENH] CompCor enhancement #2878

merged 42 commits into from
Apr 29, 2019

Conversation

rciric
Copy link
Contributor

@rciric rciric commented Feb 9, 2019

Summary

Continuing #2859

As a step toward implementing nipreps/fmriprep#1458, this PR updates the CompCor algorithm to allow greater flexibility.

List of changes proposed in this PR (pull-request)

  • As before, num_components encodes the number of components to return if it takes an integer value greater than 1.
  • If num_components is all, then CompCor returns all component time series.
  • Alternatively, a mutually exclusive interface option, variance_threshold, allows for automatic selection of components on the basis of their capacity to explain variance in the data. variance_threshold is mutually exclusive with num_components and takes values between 0 and 1.
  • If variance_threshold is set, CompCor returns the minimum number of components necessary to explain the indicated variance in the data, computed as the cumulative sum of (singular_value^2)/sum(singular_values^2).
  • For backward compatibility, if neither num_components nor variance_threshold is defined, then CompCor executes as though num_components were set to 6 (the previous default behaviour).
  • CompCor optionally writes out a metadata table, which includes for each component the source mask, the singular value, the explained variance, and the cumulative explained variance.

Acknowledgment

  • (Mandatory) I acknowledge that this contribution will be available under the Apache 2 license.

@codecov-io
Copy link

codecov-io commented Feb 9, 2019

Codecov Report

Merging #2878 into master will increase coverage by 0.22%.
The diff coverage is 70.78%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master   #2878      +/-   ##
=========================================
+ Coverage   67.57%   67.8%   +0.22%     
=========================================
  Files         343     344       +1     
  Lines       43645   44494     +849     
  Branches     5428    5557     +129     
=========================================
+ Hits        29494   30168     +674     
- Misses      13447   13606     +159     
- Partials      704     720      +16
Flag Coverage Δ
#smoketests 50.38% <12.35%> (-0.1%) ⬇️
#unittests 65.3% <70.78%> (+0.29%) ⬆️
Impacted Files Coverage Δ
nipype/workflows/rsfmri/fsl/resting.py 85.24% <ø> (-0.24%) ⬇️
nipype/info.py 93.93% <ø> (ø) ⬆️
nipype/algorithms/confounds.py 67.52% <70.78%> (+1.17%) ⬆️
nipype/interfaces/dipy/registration.py 93.33% <0%> (-6.67%) ⬇️
nipype/interfaces/dipy/base.py 76.28% <0%> (-0.3%) ⬇️
nipype/interfaces/dipy/stats.py 100% <0%> (ø)
nipype/interfaces/dipy/reconstruction.py 33.49% <0%> (+0.48%) ⬆️
nipype/interfaces/afni/preprocess.py 85.42% <0%> (+3.02%) ⬆️
nipype/interfaces/dipy/tracks.py 37.05% <0%> (+4.66%) ⬆️
nipype/interfaces/dipy/preprocess.py 29.11% <0%> (+5.58%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 44ed184...b80a3d7. Read the comment docs.

Copy link
Contributor

@oesteban oesteban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worried about that i variable containing names. Other than that this is shaping up!

@rciric
Copy link
Contributor Author

rciric commented Mar 1, 2019

Interestingly, I'm now getting

ValueError: Inputs for CompCor, timeseries and mask, do not have matching spatial dimensions ((64, 64, 46) and (64, 64, 46, 1), respectively)

for one of my test masks. Probably an oddity on my end -- I didn't add the squeeze back in, since it seemed to be causing trouble earlier.

@rciric
Copy link
Contributor Author

rciric commented Mar 1, 2019

Looks like the empty mask is now breaking cosine_filter -- I'll try and fix that next.

@effigies
Copy link
Member

@rciric Check out nibabel.squeeze_image.

@mgxd
Copy link
Member

mgxd commented Mar 27, 2019

hi @rciric - do you have a moment to finish this up sometime this week? We hope to release next week and would love to get this in.

@rciric
Copy link
Contributor Author

rciric commented Mar 28, 2019

Happy to do what it takes to get this over the finish line. It looks like I'm getting errors related to diffusion processing now -- specifically an import failure from dipy.workflows.tracking for DetTrackPAMFlow. The only workflows in dipy.workflows.tracking are PFTrackingPAMFlow and LocalFiberTrackingPAMFlow. I guess it's now LocalFiberTrackingPAMFlow with sh_strategy="deterministic"?

@mgxd
Copy link
Member

mgxd commented Mar 28, 2019

@rciric yeah that is unrelated to this PR - I just merged #2903 which should fix it so if you pull the latest master it should be good.

@oesteban
Copy link
Contributor

oesteban commented Apr 4, 2019

@rciric what is the status of this PR? /cc @mgxd @effigies

@rciric
Copy link
Contributor Author

rciric commented Apr 5, 2019

I tested it with my fmriprep install before the latest push, so I think we just need verification as to whether it works from reviewers. Additional feedback is also always welcome.

@oesteban
Copy link
Contributor

oesteban commented Apr 9, 2019

@rciric are there any public tests (i.e. CircleCI) where we can check how this is playing along with fMRIPrep?

@oesteban
Copy link
Contributor

oesteban commented Apr 17, 2019

Okay, so it seems this PR is working out as expected (https://circleci.com/workflow-run/3c1ae1ca-27db-43e6-897e-54cc5d323dcb, the red light on ds005 is unrelated to this PR).

If @rciric promises that the correlation between a_comp_cor_xx regressors does not pertain to this PR and should be addressed in niworkflows/fmriprep context then we can go ahead with this. Could you double check whether we get the same outputs if you force the CompCor interfaces in fmriprep to behave like the former implementation (ie. leave both num_components and variance_threshold undefined)?

Do you want to have a final review @effigies, @mgxd ?

@satra, it would be great if you could take a quick look, we don't want to regret having this PR merged in the future.

@satra
Copy link
Member

satra commented Apr 18, 2019

@rciric and @oesteban - this looks good to me!

Copy link
Member

@effigies effigies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No serious complaints, but a few minor suggestions. Feel free to ignore if you just want to get this in.

@mgxd @satra Just a note on the #2913 situation here. Do you agree with my proposed traits constraint, or did you have something else in mind?


components_data = [line.split('\t') for line in components_file]
components_data = [re.sub('\n', '', line).split('\t')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just stripping the newline? If so, re is overly heavy machinery.

Suggested change
components_data = [re.sub('\n', '', line).split('\t')
components_data = [line.rstrip().split('\t')

assert os.path.getsize(expected_metadata_file) > 0

with open(ccresult.outputs.metadata_file, 'r') as metadata_file:
components_metadata = [re.sub('\n', '', line).split('\t')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, if I understand the purpose:

Suggested change
components_metadata = [re.sub('\n', '', line).split('\t')
components_metadata = [line.rstrip().split('\t')

nipype/info.py Outdated
@@ -141,7 +141,8 @@ def get_nipype_gitversion():
'numpy>=%s ; python_version >= "3.7"' % NUMPY_MIN_VERSION_37,
'python-dateutil>=%s' % DATEUTIL_MIN_VERSION,
'scipy>=%s' % SCIPY_MIN_VERSION,
'traits>=%s' % TRAITS_MIN_VERSION,
'traits>=%s,<%s ; python_version == "2.7"' % (TRAITS_MIN_VERSION, '5.0.0'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is #2913. The fix should be just to skip 5.0, and we can do that for all versions.

Suggested change
'traits>=%s,<%s ; python_version == "2.7"' % (TRAITS_MIN_VERSION, '5.0.0'),
'traits>=%s,!=5.0' % TRAITS_MIN_VERSION,

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with this, though we should probably only restrict 5.0 if the user is running py27 - to my knowledge everything was working fine with py3.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't have a strong opinion here (this seems simpler than splitting out by Python version, but yes there may be someone who really needs Py3 + traits 5.0), but I think maybe let's make that change after this is merged?

nipype/info.py Outdated
@@ -141,7 +141,8 @@ def get_nipype_gitversion():
'numpy>=%s ; python_version >= "3.7"' % NUMPY_MIN_VERSION_37,
'python-dateutil>=%s' % DATEUTIL_MIN_VERSION,
'scipy>=%s' % SCIPY_MIN_VERSION,
'traits>=%s' % TRAITS_MIN_VERSION,
'traits>=%s,<%s ; python_version == "2.7"' % (TRAITS_MIN_VERSION, '5.0.0'),
'traits>=%s ; python_version >= "3.0"' % TRAITS_MIN_VERSION,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
'traits>=%s ; python_version >= "3.0"' % TRAITS_MIN_VERSION,

@@ -1,6 +1,7 @@
# emacs: -*- mode: python; py-indent-offset: 4; indent-tabs-mode: nil -*-
# vi: set ft=python sts=4 ts=4 sw=4 et:
import os
import re
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The suggestions below were prompted by a suspicion this might be heavier than needed. If you accept those changes, also revert:

Suggested change
import re

raise ValueError('No components found')
components = np.ones((M.shape[0], num_components), dtype=np.float32) * np.nan
return components, basis
components = np.full((M.shape[0], num_components), np.nan)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you intentionally promoting this to a np.float64?

np.hstack((metadata['cumulative_variance_explained'],
cumulative_variance_explained)))
metadata['retained'] = (
metadata['retained'] + [i < num_components for i in range(len(s))])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If these can get large and you do more than 2 or 3 loops, this will be significantly less efficient than aggregating these things into lists and concatenating/stacking them at the end. e.g.,

# Outside loop
md_mask = []
md_sv = []
md_var = []
md_cumvar = []
md_retained = []
for ...
    # here
    md_mask.append([name] * len(s))
    md_sv.append(s)
    md_var.append(variance_explained)
    md_cumvar.append(cumulative_variance_explained)
    # The lack of square brackets is intentional
    md_retained.append(i < num_components for i in range(len(s)))
# below
metadata = {
    'mask': list(itertools.chain(*md_mask)),
    'singular_value': np.hstack(md_sv),
    'variance_explained': np.hstack(md_var),
    'cumulative_variance_explained': np.hstack(md_cumvar),
    'retained': list(itertools.chain(*md_retained))}

Numpy array containing the requested set of noise components
basis: numpy array
Numpy array containing the (non-constant) filter regressors
metadata: OrderedDict{str: numpy array}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this an OrderedDict as opposed to a dict?

@rciric
Copy link
Contributor Author

rciric commented Apr 19, 2019

Hello, and thanks for the comments! I think they've cleaned up the code considerably -- I've updated the code to reflect those comments with the following exceptions:

  • I wasn't able to get md_retained working as a generator on my local test machine, so I ended up using a list.
  • I am using an OrderedDict because the call that prints the metadata to TSV currently requires its entries to be in a particular order.

@effigies
Copy link
Member

Oh, looks like tests are failing. Sorry if I've led you astray.

@rciric
Copy link
Contributor Author

rciric commented Apr 19, 2019

No, this is totally me not updating my own unit tests.

@effigies effigies merged commit d353f0d into nipy:master Apr 29, 2019
@oesteban
Copy link
Contributor

yay! @rciric

@effigies effigies mentioned this pull request May 8, 2019
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants