-
Notifications
You must be signed in to change notification settings - Fork 532
[ENH] CompCor enhancement #2878
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Codecov Report
@@ Coverage Diff @@
## master #2878 +/- ##
=========================================
+ Coverage 67.57% 67.8% +0.22%
=========================================
Files 343 344 +1
Lines 43645 44494 +849
Branches 5428 5557 +129
=========================================
+ Hits 29494 30168 +674
- Misses 13447 13606 +159
- Partials 704 720 +16
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worried about that i
variable containing names. Other than that this is shaping up!
Interestingly, I'm now getting ValueError: Inputs for CompCor, timeseries and mask, do not have matching spatial dimensions ((64, 64, 46) and (64, 64, 46, 1), respectively) for one of my test masks. Probably an oddity on my end -- I didn't add the |
Looks like the empty mask is now breaking |
@rciric Check out |
hi @rciric - do you have a moment to finish this up sometime this week? We hope to release next week and would love to get this in. |
Happy to do what it takes to get this over the finish line. It looks like I'm getting errors related to diffusion processing now -- specifically an import failure from |
I tested it with my |
@rciric are there any public tests (i.e. CircleCI) where we can check how this is playing along with fMRIPrep? |
Okay, so it seems this PR is working out as expected (https://circleci.com/workflow-run/3c1ae1ca-27db-43e6-897e-54cc5d323dcb, the red light on ds005 is unrelated to this PR). If @rciric promises that the correlation between Do you want to have a final review @effigies, @mgxd ? @satra, it would be great if you could take a quick look, we don't want to regret having this PR merged in the future. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
components_data = [line.split('\t') for line in components_file] | ||
components_data = [re.sub('\n', '', line).split('\t') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just stripping the newline? If so, re
is overly heavy machinery.
components_data = [re.sub('\n', '', line).split('\t') | |
components_data = [line.rstrip().split('\t') |
assert os.path.getsize(expected_metadata_file) > 0 | ||
|
||
with open(ccresult.outputs.metadata_file, 'r') as metadata_file: | ||
components_metadata = [re.sub('\n', '', line).split('\t') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, if I understand the purpose:
components_metadata = [re.sub('\n', '', line).split('\t') | |
components_metadata = [line.rstrip().split('\t') |
nipype/info.py
Outdated
@@ -141,7 +141,8 @@ def get_nipype_gitversion(): | |||
'numpy>=%s ; python_version >= "3.7"' % NUMPY_MIN_VERSION_37, | |||
'python-dateutil>=%s' % DATEUTIL_MIN_VERSION, | |||
'scipy>=%s' % SCIPY_MIN_VERSION, | |||
'traits>=%s' % TRAITS_MIN_VERSION, | |||
'traits>=%s,<%s ; python_version == "2.7"' % (TRAITS_MIN_VERSION, '5.0.0'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is #2913. The fix should be just to skip 5.0, and we can do that for all versions.
'traits>=%s,<%s ; python_version == "2.7"' % (TRAITS_MIN_VERSION, '5.0.0'), | |
'traits>=%s,!=5.0' % TRAITS_MIN_VERSION, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm fine with this, though we should probably only restrict 5.0 if the user is running py27 - to my knowledge everything was working fine with py3.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have a strong opinion here (this seems simpler than splitting out by Python version, but yes there may be someone who really needs Py3 + traits 5.0), but I think maybe let's make that change after this is merged?
nipype/info.py
Outdated
@@ -141,7 +141,8 @@ def get_nipype_gitversion(): | |||
'numpy>=%s ; python_version >= "3.7"' % NUMPY_MIN_VERSION_37, | |||
'python-dateutil>=%s' % DATEUTIL_MIN_VERSION, | |||
'scipy>=%s' % SCIPY_MIN_VERSION, | |||
'traits>=%s' % TRAITS_MIN_VERSION, | |||
'traits>=%s,<%s ; python_version == "2.7"' % (TRAITS_MIN_VERSION, '5.0.0'), | |||
'traits>=%s ; python_version >= "3.0"' % TRAITS_MIN_VERSION, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'traits>=%s ; python_version >= "3.0"' % TRAITS_MIN_VERSION, |
@@ -1,6 +1,7 @@ | |||
# emacs: -*- mode: python; py-indent-offset: 4; indent-tabs-mode: nil -*- | |||
# vi: set ft=python sts=4 ts=4 sw=4 et: | |||
import os | |||
import re |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The suggestions below were prompted by a suspicion this might be heavier than needed. If you accept those changes, also revert:
import re |
nipype/algorithms/confounds.py
Outdated
raise ValueError('No components found') | ||
components = np.ones((M.shape[0], num_components), dtype=np.float32) * np.nan | ||
return components, basis | ||
components = np.full((M.shape[0], num_components), np.nan) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you intentionally promoting this to a np.float64
?
nipype/algorithms/confounds.py
Outdated
np.hstack((metadata['cumulative_variance_explained'], | ||
cumulative_variance_explained))) | ||
metadata['retained'] = ( | ||
metadata['retained'] + [i < num_components for i in range(len(s))]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If these can get large and you do more than 2 or 3 loops, this will be significantly less efficient than aggregating these things into lists and concatenating/stacking them at the end. e.g.,
# Outside loop
md_mask = []
md_sv = []
md_var = []
md_cumvar = []
md_retained = []
for ...
# here
md_mask.append([name] * len(s))
md_sv.append(s)
md_var.append(variance_explained)
md_cumvar.append(cumulative_variance_explained)
# The lack of square brackets is intentional
md_retained.append(i < num_components for i in range(len(s)))
# below
metadata = {
'mask': list(itertools.chain(*md_mask)),
'singular_value': np.hstack(md_sv),
'variance_explained': np.hstack(md_var),
'cumulative_variance_explained': np.hstack(md_cumvar),
'retained': list(itertools.chain(*md_retained))}
Numpy array containing the requested set of noise components | ||
basis: numpy array | ||
Numpy array containing the (non-constant) filter regressors | ||
metadata: OrderedDict{str: numpy array} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this an OrderedDict
as opposed to a dict
?
Hello, and thanks for the comments! I think they've cleaned up the code considerably -- I've updated the code to reflect those comments with the following exceptions:
|
Oh, looks like tests are failing. Sorry if I've led you astray. |
No, this is totally me not updating my own unit tests. |
yay! @rciric |
Summary
Continuing #2859
As a step toward implementing nipreps/fmriprep#1458, this PR updates the CompCor algorithm to allow greater flexibility.
List of changes proposed in this PR (pull-request)
num_components
encodes the number of components to return if it takes an integer value greater than 1.num_components
isall
, then CompCor returns all component time series.variance_threshold
, allows for automatic selection of components on the basis of their capacity to explain variance in the data.variance_threshold
is mutually exclusive withnum_components
and takes values between 0 and 1.variance_threshold
is set, CompCor returns the minimum number of components necessary to explain the indicated variance in the data, computed as the cumulative sum of (singular_value^2)/sum(singular_values^2).num_components
norvariance_threshold
is defined, then CompCor executes as thoughnum_components
were set to 6 (the previous default behaviour).Acknowledgment