Skip to content

Optimize CSR and sorted conversions #11

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 4, 2017
Merged

Optimize CSR and sorted conversions #11

merged 3 commits into from
May 4, 2017

Conversation

mrocklin
Copy link
Contributor

@mrocklin mrocklin commented May 1, 2017

This started by adding the optimzied version of tocsr from #9 (cc @jakevdp)

Then, to avoid calling sorting routines excessively we started tracking sorted= metadata.

Then, to accelerate sorting and to consolidate lexsort and reshape code we based all sorting code onto a single function called linear_loc, which provides the linear location of any index in a C-ordered array. This replaces np.lexsort with np.argsort and performs a quick issorted check beforehand. This can dramatically speed up some workflows, but does limit the size of our arrays to 2**64 elements (zero or nonzero) so for example this library can no longer represent an array of shape (1e6, 1e6, 1e6, 1e6). This was already the case in reshape. I'm personally ok with this for the time being. I think that we can probably remove this limitation in the future (with a cost) if necessary.

There is still work to be done to optimize matvecs and matmuls, but this is a decent start.

@mrocklin mrocklin force-pushed the tocsr branch 2 times, most recently from e854fb6 to d4801f4 Compare May 1, 2017 23:56
mrocklin added 2 commits May 2, 2017 08:51
pull out linear_loc to separate function
This stores the last few objects returned from reshape and transpose
calls.  This allows efficiencies from in-place operations like
`sum_duplicates` and `sort_indices` to persist in interative workflows.

Modern NumPy programmers are accustomed to operations like .transpose()
being cheap and aren't accustomed to having to pay sorting costs after
many computations.  These assumptions are no longer true in sparse by
default.  However, by caching recent transpose and reshape objects we
can reuse their inplace modifications.  This greatly accelerates common
machine learning workloads.
@mrocklin
Copy link
Contributor Author

mrocklin commented May 2, 2017

I have extended this to cache the last few results of reshape and transpose. This makes things run quite fast for me in practice. Copying commit message here for convenience:

This stores the last few objects returned from reshape and transpose
calls. This allows efficiencies from in-place operations like
sum_duplicates and sort_indices to persist in interative workflows.

Modern NumPy programmers are accustomed to operations like .transpose()
being cheap and aren't accustomed to having to pay sorting costs after
many computations. These assumptions are no longer true in sparse by
default. However, by caching recent transpose and reshape objects we
can reuse their inplace modifications. This greatly accelerates common
machine learning workloads.

@mrocklin
Copy link
Contributor Author

mrocklin commented May 2, 2017

Merging shortly if no objections

@jakevdp
Copy link
Contributor

jakevdp commented May 2, 2017

Looks good. A couple things:

  • the CSC convrtsiom could be made more efficient by sorting the indices as CSC directly, rather than doing CSR first

  • for the cacheing, does this mean that sparse arrays are effectively read-only? If you modify them in place, cached results will be incorrect yes?

  • in that subject, we should be careful about CSC/CSR conversions that might use views of data matrices, as you light inadvertently change the sparse matrix when changing the CSR generated from it.

@mrocklin
Copy link
Contributor Author

mrocklin commented May 2, 2017

All good points.

  1. I'm tempted to leave csc as is until I (or someone else) actively needs a fast csc. I do expect this to happen given my current application set, but it hasn't happened yet.
  2. For caching this gets tricky. Currently I think that sparse arrays are practically read-only. We haven't added any top-level in-place modification API. However it is reasonable to expect that this will be added in the future. We can easily invalidate whenever the parent is changed, but it would require more bookkeeping to invalidate whenever a child is changed.
  3. Regarding views how does numpy handle this? If I change a view in NumPy then I expect the underlying array to change as well. I guess that the situation here is more complex because it's not immediately clear to the user what is and is not a view.

For the caching issue I'm tempted to implement the book-keeping necessary for invalidation.

@jakevdp
Copy link
Contributor

jakevdp commented May 2, 2017

Makes sense.

Regarding the book-keeping for cacheing: we could detect changes in the arrays by making them properties, but that still wouldn't protect against users modifying the contents of these arrays either directly, or via views in derived CSR matrices (for example). One way we might address that is by setting the writeable flag of the arrays to False.

@mrocklin
Copy link
Contributor Author

mrocklin commented May 3, 2017

Alternatively we could have the user explicitly opt-in to caching. This may be simpler all around.

@jakevdp
Copy link
Contributor

jakevdp commented May 3, 2017

Alternatively we could have the user explicitly opt-in to caching. This may be simpler all around.

+1, good solution for right now

@mrocklin
Copy link
Contributor Author

mrocklin commented May 3, 2017

Planning to merge this soon if there are no further comments

@jakevdp
Copy link
Contributor

jakevdp commented May 3, 2017

Looks good – one thing came to mind though: I wonder if it would be possible to use functools.lru_cache instead of maintaining the cache manually?

@mrocklin
Copy link
Contributor Author

mrocklin commented May 4, 2017

I would love to use a standard LRU data structure. However the functools.lru_cache solution is strictly a decorator, it doesn't let you easily inspect the contents, which I've found to be inconvenient in the past. I'm inclined to use the simple approach with a deque here until it proves cumbersome. If that fails then I have my own lru data structures that I can bring in if necessary.

@jakevdp
Copy link
Contributor

jakevdp commented May 4, 2017

Makes sense!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants