-
-
Notifications
You must be signed in to change notification settings - Fork 132
Use Cython, Numba, or C/C++ for algorithmic code #126
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
My personal thoughtsThe default today seems to be Cython. This would be a fine choice. It seems to be fine for speed and community adoptability. I do think that it imposes a serious barrier to attracting new developers (I'd say 80% of the candidate pool will avoid Cython), and it will force a more complex build-and-deploy process. For example we can no longer just point people to try things on master to get quick feedback. We'll get questions from people on how to install from source on windows, etc.. C/C++ is, I think, a little cleaner than Cython in terms of attracting new developers. I think that C tends to be more often within people's comfort zone. Numba is great for new developers and maintenance, but has issues in community adoptability (folks aren't accustomed to depending on it, thoughts here). Numba also has issues if we're going to be doing a lot with dynamic data structures and want to use SuggestionIf I were doing this work I would stick to Numba until numba became an issue (either due to a need for dynamic data structures or downstream libraries being unwilling to depend on it) and then I would switch to something else. The cost of starting with Numba to me seems to be zero. We don't have to update our release mechanism, we don't have to have special documentation or field extra compilation help requests. The cost to switch to Cython if necessary is likely very small. We'll likely be writing exactly the same code we would write for Cython, just with less stuff around it. Numba is also likely to be faster without us thinking as much, which I think is useful when experimenting with new algorithms, like in #125 HoweverHowever, I'm unlikely to do any of this work in the near future, and the preferences of those doing the work should have precedence. |
In favor of Cython
In favor of Numba
My ConcernsI guess it comes down to exactly two questions for me: Will we have to switch at some point (which I would like to avoid outright), or will Numba devs be willing to add support for accelerated If, in the other hand, the answer to even one of these questions is "no"... Then my vote is for a mixture of Cython and Cython-wrapped C++. I guess my main worries with Numba come down to the following:
|
Some thoughts:
However, mostly this is me trying to convince you not to adopt Cython prematurely. At this point you're doing most of the work and should probably decide. As a co-maintainer though I would appreciate it if, prior to actually writing algorithmic code in Cython you first to setup the build process, packaging support (we should consider conda-forge), and documentation. I don't expect this to be particularly difficult, I don't think we'll need any external libraries to speak of, but it's best to understand the costs ahead of time and build in solutions in case you leave this project in the future. |
FWIW while I'm probably biased towards numba due to professional association I also have a lot of respect for Cython. As someone who has used both in different projects though (numba for personal things, cython for geopandas) and as someone who maintains a number of projects, the added cost of maintaining a Cython project in terms of person-hours is high, and that cost tends to be focused on fewer maintainers. I think I tend to optimize these days more to reduce and diffuse maintenance costs than most other things. I personally would probably stop spending as much personal time fielding maintenance questions or build/release issues if we go for Cython. This is something that the community would need to pick up long term. |
There is also
GDB supports python natively, and Cython has extensions for GDB. It's on the command line, though (I've tried and failed to find a good IDE solution), so it is a bit of a pain compared to your solution. Also, it freezes your interpreter in, so
Interleaving Pure Python/optimized code. Other problems would be calling unoptimized code from optimized.
There are also costs in terms of alienated developers we may have accrued that are used to Numba.
Of course, I understand the costs, and if the consensus is on Cython, then I would build these solutions in, along with docs, right at the very first check-in. I believe properly > quickly. After your convincing and experience, I'm not too opposed to Numba, but would certainly like to wait for thoughts of others (particularly Scipy devs and whether they would consider it to be a blocker) before committing to it.
My vote is still slightly in favor of Cython, but I'm open to Numba as an option. I don't believe I should make the decision, I would really wait for community consensus. |
I have started watching this project only recently and although I was super happy to finally a sparse multidimensional tensor lib that I could contribute to I'm just an observer here. |
At this point, if Scipy devs came out and said they would be okay to depend on this project if Numba was used, I'd go for Numba. I just had a look at the docs and they're significantly better than Cython's (which alone makes them worth considering). It'd be nice to have another contributor, @albop. :-) |
@hameerabbasi : I was considering saying something about me contributing to the project, but wouldn't like to disappoint if it takes me a bit too long to start. I'll try to create some issues first ;-) (with the usecase I have in mind...) |
@albop If you have any questions about where something is handled, or how to implement something, don't hesitate to raise an issue and cc me on it. :-) |
Numba actually does currently accelerated |
C++ is fine as long as the wrapper tool is modern enough to make it easy to package as a wheel (e.g. Cython or pybind11, no boost python please). I am fine with numba as a dependency as well. I can't speak for scipy developers. |
They do, it's even been labelled a high priority. xref numba/numba#2096
This makes me much more comfortable using Numba. |
I can't speak for all SciPy developers, but if Numba continues to improve and be maintained (not sure of its status now with all the changes at Anaconda?) then at some point we'll likely adopt it. The concerns regarding build dependencies etc. are minor compared to all the C/C++/Fortran we have already. If you want me to start that discussion now on scipy-dev, I'd be happy to do so. Re debugging: it depends on what you're debugging. If it's an issue with the logic of your code, then Numba is much better (just remove We recently got a similar question from Pythran, and I came up with a list of factors for the decision at https://mail.python.org/pipermail/scipy-dev/2018-January/022334.html. The ones I'm not sure about for Numba are:
|
How will the sparse array object be represented in numba? The JIT class capability.is currently very limited. |
@datnamer This is a discussion on how we will adopt Numba for our internal algorithms, to make our own library faster for users. It isn't much of a discussion about how to make our classes JIT-compatible. |
That'd be really appreciated. 😃 |
cc @woodmd, since you're planning to contribute, you should get a say. Also, @nils-werner. |
As long as things go well, numba has an edge above Cython. But once they start to misbehave, I'm not sure a) how to identify that it is misbehaving in numba or b) how to debug numba and steer it in the right direction. We have good answers to these questions for Cython because we've been using it for long all over the scientific Python ecosystem, but if you can find similar answers for numba it would be tempting to use that instead (the code will almost certainly be simpler to implement and debug). |
One question at large regarding Cython, which will be very useful later: How hard is it to make it work for general N-dimensional arrays? From what I can tell, only arrays of a specific dimension can be an input to a function.
Would you be willing to contribute in code (for the pure Python parts) and do code reviews (for both parts)? FWIW, I only intend to move the bottlenecks (currently only indexing, sorting, matching) to Cython/Numba/C++, keeping all else as-is. I don't intend to write the entire thing in another language, Numpy already does a lot of that for us. I also care a lot about maintainability and attracting devs, I'm sorry if it came across otherwise. Maintainability and community > A few ms shaved off an operation. If building wheels is an issue, I have access to all three major OSs on my personal machine, and can easily build wheels for all of them (although I believe CircleCI/AppVeyor/Travis do this already) You're a big part of this project at this stage, losing that could be a fatal blow to it. |
Re nD data structures: All internal data of |
There are a few use cases for N-D iteration in our code that I can think of at this point:
|
It depends on LLVM via llvmlite, which is quite portable, I think.
Last release was 25 days ago, code frequency seems to be good. Not sure about the future, though. @mrocklin might have insights on that. |
I've raised an issue highlighting debugging issues in Numba here: numba/numba#2788 Engagement there would be welcome. |
I do not have any major concerns about the longevity of Numba. Folks probably shouldn't rely on my opinion here due to the conflict of interest. I'd be happy to chat in more depth with folks if desired. |
This was what I was talking about when I spoke of interleaving, so that concern can be put aside. Not too long ago, this wasn't supported IIRC. It also means templates and custom |
@hameerabbasi serge-sans-paille/pythran#866 implements the missing features to support these two functions in Pythran, and add them to the testc ases. If you can provideme with more similar kernels to port, that's great :-) |
Most scipy style code in the past has gone without nested dynamic data structures, mostly for performance reasons. I recommend that you convey more information about what you're trying to achieve and why you're trying to use these data structures and perhaps others in the community will be able to suggest alternatives. |
@serge-sans-paille Unfortunately, these already work in Numba. I haven't written the actual kernels needed. Also, But the main thing holding me back (feature-wise instead of speed-wise) is For advanced indexing, I need For CSD/ndarray in |
Right, I recommend verifying that this overhead is meaningful before trying to avoid it.
I recommend providing more detail about exactly what algorithms or code or outcomes you're trying to accomplish and perhaps someone can provide alternatives with statically allocated data structures. You've done this with radix sort, which is a nice example. If there are others then I recommend providing pseudocode, tests that you think properly scope the problem, or links to algorithms or papers that you're trying to implement so that others can understand what you're trying to accomplish. My guess is that it's very natural to use nested lists of lists for some of these operations, but that these are neither strictly necessary nor always optimal. There are often many ways to accomplish operations like these. CSR and CSC were designed to not need dynamic data structures. My guess is that it is possible to accomplish the operations that relevant libraries need without dynamic data structures. It may be that you're trying to implement everything all at once, which may not be the optimal use of time. However I've said this before, and you seem to disagree. That's fine, I suspect that it is because you know more about the problem. What I'm asking here is that you share more of your knowledge so that others start to understand why these data structures are necessary. Then maybe people (myself or others) can help work around the problem. As things stand currently it's hard for others (or at least myself) to know what you have in mind. At the same time I recommend that you push upstream and mention your concerns on the relevant Numba issue. I see that you've mentioned your desire on numba/numba#2560 . I recommend that you ping that issue again, saying again why you're interested and maybe asking how you could contribute to resolve the problem. |
FWIW I personally don't care about DOK at all. None of the applications that I know of (xarray, scikit-learn, tensor factorization algorithms) are likely to care much either. If I were in your position and wanted to make useful software I would query downstream users to hear what operations they need and focus on those first. The two main communities that I would query are XArray and Scikit-Learn. I think that this would help to prioritize work. |
I confirm that I don't know any algorithm in scikit-learn that would be optimally written using the DOK datastructure. We mostly use CSR / CSC at consuming time (to run linear algebra operations such as matrix vector products row wise or column wise) or row-wise and column-wise reductions (e.g. to scale the data). We also use COO for producing sparse matrices from another kind of data (e.g. a text vectorizer outputting a sparse matrix for bag of words counts) prior to conversion to CSR or CSC. |
@ogrisel, can you list the minimal set of operations we would need for
CSR/CSC to make relevant sklearn opeartions happy? Is it just tensordot or
are elemwise, slicing, reductions, etc. also necessary?
…On Thu, Apr 26, 2018 at 5:12 AM, Olivier Grisel ***@***.***> wrote:
I confirm that I don't know any algorithm in scikit-learn that would be
optimally written using the DOK datastructure. We mostly use CSR / CSC at
consuming time (to run linear algebra operations such as matrix vector
products row wise or column wise) or row-wise and column-wise reductions
(e.g. to scale the data). We also use COO for producing sparse matrices
from another kind of data (e.g. a text vectorizer outputting a sparse
matrix for bag of words counts) prior to conversion to CSR or CSC.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#126 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AASszB7NtoFtPVL_XiNfq985t8oBHRJaks5tsY9ggaJpZM4SbWBy>
.
|
Actually many scikit-learn algorithms have Cython code that directly accesses the 3 CSR/CSC component arrays One could expect that some of those algorithms could be refactored to work on row-wise-chunked CSR or column-wise or row-wise chunked CSC (depending on the algorithm and how they can be parallelized). Example Cython code on CSC matrices: Some other algorithms (also in Cython) use a low overhead dataset abstraction to wrap dense 2D numpy arrays and CSR sparse matrices: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/seq_dataset.pyx Is is often possible to imagine that such algorithms could be extended to have a pure python outer-loop that would work on a list of row-wise CSR or CSC chunks in which a variant of the existing Cython code would run either sequentially (for out of core computation with a small model moving from one node to the next to update its parameter when accessing the chunks of data on that node) or sometimes in parallel accross chunks. Other algorithms written in Python use dot products with a numpy array (and maybe sometimes another sparse matrix) and column-wise or row-wise aggregations (mean, sum) possibly on slices. |
Some notes on issues raised above:
|
@seibert Seems like you're well ahead of schedule. 😃 Thanks a lot. I'll try to juggle around things so I do things that don't need |
I'm not entirely sure who would use sparse arrays in xarray, but my guess is that all-purpose flexibility of COO would suffice for most analytics purposes. That said, for data analysis, I think the most useful capability would be support for fill-values other than zero (namely, NaN). One intriguing usecase for sparse arrays in xarray is to reproduce the capabilities of the "multi-dimensional" databases used in business intelligence (something like Online analytical processing). |
Hrm, that's an interesting thought. Do we know people who care about OLAP cubes? Perhaps @Stiivi ? It would be useful to get an introduction to someone that might need an in-memory sparse multi-dimensional data structure and would be willing to try things out and provide feedback from concrete applications. |
Fill values are easy to build and I have them planned soon-ish. Most operations can be made to work with it, the only one I can think of that won't is |
I'm still waiting on numba/numba#2560, it's a blocker for a lot of features. Failing that in 0.39, I'd like to discuss Cython and/or C/C++ wrapping alternatives. |
I encourage you to ask on that issue again to see what their plans are, if
any, and how expensive they think it is. That might help inform decisions.
…On Fri, Jun 1, 2018 at 4:10 PM, Hameer Abbasi ***@***.***> wrote:
I'm still waiting on numba/numba#2560
<numba/numba#2560>, it's a blocker for a lot of
features. Failing that in 0.39, I'd like to discuss Cython and/or C/C++
wrapping alternatives.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#126 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AASszHnTW_8is-9VUlQ9V_InhE4n3mxAks5t4Z_AgaJpZM4SbWBy>
.
|
@hameerabbasi and @mrocklin It looks like the implementation will be coming in 0.39. It is labeled as ready for review: here |
Refcounted lists seemed to have landed in numba indeed. Does anyone know what's the story with |
See numba/numba#2096. |
Afaik numba version 1.0 release date is scheduled for december 2018. Dict support is marked as high priority for 1.0. |
In the light of this question Why Julia? Will Python/Numba and Python/Cython lose to Julia? I suggest to go with Numba and actively contribute to it if needed. |
Looks like the decisions was to go with numba, and given the amount of work invested there, this would be unlikely to change in the future? Should this be issue closed then? Performance would be a significant argument for or against adoption (#331). For users who don't need much extra functionality outside of scipy.sparse (but just want a sparse COO/CSD 2D ndarray class), if this library can be made faster than Wrapping low level libraries is probably not doable now with numba, but it still might be worth following the work done in https://github.com/vbarrielle/sprs in Rust. So far it doesn't have the ND generalized data structures that pydata/sparse has, but some work has been done lately to benchmark against scipy.sparse in sparsemat/sprs#184 (comment) and to improve results. |
While I'm somewhat of a fan of Rust myself, it seems hard, due to the amount of work invested, as you say. Not to mention, some upcoming features absolutely need JIT, due to an "explosion of types". If Rust or C++ JIT is a stable thing (better than Numba) I'd like to hear about it. |
Numba has |
It seems likely that we'll want the ability to write fast numeric code in a low-level-ish language. There are a few options:
We should make a decision here before investing serious time in one path or the other.
There are, I think, a few categories of concerns:
The text was updated successfully, but these errors were encountered: