Use Cython, Numba, or C/C++ for algorithmic code #126

mrocklin · 2018-03-04T14:10:35Z

It seems likely that we'll want the ability to write fast numeric code in a low-level-ish language. There are a few options:

Cython
Numba
C/C++, wrapped with a variety of other options

We should make a decision here before investing serious time in one path or the other.

There are, I think, a few categories of concerns:

Speed
Community adoptability (as a dependency)
Barriers to attracting new developers
Barriers to maintenance

mrocklin · 2018-03-04T14:14:05Z

My personal thoughts

The default today seems to be Cython. This would be a fine choice. It seems to be fine for speed and community adoptability. I do think that it imposes a serious barrier to attracting new developers (I'd say 80% of the candidate pool will avoid Cython), and it will force a more complex build-and-deploy process. For example we can no longer just point people to try things on master to get quick feedback. We'll get questions from people on how to install from source on windows, etc..

C/C++ is, I think, a little cleaner than Cython in terms of attracting new developers. I think that C tends to be more often within people's comfort zone.

Numba is great for new developers and maintenance, but has issues in community adoptability (folks aren't accustomed to depending on it, thoughts here). Numba also has issues if we're going to be doing a lot with dynamic data structures and want to use std::map, std::vector and friends.
Having some familiarity with both Numba and Cython I personally also find the development cycle to be must faster with Numba than with Cython.

Suggestion

If I were doing this work I would stick to Numba until numba became an issue (either due to a need for dynamic data structures or downstream libraries being unwilling to depend on it) and then I would switch to something else. The cost of starting with Numba to me seems to be zero. We don't have to update our release mechanism, we don't have to have special documentation or field extra compilation help requests. The cost to switch to Cython if necessary is likely very small. We'll likely be writing exactly the same code we would write for Cython, just with less stuff around it.

Numba is also likely to be faster without us thinking as much, which I think is useful when experimenting with new algorithms, like in #125

However

However, I'm unlikely to do any of this work in the near future, and the preferences of those doing the work should have precedence.

mrocklin · 2018-03-04T14:43:02Z

cc @rgommers @ogrisel @stefanv @shoyer for thoughts and hopefully critical feedback. In what system would you like to see low-level code in the pydata/sparse library written?

hameerabbasi · 2018-03-04T14:43:29Z

In favor of Cython

Speed is near-native C++ with the benefits of Python.
Community adoptability: We will only need it for development, not as a hard dependency, so it has zero impact.
Barriers to attracting new developers: Only a few algorithms need to be in Cython/Wrapped C++, others will be pure Python. Developers will have the choice of both wrapped C++ with Cython, or pure Cython.
Wrapping logic is really trivial. BLAS and friends have wrappers in SciPy, we could use those.
Making DOK faster is a priority for me for batch assignments (i.e. ndarray and SparseArray assignments), this needs std::unordered_map.
Switching can be a somewhat expensive process down the line.
@shoyer's algorithm for elemwise discussed in Support Everything that XArray Expects #1 will come in useful for CSD. Since the passes in that algorithm are likely to be slow, Numba could be almost twice as slow as std::vector. Other uses are very likely to pop up sooner or later.
Debugging is better. This can be critical with complex logic.
Interleaving code is relatively painless.
We could re-use some things such as for radix sort.

In favor of Numba

Pure Python. This could be useful for attracting devs.
Faster performance overall.
Iteration of N-dimensional arrays is possible without complex logic. This will be useful for advanced indexing.
GPU support (?)

My Concerns

I guess it comes down to exactly two questions for me: Will we have to switch at some point (which I would like to avoid outright), or will Numba devs be willing to add support for accelerated dict and list operations, and interleaving? If the answers to both these questions are in the favor of Numba, then it's a no-brainer for me.

If, in the other hand, the answer to even one of these questions is "no"... Then my vote is for a mixture of Cython and Cython-wrapped C++.

I guess my main worries with Numba come down to the following:

Not everything is possible in it. I can think of a few occasions I ran into a brick wall with Numba.
We might have to switch, which could be more expensive than going with Cython in the first place.
Community adaptability.

mrocklin · 2018-03-04T14:58:17Z

Some thoughts:

I would not expect accelerated list/dict implementations from Numba. This is probably out of scope for them.
Where possible I would like to see us avoid dynamic data structures for performance reasons
I would not expect std::map to be significantly faster than Python's dict. At best I would hope for 2-3x improvements, not 10x.
I find debugging to be much nicer in numba, I just remove numba.jit. I'd be curious to learn your debugging technique for Cython
Can you expand what you mean by interleaving?
I think that the presence of Cython in the codebase is itself a barrier to adoption. People won't be able to run from master without passing through a compilation step.
I'm surprised to hear that switching from numba to cython seems expensive to you. To me it seems like the only cost is the work saved in the first place. In my experiences going the opposite direction (Cython -> Numba) code typically works after I strip out all of the annotations, rename the file from pyx to py, and put numba.jit on the function. For numeric for-loopy code they seem to be more or less equivalent to me.

However, mostly this is me trying to convince you not to adopt Cython prematurely. At this point you're doing most of the work and should probably decide.

As a co-maintainer though I would appreciate it if, prior to actually writing algorithmic code in Cython you first to setup the build process, packaging support (we should consider conda-forge), and documentation. I don't expect this to be particularly difficult, I don't think we'll need any external libraries to speak of, but it's best to understand the costs ahead of time and build in solutions in case you leave this project in the future.

mrocklin · 2018-03-04T15:09:43Z

FWIW while I'm probably biased towards numba due to professional association I also have a lot of respect for Cython. As someone who has used both in different projects though (numba for personal things, cython for geopandas) and as someone who maintains a number of projects, the added cost of maintaining a Cython project in terms of person-hours is high, and that cost tends to be focused on fewer maintainers. I think I tend to optimize these days more to reduce and diffuse maintenance costs than most other things.

I personally would probably stop spending as much personal time fielding maintenance questions or build/release issues if we go for Cython. This is something that the community would need to pick up long term.

hameerabbasi · 2018-03-04T16:17:20Z

I would not expect std::map to be significantly faster than Python's dict. At best I would hope for 2-3x improvements, not 10x.

There is also std::unordered_map, which is 2x-3x faster than std::map, and is the best equivalent to dict. std::map is more of a binary search tree.

I'd be curious to learn your debugging technique for Cython

GDB supports python natively, and Cython has extensions for GDB. It's on the command line, though (I've tried and failed to find a good IDE solution), so it is a bit of a pain compared to your solution. Also, it freezes your interpreter in, so conda env is a no-no.

Can you expand what you mean by interleaving?

Interleaving Pure Python/optimized code. Other problems would be calling unoptimized code from optimized.

People won't be able to run from master without passing through a compilation step

We can have a unified setup.py.
All developers would need to do for initial setup from master is pip install -e (with or without Cython installed)
- If this is to work without Cython, :
  - we need a few more tweaks to setup.py to either pip install Cython
  - Or check-in generated C++ (preferred)
    - This would make the release easier as well.
If they frequently work with C/C++/Cython, Cython also has JIT support.

In my experiences going the opposite direction (Cython -> Numba) code typically works after I strip out all of the annotations, rename the file from pyx to py, and put numba.jit on the function.

There are also costs in terms of alienated developers we may have accrued that are used to Numba.

However, mostly this is me trying to convince you not to adopt Cython prematurely.

Of course, I understand the costs, and if the consensus is on Cython, then I would build these solutions in, along with docs, right at the very first check-in. I believe properly > quickly.

After your convincing and experience, I'm not too opposed to Numba, but would certainly like to wait for thoughts of others (particularly Scipy devs and whether they would consider it to be a blocker) before committing to it.

At this point you're doing most of the work and should probably decide.

My vote is still slightly in favor of Cython, but I'm open to Numba as an option. I don't believe I should make the decision, I would really wait for community consensus.

albop · 2018-03-04T16:37:27Z

I have started watching this project only recently and although I was super happy to finally a sparse multidimensional tensor lib that I could contribute to I'm just an observer here.
Funnily enough I spent some time yesterday trying to implement specialized tensor products for DOK with numba, and found out that dicts are not a recognized type yet. I used ordered list of tuples instead of dicts everywhere, which seems to be supported well but @mrocklin must know the details better than I do.
Here is one argument in favour of numba, which I haven't seen listed above. I have found very useful in other projects the ability to generate code, based on number of dimensions for instance, and jit-compile it to a fast, memory-savvy option. There are several instances, where I found this approach to be much faster than using numpy's generic functions, and I wouldn't know how to perform the same with Cython or C/C++ for that matter as dimension combinations are not known at compile-time.

hameerabbasi · 2018-03-04T16:40:18Z

At this point, if Scipy devs came out and said they would be okay to depend on this project if Numba was used, I'd go for Numba. I just had a look at the docs and they're significantly better than Cython's (which alone makes them worth considering).

It'd be nice to have another contributor, @albop. :-)

albop · 2018-03-04T16:44:53Z

@hameerabbasi : I was considering saying something about me contributing to the project, but wouldn't like to disappoint if it takes me a bit too long to start. I'll try to create some issues first ;-) (with the usecase I have in mind...)

hameerabbasi · 2018-03-04T17:00:36Z

@albop If you have any questions about where something is handled, or how to implement something, don't hesitate to raise an issue and cc me on it. :-)

shoyer · 2018-03-04T17:45:45Z

Numba actually does currently accelerated list and set. I don't know if accelerating dict is on their roadmap or not.

ogrisel · 2018-03-04T17:47:49Z

C++ is fine as long as the wrapper tool is modern enough to make it easy to package as a wheel (e.g. Cython or pybind11, no boost python please). I am fine with numba as a dependency as well. I can't speak for scipy developers.

hameerabbasi · 2018-03-04T17:48:15Z

They do, it's even been labelled a high priority. xref numba/numba#2096

Numba actually does currently accelerated list and set. I don't know if accelerating dict is on their roadmap or not.

This makes me much more comfortable using Numba.

rgommers · 2018-03-04T18:24:54Z

I can't speak for all SciPy developers, but if Numba continues to improve and be maintained (not sure of its status now with all the changes at Anaconda?) then at some point we'll likely adopt it. The concerns regarding build dependencies etc. are minor compared to all the C/C++/Fortran we have already. If you want me to start that discussion now on scipy-dev, I'd be happy to do so.

Re debugging: it depends on what you're debugging. If it's an issue with the logic of your code, then Numba is much better (just remove @jit). If it's a problem with Numba or Cython behavior though, then Cython's debugging support is far better than Numba's. The latter case is probably less common, but not that uncommon - both due to changes in new releases and things that don't work identically to plain Python code in the first place. I'd say that as a contributor I would prefer Numba, and as a library maintainer I'd prefer Cython.

We recently got a similar question from Pythran, and I came up with a list of factors for the decision at https://mail.python.org/pipermail/scipy-dev/2018-January/022334.html. The ones I'm not sure about for Numba are:

portability (exotic stuff like AIX, Raspberry Pi, etc.)
maintenance status
templating support for multiple dtypes

datnamer · 2018-03-04T18:25:40Z

How will the sparse array object be represented in numba? The JIT class capability.is currently very limited.

hameerabbasi · 2018-03-04T18:28:11Z

@datnamer This is a discussion on how we will adopt Numba for our internal algorithms, to make our own library faster for users. It isn't much of a discussion about how to make our classes JIT-compatible.

hameerabbasi · 2018-03-04T18:30:45Z

If you want me to start that discussion now on scipy-dev, I'd be happy to do so.

That'd be really appreciated. 😃

hameerabbasi · 2018-03-04T19:58:49Z

cc @woodmd, since you're planning to contribute, you should get a say. Also, @nils-werner.

stefanv · 2018-03-04T22:02:19Z

As long as things go well, numba has an edge above Cython. But once they start to misbehave, I'm not sure a) how to identify that it is misbehaving in numba or b) how to debug numba and steer it in the right direction. We have good answers to these questions for Cython because we've been using it for long all over the scientific Python ecosystem, but if you can find similar answers for numba it would be tempting to use that instead (the code will almost certainly be simpler to implement and debug).

hameerabbasi · 2018-03-05T12:21:02Z

One question at large regarding Cython, which will be very useful later: How hard is it to make it work for general N-dimensional arrays? From what I can tell, only arrays of a specific dimension can be an input to a function.

I personally would probably stop spending as much personal time fielding maintenance questions or build/release issues if we go for Cython. This is something that the community would need to pick up long term.

Would you be willing to contribute in code (for the pure Python parts) and do code reviews (for both parts)? FWIW, I only intend to move the bottlenecks (currently only indexing, sorting, matching) to Cython/Numba/C++, keeping all else as-is. I don't intend to write the entire thing in another language, Numpy already does a lot of that for us. I also care a lot about maintainability and attracting devs, I'm sorry if it came across otherwise. Maintainability and community > A few ms shaved off an operation.

If building wheels is an issue, I have access to all three major OSs on my personal machine, and can easily build wheels for all of them (although I believe CircleCI/AppVeyor/Travis do this already)

You're a big part of this project at this stage, losing that could be a fatal blow to it.

nils-werner · 2018-03-05T13:07:49Z

Re nD data structures: All internal data of COO arrays are 1D or 2D, are they not?

hameerabbasi · 2018-03-05T13:10:25Z

There are a few use cases for N-D iteration in our code that I can think of at this point:

Batch DOK assignment.
Advanced indexing.

hameerabbasi · 2018-03-05T14:52:51Z

portability (exotic stuff like AIX, Raspberry Pi, etc.)

It depends on LLVM via llvmlite, which is quite portable, I think.

maintenance status

Last release was 25 days ago, code frequency seems to be good. Not sure about the future, though. @mrocklin might have insights on that.

mrocklin · 2018-03-05T16:16:19Z

I've raised an issue highlighting debugging issues in Numba here: numba/numba#2788

Engagement there would be welcome.

mrocklin · 2018-03-05T16:17:22Z

I do not have any major concerns about the longevity of Numba. Folks probably shouldn't rely on my opinion here due to the conflict of interest. I'd be happy to chat in more depth with folks if desired.

hameerabbasi · 2018-03-05T16:28:08Z

this optimization will allow to compile inner loops in nopython mode regardless of what code surrounds those inner loops.

This was what I was talking about when I spoke of interleaving, so that concern can be put aside. Not too long ago, this wasn't supported IIRC. It also means templates and custom dtypes are supported.

serge-sans-paille · 2018-04-23T08:03:37Z

@hameerabbasi serge-sans-paille/pythran#866 implements the missing features to support these two functions in Pythran, and add them to the testc ases. If you can provideme with more similar kernels to port, that's great :-)

mrocklin · 2018-04-23T09:02:15Z

Most scipy style code in the past has gone without nested dynamic data structures, mostly for performance reasons. I recommend that you convey more information about what you're trying to achieve and why you're trying to use these data structures and perhaps others in the community will be able to suggest alternatives.

hameerabbasi · 2018-04-23T12:58:13Z

@serge-sans-paille Unfortunately, these already work in Numba. I haven't written the actual kernels needed.
@mrocklin A radix argsort, for one needs list[list[int]] AFAICT to be reasonable, maybe I'm wrong, though.

Also, dict support is absolutely essential to speeding up DOK.

But the main thing holding me back (feature-wise instead of speed-wise) is list[list[int]]. All three issues I referenced above need this in one form or another.

For advanced indexing, I need list[np.ndarray[int, 1D]] to pass the flattened-broadcasted versions of the input arrays. I realize I could stack and send them in, but that's extra conversion overhead. I also need zip(*x) where x has the type I mentioned before, this can't be done as the tuple length can't be inferred at compile time.

For CSD/ndarray in elemwise, I'll need to bring them into "common compressed dimensions" (easy) but then I have to loop over them in a fashion that's hard to describe without putting into code but will need list[list[int]].

mrocklin · 2018-04-23T14:24:00Z

I realize I could stack and send them in, but that's extra conversion overhead

Right, I recommend verifying that this overhead is meaningful before trying to avoid it.

but then I have to loop over them in a fashion that's hard to describe without putting into code but will need list[list[int]].

I recommend providing more detail about exactly what algorithms or code or outcomes you're trying to accomplish and perhaps someone can provide alternatives with statically allocated data structures. You've done this with radix sort, which is a nice example. If there are others then I recommend providing pseudocode, tests that you think properly scope the problem, or links to algorithms or papers that you're trying to implement so that others can understand what you're trying to accomplish.

My guess is that it's very natural to use nested lists of lists for some of these operations, but that these are neither strictly necessary nor always optimal. There are often many ways to accomplish operations like these. CSR and CSC were designed to not need dynamic data structures. My guess is that it is possible to accomplish the operations that relevant libraries need without dynamic data structures. It may be that you're trying to implement everything all at once, which may not be the optimal use of time.

However I've said this before, and you seem to disagree. That's fine, I suspect that it is because you know more about the problem. What I'm asking here is that you share more of your knowledge so that others start to understand why these data structures are necessary. Then maybe people (myself or others) can help work around the problem. As things stand currently it's hard for others (or at least myself) to know what you have in mind.

At the same time I recommend that you push upstream and mention your concerns on the relevant Numba issue. I see that you've mentioned your desire on numba/numba#2560 . I recommend that you ping that issue again, saying again why you're interested and maybe asking how you could contribute to resolve the problem.

mrocklin · 2018-04-23T14:26:04Z

Also, dict support is absolutely essential to speeding up DOK.

FWIW I personally don't care about DOK at all. None of the applications that I know of (xarray, scikit-learn, tensor factorization algorithms) are likely to care much either. If I were in your position and wanted to make useful software I would query downstream users to hear what operations they need and focus on those first.

The two main communities that I would query are XArray and Scikit-Learn. I think that this would help to prioritize work.

ogrisel · 2018-04-26T09:12:00Z

I confirm that I don't know any algorithm in scikit-learn that would be optimally written using the DOK datastructure. We mostly use CSR / CSC at consuming time (to run linear algebra operations such as matrix vector products row wise or column wise) or row-wise and column-wise reductions (e.g. to scale the data). We also use COO for producing sparse matrices from another kind of data (e.g. a text vectorizer outputting a sparse matrix for bag of words counts) prior to conversion to CSR or CSC.

mrocklin · 2018-04-26T09:28:34Z

@ogrisel, can you list the minimal set of operations we would need for CSR/CSC to make relevant sklearn opeartions happy? Is it just tensordot or are elemwise, slicing, reductions, etc. also necessary?

…

On Thu, Apr 26, 2018 at 5:12 AM, Olivier Grisel ***@***.***> wrote: I confirm that I don't know any algorithm in scikit-learn that would be optimally written using the DOK datastructure. We mostly use CSR / CSC at consuming time (to run linear algebra operations such as matrix vector products row wise or column wise) or row-wise and column-wise reductions (e.g. to scale the data). We also use COO for producing sparse matrices from another kind of data (e.g. a text vectorizer outputting a sparse matrix for bag of words counts) prior to conversion to CSR or CSC. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#126 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AASszB7NtoFtPVL_XiNfq985t8oBHRJaks5tsY9ggaJpZM4SbWBy> .

ogrisel · 2018-04-26T09:53:27Z

Actually many scikit-learn algorithms have Cython code that directly accesses the 3 CSR/CSC component arrays indices, data and indptr and then do pointer arithmetic inside those data structure components.

One could expect that some of those algorithms could be refactored to work on row-wise-chunked CSR or column-wise or row-wise chunked CSC (depending on the algorithm and how they can be parallelized).

Example Cython code on CSC matrices:

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/cd_fast.pyx#L320

Some other algorithms (also in Cython) use a low overhead dataset abstraction to wrap dense 2D numpy arrays and CSR sparse matrices:

https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/utils/seq_dataset.pyx

Is is often possible to imagine that such algorithms could be extended to have a pure python outer-loop that would work on a list of row-wise CSR or CSC chunks in which a variant of the existing Cython code would run either sequentially (for out of core computation with a small model moving from one node to the next to update its parameter when accessing the chunks of data on that node) or sometimes in parallel accross chunks.

Other algorithms written in Python use dot products with a numpy array (and maybe sometimes another sparse matrix) and column-wise or row-wise aggregations (mean, sum) possibly on slices.

seibert · 2018-04-26T20:56:21Z

Some notes on issues raised above:

Being able to nest containers (like list[list[int]]) should be available in Numba 0.39 (scheduled for late June). However, we suspect that will reveal a new bottleneck for some use cases, where the unboxing cost of nested Python lists (when transitioning from Python to nopython mode) becomes prohibitive. For temporary lists, this won't matter.
The architecture support limitations should soon be resolved. By next release, ARMv7 will be working (except some issues that are also present in SciPy linalg functions), ARMv8 is working except for a few NaT conversion issues when working with datatime dtypes, and POWER8 will be solved with the future LLVM 6.0.1 release which contains some required PPC64 bug fixes.
We've had a request from HPAT for a typed dictionary that can be used as a temporary data structure inside algorithms. This is something we want to do for several reasons, but we're currently juggling development priorities between these sort of container features, refactoring our ufunc/gufunc compiler, and supporting the ability to jump back and forth between python and nopython mode in the same function. As a result I don't have an ETA, but it is a request we are aware of.

hameerabbasi · 2018-04-26T21:01:27Z

@seibert Seems like you're well ahead of schedule. 😃 Thanks a lot. I'll try to juggle around things so I do things that don't need list[list[int]] first. However, sorting (we need a counting/radix argsort) is the biggest bottleneck right now that needs it, the rest I think I can make work without this.

shoyer · 2018-04-26T23:44:53Z

I'm not entirely sure who would use sparse arrays in xarray, but my guess is that all-purpose flexibility of COO would suffice for most analytics purposes.

That said, for data analysis, I think the most useful capability would be support for fill-values other than zero (namely, NaN).

One intriguing usecase for sparse arrays in xarray is to reproduce the capabilities of the "multi-dimensional" databases used in business intelligence (something like Online analytical processing).

mrocklin · 2018-04-27T08:06:32Z

Hrm, that's an interesting thought. Do we know people who care about OLAP cubes? Perhaps @Stiivi ? It would be useful to get an introduction to someone that might need an in-memory sparse multi-dimensional data structure and would be willing to try things out and provide feedback from concrete applications.

hameerabbasi · 2018-04-27T20:36:02Z

Fill values are easy to build and I have them planned soon-ish. Most operations can be made to work with it, the only one I can think of that won't is dot/tensordot.

hameerabbasi · 2018-06-01T20:10:39Z

I'm still waiting on numba/numba#2560, it's a blocker for a lot of features. Failing that in 0.39, I'd like to discuss Cython and/or C/C++ wrapping alternatives.

mrocklin · 2018-06-01T20:35:44Z

I encourage you to ask on that issue again to see what their plans are, if any, and how expensive they think it is. That might help inform decisions.

…

On Fri, Jun 1, 2018 at 4:10 PM, Hameer Abbasi ***@***.***> wrote: I'm still waiting on numba/numba#2560 <numba/numba#2560>, it's a blocker for a lot of features. Failing that in 0.39, I'd like to discuss Cython and/or C/C++ wrapping alternatives. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#126 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AASszHnTW_8is-9VUlQ9V_InhE4n3mxAks5t4Z_AgaJpZM4SbWBy> .

JoElfner · 2018-06-07T12:46:42Z

@hameerabbasi and @mrocklin It looks like the implementation will be coming in 0.39. It is labeled as ready for review: here

aldanor · 2018-09-18T18:52:50Z

Refcounted lists seemed to have landed in numba indeed. Does anyone know what's the story with dict?

hameerabbasi · 2018-09-18T19:35:09Z

Refcounted lists seemed to have landed in numba indeed. Does anyone know what's the story with dict?

See numba/numba#2096.

JoElfner · 2018-09-18T19:55:46Z

Afaik numba version 1.0 release date is scheduled for december 2018. Dict support is marked as high priority for 1.0.

kiwi0fruit · 2019-03-04T14:29:36Z

In the light of this question Why Julia? Will Python/Numba and Python/Cython lose to Julia? I suggest to go with Numba and actively contribute to it if needed.

rth · 2020-03-10T11:49:45Z

Looks like the decisions was to go with numba, and given the amount of work invested there, this would be unlikely to change in the future? Should this be issue closed then?

Performance would be a significant argument for or against adoption (#331). For users who don't need much extra functionality outside of scipy.sparse (but just want a sparse COO/CSD 2D ndarray class), if this library can be made faster than scipy.sparse than accepting 2 extra dependencies (numba, llvmlite) can likely be acceptable. If the performance is similar or worse, that's going to be more difficult to sell (putting aside whether numba is generally faster than cython or not).

Wrapping low level libraries is probably not doable now with numba, but it still might be worth following the work done in https://github.com/vbarrielle/sprs in Rust. So far it doesn't have the ND generalized data structures that pydata/sparse has, but some work has been done lately to benchmark against scipy.sparse in sparsemat/sprs#184 (comment) and to improve results.

hameerabbasi · 2020-08-18T13:49:45Z

Wrapping low level libraries is probably not doable now with numba, but it still might be worth following the work done in https://github.com/vbarrielle/sprs in Rust. So far it doesn't have the ND generalized data structures that pydata/sparse has, but some work has been done lately to benchmark against scipy.sparse in vbarrielle/sprs#184 (comment) and to improve results.

While I'm somewhat of a fan of Rust myself, it seems hard, due to the amount of work invested, as you say. Not to mention, some upcoming features absolutely need JIT, due to an "explosion of types".

If Rust or C++ JIT is a stable thing (better than Numba) I'd like to hear about it.

stuartarchibald · 2020-09-15T08:05:49Z

<snip> Wrapping low level libraries is probably not doable now with numba, but it still might be worth following the work done in https://github.com/vbarrielle/sprs in Rust. So far it doesn't have the ND generalized data structures that pydata/sparse has, but some work has been done lately to benchmark against scipy.sparse in vbarrielle/sprs#184 (comment) and to improve results.

Numba has ctypes and cffi support, it's possible to get a long way with these (void *!) and the extension API.

hameerabbasi added the discussion label Mar 4, 2018

hameerabbasi mentioned this issue Mar 4, 2018

Dict type in nopython mode numba/numba#2096

Closed

hameerabbasi mentioned this issue Mar 5, 2018

Cython additions #123

Closed

mrocklin mentioned this issue Mar 5, 2018

Improve support on debugging Numba numba/numba#2788

Closed

This was referenced Apr 23, 2018

Add advanced indexing support #114

Open

Allow ndarray in elemwise again #124

Closed

hameerabbasi mentioned this issue Jun 25, 2018

Paper: "Sparse: A more modern sparse array library" scipy-conference/scipy_proceedings#388

Merged

Huite mentioned this issue Nov 12, 2018

Replace compiled Fortran by Just-In-Time compiled Python (numba) mbakker7/timml#15

Merged

kiwi0fruit mentioned this issue Mar 4, 2019

Why Julia? Will Python/Numba and Python/Cython lose to Julia? numba/numba#3814

Closed

kiwi0fruit mentioned this issue Mar 5, 2019

PEP for Python-free interop at C level vs. Python to Julia transpiler kiwi0fruit/misc#6

Closed

rth mentioned this issue Mar 9, 2020

Sparse ndarray class scipy/scipy#8162

Closed

Use Cython, Numba, or C/C++ for algorithmic code #126

Use Cython, Numba, or C/C++ for algorithmic code #126

Comments

mrocklin commented Mar 4, 2018

mrocklin commented Mar 4, 2018 • edited Loading

My personal thoughts

Suggestion

However

mrocklin commented Mar 4, 2018

hameerabbasi commented Mar 4, 2018 • edited Loading

In favor of Cython

In favor of Numba

My Concerns

mrocklin commented Mar 4, 2018

mrocklin commented Mar 4, 2018

hameerabbasi commented Mar 4, 2018 • edited Loading

albop commented Mar 4, 2018 • edited Loading

hameerabbasi commented Mar 4, 2018 • edited Loading

albop commented Mar 4, 2018

hameerabbasi commented Mar 4, 2018

shoyer commented Mar 4, 2018

ogrisel commented Mar 4, 2018

hameerabbasi commented Mar 4, 2018 • edited Loading

rgommers commented Mar 4, 2018

datnamer commented Mar 4, 2018

hameerabbasi commented Mar 4, 2018

hameerabbasi commented Mar 4, 2018

hameerabbasi commented Mar 4, 2018 • edited Loading

stefanv commented Mar 4, 2018

hameerabbasi commented Mar 5, 2018 • edited Loading

nils-werner commented Mar 5, 2018

hameerabbasi commented Mar 5, 2018

hameerabbasi commented Mar 5, 2018

mrocklin commented Mar 5, 2018

mrocklin commented Mar 5, 2018

hameerabbasi commented Mar 5, 2018

serge-sans-paille commented Apr 23, 2018

mrocklin commented Apr 23, 2018

hameerabbasi commented Apr 23, 2018

mrocklin commented Apr 23, 2018

mrocklin commented Apr 23, 2018

ogrisel commented Apr 26, 2018

mrocklin commented Apr 26, 2018 via email

ogrisel commented Apr 26, 2018 • edited Loading

seibert commented Apr 26, 2018

hameerabbasi commented Apr 26, 2018

shoyer commented Apr 26, 2018

mrocklin commented Apr 27, 2018

hameerabbasi commented Apr 27, 2018

hameerabbasi commented Jun 1, 2018

mrocklin commented Jun 1, 2018 via email

JoElfner commented Jun 7, 2018

aldanor commented Sep 18, 2018

hameerabbasi commented Sep 18, 2018

JoElfner commented Sep 18, 2018

kiwi0fruit commented Mar 4, 2019

rth commented Mar 10, 2020

hameerabbasi commented Aug 18, 2020

stuartarchibald commented Sep 15, 2020

mrocklin commented Mar 4, 2018 •

edited

Loading

hameerabbasi commented Mar 4, 2018 •

edited

Loading

hameerabbasi commented Mar 4, 2018 •

edited

Loading

albop commented Mar 4, 2018 •

edited

Loading

hameerabbasi commented Mar 4, 2018 •

edited

Loading

hameerabbasi commented Mar 4, 2018 •

edited

Loading

hameerabbasi commented Mar 4, 2018 •

edited

Loading

hameerabbasi commented Mar 5, 2018 •

edited

Loading

ogrisel commented Apr 26, 2018 •

edited

Loading