[PoC] Allow JIT compilation with an internal API #6

datapythonista · 2025-03-02T10:58:42Z

The approach here is to use a jit parameter for any function that could make sense to JIT in pandas (DataFrame.apply, Series.map, SeriesGroupBy.transform...) that delegates to the JIT compiler (Numba or Bodo) 100% of the logic.

Final user API would look like:

df.apply(lambda x: x.A + x.B, axis=1, jit=bodo.jit(parallel=True))

Which I think it's very simple and intuitive, and at the same time makes users import numba and bodo themselves, creating the right impression that they are using those libraries to JIT compile, and it's not something provided by pandas. At least that's my expectation, maybe others disagree.

I think this approach is very convenient for the pandas team, as maintaining the changes in pandas is trivial. And I think it should be very convenient for Bodo, which doesn't depend on reviews and decisions from pandas, as it will be Bodo maintaining all the logic. Also, Bodo can probably release much faster than what pandas will, speeding up the release of new features and bug fixes.

The exact internal API (the __pandas_udf__ function in this PR) can probably be improved by Bodo (and Numba). But probably better to discuss if this is the approach we want to implement first, and then discuss the details of the exact API.

updates: - [github.com/astral-sh/ruff-pre-commit: v0.9.4 → v0.9.9](astral-sh/ruff-pre-commit@v0.9.4...v0.9.9) - [github.com/PyCQA/isort: 6.0.0 → 6.0.1](PyCQA/isort@6.0.0...6.0.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

* ENH: Add HalfYear offsets * Add entry to whatsnew * Resolve cython typing issue

…tly (pandas-dev#61042)

* test_datetimes.py: fix literal string * fix test * fix repeated whitespace * add whatsnew entry * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

…-dev#47428) (pandas-dev#60758) Co-authored-by: Jeremy Tuloup <[email protected]>

Fix Styler.to_latex to be in Writer column

changed "normalise" to "normalize"

* Updated set_index doc with a warning * Updated set_index parameter append along with an example * Updated set_index example for append * Updated set_index example

* BUG: Recognize chained fsspec URLs * Add whatsnew note * Rename regex variable appropriately and allow more complex chaining * Fix pre-commit

repeated word

Remove bogus syntax highlighting on LICENSE in overview.rst

) * Add inference type info to apply * DOC: Add inference type information to Dataframe Apply

* DOC: Add link description Also remove errant space * fix line too long * Undo space removal

* Modify an existing test to cover the issue with na_pos > 128. * Change na_position type from int8_t and int64_t consistently to Py_ssize_t. * Add What's New entry. * Sort whatsnew entries alphabetically * Improve the whatsnew entry. * Move whatsnew entry from v2.3.0.rst to v3.0.0.rst. * Update doc/source/whatsnew/v3.0.0.rst Co-authored-by: Matthew Roeschke <[email protected]> * Undo remove '-'. * Sort whatsnew entries alphabetically. --------- Co-authored-by: avm19 <[email protected]> Co-authored-by: Matthew Roeschke <[email protected]>

…ev#60983) * modified the files according to bug#60237 * Update doc/source/whatsnew/v3.0.0.rst Co-authored-by: Matthew Roeschke <[email protected]> * moved test case to frame and serier folders * fix pyarrow import error * inconsistent issue fix * added test cases and fixed old pr test cases * added rst and small changes in tests file * fixed column name issue for column wise concat * fixed text case for concat * fix test cases issue * Trigger redeployment * fixed reviewed changes and added extra test cases * removed duplicate test case --------- Co-authored-by: Matthew Roeschke <[email protected]>

) * BUG: Fix OverflowError in lib.maybe_indices_to_slice() This fixes this error when slicing massive dataframes: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/frame.py", line 4093, in __getitem__ return self._getitem_bool_array(key) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/frame.py", line 4155, in _getitem_bool_array return self._take_with_is_copy(indexer, axis=0) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/generic.py", line 4153, in _take_with_is_copy result = self.take(indices=indices, axis=axis) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/generic.py", line 4133, in take new_data = self._mgr.take( ^^^^^^^^^^^^^^^ File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/internals/managers.py", line 893, in take new_labels = self.axes[axis].take(indexer) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/indexes/datetimelike.py", line 839, in take maybe_slice = lib.maybe_indices_to_slice(indices, len(self)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "lib.pyx", line 522, in pandas._libs.lib.maybe_indices_to_slice OverflowError: value too large to convert to int * Sort whatsnew entries * Set type hint back to int --------- Co-authored-by: benjamindonnachie <[email protected]>

)

* ENH: Add Rolling.nunique() * Add docstring for Expanding.nunique() * Add a test for float precision issues

* DOC: Add doc for half year offsets * Fix freq strings * Fix docstring error * Fix more docstring errors

… skew (pandas-dev#61098)

datapythonista and others added 30 commits March 2, 2025 17:43

[PoC] Allow JIT compilation with an internal API

7ec827e

Improving the documentation

bc2a178

CI

8b420cc

DOC: Correct typos in Working with text data (pandas-dev#61034)

8e9487a

ENH: Add HalfYear offsets (pandas-dev#60946)

826f0d3

* ENH: Add HalfYear offsets * Add entry to whatsnew * Resolve cython typing issue

CI: Bump GHA uses versions (pandas-dev#61039)

938763e

CI/TST: Fix xfail in test_columns_dtypes_not_invalid for pyarrow nigh…

7283953

…tly (pandas-dev#61042)

ENH: Add JupyterLite-powered shell for the website (reprise of pandas…

4f27380

…-dev#47428) (pandas-dev#60758) Co-authored-by: Jeremy Tuloup <[email protected]>

DOC Fix Styler.to_latex to be in Writer column (pandas-dev#61053)

9e2e65a

Fix Styler.to_latex to be in Writer column

Renamed "normalise" to "normalize" (pandas-dev#61051)

c1e57c9

changed "normalise" to "normalize"

DOC: Updated set_index doc with a warning (pandas-dev#60990)

56847c5

* Updated set_index doc with a warning * Updated set_index parameter append along with an example * Updated set_index example for append * Updated set_index example

DOC: Correct a typo in ecosystem.md (pandas-dev#61059)

c8811fb

Adjust Docker File Key Value Format (pandas-dev#61050)

b8f6bac

BUG: Recognize chained fsspec URLs (pandas-dev#61041)

9528057

* BUG: Recognize chained fsspec URLs * Add whatsnew note * Rename regex variable appropriately and allow more complex chaining * Fix pre-commit

DOC: Fix typo in Timestamp.isoformat (pandas-dev#61067)

f2c3144

repeated word

DOC: Fix syntax highlighting in overview (pandas-dev#61066)

e59a411

Remove bogus syntax highlighting on LICENSE in overview.rst

DOC: Add inference type information to Dataframe Apply (pandas-dev#61065

12d1dda

) * Add inference type info to apply * DOC: Add inference type information to Dataframe Apply

DOC: Add link description (pandas-dev#61063)

2030d9d

* DOC: Add link description Also remove errant space * fix line too long * Undo space removal

BUG: Fix MultiIndex from_tuples on tuples with NaNs (pandas-dev#60944)

f1b00b8

Better execution engine API

6a9ee5a

Fixing test

444de67

Added tests, fixed some bugs and added a release note

7e1e855

Removed temporary bodo decorator example

58fb30d

make mypy happy

c239fc9

BUG(string dtype): Empty sum produces incorrect result (pandas-dev#60936

dab1b88

)

datapythonista and others added 5 commits March 10, 2025 23:43

Typos

9567152

ENH: Add Rolling.nunique() (pandas-dev#61087)

781182c

* ENH: Add Rolling.nunique() * Add docstring for Expanding.nunique() * Add a test for float precision issues

DOC: Add doc for half year offsets (pandas-dev#61082)

513e787

* DOC: Add doc for half year offsets * Fix freq strings * Fix docstring error * Fix more docstring errors

CI/TST: Address TestArrowArray::test_reduce_series_numeric supporting…

89bc204

… skew (pandas-dev#61098)

Merge remote-tracking branch 'upstream/main' into bodo_frame_apply

2ff333f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PoC] Allow JIT compilation with an internal API #6

[PoC] Allow JIT compilation with an internal API #6

datapythonista commented Mar 2, 2025

[PoC] Allow JIT compilation with an internal API #6

Are you sure you want to change the base?

[PoC] Allow JIT compilation with an internal API #6

Conversation

datapythonista commented Mar 2, 2025