forked from pandas-dev/pandas
-
Notifications
You must be signed in to change notification settings - Fork 1
[PoC] Allow JIT compilation with an internal API #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
datapythonista
wants to merge
35
commits into
main
Choose a base branch
from
bodo_frame_apply
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
updates: - [github.com/astral-sh/ruff-pre-commit: v0.9.4 → v0.9.9](astral-sh/ruff-pre-commit@v0.9.4...v0.9.9) - [github.com/PyCQA/isort: 6.0.0 → 6.0.1](PyCQA/isort@6.0.0...6.0.1) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
* ENH: Add HalfYear offsets * Add entry to whatsnew * Resolve cython typing issue
* test_datetimes.py: fix literal string * fix test * fix repeated whitespace * add whatsnew entry * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
…-dev#47428) (pandas-dev#60758) Co-authored-by: Jeremy Tuloup <[email protected]>
Fix Styler.to_latex to be in Writer column
changed "normalise" to "normalize"
* Updated set_index doc with a warning * Updated set_index parameter append along with an example * Updated set_index example for append * Updated set_index example
* BUG: Recognize chained fsspec URLs * Add whatsnew note * Rename regex variable appropriately and allow more complex chaining * Fix pre-commit
Remove bogus syntax highlighting on LICENSE in overview.rst
* DOC: Add link description Also remove errant space * fix line too long * Undo space removal
* Modify an existing test to cover the issue with na_pos > 128. * Change na_position type from int8_t and int64_t consistently to Py_ssize_t. * Add What's New entry. * Sort whatsnew entries alphabetically * Improve the whatsnew entry. * Move whatsnew entry from v2.3.0.rst to v3.0.0.rst. * Update doc/source/whatsnew/v3.0.0.rst Co-authored-by: Matthew Roeschke <[email protected]> * Undo remove '-'. * Sort whatsnew entries alphabetically. --------- Co-authored-by: avm19 <[email protected]> Co-authored-by: Matthew Roeschke <[email protected]>
…ev#60983) * modified the files according to bug#60237 * Update doc/source/whatsnew/v3.0.0.rst Co-authored-by: Matthew Roeschke <[email protected]> * moved test case to frame and serier folders * fix pyarrow import error * inconsistent issue fix * added test cases and fixed old pr test cases * added rst and small changes in tests file * fixed column name issue for column wise concat * fixed text case for concat * fix test cases issue * Trigger redeployment * fixed reviewed changes and added extra test cases * removed duplicate test case --------- Co-authored-by: Matthew Roeschke <[email protected]>
) * BUG: Fix OverflowError in lib.maybe_indices_to_slice() This fixes this error when slicing massive dataframes: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/frame.py", line 4093, in __getitem__ return self._getitem_bool_array(key) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/frame.py", line 4155, in _getitem_bool_array return self._take_with_is_copy(indexer, axis=0) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/generic.py", line 4153, in _take_with_is_copy result = self.take(indices=indices, axis=axis) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/generic.py", line 4133, in take new_data = self._mgr.take( ^^^^^^^^^^^^^^^ File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/internals/managers.py", line 893, in take new_labels = self.axes[axis].take(indexer) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/anaconda3/lib/python3.12/site-packages/pandas/core/indexes/datetimelike.py", line 839, in take maybe_slice = lib.maybe_indices_to_slice(indices, len(self)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "lib.pyx", line 522, in pandas._libs.lib.maybe_indices_to_slice OverflowError: value too large to convert to int * Sort whatsnew entries * Set type hint back to int --------- Co-authored-by: benjamindonnachie <[email protected]>
* ENH: Add Rolling.nunique() * Add docstring for Expanding.nunique() * Add a test for float precision issues
* DOC: Add doc for half year offsets * Fix freq strings * Fix docstring error * Fix more docstring errors
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The approach here is to use a
jit
parameter for any function that could make sense to JIT in pandas (DataFrame.apply
,Series.map
,SeriesGroupBy.transform
...) that delegates to the JIT compiler (Numba or Bodo) 100% of the logic.Final user API would look like:
Which I think it's very simple and intuitive, and at the same time makes users import
numba
andbodo
themselves, creating the right impression that they are using those libraries to JIT compile, and it's not something provided by pandas. At least that's my expectation, maybe others disagree.I think this approach is very convenient for the pandas team, as maintaining the changes in pandas is trivial. And I think it should be very convenient for Bodo, which doesn't depend on reviews and decisions from pandas, as it will be Bodo maintaining all the logic. Also, Bodo can probably release much faster than what pandas will, speeding up the release of new features and bug fixes.
The exact internal API (the
__pandas_udf__
function in this PR) can probably be improved by Bodo (and Numba). But probably better to discuss if this is the approach we want to implement first, and then discuss the details of the exact API.