-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
WIP: add pd.read_ipc and DataFrame.to_ipc to provide efficient serialization to/from memory #15907
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Given that this is a pretty 'advanced' feature (and presumably an unstable format?), maybe would make sense to expose these functions somewhere like |
@chris-b1 yeah probably. This is just trying to get working (today) locally. :> (its not actually anywhere atm, but in the local module). |
Codecov Report
@@ Coverage Diff @@
## master #15907 +/- ##
==========================================
- Coverage 91% 90.9% -0.11%
==========================================
Files 145 146 +1
Lines 49576 49631 +55
==========================================
Hits 45118 45118
- Misses 4458 4513 +55
Continue to review full report at Codecov.
|
I agree putting such things into an experimental namespace would be a good idea |
Genuine question: since we are typically trying to limit pandas' scope, why should we include this in the core package? Instead of having this live in the package implementing this / a separate package ('pandas-ipc') providing this API? |
True, we are trying to limit scope. This is essentialy an ipc version of Could consider this as |
I think it would be useful for pandas to have more robust support for various transient serialization formats, with a choice between pickle (most compatible) vs msgpack vs arrow vs other things. Whether the implementation of this goes into core pandas, or into a "leaf" library that gets imported, I don't have a strong opinion |
closing for now. This will be pretty transparent with |
so this is now available in a released version in arrow: https://arrow.apache.org/docs/python/ipc.html (IIRC 0.5.0 has full support). appetite for this in main pandas as |
Depending on what memory format these functions create it may affect the name. If it's vanilla arrow stream (schema + sequence of record batches), then it might be better to call it |
No description provided.