WIP: add pd.read_ipc and DataFrame.to_ipc to provide efficient serialization to/from memory #15907

jreback · 2017-04-05T18:40:33Z

No description provided.

jreback · 2017-04-05T18:40:45Z

chris-b1 · 2017-04-05T18:56:04Z

Given that this is a pretty 'advanced' feature (and presumably an unstable format?), maybe would make sense to expose these functions somewhere like pandas.api.lib rather than in the main namespace?

jreback · 2017-04-05T18:58:35Z

@chris-b1 yeah probably. This is just trying to get working (today) locally. :> (its not actually anywhere atm, but in the local module).

codecov · 2017-04-05T19:09:01Z

Codecov Report

Merging #15907 into master will decrease coverage by 0.1%.
The diff coverage is 0%.

@@            Coverage Diff             @@
##           master   #15907      +/-   ##
==========================================
- Coverage      91%    90.9%   -0.11%     
==========================================
  Files         145      146       +1     
  Lines       49576    49631      +55     
==========================================
  Hits        45118    45118              
- Misses       4458     4513      +55

Flag	Coverage Δ
#multiple	`88.67% <0%> (-0.1%)`	⬇️
#single	`40.53% <0%> (-0.05%)`	⬇️

Impacted Files	Coverage Δ
pandas/io/ipc.py	`0% <0%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update f35209e...f727477. Read the comment docs.

wesm · 2017-04-05T19:11:17Z

I agree putting such things into an experimental namespace would be a good idea

jorisvandenbossche · 2017-04-05T19:37:05Z

Genuine question: since we are typically trying to limit pandas' scope, why should we include this in the core package? Instead of having this live in the package implementing this / a separate package ('pandas-ipc') providing this API?

jreback · 2017-04-05T21:42:27Z

Genuine question: since we are typically trying to limit pandas' scope, why should we include this in the core package? Instead of having this live in the package implementing this / a separate package ('pandas-ipc') providing this API?

True, we are trying to limit scope. This is essentialy an ipc version of arrow (which also back feather). So this ATM expands scope, though eventually would limit it somewhat (as the back-end would have less tooling).

Could consider this as pandas-ipc though this is pretty 'simple' and not meant as a full-fledged package, mostly a passthru / convenient interface for others (e.g. dask).

wesm · 2017-04-05T21:49:20Z

I think it would be useful for pandas to have more robust support for various transient serialization formats, with a choice between pickle (most compatible) vs msgpack vs arrow vs other things. Whether the implementation of this goes into core pandas, or into a "leaf" library that gets imported, I don't have a strong opinion

… to/from memory

jreback · 2017-06-10T19:02:52Z

closing for now. This will be pretty transparent with pyarrow, so this interface would just be a simple wrapper. let's see if its even needed.

jreback · 2017-09-12T13:02:47Z

so this is now available in a released version in arrow: https://arrow.apache.org/docs/python/ipc.html (IIRC 0.5.0 has full support). appetite for this in main pandas as read_ipc/to_ipc ?

@wesm @cpcloud @jorisvandenbossche

wesm · 2017-09-12T13:15:09Z

Depending on what memory format these functions create it may affect the name. If it's vanilla arrow stream (schema + sequence of record batches), then it might be better to call it read_arrow_stream / to_arrow_stream (in the latter case, you could select a batch size to use for chunking the DataFrame once someone implements this)

jreback added IO Data IO issues that don't fit into a more specific label Enhancement labels Apr 5, 2017

jreback force-pushed the ipc branch from c5f846f to c7a84c7 Compare April 7, 2017 20:24

jreback added 2 commits April 8, 2017 18:02

ENH: add pd.read_ipc and pd.to_ipc to provide efficient serialization…

d4ba982

… to/from memory

pass a dict of {'engine': engine_string, 'data': data dict}

f727477

jreback force-pushed the ipc branch from c7a84c7 to f727477 Compare April 8, 2017 22:07

jreback closed this Jun 10, 2017

jreback mentioned this pull request Oct 12, 2019

replace _msgpack with _pyarrow #28944

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: add pd.read_ipc and DataFrame.to_ipc to provide efficient serialization to/from memory #15907

WIP: add pd.read_ipc and DataFrame.to_ipc to provide efficient serialization to/from memory #15907

jreback commented Apr 5, 2017

jreback commented Apr 5, 2017

chris-b1 commented Apr 5, 2017

jreback commented Apr 5, 2017 •

edited

Loading

codecov bot commented Apr 5, 2017 •

edited

Loading

wesm commented Apr 5, 2017

jorisvandenbossche commented Apr 5, 2017

jreback commented Apr 5, 2017

wesm commented Apr 5, 2017

jreback commented Jun 10, 2017

jreback commented Sep 12, 2017

wesm commented Sep 12, 2017

WIP: add pd.read_ipc and DataFrame.to_ipc to provide efficient serialization to/from memory #15907

WIP: add pd.read_ipc and DataFrame.to_ipc to provide efficient serialization to/from memory #15907

Conversation

jreback commented Apr 5, 2017

jreback commented Apr 5, 2017

chris-b1 commented Apr 5, 2017

jreback commented Apr 5, 2017 • edited Loading

codecov bot commented Apr 5, 2017 • edited Loading

Codecov Report

wesm commented Apr 5, 2017

jorisvandenbossche commented Apr 5, 2017

jreback commented Apr 5, 2017

wesm commented Apr 5, 2017

jreback commented Jun 10, 2017

jreback commented Sep 12, 2017

wesm commented Sep 12, 2017

jreback commented Apr 5, 2017 •

edited

Loading

codecov bot commented Apr 5, 2017 •

edited

Loading