Feature: implement Bokeh dataframe plotability #6962

michaelaye · 2014-04-25T05:00:09Z

Bokeh specializes on large data, and is using pandas in some of their examples (e.g. here).
As pandas is built to work with large data-sets as well, I believe they would make an excellent couple!

However, I have tried to use the built-in dataframe plotting in the glucose notebook but without success, receiving either TypeErrors or KeyErrors, depending on how I initiate the plot:

data.head()

isig    glucose inrange
datetime            
2010-03-24 09:51:00  22.59   258     False
2010-03-24 09:56:00  22.52   260     False
2010-03-24 10:01:00  22.23   258     False
2010-03-24 10:06:00  21.56   254     False
2010-03-24 10:11:00  20.79   246     False

data.isig.plot()

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-23-14244d9a86ef> in <module>()
----> 1 data.isig.plot()

/Users/maye/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas-0.13.1_648_gad1f47d-py2.7-macosx-10.6-x86_64.egg/pandas/tools/plotting.pyc in plot_series(series, label, kind, use_index, rot, xticks, yticks, xlim, ylim, ax, style, grid, legend, logx, logy, secondary_y, **kwds)
   2114                      secondary_y=secondary_y, **kwds)
   2115 
-> 2116     plot_obj.generate()
   2117     plot_obj.draw()
   2118 

/Users/maye/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas-0.13.1_648_gad1f47d-py2.7-macosx-10.6-x86_64.egg/pandas/tools/plotting.pyc in generate(self)
    916     def generate(self):
    917         self._args_adjust()
--> 918         self._compute_plot_data()
    919         self._setup_subplots()
    920         self._make_plot()

/Users/maye/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas-0.13.1_648_gad1f47d-py2.7-macosx-10.6-x86_64.egg/pandas/tools/plotting.pyc in _compute_plot_data(self)
   1003         if is_empty:
   1004             raise TypeError('Empty {0!r}: no numeric data to '
-> 1005                             'plot'.format(numeric_data.__class__.__name__))
   1006 
   1007         self.data = numeric_data

TypeError: Empty 'Series': no numeric data to plot

data.plot(data.index.to_series(), data.isig)

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-24-e46a01c88c5e> in <module>()
----> 1 data.plot(data.index.to_series(), data.isig)

/Users/maye/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas-0.13.1_648_gad1f47d-py2.7-macosx-10.6-x86_64.egg/pandas/tools/plotting.pyc in plot_frame(frame, x, y, subplots, sharex, sharey, use_index, figsize, grid, legend, rot, ax, style, title, xlim, ylim, logx, logy, xticks, yticks, kind, sort_columns, fontsize, secondary_y, **kwds)
   1985             label = x if x is not None else frame.index.name
   1986             label = kwds.pop('label', label)
-> 1987             ser = frame[y]
   1988             ser.index.name = label
   1989 

/Users/maye/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas-0.13.1_648_gad1f47d-py2.7-macosx-10.6-x86_64.egg/pandas/core/frame.pyc in __getitem__(self, key)
   1646         if isinstance(key, (Series, np.ndarray, list)):
   1647             # either boolean or fancy integer index
-> 1648             return self._getitem_array(key)
   1649         elif isinstance(key, DataFrame):
   1650             return self._getitem_frame(key)

/Users/maye/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas-0.13.1_648_gad1f47d-py2.7-macosx-10.6-x86_64.egg/pandas/core/frame.pyc in _getitem_array(self, key)
   1690             return self.take(indexer, axis=0, convert=False)
   1691         else:
-> 1692             indexer = self.ix._convert_to_indexer(key, axis=1)
   1693             return self.take(indexer, axis=1, convert=True)
   1694 

/Users/maye/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas-0.13.1_648_gad1f47d-py2.7-macosx-10.6-x86_64.egg/pandas/core/indexing.pyc in _convert_to_indexer(self, obj, axis, is_setter)
   1065                     if isinstance(obj, tuple) and is_setter:
   1066                         return {'key': obj}
-> 1067                     raise KeyError('%s not in index' % objarr[mask])
   1068 
   1069                 return indexer

KeyError: "['22.59' '22.52' '22.23' ..., '29.06' '29.3' '30.8'] not in index"

I am guessing that we need to teach the DataFrame to check if the Bokeh framework is active and then point the plotter to the slightly different syntax for Bokeh plotting, which we would need to implement of course. Can somebody shed a light on what is going wrong in general in above examples?

Considering the possibility of having active zoomable plots over usually massively too long time-series with lots of data points in them I am very intrigued by this feature and would love to see it implemented. If further discussion in this issue can guide me on how to do the recognition part of the active plotting framework (or if somebody would take that on would be great), I am volunteering in wrapping the different Bokeh plotting syntaxes, so that e.g. the standard df.colname.plot() will work as well with Bokeh underneath.

Sounds good?

jtratner · 2014-04-25T05:10:45Z

Sounds like it would be a big win to take advantage of Bokeh. For the
moment - can you print out the dtypes so we can narrow this down?

michaelaye · 2014-04-25T07:44:54Z

yeah, wow, sorry..

isig is of type object, totally did not expect that, as the plotting works 'magically' with bokeh.
I now converted that:

isig       float64
glucose      int64
dtype: object

and the usual df.isig.plot() works now but opens a matplotlib plot still.

Interestingly, a plot of one column against the other still fails with Key- and Index out of bounds erros, I thought the syntax df.plot(col1, col2) was possible, unless I remember that wrong. If you think that should work, I can open another issue.

jorisvandenbossche · 2014-04-25T08:03:22Z

@michaelaye If I understand correctly, what you show above is just a pandas plotting call (with matplotlib underneath as it is implemented now), so if there fails something, that has just something to do with pandas. I don't understand what this should have to do with bokeh?

But of course, the feature request to have bokeh integration is a legitimate request. But another discussion? Or do I miss something?

And df.plot(x='col1', y='col2') should normally work.

michaelaye · 2014-04-25T08:16:00Z

First, I hope the title of the issue makes it clear that this is about bokeh integration. For me the first step was in trying it naively out with the bokeh session management redirecting stuff (I have no clue) to the notebook (i.e. output_notebook()). As I don't know how DataFrame.plot() is implemented, I was just assuming that above errors could be caused by bokeh messing with the output of graphics when pandas.DataFrame tries to plot something? Anyhow, as I had a MPL plot popping up with the standard df.colname.plot(), that's been cleared up. I still can't make the data plot with (x=, y=) so I'll make another issue out of it. This should be left for bokeh things.

jorisvandenbossche · 2014-04-25T08:28:45Z

@michaelaye yep, understood, I just wanted it to make clear it the errors had nothing to do with the issue you wanted to raise (for the df.plot(x=, y=), note that it should be strings indicating a column name, not series/columns itself, as what you did in the example you gave above)

But, on the bokeh integration, I think the interesting route for now, is the matplotlib compatibility of bokeh (in that they try to render a matplotlib figure with bokeh). I think they are still working on this, but maybe you can try that out? (it is the function bokeh.pyplot.show_bokeh() that you have to call afer you created the figure I think)

I think for pandas itself this is a more interesting path, than fully implementing an alternative .plot using bokeh in the background instead of matplotlib.

bryevdv · 2014-07-28T21:30:57Z

Hey guys, right now our MPL compat comes from @jakevdp mplexporter module. At SciPy the MPL guys stated their intention to make a more fully featured and robust JSON import/export facility for MPL that libraries like Bokeh, mpld3, and plot.ly can use to provide more complete MPL support. We will certainly adopt this when it is ready but I don't know what their expected timeframe is.

I do think that native Bokeh integration is worth considering, you could have more interesting features liked linked panning and linked selections much easier. But we are happy to help out integrating Bokeh more closely any way we can, whatever route you pursue.

michaelaye · 2014-07-28T21:34:36Z

Good to hear. The most interesting feature of Bokeh for me is having a higher performance for a lot of points and the ability to create a self-containing html with plots that stay interactive. (I know about MPL3D, but as Jake himself writes, that's not very performant when it comes to millions of points.)

bryevdv · 2014-07-28T21:38:16Z

Bokeh is not going to be naively performant with millions of points, either, unless you are including the Abstract Rendering and downsampling work that is being integrated. But even with, say, a 1000x1000 canvas you have a million pixels... you are never going to want to actually plot millions of points without some kind of downsamping or pre-rasterization aggregation, but those are things that Bokeh could help with as well.

michaelaye · 2014-07-28T21:57:23Z

I didn't mean the physical display of million items, but from what I tried, Bokeh's zoom interactivity stays amazingly performant even so I'm sending it millions of items. I managed 12 million data points staying zoomable with acceptable performance. Whatever Bokeh does with it automatically worked very fine so far for me.

TomAugspurger · 2017-10-04T15:51:59Z

Closing in favor of #14130

jreback added the Visualization label May 16, 2014

jreback added this to the Someday milestone May 16, 2014

michaelaye mentioned this issue Sep 11, 2014

TimeSeries chart too restrictive on required DataFrame structure? bokeh/bokeh#1190

Closed

damianavila mentioned this issue Oct 6, 2014

replicate pandas dataframe plotting API bokeh/bokeh#582

Closed

vitiral mentioned this issue Jan 30, 2015

Make automated plotting of pandas easier, I need some help! bokeh/bokeh#1790

Closed

TomAugspurger closed this as completed Oct 4, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: implement Bokeh dataframe plotability #6962

Feature: implement Bokeh dataframe plotability #6962

michaelaye commented Apr 25, 2014

jtratner commented Apr 25, 2014

michaelaye commented Apr 25, 2014

jorisvandenbossche commented Apr 25, 2014

michaelaye commented Apr 25, 2014

jorisvandenbossche commented Apr 25, 2014

bryevdv commented Jul 28, 2014

michaelaye commented Jul 28, 2014

bryevdv commented Jul 28, 2014

michaelaye commented Jul 28, 2014

TomAugspurger commented Oct 4, 2017

Feature: implement Bokeh dataframe plotability #6962

Feature: implement Bokeh dataframe plotability #6962

Comments

michaelaye commented Apr 25, 2014

jtratner commented Apr 25, 2014

michaelaye commented Apr 25, 2014

jorisvandenbossche commented Apr 25, 2014

michaelaye commented Apr 25, 2014

jorisvandenbossche commented Apr 25, 2014

bryevdv commented Jul 28, 2014

michaelaye commented Jul 28, 2014

bryevdv commented Jul 28, 2014

michaelaye commented Jul 28, 2014

TomAugspurger commented Oct 4, 2017