Skip to content

Feature: implement Bokeh dataframe plotability #6962

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
michaelaye opened this issue Apr 25, 2014 · 10 comments
Closed

Feature: implement Bokeh dataframe plotability #6962

michaelaye opened this issue Apr 25, 2014 · 10 comments
Labels

Comments

@michaelaye
Copy link
Contributor

Bokeh specializes on large data, and is using pandas in some of their examples (e.g. here).
As pandas is built to work with large data-sets as well, I believe they would make an excellent couple!

However, I have tried to use the built-in dataframe plotting in the glucose notebook but without success, receiving either TypeErrors or KeyErrors, depending on how I initiate the plot:

data.head()

isig    glucose inrange
datetime            
2010-03-24 09:51:00  22.59   258     False
2010-03-24 09:56:00  22.52   260     False
2010-03-24 10:01:00  22.23   258     False
2010-03-24 10:06:00  21.56   254     False
2010-03-24 10:11:00  20.79   246     False
data.isig.plot()

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-23-14244d9a86ef> in <module>()
----> 1 data.isig.plot()

/Users/maye/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas-0.13.1_648_gad1f47d-py2.7-macosx-10.6-x86_64.egg/pandas/tools/plotting.pyc in plot_series(series, label, kind, use_index, rot, xticks, yticks, xlim, ylim, ax, style, grid, legend, logx, logy, secondary_y, **kwds)
   2114                      secondary_y=secondary_y, **kwds)
   2115 
-> 2116     plot_obj.generate()
   2117     plot_obj.draw()
   2118 

/Users/maye/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas-0.13.1_648_gad1f47d-py2.7-macosx-10.6-x86_64.egg/pandas/tools/plotting.pyc in generate(self)
    916     def generate(self):
    917         self._args_adjust()
--> 918         self._compute_plot_data()
    919         self._setup_subplots()
    920         self._make_plot()

/Users/maye/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas-0.13.1_648_gad1f47d-py2.7-macosx-10.6-x86_64.egg/pandas/tools/plotting.pyc in _compute_plot_data(self)
   1003         if is_empty:
   1004             raise TypeError('Empty {0!r}: no numeric data to '
-> 1005                             'plot'.format(numeric_data.__class__.__name__))
   1006 
   1007         self.data = numeric_data

TypeError: Empty 'Series': no numeric data to plot
data.plot(data.index.to_series(), data.isig)

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-24-e46a01c88c5e> in <module>()
----> 1 data.plot(data.index.to_series(), data.isig)

/Users/maye/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas-0.13.1_648_gad1f47d-py2.7-macosx-10.6-x86_64.egg/pandas/tools/plotting.pyc in plot_frame(frame, x, y, subplots, sharex, sharey, use_index, figsize, grid, legend, rot, ax, style, title, xlim, ylim, logx, logy, xticks, yticks, kind, sort_columns, fontsize, secondary_y, **kwds)
   1985             label = x if x is not None else frame.index.name
   1986             label = kwds.pop('label', label)
-> 1987             ser = frame[y]
   1988             ser.index.name = label
   1989 

/Users/maye/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas-0.13.1_648_gad1f47d-py2.7-macosx-10.6-x86_64.egg/pandas/core/frame.pyc in __getitem__(self, key)
   1646         if isinstance(key, (Series, np.ndarray, list)):
   1647             # either boolean or fancy integer index
-> 1648             return self._getitem_array(key)
   1649         elif isinstance(key, DataFrame):
   1650             return self._getitem_frame(key)

/Users/maye/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas-0.13.1_648_gad1f47d-py2.7-macosx-10.6-x86_64.egg/pandas/core/frame.pyc in _getitem_array(self, key)
   1690             return self.take(indexer, axis=0, convert=False)
   1691         else:
-> 1692             indexer = self.ix._convert_to_indexer(key, axis=1)
   1693             return self.take(indexer, axis=1, convert=True)
   1694 

/Users/maye/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pandas-0.13.1_648_gad1f47d-py2.7-macosx-10.6-x86_64.egg/pandas/core/indexing.pyc in _convert_to_indexer(self, obj, axis, is_setter)
   1065                     if isinstance(obj, tuple) and is_setter:
   1066                         return {'key': obj}
-> 1067                     raise KeyError('%s not in index' % objarr[mask])
   1068 
   1069                 return indexer

KeyError: "['22.59' '22.52' '22.23' ..., '29.06' '29.3' '30.8'] not in index"

I am guessing that we need to teach the DataFrame to check if the Bokeh framework is active and then point the plotter to the slightly different syntax for Bokeh plotting, which we would need to implement of course. Can somebody shed a light on what is going wrong in general in above examples?

Considering the possibility of having active zoomable plots over usually massively too long time-series with lots of data points in them I am very intrigued by this feature and would love to see it implemented. If further discussion in this issue can guide me on how to do the recognition part of the active plotting framework (or if somebody would take that on would be great), I am volunteering in wrapping the different Bokeh plotting syntaxes, so that e.g. the standard df.colname.plot() will work as well with Bokeh underneath.

Sounds good?

@jtratner
Copy link
Contributor

Sounds like it would be a big win to take advantage of Bokeh. For the
moment - can you print out the dtypes so we can narrow this down?

@michaelaye
Copy link
Contributor Author

yeah, wow, sorry..

isig is of type object, totally did not expect that, as the plotting works 'magically' with bokeh.
I now converted that:

isig       float64
glucose      int64
dtype: object

and the usual df.isig.plot() works now but opens a matplotlib plot still.

Interestingly, a plot of one column against the other still fails with Key- and Index out of bounds erros, I thought the syntax df.plot(col1, col2) was possible, unless I remember that wrong. If you think that should work, I can open another issue.

@jorisvandenbossche
Copy link
Member

@michaelaye If I understand correctly, what you show above is just a pandas plotting call (with matplotlib underneath as it is implemented now), so if there fails something, that has just something to do with pandas. I don't understand what this should have to do with bokeh?

But of course, the feature request to have bokeh integration is a legitimate request. But another discussion? Or do I miss something?

And df.plot(x='col1', y='col2') should normally work.

@michaelaye
Copy link
Contributor Author

First, I hope the title of the issue makes it clear that this is about bokeh integration. For me the first step was in trying it naively out with the bokeh session management redirecting stuff (I have no clue) to the notebook (i.e. output_notebook()). As I don't know how DataFrame.plot() is implemented, I was just assuming that above errors could be caused by bokeh messing with the output of graphics when pandas.DataFrame tries to plot something? Anyhow, as I had a MPL plot popping up with the standard df.colname.plot(), that's been cleared up. I still can't make the data plot with (x=, y=) so I'll make another issue out of it. This should be left for bokeh things.

@jorisvandenbossche
Copy link
Member

@michaelaye yep, understood, I just wanted it to make clear it the errors had nothing to do with the issue you wanted to raise (for the df.plot(x=, y=), note that it should be strings indicating a column name, not series/columns itself, as what you did in the example you gave above)

But, on the bokeh integration, I think the interesting route for now, is the matplotlib compatibility of bokeh (in that they try to render a matplotlib figure with bokeh). I think they are still working on this, but maybe you can try that out? (it is the function bokeh.pyplot.show_bokeh() that you have to call afer you created the figure I think)

I think for pandas itself this is a more interesting path, than fully implementing an alternative .plot using bokeh in the background instead of matplotlib.

@jreback jreback added this to the Someday milestone May 16, 2014
@bryevdv
Copy link

bryevdv commented Jul 28, 2014

Hey guys, right now our MPL compat comes from @jakevdp mplexporter module. At SciPy the MPL guys stated their intention to make a more fully featured and robust JSON import/export facility for MPL that libraries like Bokeh, mpld3, and plot.ly can use to provide more complete MPL support. We will certainly adopt this when it is ready but I don't know what their expected timeframe is.

I do think that native Bokeh integration is worth considering, you could have more interesting features liked linked panning and linked selections much easier. But we are happy to help out integrating Bokeh more closely any way we can, whatever route you pursue.

@michaelaye
Copy link
Contributor Author

Good to hear. The most interesting feature of Bokeh for me is having a higher performance for a lot of points and the ability to create a self-containing html with plots that stay interactive. (I know about MPL3D, but as Jake himself writes, that's not very performant when it comes to millions of points.)

@bryevdv
Copy link

bryevdv commented Jul 28, 2014

Bokeh is not going to be naively performant with millions of points, either, unless you are including the Abstract Rendering and downsampling work that is being integrated. But even with, say, a 1000x1000 canvas you have a million pixels... you are never going to want to actually plot millions of points without some kind of downsamping or pre-rasterization aggregation, but those are things that Bokeh could help with as well.

@michaelaye
Copy link
Contributor Author

I didn't mean the physical display of million items, but from what I tried, Bokeh's zoom interactivity stays amazingly performant even so I'm sending it millions of items. I managed 12 million data points staying zoomable with acceptable performance. Whatever Bokeh does with it automatically worked very fine so far for me.

@TomAugspurger
Copy link
Contributor

Closing in favor of #14130

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants