Skip to content

ENH: Groupby.plot enhancement #8018

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
286 changes: 140 additions & 146 deletions doc/source/visualization.rst
Original file line number Diff line number Diff line change
Expand Up @@ -265,50 +265,7 @@ You can pass other keywords supported by matplotlib ``hist``. For example, horiz
See the :meth:`hist <matplotlib.axes.Axes.hist>` method and the
`matplotlib hist documentation <http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.hist>`__ for more.


The existing interface ``DataFrame.hist`` to plot histogram still can be used.

.. ipython:: python

plt.figure();

@savefig hist_plot_ex.png
df['A'].diff().hist()

.. ipython:: python
:suppress:

plt.close('all')

:meth:`DataFrame.hist` plots the histograms of the columns on multiple
subplots:

.. ipython:: python

plt.figure()

@savefig frame_hist_ex.png
df.diff().hist(color='k', alpha=0.5, bins=50)


.. versionadded:: 0.10.0

The ``by`` keyword can be specified to plot grouped histograms:

.. ipython:: python
:suppress:

plt.close('all')
plt.figure()
np.random.seed(123456)

.. ipython:: python

data = pd.Series(np.random.randn(1000))

@savefig grouped_hist.png
data.hist(by=np.random.randint(0, 4, 1000), figsize=(6, 4))

.. note:: The existing interface ``DataFrame.hist`` to plot histogram still can be used.

.. _visualization.box:

Expand Down Expand Up @@ -377,69 +334,7 @@ For example, horizontal and custom-positioned boxplot can be drawn by
See the :meth:`boxplot <matplotlib.axes.Axes.boxplot>` method and the
`matplotlib boxplot documenation <http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.boxplot>`__ for more.


The existing interface ``DataFrame.boxplot`` to plot boxplot still can be used.

.. ipython:: python
:suppress:

plt.close('all')
np.random.seed(123456)

.. ipython:: python
:okwarning:

df = pd.DataFrame(np.random.rand(10,5))
plt.figure();

@savefig box_plot_ex.png
bp = df.boxplot()

You can create a stratified boxplot using the ``by`` keyword argument to create
groupings. For instance,

.. ipython:: python
:suppress:

plt.close('all')
np.random.seed(123456)

.. ipython:: python
:okwarning:

df = pd.DataFrame(np.random.rand(10,2), columns=['Col1', 'Col2'] )
df['X'] = pd.Series(['A','A','A','A','A','B','B','B','B','B'])

plt.figure();

@savefig box_plot_ex2.png
bp = df.boxplot(by='X')

You can also pass a subset of columns to plot, as well as group by multiple
columns:

.. ipython:: python
:suppress:

plt.close('all')
np.random.seed(123456)

.. ipython:: python
:okwarning:

df = pd.DataFrame(np.random.rand(10,3), columns=['Col1', 'Col2', 'Col3'])
df['X'] = pd.Series(['A','A','A','A','A','B','B','B','B','B'])
df['Y'] = pd.Series(['A','B','A','B','A','B','A','B','A','B'])

plt.figure();

@savefig box_plot_ex3.png
bp = df.boxplot(column=['Col1','Col2'], by=['X','Y'])

.. ipython:: python
:suppress:

plt.close('all')
.. note:: The existing interface ``DataFrame.boxplot`` to plot boxplot still can be used.

.. _visualization.box.return:

Expand All @@ -455,45 +350,8 @@ When ``subplots=False`` / ``by`` is ``None``:
* if ``return_type`` is ``'both'`` a namedtuple containging the :class:`matplotlib Axes <matplotlib.axes.Axes>`
and :class:`matplotlib Lines <matplotlib.lines.Line2D>` is returned

When ``subplots=True`` / ``by`` is some column of the DataFrame:

* A dict of ``return_type`` is returned, where the keys are the columns
of the DataFrame. The plot has a facet for each column of
the DataFrame, with a separate box for each value of ``by``.

Finally, when calling boxplot on a :class:`Groupby` object, a dict of ``return_type``
is returned, where the keys are the same as the Groupby object. The plot has a
facet for each key, with each facet containing a box for each column of the
DataFrame.

.. ipython:: python
:okwarning:

np.random.seed(1234)
df_box = pd.DataFrame(np.random.randn(50, 2))
df_box['g'] = np.random.choice(['A', 'B'], size=50)
df_box.loc[df_box['g'] == 'B', 1] += 3

@savefig boxplot_groupby.png
bp = df_box.boxplot(by='g')

.. ipython:: python
:suppress:

plt.close('all')

Compare to:

.. ipython:: python
:okwarning:

@savefig groupby_boxplot_vis.png
bp = df_box.groupby('g').boxplot()

.. ipython:: python
:suppress:

plt.close('all')
When ``subplots=True``, a dict of ``return_type`` is returned, where the keys
are the columns of the DataFrame.

.. _visualization.area_plot:

Expand Down Expand Up @@ -806,6 +664,142 @@ explicit about how missing values are handled, consider using
:meth:`~pandas.DataFrame.fillna` or :meth:`~pandas.DataFrame.dropna`
before plotting.

.. _visualization.groupby:

Plotting with Grouped Data
--------------------------

.. versionadded:: 0.17
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

change to 0.18


You can plot grouped data easily by using ``GroupBy.plot`` method. It draws
each column as line categorized by groups.

.. ipython:: python

dfg = pd.DataFrame(np.random.rand(45, 4), columns=['A', 'B', 'C', 'D'])
dfg['by'] = ['Group 0', 'Group 1', 'Group 2'] * 15
grouped = dfg.groupby(by='by')

@savefig dfgropuby_line.png
grouped.plot();

.. ipython:: python
:suppress:

plt.close('all')

``SeriesGroupBy`` also supports plotting. It outputs each group in a single axes
by default. It supports ``line``, ``bar``, ``barh``, ``hist``, ``kde``,
``area``, ``box`` and ``pie`` charts.

.. ipython:: python

@savefig sgropuby_bar.png
grouped['A'].plot(kind='bar');

.. ipython:: python
:suppress:

plt.close('all')

.. ipython:: python

@savefig sgropuby_kde.png
grouped['A'].plot(kind='kde');

.. ipython:: python
:suppress:

plt.close('all')

Specify ``subplots=True`` to output in separate axes.

.. ipython:: python

@savefig sgropuby_box_subplots.png
grouped['A'].plot(kind='box', subplots=True);

.. ipython:: python
:suppress:

plt.close('all')

``layout`` keyword allows to specify the lauyout.

.. ipython:: python

@savefig sgropuby_pie_subplots.png
grouped['A'].plot(kind='pie', subplots=True, legend=False, layout=(2, 2));

.. ipython:: python
:suppress:

plt.close('all')

``DataFrameGroupBy.plot`` supports ``line``, ``bar``, ``barh``, ``hist``,
``kde``, ``area``, ``box``, ``scatter`` and ``hexbin`` plots.
Except ``scatter``, plots are outputs as subplots.

Following example shows stacked bar chart categorized by group.
Note that you can pass keywords which is supported in normal plots.

.. ipython:: python

@savefig dfgropuby_bar.png
grouped.plot(kind='bar', stacked=True);

.. ipython:: python
:suppress:

plt.close('all')

If you want to subplot by column, specify ``axis=1`` keyword.

.. ipython:: python

@savefig dfgropuby_bar_axis1.png
grouped.plot(kind='bar', axis=1);

.. ipython:: python
:suppress:

plt.close('all')

Scatter plot can be drawn in a single axes specifying ``subplots=False``.
Each group is colorized by separated colors.

.. note:: Hexbin cannot be plotted in a single axes.

.. ipython:: python

@savefig dfgropuby_scatter.png
grouped.plot(kind='scatter', x='A', y='B', subplots=False);

.. ipython:: python
:suppress:

plt.close('all')

Otherwise, it is drawn as subplots.

.. ipython:: python

@savefig dfgropuby_scatter_subplots.png
grouped.plot(kind='scatter', x='A', y='B', layout=(2, 2));

.. ipython:: python
:suppress:

plt.close('all')

.. note:: Prior to 0.17, ``GroupBy.plot`` results in each group to be plotted
on separate figures. To output the same result, you can do:

.. code-block:: python

for name, group in grouped:
group.plot()

.. _visualization.tools:

Plotting Tools
Expand Down
19 changes: 19 additions & 0 deletions doc/source/whatsnew/v0.17.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ Highlights include:
- Development support for benchmarking with the `Air Speed Velocity library <https://github.com/spacetelescope/asv/>`_ (:issue:`8316`)
- Support for reading SAS xport files, see :ref:`here <whatsnew_0170.enhancements.sas_xport>`
- Removal of the automatic TimeSeries broadcasting, deprecated since 0.8.0, see :ref:`here <whatsnew_0170.prior_deprecations>`
- GroupBy plot enhancement, see :ref:`here <whatsnew_0170.groupbyplot>` (:issue:`8018`)

Check the :ref:`API Changes <whatsnew_0170.api>` and :ref:`deprecations <whatsnew_0170.deprecations>` before updating.

Expand Down Expand Up @@ -205,6 +206,24 @@ The support math functions are `sin`, `cos`, `exp`, `log`, `expm1`, `log1p`,
These functions map to the intrinsics for the NumExpr engine. For Python
engine, they are mapped to NumPy calls.

.. _whatsnew_0170.groupbyplot:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move to 0.18

Plotting with Grouped Data
^^^^^^^^^^^^^^^^^^^^^^^^^^

``GroupBy.plot`` now can output grouped plot in a single figure,
supporting the same kinds as ``DataFrame`` and ``Series``.

.. ipython:: python

dfg = pd.DataFrame(np.random.rand(45, 4), columns=['A', 'B', 'C', 'D'])
dfg['by'] = ['Group 0', 'Group 1', 'Group 2'] * 15
grouped = dfg.groupby(by='by')

grouped.plot();

To see the output and its detail, refer to :ref:`here <visualization.groupby>`.

.. _whatsnew_0170.enhancements.other:

Other enhancements
Expand Down
6 changes: 4 additions & 2 deletions pandas/core/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -3475,8 +3475,10 @@ def count(self):
return self._wrap_agged_blocks(data.items, list(blk))


from pandas.tools.plotting import boxplot_frame_groupby
DataFrameGroupBy.boxplot = boxplot_frame_groupby
import pandas.tools.plotting as plotting
DataFrameGroupBy.boxplot = plotting.boxplot_frame_groupby
SeriesGroupBy.plot = plotting.plot_grouped_series
DataFrameGroupBy.plot = plotting.plot_grouped_frame


class PanelGroupBy(NDFrameGroupBy):
Expand Down
Loading