Skip to content

Commit ce04b5b

Browse files
committed
ENH: this is a pipe
1 parent efc4a08 commit ce04b5b

File tree

7 files changed

+193
-41
lines changed

7 files changed

+193
-41
lines changed

doc/source/basics.rst

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -624,6 +624,77 @@ We can also pass infinite values to define the bins:
624624
Function application
625625
--------------------
626626

627+
To apply your own or another library's functions to pandas objects,
628+
you should be aware of the three methods below. The appropriate
629+
method to use depends on whether your function expects to operate
630+
on an entire DataFrame or Series, row- or column-wise, or elementwise.
631+
632+
1. `Tablewise Function Application`_: :meth:`~DataFrame.pipe`
633+
2. `Row or Column-wise Function Application`_: :meth:`~DataFrame.apply`
634+
3. Elementwise_ function application: :meth:`~DataFrame.applymap`
635+
636+
.. _pipe:
637+
638+
Tablewise Function Application
639+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
640+
641+
.. versionadded:: 0.16.2
642+
643+
DataFrames and Series can of course just be passed into functions.
644+
However, if the function needs to be called in a chain, consider using the :meth:`~DataFrame.pipe` method.
645+
Compare the following
646+
647+
.. code-block:: python
648+
649+
# f, g, and h are functions taking and returning DataFrames
650+
>>> f(g(h(df), arg1=1), arg2=2, arg3=3)
651+
652+
with the equivalent
653+
654+
.. code-block:: python
655+
656+
>>> (df.pipe(h)
657+
.pipe(g, arg1=1)
658+
.pipe(f, arg2=2, arg3=3)
659+
)
660+
661+
Pandas encourages the second style, which is known as method chaining.
662+
``pipe`` makes it easy to use your own or another library's functions
663+
in method chains, alongside pandas' methods.
664+
665+
In the example above, the functions ``f``, ``g``, and ``h`` each expected the DataFrame as the first positional argument.
666+
What if the function you wish to apply takes its data as, say, the second argument?
667+
In this case, provide ``pipe`` with a tuple of ``(callable, data_keyword)``.
668+
``.pipe`` will route the DataFrame to the argument specified in the tuple.
669+
670+
For example, we can fit a regression using statsmodels. Their API expects a formula first and a DataFrame as the second argument, ``data``. We pass in the function, keyword pair ``(sm.poisson, 'data')`` to ``pipe``:
671+
672+
.. ipython:: python
673+
674+
import statsmodels.formula.api as sm
675+
676+
bb = pd.read_csv('data/baseball.csv', index_col='id')
677+
678+
(bb.query('h > 0')
679+
.assign(ln_h = lambda df: np.log(df.h))
680+
.pipe((sm.poisson, 'data'), 'hr ~ ln_h + year + g + C(lg)')
681+
.fit()
682+
.summary()
683+
)
684+
685+
The pipe method is inspired by unix pipes and more recently dplyr_ and magrittr_, which
686+
have introduced the popular ``(%>%)`` (read pipe) operator for R_.
687+
The implementation of ``pipe`` here is quite clean and feels right at home in python.
688+
We encourage you to view the source code (``pd.DataFrame.pipe??`` in IPython).
689+
690+
.. _dplyr: https://github.com/hadley/dplyr
691+
.. _magrittr: https://github.com/smbache/magrittr
692+
.. _R: http://www.r-project.org
693+
694+
695+
Row or Column-wise Function Application
696+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
697+
627698
Arbitrary functions can be applied along the axes of a DataFrame or Panel
628699
using the :meth:`~DataFrame.apply` method, which, like the descriptive
629700
statistics methods, take an optional ``axis`` argument:
@@ -678,6 +749,7 @@ Series operation on each column or row:
678749
tsdf
679750
tsdf.apply(pd.Series.interpolate)
680751
752+
681753
Finally, :meth:`~DataFrame.apply` takes an argument ``raw`` which is False by default, which
682754
converts each row or column into a Series before applying the function. When
683755
set to True, the passed function will instead receive an ndarray object, which
@@ -690,6 +762,8 @@ functionality.
690762
functionality for grouping by some criterion, applying, and combining the
691763
results into a Series, DataFrame, etc.
692764

765+
.. _Elementwise:
766+
693767
Applying elementwise Python functions
694768
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
695769

doc/source/faq.rst

Lines changed: 0 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -89,46 +89,6 @@ representation; i.e., 1KB = 1024 bytes).
8989

9090
See also :ref:`Categorical Memory Usage <categorical.memory>`.
9191

92-
.. _ref-monkey-patching:
93-
94-
Adding Features to your pandas Installation
95-
-------------------------------------------
96-
97-
pandas is a powerful tool and already has a plethora of data manipulation
98-
operations implemented, most of them are very fast as well.
99-
It's very possible however that certain functionality that would make your
100-
life easier is missing. In that case you have several options:
101-
102-
1) Open an issue on `Github <https://github.com/pydata/pandas/issues/>`__ , explain your need and the sort of functionality you would like to see implemented.
103-
2) Fork the repo, Implement the functionality yourself and open a PR
104-
on Github.
105-
3) Write a method that performs the operation you are interested in and
106-
Monkey-patch the pandas class as part of your IPython profile startup
107-
or PYTHONSTARTUP file.
108-
109-
For example, here is an example of adding an ``just_foo_cols()``
110-
method to the dataframe class:
111-
112-
::
113-
114-
import pandas as pd
115-
def just_foo_cols(self):
116-
"""Get a list of column names containing the string 'foo'
117-
118-
"""
119-
return [x for x in self.columns if 'foo' in x]
120-
121-
pd.DataFrame.just_foo_cols = just_foo_cols # monkey-patch the DataFrame class
122-
df = pd.DataFrame([list(range(4))], columns=["A","foo","foozball","bar"])
123-
df.just_foo_cols()
124-
del pd.DataFrame.just_foo_cols # you can also remove the new method
125-
126-
127-
Monkey-patching is usually frowned upon because it makes your code
128-
less portable and can cause subtle bugs in some circumstances.
129-
Monkey-patching existing methods is usually a bad idea in that respect.
130-
When used with proper care, however, it's a very useful tool to have.
131-
13292

13393
.. _ref-scikits-migration:
13494

doc/source/whatsnew/v0.16.2.txt

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -22,6 +22,56 @@ Check the :ref:`API Changes <whatsnew_0162.api>` before updating.
2222
New features
2323
~~~~~~~~~~~~
2424

25+
We've introduced a new method :meth:`DataFrame.pipe`. As suggested by the name, ``pipe``
26+
should be used to pipe data through a chain of function calls.
27+
The goal is to avoid confusing nested function calls like
28+
29+
.. code-block:: python
30+
31+
# df is a DataFrame, f, g, and h are functions taking and returing DataFrames
32+
f(g(h(df), arg1=1), arg2=2, arg3=3)
33+
34+
The logic flows from inside out, and function names are separated from their keyword arguments.
35+
This can be rewritten as
36+
37+
.. code-block:: python
38+
39+
(df.pipe(h)
40+
.pipe(g, arg1=1)
41+
.pipe(f, arg2=2)
42+
)
43+
44+
Now the both the code and the logic flow from top to bottom. Keyword arguments are next to
45+
their functions. Overall the code is much more readable.
46+
47+
In the example above, the functions ``f``, ``g``, and ``h`` each expected the DataFrame as the first positional argument.
48+
When the funciton you wish to apply takes its data anywhere other than the first argument, pass a tuple
49+
of ``(funciton, keyword)`` indicating where the DataFrame should flow. For example:
50+
51+
.. ipython:: python
52+
53+
import statsmodels.formula.api as sm
54+
55+
bb = pd.read_csv('data/baseball.csv', index_col='id')
56+
57+
# sm.poisson takes (formula, data)
58+
(bb.query('h > 0')
59+
.assign(ln_h = lambda df: np.log(df.h))
60+
.pipe((sm.poisson, 'data'), 'hr ~ ln_h + year + g + C(lg)')
61+
.fit()
62+
.summary()
63+
)
64+
65+
The pipe method is inspired by unix pipes, which stream text through
66+
processes. More recently dplyr_ and magrittr_ have introduced the
67+
popular ``(%>%)`` pipe operator for R_.
68+
69+
See the :ref:`documentation <basics.pipe>` for more. (:issue:`10129`)
70+
71+
.. _dplyr: https://github.com/hadley/dplyr
72+
.. _magrittr: https://github.com/smbache/magrittr
73+
.. _R: http://www.r-project.org
74+
2575
.. _whatsnew_0162.enhancements.other:
2676

2777
Other enhancements

doc/source/whatsnew/v0.17.0.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -21,6 +21,7 @@ Check the :ref:`API Changes <whatsnew_0170.api>` and :ref:`deprecations <whatsne
2121
New features
2222
~~~~~~~~~~~~
2323

24+
2425
.. _whatsnew_0170.enhancements.other:
2526

2627
Other enhancements

pandas/__init__.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -57,4 +57,3 @@
5757
from pandas.util.print_versions import show_versions
5858
import pandas.util.testing
5959

60-

pandas/core/generic.py

Lines changed: 49 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2044,6 +2044,55 @@ def sample(self, n=None, frac=None, replace=False, weights=None, random_state=No
20442044
locs = rs.choice(axis_length, size=n, replace=replace, p=weights)
20452045
return self.take(locs, axis=axis)
20462046

2047+
_shared_docs['pipe'] = ("""
2048+
Apply func(self, *args, **kwargs)
2049+
2050+
.. versionadded:: 0.16.2
2051+
2052+
Parameters
2053+
----------
2054+
func : function
2055+
function to apply to the %(klass)s.
2056+
``args``, and ``kwargs`` are passed into ``func``.
2057+
Alternatively a ``(callable, data_keyword)`` tuple where
2058+
``data_keyword`` is a string indicating the keyword of
2059+
``callable`` that expects the %(klass)s.
2060+
args : positional arguments passed into ``func``
2061+
kwargs : a dictionary of keyword arguments passed into ``func``.
2062+
2063+
Returns
2064+
-------
2065+
object : whatever the return type of ``func`` is.
2066+
2067+
Notes
2068+
-----
2069+
2070+
Use ``.pipe`` when chaining together functions that expect
2071+
on Series or DataFrames. Instead of writing
2072+
2073+
>>> f(g(h(df), arg1=a), arg2=b, arg3=c)
2074+
2075+
You can write
2076+
2077+
>>> (df.pipe(h)
2078+
.pipe(g, arg1=a)
2079+
.pipe(f, arg2=b, arg3=c)
2080+
)
2081+
2082+
See Also
2083+
--------
2084+
pandas.DataFrame.apply
2085+
pandas.DataFrame.applymap
2086+
pandas.Series.map
2087+
""")
2088+
@Appender(_shared_docs['pipe'] % _shared_doc_kwargs)
2089+
def pipe(self, func, *args, **kwargs):
2090+
if isinstance(func, tuple):
2091+
func, target = func
2092+
kwargs[target] = self
2093+
return func(*args, **kwargs)
2094+
else:
2095+
return func(self, *args, **kwargs)
20472096

20482097
#----------------------------------------------------------------------
20492098
# Attribute access

pandas/tests/test_generic.py

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1649,6 +1649,25 @@ def test_describe_raises(self):
16491649
with tm.assertRaises(NotImplementedError):
16501650
tm.makePanel().describe()
16511651

1652+
def test_pipe(self):
1653+
df = DataFrame({'A': [1, 2, 3]})
1654+
f = lambda x, y: x ** y
1655+
result = df.pipe(f, 2)
1656+
expected = DataFrame({'A': [1, 4, 9]})
1657+
self.assert_frame_equal(result, expected)
1658+
1659+
result = df.A.pipe(f, 2)
1660+
self.assert_series_equal(result, expected.A)
1661+
1662+
def test_pipe_tuple(self):
1663+
df = DataFrame({'A': [1, 2, 3]})
1664+
f = lambda x, y: y
1665+
result = df.pipe((f, 'y'), 0)
1666+
self.assert_frame_equal(result, df)
1667+
1668+
result = df.A.pipe((f, 'y'), 0)
1669+
self.assert_series_equal(result, df.A)
1670+
16521671
if __name__ == '__main__':
16531672
nose.runmodule(argv=[__file__, '-vvs', '-x', '--pdb', '--pdb-failure'],
16541673
exit=False)

0 commit comments

Comments
 (0)