Skip to content

Commit e4e9e94

Browse files
adamkleinwesm
authored andcommitted
more working on v0.6
1 parent 6ef6cc8 commit e4e9e94

File tree

7 files changed

+118
-61
lines changed

7 files changed

+118
-61
lines changed

doc/source/basics.rst

Lines changed: 26 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -380,6 +380,19 @@ maximum value for each column occurred:
380380
index=DateRange('1/1/2000', periods=1000))
381381
tsdf.apply(lambda x: x.index[x.dropna().argmax()])
382382
383+
You may also pass additional arguments and keyword arguments to the ``apply``
384+
method. For instance, consider the following function you would like to apply:
385+
386+
.. code-block:: python
387+
388+
def subtract_and_divide(x, sub, divide=1):
389+
return (x - sub) / divide
390+
391+
You may then apply this function as follows:
392+
393+
.. code-block:: python
394+
395+
df.apply(subtract_and_divide, args=(5,), divide=3)
383396
384397
Another useful feature is the ability to pass Series methods to carry out some
385398
Series operation on each column or row:
@@ -396,6 +409,12 @@ Series operation on each column or row:
396409
tsdf
397410
tsdf.apply(Series.interpolate)
398411
412+
Finally, ``apply`` takes an argument ``raw`` which is False by default, which
413+
converts each row or column into a Series before applying the function. When
414+
set to True, the passed function will instead receive an ndarray object, which
415+
has positive performance implications if you do not need the indexing
416+
functionality.
417+
399418
.. seealso::
400419

401420
The section on :ref:`GroupBy <groupby>` demonstrates related, flexible
@@ -673,11 +692,10 @@ produces the "keys" of the objects, namely:
673692

674693
Thus, for example:
675694

676-
.. ipython::
695+
.. ipython:: python
677696
678-
In [0]: for col in df:
679-
...: print col
680-
...:
697+
for col in df:
698+
print col
681699
682700
iteritems
683701
~~~~~~~~~
@@ -691,12 +709,11 @@ key-value pairs:
691709

692710
For example:
693711

694-
.. ipython::
712+
.. ipython:: python
695713
696-
In [0]: for item, frame in wp.iteritems():
697-
...: print item
698-
...: print frame
699-
...:
714+
for item, frame in wp.iteritems():
715+
print item
716+
print frame
700717
701718
.. _basics.sorting:
702719

doc/source/groupby.rst

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -178,7 +178,8 @@ number:
178178
s.groupby(level='second').sum()
179179
180180
As of v0.6, the aggregation functions such as ``sum`` will take the level
181-
parameter directly:
181+
parameter directly. Additionally, the resulting index will be named according
182+
to the chosen level:
182183

183184
.. ipython:: python
184185
@@ -424,8 +425,8 @@ Flexible ``apply``
424425

425426
Some operations on the grouped data might not fit into either the aggregate or
426427
transform categories. Or, you may simply want GroupBy to infer how to combine
427-
the results. For these, use the ``apply`` function, which can be substitute for
428-
both ``aggregate`` and ``transform`` in many standard use cases. However,
428+
the results. For these, use the ``apply`` function, which can be substituted
429+
for both ``aggregate`` and ``transform`` in many standard use cases. However,
429430
``apply`` can handle some exceptional use cases, for example:
430431

431432
.. ipython:: python

doc/source/indexing.rst

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -756,8 +756,11 @@ integer index. This is the inverse operation to ``set_index``
756756
df.reset_index()
757757
758758
The output is more similar to a SQL table or a record array. The names for the
759-
columns derived from the index are the ones stored in the ``names``
760-
attribute.
759+
columns derived from the index are the ones stored in the ``names`` attribute.
760+
761+
.. note::
762+
763+
The ``reset_index`` method used to be called ``delevel`` which is now deprecated.
761764

762765
Adding an ad hoc index
763766
~~~~~~~~~~~~~~~~~~~~~~

doc/source/merging.rst

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -75,9 +75,9 @@ new DataFrame as above:
7575

7676
.. ipython:: python
7777
78-
df = DataFrame(np.random.randn(8, 4))
78+
df = DataFrame(np.random.randn(8, 4), columns=['A','B','C','D'])
7979
df
80-
s = df.xs(5)
80+
s = df.xs(3)
8181
df.append(s, ignore_index=True)
8282
8383
@@ -115,7 +115,7 @@ passed DataFrame's index. This is best illustrated by example:
115115

116116
.. ipython:: python
117117
118-
df['key'] = ['foo', 'bar'] * 3
118+
df['key'] = ['foo', 'bar'] * 4
119119
to_join = DataFrame(randn(2, 2), index=['bar', 'foo'],
120120
columns=['j1', 'j2'])
121121
df

doc/source/reshaping.rst

Lines changed: 54 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@
77
import numpy as np
88
np.random.seed(123456)
99
from pandas import *
10+
from pandas.core.reshape import *
1011
import pandas.util.testing as tm
1112
randn = np.random.randn
1213
np.set_printoptions(precision=4, suppress=True)
@@ -61,19 +62,20 @@ To select out everything for variable ``A`` we could do:
6162
6263
df[df['variable'] == 'A']
6364
64-
But if we wished to do time series operations between variables, this will
65-
hardly do at all. This is really just a representation of a DataFrame whose
66-
``columns`` are formed from the unique ``variable`` values and ``index`` from
67-
the ``date`` values. To reshape the data into this form, use the ``pivot``
68-
function:
65+
But suppose we wish to do time series operations with the variables. A better
66+
representation would be where the ``columns`` are the unique variables and an
67+
``index`` of dates identifies individual observations. To reshape the data into
68+
this form, use the ``pivot`` function:
6969

7070
.. ipython:: python
7171
7272
df.pivot(index='date', columns='variable', values='value')
7373
74-
If the ``values`` argument is omitted, the resulting "pivoted" DataFrame will
75-
have :ref:`hierarchical columns <indexing.hierarchical>` with the top level
76-
being the set of value columns:
74+
If the ``values`` argument is omitted, and the input DataFrame has more than
75+
one column of values which are not used as column or index inputs to ``pivot``,
76+
then the resulting "pivoted" DataFrame will have :ref:`hierarchical columns
77+
<indexing.hierarchical>` whose topmost level indicates the respective value
78+
column:
7779

7880
.. ipython:: python
7981
@@ -90,22 +92,26 @@ You of course can then select subsets from the pivoted DataFrame:
9092
Note that this returns a view on the underlying data in the case where the data
9193
are homogeneously-typed.
9294

95+
.. _reshaping.stacking:
96+
9397
Reshaping by stacking and unstacking
9498
------------------------------------
9599

96100
Closely related to the ``pivot`` function are the related ``stack`` and
97101
``unstack`` functions currently available on Series and DataFrame. These
98-
functions are designed to tie together with ``MultiIndex`` objects (see the
102+
functions are designed to work together with ``MultiIndex`` objects (see the
99103
section on :ref:`hierarchical indexing <indexing.hierarchical>`). Here are
100104
essentially what these functions do:
101105

102-
- ``stack``: collapse level in ``axis=1`` to produce new object whose index
103-
has the collapsed columns as its lowest level
104-
- ``unstack``: inverse operation from ``stack``; "pivot" index level to
105-
produce reshaped DataFrame
106+
- ``stack``: "pivot" a level of the (possibly hierarchical) column labels,
107+
returning a DataFrame with an index with a new inner-most level of row
108+
labels.
109+
- ``unstack``: inverse operation from ``stack``: "pivot" a level of the
110+
(possibly hierarchical) row index to the column axis, producing a reshaped
111+
DataFrame with a new inner-most level of column labels.
106112

107-
Actually very hard to explain in words; the clearest way is by example. Let's
108-
take a prior example data set from the hierarchical indexing section:
113+
The clearest way to explain is by example. Let's take a prior example data set
114+
from the hierarchical indexing section:
109115

110116
.. ipython:: python
111117
@@ -151,10 +157,14 @@ the level numbers:
151157
152158
stacked.unstack('second')
153159
154-
These functions are very intelligent about handling missing data and do not
155-
expect each subgroup within the hierarchical index to have the same set of
156-
labels. They also can handle the index being unsorted (but you can make it
157-
sorted by calling ``sortlevel``, of course). Here is a more complex example:
160+
You may also stack or unstack more than one level at a time by passing a list
161+
of levels, in which case the end result is as if each level in the list were
162+
processed individually.
163+
164+
These functions are intelligent about handling missing data and do not expect
165+
each subgroup within the hierarchical index to have the same set of labels.
166+
They also can handle the index being unsorted (but you can make it sorted by
167+
calling ``sortlevel``, of course). Here is a more complex example:
158168

159169
.. ipython:: python
160170
@@ -181,6 +191,29 @@ the right thing:
181191
df[:3].unstack(0)
182192
df2.unstack(1)
183193
194+
.. _reshaping.melt:
195+
196+
Reshaping by Melt
197+
-----------------
198+
199+
The ``melt`` function found in ``pandas.core.reshape`` is useful to massage a
200+
DataFrame into a format where one or more columns are identifier variables,
201+
while all other columns, considered measured variables, are "pivoted" to the
202+
row axis, leaving just two non-identifier columns, "variable" and "value".
203+
204+
For instance,
205+
206+
.. ipython:: python
207+
208+
df = DataFrame({'first' : ['John', 'Mary'],
209+
'last' : ['Doe', 'Bo'],
210+
'height' : [5.5, 6.0],
211+
'weight' : [130, 150]})
212+
213+
df
214+
215+
melt(df, id_vars=['first', 'last'])
216+
184217
Combining with stats and GroupBy
185218
--------------------------------
186219

@@ -210,7 +243,7 @@ The function ``pandas.pivot_table`` can be used to create spreadsheet-style pivo
210243
tables. It takes a number of arguments
211244

212245
- ``data``: A DataFrame object
213-
- ``values``: column to aggregate
246+
- ``values``: a column or a list of columns to aggregate
214247
- ``rows``: list of columns to group by on the table rows
215248
- ``cols``: list of columns to group by on the table columns
216249
- ``aggfunc``: function to use for aggregation, defaulting to ``numpy.mean``
@@ -232,6 +265,7 @@ We can produce pivot tables from this data very easily:
232265
233266
pivot_table(df, values='D', rows=['A', 'B'], cols=['C'])
234267
pivot_table(df, values='D', rows=['B'], cols=['A', 'C'], aggfunc=np.sum)
268+
pivot_table(df, values=['D','E'], rows=['B'], cols=['A', 'C'], aggfunc=np.sum)
235269
236270
The result object is a DataFrame having potentially hierarchical indexes on the
237271
rows and columns. If the ``values`` column name is not given, the pivot table

doc/source/visualization.rst

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,8 @@ We use the standard convention for referencing the matplotlib API:
2828
2929
import matplotlib.pyplot as plt
3030
31+
.. _visualization.basic:
32+
3133
Basic plotting: ``plot``
3234
------------------------
3335

@@ -43,7 +45,7 @@ The ``plot`` method on Series and DataFrame is just a simple wrapper around
4345
ts.plot()
4446
4547
If the index consists of dates, it calls ``gca().autofmt_xdate()`` to try to
46-
format the x-axis nicely as per above. THe method takes a number of arguments
48+
format the x-axis nicely as per above. The method takes a number of arguments
4749
for controlling the look of the plot:
4850

4951
.. ipython:: python
@@ -62,6 +64,14 @@ On DataFrame, ``plot`` is a convenience to plot all of the columns with labels:
6264
@savefig frame_plot_basic.png width=4.5in
6365
plt.figure(); df.plot(); plt.legend(loc='best')
6466
67+
You may set the ``legend`` argument to ``False`` to hide the legend, which is
68+
shown by default.
69+
70+
.. ipython:: python
71+
72+
@savefig frame_plot_basic_noleg.png width=4.5in
73+
df.plot(legend=False)
74+
6575
Some other options are available, like plotting each Series on a different axis:
6676

6777
.. ipython:: python

doc/source/whatsnew/v0.6.0.txt

Lines changed: 15 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,12 @@ v.0.6.0 (November 25, 2011)
55

66
New Features
77
~~~~~~~~~~~~
8-
- Add ``melt`` function to ``pandas.core.reshape``
8+
- :ref:`Added <reshaping.melt>` ``melt`` function to ``pandas.core.reshape``
99
- :ref:`Added <groupby.multiindex>` ``level`` parameter to group by level in Series and DataFrame descriptive statistics (PR313_)
1010
- :ref:`Added <basics.head_tail>` ``head`` and ``tail`` methods to Series, analogous to to DataFrame (PR296_)
1111
- :ref:`Added <indexing.boolean>` ``Series.isin`` function which checks if each value is contained in a passed sequence (GH289_)
1212
- :ref:`Added <io.formatting>` ``float_format`` option to ``Series.to_string``
1313
- :ref:`Added <io.parse_dates>` ``skip_footer`` (GH291_) and ``converters`` (GH343_) options to ``read_csv`` and ``read_table``
14-
- Added proper, tested weighted least squares to standard and panel OLS (GH303_)
1514
- :ref:`Added <indexing.duplicate>` ``drop_duplicates`` and ``duplicated`` functions for removing duplicate DataFrame rows and checking for duplicate rows, respectively (GH319_)
1615
- :ref:`Implemented <dsintro.boolean>` operators '&', '|', '^', '-' on DataFrame (GH347_)
1716
- :ref:`Added <basics.stats>` ``Series.mad``, mean absolute deviation
@@ -33,33 +32,26 @@ New Features
3332
- :ref:`Added <io.html>` ``DataFrame.to_html`` for writing DataFrame to HTML (PR387_)
3433
- :ref:`Added <basics.dataframe>` support for MaskedArray data in DataFrame, masked values converted to NaN (PR396_)
3534
- :ref:`Added <visualization.box>` ``DataFrame.boxplot`` function (GH368_)
36-
- Can pass extra args, kwds to DataFrame.apply (GH376_)
37-
- Arithmetic methods like ``sum`` will attempt to sum dtype=object values by default instead of excluding them (GH382_)
38-
- Print level names in hierarchical index in Series repr (GH305_)
39-
- Return DataFrame when performing GroupBy on selected column and as_index=False (GH308_)
40-
- Can pass vector to ``on`` argument in ``DataFrame.join`` (GH312_)
41-
- Show legend by default in ``DataFrame.plot``, add ``legend`` boolean flag
42-
(GH324_) np.unique called on a Series faster (GH327_) "empty" combinations
43-
``Series.map`` significantly when passed elementwise Python function,
44-
motivated by PR355_ enhancements throughout the codebase (GH361_) with 3-5x
45-
better performance than ``np.apply_along_axis`` (GH309_) the passed function
46-
only requires an ndarray (GH309_)
47-
- Can pass multiple levels to ``stack`` and ``unstack`` (GH370_)
48-
- Can pass multiple values columns to ``pivot_table`` (GH381_)
49-
- Can call ``DataFrame.delevel`` with standard Index with name set (GH393_)
50-
- Use Series name in GroupBy for result index (GH363_)
51-
- MAYBE? Refactor Series/DataFrame stat methods to use common set of NaN-friendly function
35+
- :ref:`Can <basics.apply>` pass extra args, kwds to DataFrame.apply (GH376_)
36+
- :ref:`Implement <merging.multikey_join>` ``DataFrame.join`` with vector ``on`` argument (GH312_)
37+
- :ref:`Added <visualization.basic>` ``legend`` boolean flag to ``DataFrame.plot`` (GH324_)
38+
- :ref:`Can <reshaping.stacking>` pass multiple levels to ``stack`` and ``unstack`` (GH370_)
39+
- :ref:`Can <reshaping.pivot>` pass multiple values columns to ``pivot_table`` (GH381_)
40+
- :ref:`Use <groupby.multiindex>` Series name in GroupBy for result index (GH363_)
41+
- :ref:`Added <basics.apply>` ``raw`` option to ``DataFrame.apply`` for performance if only need ndarray (GH309_)
42+
- Added proper, tested weighted least squares to standard and panel OLS (GH303_)
5243

5344
Performance Enhancements
5445
~~~~~~~~~~~~~~~~~~~~~~~~
55-
- VBENCH Cythonized ``cache_readonly``, resulting in substantial micro-performance
56-
- VBENCH Improve performance of ``MultiIndex.from_tuples``
46+
- VBENCH Cythonized ``cache_readonly``, resulting in substantial micro-performance enhancements throughout the codebase (GH361_)
47+
- VBENCH Special Cython matrix iterator for applying arbitrary reduction operations with 3-5x better performance than `np.apply_along_axis` (GH309_)
48+
- VBENCH Improved performance of ``MultiIndex.from_tuples``
5749
- VBENCH Special Cython matrix iterator for applying arbitrary reduction operations
5850
- VBENCH + DOCUMENT Add ``raw`` option to ``DataFrame.apply`` for getting better performance when
5951
- VBENCH Faster cythonized count by level in Series and DataFrame (GH341_)
60-
- VBENCH? Significant GroupBy performance enhancement with multiple keys with many
61-
- VBENCH New Cython vectorized function ``map_infer`` speeds up ``Series.apply`` and
62-
- VBENCH Significantly improved performance of ``Series.order``, which also makes
52+
- VBENCH? Significant GroupBy performance enhancement with multiple keys with many "empty" combinations
53+
- VBENCH New Cython vectorized function ``map_infer`` speeds up ``Series.apply`` and ``Series.map`` significantly when passed elementwise Python function, motivated by (PR355_)
54+
- VBENCH Significantly improved performance of ``Series.order``, which also makes np.unique called on a Series faster (GH327_)
6355
- VBENCH Vastly improved performance of GroupBy on axes with a MultiIndex (GH299_)
6456

6557
.. _GH65: https://github.com/wesm/pandas/issues/65

0 commit comments

Comments
 (0)