The apply function of a DataFrame is called twice on the first row #6753

anton-d · 2014-04-01T09:34:17Z

When calling the apply function of a DataFrame its called twice for the first row. The following code

df = DataFrame({"a":["x", "y"], "b":[1,2]})
def identity(row):
  print tuple(row)
  return row
df2=df.apply(identity, axis=1)

prints

('x', 1)
('x', 1)
('y', 2)

The result df2 is not affected by this behavior. However, when the function called by apply (identity in the example above) has side-effects, this can lead to very surprising and unexpected effects.
It would be good to have at least a note on this behavior in the documentation of the apply function.

This is related to #2656 and #2936, where this behaviour was reported for calling apply after groupby. Here it appears for a DataFrame.

INSTALLED VERSIONS

commit: None
python: 2.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 3.2.0-4-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.utf8

pandas: 0.13.1
Cython: 0.15.1
numpy: 1.8.1
scipy: 0.13.3
statsmodels: None
IPython: 1.2.1
sphinx: 1.1.3
patsy: None
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.2
bottleneck: None
tables: 2.3.1
numexpr: 2.0.1
matplotlib: 1.3.1
openpyxl: 1.5.8
xlrd: 0.6.1
xlwt: 0.7.4
xlsxwriter: None
sqlalchemy: None
lxml: None
bs4: None
html5lib: None
bq: None
apiclient: None

jreback · 2014-04-01T09:57:49Z

this is expected. pandas needs to figure out if you are mutating the passed data in order to take a fast or slow path to execute the code. This is done in cython and the results are not currently reused in certain cases.

anton-d · 2014-04-01T10:33:35Z

Wouldn't it be worthwhile to document this behavior?
I did not expect this and it took me a while to figure out. (I was updating a variable inside the function called by apply, which obviously leads to unexpected results, because the function is called twice.)

jreback · 2014-04-01T10:36:49Z

sure. pls do a pull-request to put a warning sectionin groupby (where its most common), maybe a doc-string note in DataFrame.apply is in order as well

Related issues are pandas-dev#2656, pandas-dev#2936 and pandas-dev#6753.

csjeney · 2019-08-15T06:44:43Z

Ok. I understand from the comments it is ongoing someway. So I just would like to draw your attention that it is context-sensitive as well. See: https://stackoverflow.com/questions/57501868/pandas-bug-applying-lambda-on-multiple-columns-row-name-has-strange-behavior?noredirect=1#comment101473064_57501868
This is really bad I believe (if you need both codes, please let me know).

What about just give up this fast/slow evaluation for a while until the resolution is reached? One/we have more time (5 years or so is already passed) to find the way out, a CONSISTENT CODE and fewer headaches out there...I am in the perfect position to say that, as I too frequently use massive datasets :) ... speed can be compensated, inconsistent behavior NEVER.

rbhambriiit · 2022-05-04T14:00:09Z

I think this indeed is a big design miss and a known issue for quite some time.

There should have been a mechanism to not do a double apply in older pandas versions by setting an argument, even though slow.

I am really stuck between pandas version with gaps on v1.0.5 and v1.1* due to this and another bug.

jreback · 2022-05-05T00:26:26Z

@rbhambriiit this was fixed in believe in 0.25

jreback closed this as completed Apr 1, 2014

jreback added Docs labels Apr 1, 2014

jreback added this to the 0.14.0 milestone Apr 1, 2014

jreback reopened this Apr 1, 2014

anton-d mentioned this issue Apr 1, 2014

DOC: documented that .apply(func) executes func twice on the first time #6756

Merged

anton-d closed this as completed Apr 1, 2014

jeffreystarr pushed a commit to jeffreystarr/pandas that referenced this issue Apr 28, 2014

DOC: documented that .apply(func) executes func twice on the first time

ec2a8a1

Related issues are pandas-dev#2656, pandas-dev#2936 and pandas-dev#6753.

cpcloud mentioned this issue Jul 12, 2014

groupby - apply applies to the first group twice #7739

Closed

jreback mentioned this issue May 28, 2015

current implementation of DataFrame.apply where passed function has side effects is a real "gotcha" #10222

Closed

RodrigoNeves95 mentioned this issue Nov 26, 2018

Duplicated tasks when processing dataframes dask/dask#4244

Closed

fjetter mentioned this issue Jan 26, 2019

ENH: Only apply first group once in fast GroupBy.apply #24748

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The apply function of a DataFrame is called twice on the first row #6753

The apply function of a DataFrame is called twice on the first row #6753

anton-d commented Apr 1, 2014

jreback commented Apr 1, 2014

anton-d commented Apr 1, 2014

jreback commented Apr 1, 2014

csjeney commented Aug 15, 2019

rbhambriiit commented May 4, 2022

jreback commented May 5, 2022

The apply function of a DataFrame is called twice on the first row #6753

The apply function of a DataFrame is called twice on the first row #6753

Comments

anton-d commented Apr 1, 2014

INSTALLED VERSIONS

jreback commented Apr 1, 2014

anton-d commented Apr 1, 2014

jreback commented Apr 1, 2014

csjeney commented Aug 15, 2019

rbhambriiit commented May 4, 2022

jreback commented May 5, 2022