Skip to content

The apply function of a DataFrame is called twice on the first row #6753

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
anton-d opened this issue Apr 1, 2014 · 6 comments
Closed

The apply function of a DataFrame is called twice on the first row #6753

anton-d opened this issue Apr 1, 2014 · 6 comments

Comments

@anton-d
Copy link
Contributor

anton-d commented Apr 1, 2014

When calling the apply function of a DataFrame its called twice for the first row. The following code

df = DataFrame({"a":["x", "y"], "b":[1,2]})
def identity(row):
  print tuple(row)
  return row
df2=df.apply(identity, axis=1)

prints

('x', 1)
('x', 1)
('y', 2)

The result df2 is not affected by this behavior. However, when the function called by apply (identity in the example above) has side-effects, this can lead to very surprising and unexpected effects.
It would be good to have at least a note on this behavior in the documentation of the apply function.

This is related to #2656 and #2936, where this behaviour was reported for calling apply after groupby. Here it appears for a DataFrame.

INSTALLED VERSIONS

commit: None
python: 2.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 3.2.0-4-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.utf8

pandas: 0.13.1
Cython: 0.15.1
numpy: 1.8.1
scipy: 0.13.3
statsmodels: None
IPython: 1.2.1
sphinx: 1.1.3
patsy: None
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.2
bottleneck: None
tables: 2.3.1
numexpr: 2.0.1
matplotlib: 1.3.1
openpyxl: 1.5.8
xlrd: 0.6.1
xlwt: 0.7.4
xlsxwriter: None
sqlalchemy: None
lxml: None
bs4: None
html5lib: None
bq: None
apiclient: None

@jreback
Copy link
Contributor

jreback commented Apr 1, 2014

this is expected. pandas needs to figure out if you are mutating the passed data in order to take a fast or slow path to execute the code. This is done in cython and the results are not currently reused in certain cases.

@jreback jreback closed this as completed Apr 1, 2014
@anton-d
Copy link
Contributor Author

anton-d commented Apr 1, 2014

Wouldn't it be worthwhile to document this behavior?
I did not expect this and it took me a while to figure out. (I was updating a variable inside the function called by apply, which obviously leads to unexpected results, because the function is called twice.)

@jreback
Copy link
Contributor

jreback commented Apr 1, 2014

sure. pls do a pull-request to put a warning sectionin groupby (where its most common), maybe a doc-string note in DataFrame.apply is in order as well

@csjeney
Copy link

csjeney commented Aug 15, 2019

Ok. I understand from the comments it is ongoing someway. So I just would like to draw your attention that it is context-sensitive as well. See: https://stackoverflow.com/questions/57501868/pandas-bug-applying-lambda-on-multiple-columns-row-name-has-strange-behavior?noredirect=1#comment101473064_57501868
This is really bad I believe (if you need both codes, please let me know).

What about just give up this fast/slow evaluation for a while until the resolution is reached? One/we have more time (5 years or so is already passed) to find the way out, a CONSISTENT CODE and fewer headaches out there...I am in the perfect position to say that, as I too frequently use massive datasets :) ... speed can be compensated, inconsistent behavior NEVER.

@rbhambriiit
Copy link

I think this indeed is a big design miss and a known issue for quite some time.

There should have been a mechanism to not do a double apply in older pandas versions by setting an argument, even though slow.

I am really stuck between pandas version with gaps on v1.0.5 and v1.1* due to this and another bug.

@jreback
Copy link
Contributor

jreback commented May 5, 2022

@rbhambriiit this was fixed in believe in 0.25

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants