-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
The apply function of a DataFrame is called twice on the first row #6753
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
this is expected. pandas needs to figure out if you are mutating the passed data in order to take a fast or slow path to execute the code. This is done in cython and the results are not currently reused in certain cases. |
Wouldn't it be worthwhile to document this behavior? |
sure. pls do a pull-request to put a |
Ok. I understand from the comments it is ongoing someway. So I just would like to draw your attention that it is context-sensitive as well. See: https://stackoverflow.com/questions/57501868/pandas-bug-applying-lambda-on-multiple-columns-row-name-has-strange-behavior?noredirect=1#comment101473064_57501868 What about just give up this fast/slow evaluation for a while until the resolution is reached? One/we have more time (5 years or so is already passed) to find the way out, a CONSISTENT CODE and fewer headaches out there...I am in the perfect position to say that, as I too frequently use massive datasets :) ... speed can be compensated, inconsistent behavior NEVER. |
I think this indeed is a big design miss and a known issue for quite some time. There should have been a mechanism to not do a double apply in older pandas versions by setting an argument, even though slow. I am really stuck between pandas version with gaps on v1.0.5 and v1.1* due to this and another bug. |
@rbhambriiit this was fixed in believe in 0.25 |
When calling the apply function of a DataFrame its called twice for the first row. The following code
prints
The result df2 is not affected by this behavior. However, when the function called by apply (identity in the example above) has side-effects, this can lead to very surprising and unexpected effects.
It would be good to have at least a note on this behavior in the documentation of the apply function.
This is related to #2656 and #2936, where this behaviour was reported for calling apply after groupby. Here it appears for a DataFrame.
INSTALLED VERSIONS
commit: None
python: 2.7.3.final.0
python-bits: 64
OS: Linux
OS-release: 3.2.0-4-amd64
machine: x86_64
processor:
byteorder: little
LC_ALL: None
LANG: en_US.utf8
pandas: 0.13.1
Cython: 0.15.1
numpy: 1.8.1
scipy: 0.13.3
statsmodels: None
IPython: 1.2.1
sphinx: 1.1.3
patsy: None
scikits.timeseries: None
dateutil: 2.2
pytz: 2014.2
bottleneck: None
tables: 2.3.1
numexpr: 2.0.1
matplotlib: 1.3.1
openpyxl: 1.5.8
xlrd: 0.6.1
xlwt: 0.7.4
xlsxwriter: None
sqlalchemy: None
lxml: None
bs4: None
html5lib: None
bq: None
apiclient: None
The text was updated successfully, but these errors were encountered: