-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
API: honor copy=True when passing dict to DataFrame #38939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
API: honor copy=True when passing dict to DataFrame #38939
Conversation
…way-init-no-consolidate
…andas into myway-init-no-consolidate
…way-init-no-consolidate
…way-init-no-consolidate
…way-init-no-consolidate
…way-init-no-consolidate
@@ -1297,7 +1297,7 @@ def test_strings_to_numbers_comparisons_raises(self, compare_operators_no_eq_ne) | |||
f(df, 0) | |||
|
|||
def test_comparison_protected_from_errstate(self): | |||
missing_df = tm.makeDataFrame() | |||
missing_df = tm.makeDataFrame()._consolidate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are already doing this on the creation (I understand by find this fragile)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you are already doing this on the creation
i dont think so
(I understand by find this fragile)
I agree. Silver lining: finding the existing fragility.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actuallly you are doing this on creation, maybe you recently added. prefer NOT to do this in the tests proper (in pandas/testing is ok)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
updated to do the consolidation in tm.makeDataFrame
@@ -765,7 +765,7 @@ def test_operators_timedelta64(self): | |||
def test_std_timedelta64_skipna_false(self): | |||
# GH#37392 | |||
tdi = pd.timedelta_range("1 Day", periods=10) | |||
df = DataFrame({"A": tdi, "B": tdi}) | |||
df = DataFrame({"A": tdi, "B": tdi}, copy=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would copy matter here at all? isn't tdi supposed to be immutable? we need to always copy no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would copy matter here at all?
there are a couple of inter-twined issues here. the copy is necesary because we need df["A"] and df["B"] to not share data. We could do df = DataFrame({"A": tdi.values, "B": tdi.values.copy()})
isn't tdi supposed to be immutable?
yes, but thats kinda orthogonal (maybe 45°). e.g. in master
tdi = pd.timedelta_range("1 Day", periods=3)
df = pd.DataFrame(tdi)
df.iloc[0, 0] = pd.NaT
>>> tdi
TimedeltaIndex([NaT, '2 days', '3 days'], dtype='timedelta64[ns]', freq='D')
we need to always copy no?
i think we do something like this for dt64tz motivated by a similar immutability concern. ill see if that can be adapted
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this copy=True can be removed now?
pandas/core/internals/managers.py
Outdated
result[rl] = blk.iget((i, loc)) | ||
out = blk.iget((i, loc)) | ||
if is_dtype_equal(blk.dtype, dtype) and dtype == "m8[ns]": | ||
# FIXME: kludge for NaT -> tdnat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i know you mention, but this is very kludgy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yah, im still looking for an alternative here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
de-kludged. i need to double check, but think this may solve a separate silent bug. if so ill make a separate PR for that
I am leary of this consolidation (or rather non-consolidation). it is going to be quite fragile. why is consolidation tied to the dict copy at all? |
…way-init-no-consolidate
…way-init-no-consolidate
…way-init-no-consolidate
…way-init-no-consolidate
@TomAugspurger can you weigh in on why this is worthwhile |
I think just having the |
@@ -1297,7 +1297,7 @@ def test_strings_to_numbers_comparisons_raises(self, compare_operators_no_eq_ne) | |||
f(df, 0) | |||
|
|||
def test_comparison_protected_from_errstate(self): | |||
missing_df = tm.makeDataFrame() | |||
missing_df = tm.makeDataFrame()._consolidate() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actuallly you are doing this on creation, maybe you recently added. prefer NOT to do this in the tests proper (in pandas/testing is ok)
df.iloc[0, 1] = 0 | ||
if not copy: | ||
# setitem on non-EA values preserves views | ||
assert sum(x is c for x in df._mgr.arrays) == 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, yes, that clarifies
Co-authored-by: Joris Van den Bossche <[email protected]>
@jbrockmendel can you do a final merge of master here to be sure it's all passing with latest master? @jreback all good for you? (your "approve" is from before the behaviour changed to preserve the copy=False behaviour for dicts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good. can you merge master, ping on greenish
copy : bool or None, default None | ||
Copy data from inputs. | ||
For dict data, the default of None behaves like ``copy=True``. For DataFrame | ||
or 2d ndarray input, the default of None behaves like ``copy=False``. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a versionchanged 1.3
pandas/core/frame.py
Outdated
): | ||
|
||
if copy is None: | ||
# GH#38939 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this line (as you mention below)
hmm, this looks to be failing a bunch of things |
thanks @jbrockmendel this has been a long road, and @TomAugspurger for the original version. |
xref #34872 cc @TomAugspurger used the test_dict_nocopy you wrote but it ended up pretty mangled