Skip to content

ENH: Avoid copying whole block for single block case #51435

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Aug 11, 2023

Conversation

phofl
Copy link
Member

@phofl phofl commented Feb 16, 2023

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

let's see if this passes ci

@phofl phofl marked this pull request as draft February 16, 2023 17:01
@phofl phofl marked this pull request as ready for review February 20, 2023 21:33
@phofl phofl requested a review from jorisvandenbossche March 1, 2023 19:35
@phofl phofl added this to the 2.0 milestone Mar 15, 2023
@phofl phofl modified the milestones: 2.0, 2.1 Apr 2, 2023
# Split blocks to only copy the columns we want to modify, only 1 block
if self.ndim == 2 and isinstance(indexer, tuple):
blk_loc = self.blklocs[indexer[1]]
if is_list_like(blk_loc) and blk_loc.ndim == 2:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how would you get blk_loc.ndim == 2?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would have to check, got test failures without checking this

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check this again?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forget to comment maybe_convert_indexer makes indexer[1] 2D in some cases which causes blk_loc.ndim=2

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What indexer[1] causes this? this strikes me as very weird

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_loc_setitem_reordering_with_all_true_indexer

That's one of the test that hits this if you want to take a look

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i added this check and a breakpoint on main and it didnt get hit in that test (or that test file). are there other changes in this PR that cause it to be reached?

elif not is_list_like(blk_loc):
# Keep dimension and copy data later
blk_loc = [blk_loc]
if len(blk_loc.shape) == 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blk_loc.ndim?

# Keep dimension and copy data later
blk_loc = [blk_loc]
if len(blk_loc.shape) == 0:
return self
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a shallow copy?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC we can only get here in setitem ops, so doesn't matter. Will check more closely though

# Split blocks to only copy the columns we want to modify, only 1 block
if self.ndim == 2 and isinstance(indexer, tuple):
blk_loc = self.blklocs[indexer[1]]
if is_list_like(blk_loc) and blk_loc.ndim == 2:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you check this again?

0, blk_loc, values
)
# first block equals values
self.blocks[0].setitem((indexer[0], np.arange(len(blk_loc))), value)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.arange(len(blk_loc)) could also be a slice(None) ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this, thx

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this has side effects. Rolling back for now

phofl and others added 3 commits June 28, 2023 15:34
…block_case

# Conflicts:
#	pandas/tests/copy_view/test_indexing.py
@phofl phofl merged commit 0582e35 into pandas-dev:main Aug 11, 2023
@phofl phofl deleted the cow_setitem_single_block_case branch August 11, 2023 09:17
mroeschke pushed a commit to mroeschke/pandas that referenced this pull request Aug 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants