Skip to content

BUG: assign consensus name to index union in array case GH13475 #35338

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 7, 2020

Conversation

iamlemec
Copy link
Contributor

@iamlemec iamlemec commented Jul 18, 2020

Copy link
Member

@arw2019 arw2019 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix!

We will need a test to show how this fixes the bug (based on a quick look pandas/tests/reshape/test_concat.py would be a good place for it)

@iamlemec
Copy link
Contributor Author

Sure thing! Just added in a test ensuring a common index name is preserved with concat.

Copy link
Member

@arw2019 arw2019 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One comment, otherwise lgtm


result = pd.concat([frame1, frame2], axis=1)

assert result.index.name == "idx"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you want to hard code the expected result and check equality with tm.assert_frame_equal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah indeed, that's much better. just pushed a remedy.

Copy link
Member

@arw2019 arw2019 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok great!

All that's left is a whatsnew entry. I think this may go to 1.2 so we might need to wait until the 1.1 gets branched to finalize (#34730 & #35315)

I'll ping here when that's done if you like

@iamlemec
Copy link
Contributor Author

Sounds great, thanks!

Copy link
Member

@simonjayhawkins simonjayhawkins left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @iamlemec for the PR. generally lgtm pending release note.

In the meantime, maybe could parameterize test for other name combinations (different names and missing names as well as same names, may need to concat three dataframes to get better coverage of permutations)

@@ -220,7 +220,8 @@ def conv(i):
index = indexes[0]
for other in indexes[1:]:
if not index.equals(other):
return _unique_indices(indexes)
index = _unique_indices(indexes)
break
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a personal preference, but for me something like

        if not all(index.equals(other) for other in indexes[1:]):
            index = _unique_indices(indexes)

is easier to grok instead of introducing a break.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, yup that's much more readable. hadn't realized that all will short-circuit. will add that in plus some additional tests.

@iamlemec
Copy link
Contributor Author

Just added the new tests. I figured since we're primarily testing the index name functionality, it's okay to define the base frame then use rename_axis to differentiate.

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this also needs a whatsnew note, this would be for 1.2 (the whatsnew is not pushed yet so will have to do a bit later)

@jreback jreback added the Reshaping Concat, Merge/Join, Stack/Unstack, Explode label Jul 20, 2020
@@ -1279,6 +1279,33 @@ def test_concat_ignore_index(self, sort):

tm.assert_frame_equal(v1, expected)

concat_index_names = [
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we would want to put this inside pytest.mark.parametrize rather than defining a separate variable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

all integrated now

frames = [pd.DataFrame({c: vals}, index=i) for i, c in zip(indices, cols)]
result = pd.concat(frames, axis=1)

exp_ind = pd.Index(rows, name=output_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit pick: might want to define this inside the frames constructor rather than in a separate variable

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed, best to keep result / expected separate

@arw2019
Copy link
Member

arw2019 commented Jul 29, 2020

this also needs a whatsnew note, this would be for 1.2 (the whatsnew is not pushed yet so will have to do a bit later)

@iamlemec 1.2 whatsnew is on master now

@iamlemec
Copy link
Contributor Author

Congrats on the release!

Actually, I was going over the testing code one last time, and I realized it doesn't actually test the new behavior (put another way, the current master will pass the test). That's because the bug only kicks in when the indices aren't equal (numerically), but they are equal in the test. I'm going to change it so the indices are only partially overlapping. Should I still also test the case where they are numerically equal?

One more thing, which I think hasn't been brought up explicitly. This fix will also affect the behavior of the DataFrame constructor in the same way it affects concat with axis=1. Not sure if this influences the testing requirements.

@jreback jreback added this to the 1.2 milestone Aug 6, 2020
@jreback
Copy link
Contributor

jreback commented Aug 6, 2020

One more thing, which I think hasn't been brought up explicitly. This fix will also affect the behavior of the DataFrame constructor in the same way it affects concat with axis=1. Not sure if this influences the testing requirements.

can you show an example?

@iamlemec
Copy link
Contributor Author

iamlemec commented Aug 7, 2020

Sure thing. The original issue only arises when it hits the "array" case of union_indexes, which means we need to send it a plain Index, so I'm using string index labels here:

s1 = pd.Series([1, 2], index=pd.Index(['a', 'b'], name='idx'))
s2 = pd.Series([2, 3], index=pd.Index(['b', 'c'], name='idx'))
pd.DataFrame({'a': s1, 'b': s2})

On master, I'm getting this yielding a DataFrame whose index has no name. With patch, it's named 'idx'.

@jreback jreback added Bug Constructors Series/DataFrame/Index/pd.array Constructors Index Related to the Index class or subclasses labels Aug 7, 2020
@jreback
Copy link
Contributor

jreback commented Aug 7, 2020

Sure thing. The original issue only arises when it hits the "array" case of union_indexes, which means we need to send it a plain Index, so I'm using string index labels here:

s1 = pd.Series([1, 2], index=pd.Index(['a', 'b'], name='idx'))
s2 = pd.Series([2, 3], index=pd.Index(['b', 'c'], name='idx'))
pd.DataFrame({'a': s1, 'b': s2})

On master, I'm getting this yielding a DataFrame whose index has no name. With patch, it's named 'idx'.

great, can you add this as a test as well; let's put it in the pandas/tests/frame/test_constructors.py, ping on green.

@iamlemec
Copy link
Contributor Author

iamlemec commented Aug 7, 2020

Sounds good @jreback. Just pushed a DataFrame constructor test.

@jreback
Copy link
Contributor

jreback commented Aug 7, 2020

great ping on green

@iamlemec
Copy link
Contributor Author

iamlemec commented Aug 7, 2020

@jreback ok everything on CI looks good except for test_chunks_have_consistent_numerical_type, but it seems like that one's been flakey for people lately

@jreback jreback merged commit 92bf41a into pandas-dev:master Aug 7, 2020
@jreback
Copy link
Contributor

jreback commented Aug 7, 2020

yep that's fine, thanks @iamlemec

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Constructors Series/DataFrame/Index/pd.array Constructors Index Related to the Index class or subclasses Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: index.name not preserved in concat in case of unequal object index
4 participants