Skip to content

DOC: Add examples to the method MultiIndex.is_lexsorted() #32312

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 2, 2020

Conversation

raisadz
Copy link
Contributor

@raisadz raisadz commented Feb 27, 2020

True
>>> pd.MultiIndex.from_arrays([[0, 1, 1], ['a', 'c', 'b']]).is_lexsorted()
False
>>> pd.MultiIndex.from_arrays([['a', 'a', 'b', 'b'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm not sure about this one in particular so we collectively should clarify

I don't know that the is_lexsorted implementation is really well defined for a MultiIndex. For instance, I think it only (as a bug) ever looks at the first two levels of a MultiIndex, failing conditions like this:

>>> pd.MultiIndex.from_arrays([['a', 'b', 'c', 'd'],['e', 'f', 'g', 'h'],["z", "y", "x", "w"]]).is_lexsorted()
True
>>> pd.MultiIndex.from_arrays([['a', 'b', 'c', 'd'],['e', 'f', 'g', 'h'],["w", "x", "y", "z"]]).is_lexsorted()
True

@toobaz any thoughts on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For instance, I think it only (as a bug) ever looks at the first two levels of a MultiIndex, failing conditions like this:

@WillAyd isn't it correct there, though? The first level (Index(['a', 'b', 'c', 'd'], dtype='object')) is sorted, and so the whole MultiIndex is also lexically sorted.

From what I understand about lexical sorting, the third level only needs to be looked at if there's a tie in the first two, e.g.

>>> pd.MultiIndex.from_arrays([['a', 'a', 'c', 'd'],['e', 'e', 'g', 'h'],["x", "y", "y", "z"]]).is_lexsorted()                                                                                               
True

>>> pd.MultiIndex.from_arrays([['a', 'a', 'c', 'd'],['e', 'e', 'g', 'h'],["y", "x", "y", "z"]]).is_lexsorted()                                                                                               
False

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the first level alone should be the decider here - maybe @TomAugspurger or @jreback know more of the history / intent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the wikipedia page https://en.wikipedia.org/wiki/Lexicographical_order it is said: "Given two different sequences of the same length, a1, a2,...,ak and b1,b2,...,bk, the first one is smaller than the second one for the lexicographical order, if ai < bi (for the order of A), for the first i where ai and bi differ."

>>> pd.MultiIndex.from_arrays([['a', 'b', 'c', 'd'],['e', 'f', 'g', 'h'],["z", "y", "x", "w"]]).is_lexsorted()
True

In the above example, as I understand, the first level is sorted because a<b<c<d, so lexical sorting does not need to look in the next level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is correct.

But I think this discussion highlights that a sentence describing why this index is lex-sorted would be helpful.

@raisadz
Copy link
Contributor Author

raisadz commented Mar 3, 2020

Sure, happy to add more clarification to this function. I added this example with some explanation, though just made it shorter, so it would be easier to follow.

>>> pd.MultiIndex.from_arrays([['a', 'b', 'c'], ['d', 'f', 'e']]).is_lexsorted()
True

In the above examples, the first level of MultiIndex is sorted because a<b<c,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put the expl that you have on 1636-137 before the above example

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm - @jreback

@WillAyd WillAyd added this to the 1.1 milestone Apr 2, 2020
@WillAyd WillAyd merged commit 3209b69 into pandas-dev:master Apr 2, 2020
@WillAyd
Copy link
Member

WillAyd commented Apr 2, 2020

Thanks @raisadz

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants