-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DOC: Add examples to the method MultiIndex.is_lexsorted() #32312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
True | ||
>>> pd.MultiIndex.from_arrays([[0, 1, 1], ['a', 'c', 'b']]).is_lexsorted() | ||
False | ||
>>> pd.MultiIndex.from_arrays([['a', 'a', 'b', 'b'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm not sure about this one in particular so we collectively should clarify
I don't know that the is_lexsorted
implementation is really well defined for a MultiIndex. For instance, I think it only (as a bug) ever looks at the first two levels of a MultiIndex, failing conditions like this:
>>> pd.MultiIndex.from_arrays([['a', 'b', 'c', 'd'],['e', 'f', 'g', 'h'],["z", "y", "x", "w"]]).is_lexsorted()
True
>>> pd.MultiIndex.from_arrays([['a', 'b', 'c', 'd'],['e', 'f', 'g', 'h'],["w", "x", "y", "z"]]).is_lexsorted()
True
@toobaz any thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For instance, I think it only (as a bug) ever looks at the first two levels of a MultiIndex, failing conditions like this:
@WillAyd isn't it correct there, though? The first level (Index(['a', 'b', 'c', 'd'], dtype='object')
) is sorted, and so the whole MultiIndex is also lexically sorted.
From what I understand about lexical sorting, the third level only needs to be looked at if there's a tie in the first two, e.g.
>>> pd.MultiIndex.from_arrays([['a', 'a', 'c', 'd'],['e', 'e', 'g', 'h'],["x", "y", "y", "z"]]).is_lexsorted()
True
>>> pd.MultiIndex.from_arrays([['a', 'a', 'c', 'd'],['e', 'e', 'g', 'h'],["y", "x", "y", "z"]]).is_lexsorted()
False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the first level alone should be the decider here - maybe @TomAugspurger or @jreback know more of the history / intent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the wikipedia page https://en.wikipedia.org/wiki/Lexicographical_order it is said: "Given two different sequences of the same length, a1, a2,...,ak and b1,b2,...,bk, the first one is smaller than the second one for the lexicographical order, if ai < bi (for the order of A), for the first i where ai and bi differ."
>>> pd.MultiIndex.from_arrays([['a', 'b', 'c', 'd'],['e', 'f', 'g', 'h'],["z", "y", "x", "w"]]).is_lexsorted()
True
In the above example, as I understand, the first level is sorted because a<b<c<d, so lexical sorting does not need to look in the next level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, this is correct.
But I think this discussion highlights that a sentence describing why this index is lex-sorted would be helpful.
Sure, happy to add more clarification to this function. I added this example with some explanation, though just made it shorter, so it would be easier to follow. |
pandas/core/indexes/multi.py
Outdated
>>> pd.MultiIndex.from_arrays([['a', 'b', 'c'], ['d', 'f', 'e']]).is_lexsorted() | ||
True | ||
|
||
In the above examples, the first level of MultiIndex is sorted because a<b<c, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you put the expl that you have on 1636-137 before the above example
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm - @jreback
Thanks @raisadz |
black pandas
git diff upstream/master -u -- "*.py" | flake8 --diff