Skip to content

DOC: Add examples to the method MultiIndex.is_lexsorted() #32312

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Apr 2, 2020
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
24 changes: 24 additions & 0 deletions pandas/core/indexes/multi.py
Original file line number Diff line number Diff line change
Expand Up @@ -1625,6 +1625,30 @@ def is_lexsorted(self) -> bool:
Returns
-------
bool

Examples
--------
In the below examples, the first level of the MultiIndex is sorted because
a<b<c, so there is no need to look at the next level.

>>> pd.MultiIndex.from_arrays([['a', 'b', 'c'], ['d', 'e', 'f']]).is_lexsorted()
True
>>> pd.MultiIndex.from_arrays([['a', 'b', 'c'], ['d', 'f', 'e']]).is_lexsorted()
True

In case there is a tie, the lexicographical sorting looks
at the next level of the MultiIndex.

>>> pd.MultiIndex.from_arrays([[0, 1, 1], ['a', 'b', 'c']]).is_lexsorted()
True
>>> pd.MultiIndex.from_arrays([[0, 1, 1], ['a', 'c', 'b']]).is_lexsorted()
False
>>> pd.MultiIndex.from_arrays([['a', 'a', 'b', 'b'],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm not sure about this one in particular so we collectively should clarify

I don't know that the is_lexsorted implementation is really well defined for a MultiIndex. For instance, I think it only (as a bug) ever looks at the first two levels of a MultiIndex, failing conditions like this:

>>> pd.MultiIndex.from_arrays([['a', 'b', 'c', 'd'],['e', 'f', 'g', 'h'],["z", "y", "x", "w"]]).is_lexsorted()
True
>>> pd.MultiIndex.from_arrays([['a', 'b', 'c', 'd'],['e', 'f', 'g', 'h'],["w", "x", "y", "z"]]).is_lexsorted()
True

@toobaz any thoughts on this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For instance, I think it only (as a bug) ever looks at the first two levels of a MultiIndex, failing conditions like this:

@WillAyd isn't it correct there, though? The first level (Index(['a', 'b', 'c', 'd'], dtype='object')) is sorted, and so the whole MultiIndex is also lexically sorted.

From what I understand about lexical sorting, the third level only needs to be looked at if there's a tie in the first two, e.g.

>>> pd.MultiIndex.from_arrays([['a', 'a', 'c', 'd'],['e', 'e', 'g', 'h'],["x", "y", "y", "z"]]).is_lexsorted()                                                                                               
True

>>> pd.MultiIndex.from_arrays([['a', 'a', 'c', 'd'],['e', 'e', 'g', 'h'],["y", "x", "y", "z"]]).is_lexsorted()                                                                                               
False

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the first level alone should be the decider here - maybe @TomAugspurger or @jreback know more of the history / intent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the wikipedia page https://en.wikipedia.org/wiki/Lexicographical_order it is said: "Given two different sequences of the same length, a1, a2,...,ak and b1,b2,...,bk, the first one is smaller than the second one for the lexicographical order, if ai < bi (for the order of A), for the first i where ai and bi differ."

>>> pd.MultiIndex.from_arrays([['a', 'b', 'c', 'd'],['e', 'f', 'g', 'h'],["z", "y", "x", "w"]]).is_lexsorted()
True

In the above example, as I understand, the first level is sorted because a<b<c<d, so lexical sorting does not need to look in the next level.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, this is correct.

But I think this discussion highlights that a sentence describing why this index is lex-sorted would be helpful.

... ['aa', 'bb', 'aa', 'bb']]).is_lexsorted()
True
>>> pd.MultiIndex.from_arrays([['a', 'a', 'b', 'b'],
... ['bb', 'aa', 'aa', 'bb']]).is_lexsorted()
False
"""
return self.lexsort_depth == self.nlevels

Expand Down