-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Be able to add lines for all index levels, not just visible ones [fix #59877] #59916
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…as-dev#59877] - implement a new suffix for the `clines` option, `-invisible`, doubling the number of available options, specifying that hidden indices should be included when deciding whether to add \clines - add tests for this behaviour
This pull request is stale because it has been open for thirty days with no activity. Please update and respond to this comment if you're still interested in working on this. |
Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen. |
@mroeschke I merged in the main branch two weeks ago at the request of the bot, but forgot to add a comment saying such. if I’m going to keep doing regular merging from main and fixing of tests, is there any way to know if or when this PR might be merged? |
Sorry for the lack of response. We don't have many core team members familiar with this part of the codebase, only really @attack68. To ensure that this PR is in a mergeable state, please ensure all the CI checks are passing (unless there's a check failing in other PRs too) |
I quite like this PR, Matt's comments look valid amendments though. I have three points worth requiring a response.
Your "invisible" does not appear to make any distinction between the different cases here. What is hidden will have a
Ultimately I think this PR is an overall feature addition so I am happy to support it, but I think it is worth documenting its short comings particularly if someone wants to follow it up later. |
Many thanks for the feedback both.
Yes; as I mentioned upthread, I chose this route to avoid introducing a change that would break existing code. I guess a compatible alternative would be to have a second boolean kwarg Were I designing it from clean I'd be inclined to split these options up, either to take a tuple of booleans like
I was hung up here on the fact that ideally I'd have wanted to use
This is where I demonstrate my ignorance of Pandas indexing; I was focusing on my own use case and wasn't aware of this subtlety. Would you please be able to send/link to an example code block to construct a DataFrame for each of the three cases so I can test? Thanks again |
OK, good points. For 1) and 2) This is potentially going into v3.0.0 which means a little more flexibility in breaking things.
For your PR I would then suggest Requires careful documentation. |
For 3) here are some examples for your consideration: m = MultiIndex.from_product([["l00", "l01","l02"], ["l10", "l11", "l12"], ["l20", "l21", "l22"]])
d = DataFrame([[_] for _ in range(27)], index=m)
# Hiding Specific Rows
d.style.hide(d.index[3:22], axis=0) # Hiding Specific Rows and hiding a level
d.style.hide(d.index[3:22], axis=0).hide(level=[1], axis=0) I would be inclined to believe that if a user has hidden specific rows they would not expect a |
Thanks for the feedback
Makes sense. Would it be a good idea to define a namedtuple so that calling code can be made more readable than passing three anonymous bools? Something like
which would then allow calling as e.g.
OK, I'll hold off implementing this pending more feedback, and look at point 3. |
Thanks for the examples.
That's right. The block of hidden rows in the example you sent spans a change of all index levels, so I'd always expect a line to be drawn there, and indeed that's what I see in testing. Modifying to only hide row 1, the first example shows no difference between the options with and without I'm not sure that "draw clines where a specific row has been hidden regardless of other cline-drawing options" is relevant to this discussion, although possibly a feature some would find useful at some point. So would you agree that the action is to (once the final interface is agreed on) clarify in the documentation that the |
NamedTuples are nice for internal code structuring, but in my experience they are horrible for UX. A user would have to import the ClinesOption and then use it, rather than just reading the docs and substituting with (True, False, True). I don't think there are any examples of pandas args requiring NamedTuples but I could be wrong. |
I think you've captured the required cases. Because this is a visual issue I think it will be worth rendering the test cases and going through them I can try to do this in the thread for those already constructed and if it is worth adding more. |
Thanks—in principle a tuple can always be passed in place of a namedtuple (and the former could be coerced to the latter internally to the function, so you still get the benefits of named arguments), but if that's the intended way of calling then I agree there's little point in having it, and I agree that needing to import the specific class is a little clumsy. (It might be more idiomatic in Matplotlib, since that library makes heavier use of composition.) Potentially one could add a staticmethod to Styler that generates it, but that seems overkill for a relatively niche option. |
Agreed, I think overkill. According to #59125 we need a DeprecationCycle. I think the best thing to do is test for str input and raise the warning, otherwise just code it to accept the new Tuple format. |
I do not think changing this argument to a tuple is a good idea. I find
to be hard to read code, whereas something like
is much better.
I don't think this holds for pandas - namely I would be expecting virtually all users to be importing pandas globally, so it would just be |
If you turn it on a user has to currently specify;
This PR proposes adding a third option to that list which is:
So that's potentially 4 keyword arguments for styler.to_latex(
clines=True,
clines_include_last_level=True,
clines_span_all_data=False,
clines_include_hidden_levels=True
) I agree that this is clearer in the keyword arguments as specified here, but once all this is added to the documentation (which has a lot of features anyway) I think it obfuscates the whole. Additionally That is the reason I prefer: styler.to_latex(
clines=(True, False, True) # or None
) with dedicated documentation. Ultimately though I would rather support a PR that gets these feature in and is likely to be more generally approved. |
Can we name another common API that accepts tuples of Boolean arguments like this? flake8-boolean-trap was created to lint the anti-pattern of specifying Boolean argument by position. This API wouldn't give users the option to not specify 3 Boolean arguments by position. |
Not without looking for it, but the problem with the catch-all argument no one else has done it assumes that no new approach should ever be considered. I would prefer to evaluate merits of solution in its own context. (Styler.to_latex only accepts 1 positional argument, all other are keyword). But what about the following as replacement: the current solution uses string identifiers, e.g. "skip-last;data". Now that more options are in scope what about amending that to a dictionary (with some defaults consistent with current version)? This is also easily extended going forward without top-line method argument changes needed? styler.to_latex(
clines={
"include_last_level": True,
"span_all_data": True,
"include_hidden_levels": False
}
) |
FWIW I was thinking along the same lines as the dict suggestion; I find it significantly more readable than the boolean tuple. |
Just for additional consideration , a tuple recognising specific strings(?). I see this pattern sometimes (e.g. scipy boundary condition specifications). Reads fairly well on one line.
|
@mroeschke @rhshadrach any preference on steering the development here: outlined choices for the expansion of the
My personal order of preference is |
I suppose 3 is fine with me too. If these latex API arguments were reworked, I would be +1 for them to be namedtuples/dataclasses such that the object name is the "argument theme" with the parameters being the various options that can be set within that "theme" since I find that more explicit |
deprecate passing the single string parameter convert the single string parameter to the new tuple format in the interim
Thanks for the steer; I've pushed changes that implement option 3. |
You have designed this as a tuple input where deafult settings are applied with a unordered set of override inputs. E.g. to just turn on clines with default settings we have clines : tuple, optional
In *None* will not produce any *clines*.
If a tuple will produce *clines* defined by the parameter options in the tuple:
- Which index levels affect the result: {"all-levels", "skip-last"},
- Where to extend the horizontal lines across the table: {"rule-data", "rule-index"},
- Whether to include or exclude hidden index levels: {"exclude-hidden", "include-hidden"}.
An example, with the usual default values is ("all-levels", "rule-data", "exclude-hidden"). This provides 8 current possible settings as well as the additional None. If options were added in the future, e.g. "rule-2-cols" it would easier to add in because the option is placed in tuple.1 and can be easily parsed. This is also not a major departure from the current config: What do we think? |
Also, as a side note, I'd be interested in understanding what happens if you have a 3 level index and hide the last index level: Does clines then interpret the last-level as level 1 in [0,1] because 2 is hidden or is last level always 2 even though it is hidden. This has relevance for the combinations "skip-last" and "include-hidden" or "exclude-hidden". I think it can be interpreted either way so not suggesting a re-implementation, just an understanding of the status quo. |
Ah, I see; apologies for misunderstanding your intention. My thoughts, in no particular order:
|
We have the existing framework of either None or two-parameter string e.g. "all;data". If we work to upset that the least then: Converting to an ordered tuple is most similar to a NamedTuple without the names so provides structure that might be useful later, and is also easy to parse in the code (possibly reduces mainetnance burden). I prefer having the explicitly named features rather than reverting to any system defaults because it is not particularly obvious what those defaults should be (e.g. why choose only across index or across data, its purely a subjective choice for visualisation). Thus forcing a user to specify means they must naturally become aware of the options. You do raise a good point about possibility of adding a 4th option. I had thought of this but was hoping to defer it. However, here's a middle ground. clines: tuple, optional
If *None* no clines are created.
If a tuple will create clines. The tuple must provide at least the first two options, but can extend to three. Options are:
- Which index levels affect the result: {"all-levels", "skip-last"},
- Where to extend the horizontal lines across the table: {"rule-data", "rule-index"},
- Whether to include or exclude hidden index levels: {"exclude-hidden", "include-hidden"}.
As an example ("all-levels","rule-data"). The third option will default to "exclude-hidden". This also means the easiest/cleanest form of translating the existing string form to tuple form. |
We are starting to get into territory where it is difficult for anyone to legitimately keep track of the possible combinations. Developers or users. Since it is always possible to reindex a DataFrame (drop levels, etc.) and then create a Styler from that reindexed DataFrame one can always achieve precisely what one wants with the following suggestion of keeping these options as high-level as possible: I think "skip-last" should refer only to the explicit last level of the index (whether it is hidden or not), as in the current implementation, and not the last visible level. |
+1 |
Thanks for the pull request, but it appears to have gone stale. If interested in continuing, please merge in the main branch, address any review comments and/or failing tests, and we can reopen. |
Per discussion in #59877, add an
-invisible
suffix that can be added to the index selection in theclines
option toStyler.to_latex()
, allowing hidden indices to be used when deciding where to place lines between rows.This approach does not break compatibility with existing code, at the expense of starting a potential combinatoric explosion of options being combined into this one string. There may be a more elegant and/or expressive way of achieving the same result, at the cost of breaking compatibility (or needing to maintain two interfaces until the previous one is deprecated).
doc/source/whatsnew/vX.X.X.rst
file if fixing a bug or adding a new feature.