Skip to content

Latex from CSS conversion of the Styler.to_latex() method #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 18 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Binary file added doc/source/_static/style/latex_3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
215 changes: 214 additions & 1 deletion doc/source/user_guide/style.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -1755,7 +1755,7 @@
" Styler.loader, # the default\n",
" ])\n",
" )\n",
" template_html = env.get_template(\"myhtml.tpl\")"
" template = env.get_template(\"myhtml.tpl\")"
]
},
{
Expand Down Expand Up @@ -1837,6 +1837,212 @@
"See the template in the [GitHub repo](https://github.com/pandas-dev/pandas) for more details."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## LaTeX\n",
"\n",
"Since version XXX Styler has an alternative jinja2 template and additional parsing functions to allow it to produce conditionally styled LaTeX tables.\n",
"\n",
"The above HTML-CSS representation in terms of `(<attribute>, <value>)` pairs is effectively replaced by a LaTeX `(<command>, <options>)` format, and renders styles directly to each cell in a nested format. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df5 = pd.DataFrame([[1, 2.2, \"dogs\"], [3, 4.4, \"cats\"], [2, 6.6, \"cows\"]], \n",
" index=[\"ix1\", \"ix2\", \"ix3\"], \n",
" columns=[\"Integers\", \"Floats\", \"Strings\"])\n",
"df5"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For example we can style the above DataFrame using the normal functions but modifying our ``props`` argument to the LaTeX format. In this example the coloring commands have options but the text modifying commands do not need any. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"s5 = df5.style.highlight_max(props='cellcolor:[HTML]{FFFF00}; color:{red}; itshape:; bfseries:;')\n",
"print(s5.to_latex())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![LaTeX Styler example 1](../_static/style/latex_1.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"When specifying the ``options`` there are 5 hidden flags that can be amended to structure the braces used in LaTeX output:\n",
"\n",
" - ``--nowrap`` which is the default produces: ``\\<command><options> <display_value>``\n",
" - ``--wrap`` encloses the whole block in braces: ``{\\<command><options> <display_value>}``\n",
" - ``--lwrap`` encloses the left block in braces: ``{\\<command><options>} <display_value>``\n",
" - ``--rwrap`` encloses the right block in braces: ``\\<command><options>{<display_value>}``\n",
" - ``--dwrap`` encloses both blocks in braces: ``{\\<command><options>}{<display_value>}``\n",
" \n",
"For example the \\textbf and \\bfseries and the \\textit and \\itshape alternatives should be structured in different formats. We can replicate the above LaTeX render in these alternatives as follows:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"s5 = df5.style.highlight_max(props='cellcolor:[HTML]{FFFF00}; color:{red}; textit:--rwrap; textbf:--rwrap;')\n",
"\n",
"print(s5.to_latex())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The ``to_latex`` method provides a number of options for the LaTeX output such as:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df5.columns = pd.MultiIndex.from_tuples([(\"Numeric\", \"Integers\"), (\"Numeric\", \"Floats\"), (\"Non-Numeric\", \"Strings\")])\n",
"df5.index = pd.MultiIndex.from_tuples([(\"L0\", \"ix1\"), (\"L0\", \"ix2\"), (\"L1\", \"ix3\")])\n",
"s5 = df5.style.highlight_max(props='cellcolor:[HTML]{FFFF00}; color:{red}; itshape:; bfseries:;')\n",
"\n",
"print(s5.to_latex(column_format=\"rrrrr\", position=\"h\", position_float=\"centering\", hrules=True, \n",
" label=\"table:5\", caption=\"Styled LaTeX Table\", multirow_align=\"t\", multicol_align=\"r\"))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![LaTeX Styler example 2](../_static/style/latex_2.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A difference to the ``DataFrame.to_latex()`` method is that LaTeX Styler version does not contain any options for formatting. Instead, the usual ``.format()`` method should be used to create the display values before rendering. There are three good reasons for making this change:\n",
"\n",
" - Firstly using ``.format()`` in a Jupyter Notebook allows you to visualise the results of the formatting application before rendering.\n",
" - Secondly the ``.format()`` is very flexible and by chaining the method multiple times gives an even broader array of formatting possibilities which it is not possible to replicate by including the arguments in the ``to_latex()`` method which will call ``.format()`` a single time.\n",
" - Thirdly the ``.format()`` method has multiple options and updating the ``to_latex()`` method whenever ``.format()`` is updated or improved creates a development dependency which is best avoided."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"s5.clear()\n",
"s5.table_styles = []\n",
"s5.caption = None\n",
"s5.format({\n",
" (\"Numeric\", \"Integers\"): '\\${}',\n",
" (\"Numeric\", \"Floats\"): '{:.3f}',\n",
" (\"Non-Numeric\", \"Strings\"): str.upper\n",
"})\n",
"print(s5.to_latex())"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### CSS Conversion to LaTeX and siunitx"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If designing a Styler with usual CSS attributes the ``.to_latex()`` method can automatically convert the following attributes to LaTeX commands:\n",
"\n",
"| CSS Attribute | CSS Value | LaTeX Command | LaTeX Options |\n",
"|---|---|---|---|\n",
"| font-weight | bold | bfseries | |\n",
"| font-weight | bolder | bfseries | |\n",
"| font-style | italic | itshape | |\n",
"| font-style | oblique | slshape | |\n",
"| background-color | #fe01ea | cellcolor | \\[HTML\\]{FE01EA}--lwrap |\n",
"| background-color | #f0e | cellcolor | \\[HTML\\]{FF00EE}--lwrap |\n",
"| background-color | rgb(128, 255, 0) | cellcolor | \\[rgb\\]{0.502, 1.000, 0.000}--lwrap |\n",
"| background-color | rgba(128, 255, 0, 0.5) | cellcolor | \\[rgb\\]{0.502, 1.000, 0.000}--lwrap |\n",
"| background-color | rgb(25%, 255, 50%) | cellcolor | \\[rgb\\]{0.250, 1.000, 0.500}--lwrap |\n",
"| background-color | red | cellcolor | {red}--lwrap |\n",
"| color | #fe01ea | cellcolor | \\[HTML\\]{FE01EA} |\n",
"| color | #f0e | color | \\[HTML\\]{FF00EE} |\n",
"| color | rgb(128, 255, 0) | color | \\[rgb\\]{0.502, 1.000, 0.000} |\n",
"| color | rgba(128, 255, 0, 0.5) | color | \\[rgb\\]{0.502, 1.000, 0.000} |\n",
"| color | rgb(25%, 255, 50%) | color | \\[rgb\\]{0.250, 1.000, 0.500} |\n",
"| color | red | color | {red} |\n",
"\n",
"These attributes are configured to be compatible with default settings, and simultaneously with the `{siunitx}` package, if setting the ``siunitx`` argument to ``True``.\n",
"\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"df6 = pd.DataFrame([[1, 2], [10, 20], [100, 200], [1000, 2000]], columns=[\"Heading A\", \"Heading B\"], dtype=float)\n",
"s6 = df6.style.hide_index()\n",
"styles = pd.DataFrame([[None, None], \n",
" ['background-color: yellow; font-weight: bold;', 'font-weight: bolder; font-style: italic;'],\n",
" [None, None],\n",
" ['color: red;', 'background-color: yellow; color: green; font-weight: bold;']],\n",
" columns=df6.columns)\n",
"s6.format(precision=2).format(precision=1, subset=idx[1, :]).format(precision=3, subset=idx[2,:])\n",
"s6.apply(lambda df: styles, axis=None)\n",
"s6"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(s6.to_latex(siunitx=True, convert_css=True, hrules=True, position_float='centering'))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Provided the necessary packages and LaTeX inclusions regarding the `{siunitx}` package are present the above renders:"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![LaTeX Styler example 3](../_static/style/latex_3.png)"
]
},
{
"cell_type": "code",
"execution_count": null,
Expand All @@ -1853,6 +2059,13 @@
" \n",
"# HTML('<style>{}</style>'.format(css))"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
Expand Down
19 changes: 17 additions & 2 deletions pandas/io/formats/style.py
Original file line number Diff line number Diff line change
Expand Up @@ -396,6 +396,7 @@ def to_latex(
multicol_align: str = "r",
siunitx: bool = False,
encoding: str | None = None,
convert_css: bool = False,
):
r"""
Write Styler to a file, buffer or string in LaTeX format.
Expand Down Expand Up @@ -445,6 +446,12 @@ def to_latex(
Set to ``True`` to structure LaTeX compatible with the {siunitx} package.
encoding : str, default "utf-8"
Character encoding setting.
convert_css : bool, default False
Convert simple cell-styles from CSS to LaTeX format. Any CSS not found in
conversion table is dropped. A style can be forced by adding option
`--latex`.

.. versionadded:: TODO

Returns
-------
Expand Down Expand Up @@ -604,6 +611,10 @@ def to_latex(
& ix2 & \\$3 & 4.400 & CATS \\\\
L1 & ix3 & \\$2 & 6.600 & COWS \\\\
\\end{tabular}

**CSS Conversion**

TODO
"""
table_selectors = (
[style["selector"] for style in self.table_styles]
Expand Down Expand Up @@ -674,11 +685,15 @@ def to_latex(
if sparsify is not None:
with pd.option_context("display.multi_sparse", sparsify):
latex = self._render_latex(
multirow_align=multirow_align, multicol_align=multicol_align
multirow_align=multirow_align,
multicol_align=multicol_align,
convert_css=convert_css,
)
else:
latex = self._render_latex(
multirow_align=multirow_align, multicol_align=multicol_align
multirow_align=multirow_align,
multicol_align=multicol_align,
convert_css=convert_css,
)

return save_to_buffer(latex, buf=buf, encoding=encoding)
Expand Down
86 changes: 85 additions & 1 deletion pandas/io/formats/style_render.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

from collections import defaultdict
from functools import partial
import re
from typing import (
Any,
Callable,
Expand Down Expand Up @@ -1021,7 +1022,9 @@ def _parse_latex_table_styles(table_styles: CSSStyles, selector: str) -> str | N
return None


def _parse_latex_cell_styles(latex_styles: CSSList, display_value: str) -> str:
def _parse_latex_cell_styles(
latex_styles: CSSList, display_value: str, convert_css: bool = False
) -> str:
r"""
Mutate the ``display_value`` string including LaTeX commands from ``latex_styles``.

Expand All @@ -1047,6 +1050,8 @@ def _parse_latex_cell_styles(latex_styles: CSSList, display_value: str) -> str:
For example for styles:
`[('c1', 'o1--wrap'), ('c2', 'o2')]` this returns: `{\c1o1 \c2o2{display_value}}
"""
if convert_css:
latex_styles = _parse_latex_css_conversion(latex_styles)
for (command, options) in latex_styles[::-1]: # in reverse for most recent style
formatter = {
"--wrap": f"{{\\{command}--to_parse {display_value}}}",
Expand Down Expand Up @@ -1117,3 +1122,82 @@ def _parse_latex_options_strip(value: str | int | float, arg: str) -> str:
For example: 'red /* --wrap */ ' --> 'red'
"""
return str(value).replace(arg, "").replace("/*", "").replace("*/", "").strip()


def _parse_latex_css_conversion(styles: CSSList) -> CSSList:
"""
Accept list of CSS (attribute,value) pairs and convert to equivalent LaTeX
(command,options) pairs.

Ignore conversion if tagged with `--latex` option

Removed if no conversion found.
"""

def font_weight(value, arg):
if value == "bold" or value == "bolder":
return "bfseries", f"{arg}"
return None

def font_style(value, arg):
if value == "italic":
return "itshape", f"{arg}"
elif value == "oblique":
return "slshape", f"{arg}"
return None

def color(value, user_arg, command, comm_arg):
"""
CSS colors have 5 formats to process:

- 6 digit hex code: "#ff23ee" --> [HTML]{FF23EE}
- 3 digit hex code: "#f0e" --> [HTML]{FF00EE}
- rgba: rgba(128, 255, 0, 0.5) --> [rgb]{0.502, 1.000, 0.000}
- rgb: rgb(128, 255, 0,) --> [rbg]{0.502, 1.000, 0.000}
- string: red --> {red}

Additionally rgb or rgba can be expressed in % which is also parsed.
"""
arg = user_arg if user_arg != "" else comm_arg

if value[0] == "#" and len(value) == 7: # color is hex code
return command, f"[HTML]{{{value[1:].upper()}}}{arg}"
if value[0] == "#" and len(value) == 4: # color is short hex code
val = f"{value[1].upper()*2}{value[2].upper()*2}{value[3].upper()*2}"
return command, f"[HTML]{{{val}}}{arg}"
elif value[:3] == "rgb": # color is rgb or rgba
r = re.search("(?<=\\()[0-9\\s%]+(?=,)", value)[0].strip()
r = float(r[:-1]) / 100 if "%" in r else int(r) / 255
g = re.search("(?<=,)[0-9\\s%]+(?=,)", value)[0].strip()
g = float(g[:-1]) / 100 if "%" in g else int(g) / 255
if value[3] == "a": # color is rgba
b = re.findall("(?<=,)[0-9\\s%]+(?=,)", value)[1].strip()
else: # color is rgb
b = re.search("(?<=,)[0-9\\s%]+(?=\\))", value)[0].strip()
b = float(b[:-1]) / 100 if "%" in b else int(b) / 255
return command, f"[rgb]{{{r:.3f}, {g:.3f}, {b:.3f}}}{arg}"
else:
return command, f"{{{value}}}{arg}" # color is likely string-named

CONVERTED_ATTRIBUTES = {
"font-weight": font_weight,
"background-color": partial(color, command="cellcolor", comm_arg="--lwrap"),
"color": partial(color, command="color", comm_arg=""),
"font-style": font_style,
}

latex_styles = []
for (attribute, value) in styles:
if "--latex" in value:
# return the style without conversion but drop '--latex'
latex_styles.append((attribute, value.replace("--latex", "")))
if attribute in CONVERTED_ATTRIBUTES.keys():
arg = ""
for x in ["--wrap", "--nowrap", "--lwrap", "--dwrap", "--rwrap"]:
if x in str(value):
arg, value = x, _parse_latex_options_strip(value, x)
break
latex_style = CONVERTED_ATTRIBUTES[attribute](value, arg)
if latex_style is not None:
latex_styles.extend([latex_style])
return latex_styles
Loading