Huge html with df.style.render to due css duplications #20695

wiso · 2018-04-14T12:57:30Z

When creating html with df.style e.g.

df.style.apply(color_f, axis=1).render()

pandas assigns to each html cell of the table a unique css class. This means huge html are created, while it would be possible to use the same class for cell with the same style.

commit: None
python: 2.7.14.final.0
python-bits: 64
OS: Linux
OS-release: 4.15.14-300.fc27.x86_64
machine: x86_64
processor: x86_64
byteorder: little
LC_ALL: None
LANG: it_IT.UTF-8
LOCALE: None.None

pandas: 0.22.0
pytest: 3.1.3
pip: 9.0.1
setuptools: 36.0.1
Cython: 0.27.3
numpy: 1.14.2
scipy: 1.0.0
pyarrow: None
xarray: None
IPython: 5.4.1
sphinx: None
patsy: 0.4.1
dateutil: 2.7.2
pytz: 2018.4
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.2
openpyxl: None
xlrd: 1.0.0
xlwt: 1.1.2
xlsxwriter: None
lxml: None
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.11
pymysql: None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

The text was updated successfully, but these errors were encountered:

TomAugspurger · 2018-04-14T18:20:30Z

Can you make a small copy-pastable example, and note which CSS classes you would want to exclude?

We could offer a couple optimizations

disable ID per cell
Disable row / col classes for cells that aren't referenced by a style

This will be a bit of work though.

wiso · 2018-04-15T13:52:03Z

Sure, here a very minimal example

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.random(size=(2, 2)), columns=list('AB'))
print df.style.render()

as you can see I am not applying any special style. Even in this trivial case I get one css class and id for each cell.

<style  type="text/css" >
</style>  
<table id="T_af65670a_40b3_11e8_9422_ac220bbe67c8" > 
<thead>    <tr> 
        <th class="blank level0" ></th> 
        <th class="col_heading level0 col0" >A</th> 
        <th class="col_heading level0 col1" >B</th> 
    </tr></thead> 
<tbody>    <tr> 
        <th id="T_af65670a_40b3_11e8_9422_ac220bbe67c8level0_row0" class="row_heading level0 row0" >0</th> 
        <td id="T_af65670a_40b3_11e8_9422_ac220bbe67c8row0_col0" class="data row0 col0" >0.628483</td> 
        <td id="T_af65670a_40b3_11e8_9422_ac220bbe67c8row0_col1" class="data row0 col1" >0.961722</td> 
    </tr>    <tr> 
        <th id="T_af65670a_40b3_11e8_9422_ac220bbe67c8level0_row1" class="row_heading level0 row1" >1</th> 
        <td id="T_af65670a_40b3_11e8_9422_ac220bbe67c8row1_col0" class="data row1 col0" >0.0814626</td> 
        <td id="T_af65670a_40b3_11e8_9422_ac220bbe67c8row1_col1" class="data row1 col1" >0.723978</td> 
    </tr></tbody> 
</table>

In my real case I have thousands of row and hundres of lines, so I and end with html ~10Mb.

What you propose seems ok (don't know how the id is used), but as final solution I guess you have to group cell css class using the same classid if the cells have the same style.

TomAugspurger · 2018-04-15T18:39:30Z

You're welcome to take a look at

pandas/pandas/io/formats/style.py

Line 407 in d5d5a71

def render(self, **kwargs):

and

pandas/pandas/io/formats/style.py

Line 177 in d5d5a71

def _translate(self):

to explore how things can be (optionally) optimized.

TomAugspurger added IO HTML read_html, to_html, Styler.apply, Styler.applymap Enhancement Performance Memory or execution speed performance Difficulty Intermediate labels Apr 14, 2018

TomAugspurger added this to the Next Major Release milestone Apr 14, 2018

Moisan mentioned this issue Oct 6, 2018

PERF: only output an html id if a style is applied #23019

Merged

3 tasks

TomAugspurger closed this as completed in #23019 Oct 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Huge html with df.style.render to due css duplications #20695

Huge html with df.style.render to due css duplications #20695

wiso commented Apr 14, 2018

TomAugspurger commented Apr 14, 2018

wiso commented Apr 15, 2018

TomAugspurger commented Apr 15, 2018 •

edited

Loading

Huge html with df.style.render to due css duplications #20695

Huge html with df.style.render to due css duplications #20695

Comments

wiso commented Apr 14, 2018

TomAugspurger commented Apr 14, 2018

wiso commented Apr 15, 2018

TomAugspurger commented Apr 15, 2018 • edited Loading

TomAugspurger commented Apr 15, 2018 •

edited

Loading