Skip to content

Jupyter Notebook displays Series VERY slowly #18137

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
JoshuaC3 opened this issue Nov 6, 2017 · 12 comments
Closed

Jupyter Notebook displays Series VERY slowly #18137

JoshuaC3 opened this issue Nov 6, 2017 · 12 comments
Labels
Needs Info Clarification about behavior needed to assess issue

Comments

@JoshuaC3
Copy link

JoshuaC3 commented Nov 6, 2017

Code Sample

import pandas as pd
from IPython.display import display
df = pd.DataFrame(
    pd.np.random.random((10000000, 1)), 
    columns=['a'], 
    index=pd.date_range(start='2001-01-01', freq='min', periods=10000000)
)
display(df.a)
import pandas as pd
from IPython.display import display

# read some data for size 200,000 x 4

display(data.loc[:, ['reading']])  #25ms
display(data.reading.to_frame())  #25ms
display(data.reading) #3.53s
s = data.reading
display(s) #3.32s possibly cached
print(data.reading) #9ms
print(data.loc[:, ['reading']]) # 15ms

Problem description

In the Jupyter notebook, displaying a pd.Series is VERY slow. It is displays quicker when kept as a pd.DataFrame but is of course more verbose.

Expected Output

A pd.Series display/print out at reasonable speeds.

Output of pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.5.4.final.0
python-bits: 64
OS: Windows
OS-release: 7
machine: AMD64
processor: Intel64 Family 6 Model 60 Stepping 3, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.21.0
pytest: 3.2.1
pip: 9.0.1
setuptools: 36.2.4
Cython: 0.26
numpy: 1.13.3
scipy: 0.19.1
pyarrow: None
xarray: None
IPython: 5.3.0
sphinx: 1.6.3
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.3
blosc: None
bottleneck: 1.2.1
tables: 3.4.2
numexpr: 2.6.2
feather: None
matplotlib: 2.0.0
openpyxl: 2.5.0a2
xlrd: 1.0.0
xlwt: 1.2.0
xlsxwriter: 0.9.8
lxml: 3.8.0
bs4: 4.6.0
html5lib: 0.9999999
sqlalchemy: 1.1.13
pymysql: 0.7.9.None
psycopg2: None
jinja2: 2.9.6
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: 0.4.0

@TomAugspurger
Copy link
Contributor

Could you make a reproducible by generating the data randomly?

@TomAugspurger TomAugspurger added the Needs Info Clarification about behavior needed to assess issue label Nov 6, 2017
@JoshuaC3
Copy link
Author

JoshuaC3 commented Nov 6, 2017

Apparently not! When I try to replicate this with randomise data, np.random.rand, this prints and displays efficiently. This leads me to believe that the bug is caused by a specific issue with my data. I will investigate further...

@JoshuaC3
Copy link
Author

JoshuaC3 commented Nov 6, 2017

I ran this in the same Jupyter Notebook

import numpy as np
import pandas as pd
from IPython.display import display

data = pd.DataFrame({'reading': np.random.rand(200000), 
                   'strings': ['string']*200000,
                   'ints': range(200000),
                   'dates': pd.date_range(start='2017', periods=200000, freq='MIN')})

display(data.loc[:, ['reading']])  #25ms
display(data.reading.to_frame())  #25ms
display(data.reading) #3.53s
s = data.reading
display(s) #3.32s possibly cached
print(data.reading) #9ms
print(data.loc[:, ['reading']]) # 15ms

All prints/displays performed well.

Are there any attributes/functions/info that I can run on the two DataFrames to compare them and debug?

FYI: I have experienced this issue before in other notebooks, with other, but similar types of data.

@JoshuaC3 JoshuaC3 closed this as completed Nov 6, 2017
@JoshuaC3
Copy link
Author

JoshuaC3 commented Nov 6, 2017

Accidentally closed. My apologies.

@JoshuaC3 JoshuaC3 reopened this Nov 6, 2017
@JoshuaC3
Copy link
Author

JoshuaC3 commented Nov 8, 2017

Is there anything else I should do to help debug this?

@TomAugspurger
Copy link
Contributor

TomAugspurger commented Nov 8, 2017 via email

@jreback
Copy link
Contributor

jreback commented Nov 8, 2017

do a prun on the problematic and ok ones

@JoshuaC3
Copy link
Author

JoshuaC3 commented Nov 8, 2017

A notebook restart has solved this issue. I have had this problem before in other notebooks however. I will run the above when I see this problem arise again. Is it ok to leave this open until then or close and re-open when I re-discover it?

@Strateus
Copy link

I have same issue. How to reproduce:

import pandas as pd
from IPython.display import display
df = pd.DataFrame(
    pd.np.random.random((10000000, 1)), 
    columns=['a'], 
    index=pd.date_range(start='2001-01-01', freq='min', periods=10000000)
)
display(df.a)

if you just run display(df) - it works well.

@TomAugspurger
Copy link
Contributor

Thanks for the reproducible example. I've added to the first post.

Anyone have time to look into where things are slow?

@TomAugspurger
Copy link
Contributor

Fixed by #20834 I think. LMK if not.

@JoshuaC3
Copy link
Author

I no longer get this issue. I have updated a lot of things a lot of times so couldn't pin it to any one thing. Thanks :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Info Clarification about behavior needed to assess issue
Projects
None yet
Development

No branches or pull requests

4 participants