-
Notifications
You must be signed in to change notification settings - Fork 16
Python dependencies: Upgrade pandas and matplotlib, check xlrd #211
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@annette-lutz Could you take this one over and check whether our programs still work well with some new(er) versions ? |
In this context, we should have a look a xlrd. I run into the issue:
with xlrd 2.0.1. However, an old version 1.2.0 worked well. |
With the more recent pandas version 1.3.5 I get a warning and it uses openpyxl to read the file.
Reading the file with another engine is not possible since xlrd supports only .xls files (see https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.read_excel.html).
So we can use only openpyxl but it issues a warning and only works for pandas version>=1.3.0. I think the warning is about the wrong defined dimensions of the exel sheet that causes older pandas versions to define the number of lines in the file to be 2 and to read an empty dataframe (see explaination in openpyxls documentation https://openpyxl.readthedocs.io/en/default/optimized.html?highlight=dimension#worksheet-dimensions). If I use openpyxl with an older pandas version I get
A similar issue was reported to pandas (see pandas-dev/pandas#38956). For pandas version 1.2.0 I tried all engines but could not read the file. For pandas>=1.3.0 the file can only be read with openpyxls and a warning is issued. |
This pandas issue seems to be an issue with empty rows and incorrectly or not set dimensions in the excel sheet. Is it possible to set the dimensions in any way? From the openpyxl docs:
Furthermore, since header=N can lead to different results for visually the same file, we should check that the correct row is parsed as header. So for every file, we should verify the header line (first row) after loading the xls(x) |
The bug is fixed in pandas version 1.2.2, so requirering a pandas version>=1.2.2 solves the problem (see https://pandas.pydata.org/pandas-docs/stable/whatsnew/v1.2.2.html). |
memilio-epidata currently uses quite old pandas and matplotlib versions as of mid-2020:
https://github.com/DLR-SC/memilio/blob/main/pycode/memilio-epidata/setup.py#L71-L72
These should be upgraded to more recent versions.
The text was updated successfully, but these errors were encountered: