-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
CLN: Make ExcelWriter more pluggable #4750
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@jreback et al - does the API seem okay? @jmcnamara - since you actually want to add a new writer, does this seem clear to you? It's similar to what was there before with a bit cleaner separation of concerns. |
@jtratner maybe put the methods you are supposed to override in the base class as NotImplemented? |
I could also just make an abstract base class... would that be clearer? Otherwise do you mean |
either work fine....its the person in the future who adds a new writer/reader and forgets to define something... |
|
||
register_engine('xlwt', _XlwtWriter) | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor point: can everything below this line be moved to core/config_init.py
? otherwise 👌
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes - makes much more sense to do that :) - btw - does pandas support reading a config file? (if so, I'd need to just validate with str
instead, since you might set an engine that doesn't exist at the moment...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
take that back...I'll have to just validate with string, because don't want to have to be sure that config is instantiated after io/excel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no config files AFAIK. could be a good first pr maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also can u add a small test to test_config
to make sure ur validator is doing what u think it is? thanks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cpcloud I got rid of the validator(too dependent on timing of setting option), but added a test case for ExcelWriter('badfilename')
It looks fine. I'll subclass it and let you know how I get on. Should I subclass it in another file or in excel.py? About this: # declare external properties you can count on
book = None
curr_sheet = None
path = None That is good. I was going to ask for some interface to the workbook and worksheet since that could be useful to the end user for adding additional formatting or invoking other methods on them. How about calling them |
@jmcnamara huh? You would just put whatever setup you need in your |
Sure. What I meant was it that it would be useful (from my point of view) to have interfaces in the base class that would return workbook and worksheet objects that could be used to access APIs that aren't exposed by ExcelWriter. For example, if I wanted to set a column width in an xlwt worksheet. Something like this: writer = ExcelWriter('file.xls')
df.to_excel(writer, 'Sheet1')
# Access the underlying worksheet.
worksheet = writer.worksheet()
worksheet.col(0).width = 256 * 12
writer.save() |
You can already do this, right? writer = ExcelWriter('file.xls')
df.to_excel(writer, 'Sheet1')
# Access the underlying worksheet.
worksheet = writer.sheets["Sheet1"]
worksheet.col(0).width = 256 * 12
writer.save() (on current master) |
Ok. That's good. Ignore above. |
okay, I added some additional documentation for this and testing for registering writers, setting engines, etc. If no objections, going to merge at some point soon... |
@jtratner For what it is worth this looks good to me. If you can get it merged I redo my PR for xlsxwriter support. |
Yeah, I think it's close. Just want to look it over one more time. It |
It looks good from my end. I already tried it out on a clone from your fork. When I add the xlsxwriter subclass do you want it in |
I'd put it in excel.py. |
I just rebased this. I think it's in a pretty good state now. I could change ExcelReader slightly to do the same thing (for completeness sake) - is that worth doing right now even though we don't have any other readers in the pipeline at the moment? The only other item is whether the formatting should be done by |
I guess I will probably merge this tonight. So if you are interested/have time, please take a look. |
``xlwt`` for ``.xls`` files. It's very simple to add new Excel writer engines | ||
to ``pandas``. If you have multiple engines installed, you can choose the | ||
engine to use by default via the options ``io.excel.xlsx.writer`` and | ||
``io.excel.xls.writer``. (side note: if you want to add an Excel writer, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would take out this side note comment, that's for dev, not users; also Ithink we usually put the versionadded at the top of the new section?
okay, I'm probably going to merge this today - nothing much changed from past few days. |
Taken from a new addition to the six library (again, license is in the LICENSE directory).
Make ExcelWriter an ABC and add ExcelWriter config to core/config_init ENH: Allow Panel.to_excel to pass keyword arguments
CLN: Make ExcelWriter more pluggable
@jmcnamara okay, this has been merged in...go ahead :) |
FIxes #4745 and makes it easier to swap in other
ExcelWriter
s.Basically, you can subclass
ExcelWriter
(or not), register the engine withregister_engine
, and then either pass it as anengine
keyword toread_excel
or set it as the default writer using the config options (i.e.,io.excel.xlsx.writer
orio.excel.xls.writer
). Engine names are validated by checking that they are already defined and that they are able to read files of that type (that said, if you can set up external config files to be parsed by pandas, this validation should be removed, because it won't work if pandas is loaded before an external dependency [e.g., a plugin that registers a writer class]). When you callExcelWriter
, the metaclass swaps in the appropriate engine class automatically [thereby being backwards compatible].The great thing is that it should be really simple to add additional writers and relatively trivial to add additional readers by following the same format. (if, for example, we wanted to resurrect
openpyxl
for brave souls or something).Here's the 'interface':
write_cells(self, cells, sheet_name=None, startrow=0, startcol=0)
--> called to write additional DataFrames to disk
supported_extensions
(tuple of supported extensions), used to checkthat engine supports the given extension.
engine
- string that gives the engine name. Necessary toinstantiate class directly and bypass
ExcelWriterMeta
engine lookup.save(self)
--> called to save file to disk__init__(self, path, **kwargs)
--> always called with path as firstargument.
check_extension(cls, ext)
--> called to check that the engine supports agiven extension (and used to validate user selection of engine as option).
This supercedes
supported_exceptions
.You also need to register the class with
register_engine()
. If you don't subclassExcelWriter
,you will have to implement
check_extensions(ext)
yourself.There's a tiny bit of metaclass magic to make this backwards-compatible, but it's nothing particularly complicated.