Skip to content

CLN: Make ExcelWriter more pluggable #4750

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Sep 13, 2013

Conversation

jtratner
Copy link
Contributor

@jtratner jtratner commented Sep 5, 2013

FIxes #4745 and makes it easier to swap in other ExcelWriters.

Basically, you can subclass ExcelWriter (or not), register the engine with register_engine, and then either pass it as an engine keyword to read_excel or set it as the default writer using the config options (i.e., io.excel.xlsx.writer or io.excel.xls.writer). Engine names are validated by checking that they are already defined and that they are able to read files of that type (that said, if you can set up external config files to be parsed by pandas, this validation should be removed, because it won't work if pandas is loaded before an external dependency [e.g., a plugin that registers a writer class]). When you call ExcelWriter, the metaclass swaps in the appropriate engine class automatically [thereby being backwards compatible].

The great thing is that it should be really simple to add additional writers and relatively trivial to add additional readers by following the same format. (if, for example, we wanted to resurrect openpyxl for brave souls or something).

Here's the 'interface':

  • Mandatory (but not checked at run time):
    • write_cells(self, cells, sheet_name=None, startrow=0, startcol=0)
      --> called to write additional DataFrames to disk
    • supported_extensions (tuple of supported extensions), used to check
      that engine supports the given extension.
    • engine - string that gives the engine name. Necessary to
      instantiate class directly and bypass ExcelWriterMeta engine lookup.
    • save(self) --> called to save file to disk
  • Optional:
    • __init__(self, path, **kwargs) --> always called with path as first
      argument.
    • check_extension(cls, ext) --> called to check that the engine supports a
      given extension (and used to validate user selection of engine as option).
      This supercedes supported_exceptions.

You also need to register the class with register_engine(). If you don't subclass ExcelWriter,
you will have to implement check_extensions(ext) yourself.

There's a tiny bit of metaclass magic to make this backwards-compatible, but it's nothing particularly complicated.

@jtratner
Copy link
Contributor Author

jtratner commented Sep 5, 2013

@jreback @cpcloud @hayd this look okay to you? It's just a prep for adding xlsxwriter as a writer to pandas.

@jtratner
Copy link
Contributor Author

jtratner commented Sep 6, 2013

@jreback et al - does the API seem okay?

@jmcnamara - since you actually want to add a new writer, does this seem clear to you? It's similar to what was there before with a bit cleaner separation of concerns.

@jreback
Copy link
Contributor

jreback commented Sep 6, 2013

@jtratner maybe put the methods you are supposed to override in the base class as NotImplemented?

@jtratner
Copy link
Contributor Author

jtratner commented Sep 6, 2013

I could also just make an abstract base class... would that be clearer?

Otherwise do you mean func = NotImplemented? Or raise NotImplemented?

@jreback
Copy link
Contributor

jreback commented Sep 6, 2013

either work fine....its the person in the future who adds a new writer/reader and forgets to define something...


register_engine('xlwt', _XlwtWriter)


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor point: can everything below this line be moved to core/config_init.py? otherwise 👌

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes - makes much more sense to do that :) - btw - does pandas support reading a config file? (if so, I'd need to just validate with str instead, since you might set an engine that doesn't exist at the moment...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

take that back...I'll have to just validate with string, because don't want to have to be sure that config is instantiated after io/excel

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no config files AFAIK. could be a good first pr maybe?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also can u add a small test to test_config to make sure ur validator is doing what u think it is? thanks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cpcloud I got rid of the validator(too dependent on timing of setting option), but added a test case for ExcelWriter('badfilename')

@jmcnamara
Copy link
Contributor

@jtratner

@jmcnamara - since you actually want to add a new writer, does this seem clear to you? It's similar to what was there before with a bit cleaner separation of concerns.

It looks fine. I'll subclass it and let you know how I get on.

Should I subclass it in another file or in excel.py?

About this:

# declare external properties you can count on
book = None
curr_sheet = None
path = None

That is good. I was going to ask for some interface to the workbook and worksheet since that could be useful to the end user for adding additional formatting or invoking other methods on them.

How about calling them workbook and worksheet (or current_worksheet) which would be more consistent with the existing Excel writers.

@jtratner
Copy link
Contributor Author

jtratner commented Sep 6, 2013

@jmcnamara huh? You would just put whatever setup you need in your __init__ method. Basically, as long as the engine takes a path, responds to write_cells appropriately and saves to disk on save(), you can do whatever you want in terms of saving workbook/worksheet, etc.

@jmcnamara
Copy link
Contributor

You would just put whatever setup you need in your init method.

Sure. What I meant was it that it would be useful (from my point of view) to have interfaces in the base class that would return workbook and worksheet objects that could be used to access APIs that aren't exposed by ExcelWriter.

For example, if I wanted to set a column width in an xlwt worksheet. Something like this:

writer = ExcelWriter('file.xls')
df.to_excel(writer, 'Sheet1')

# Access the underlying worksheet.
worksheet = writer.worksheet()
worksheet.col(0).width = 256 * 12

writer.save()

@jtratner
Copy link
Contributor Author

jtratner commented Sep 6, 2013

You can already do this, right?

writer = ExcelWriter('file.xls')
df.to_excel(writer, 'Sheet1')

# Access the underlying worksheet.
worksheet = writer.sheets["Sheet1"]
worksheet.col(0).width = 256 * 12

writer.save()

(on current master)

@jmcnamara
Copy link
Contributor

worksheet = writer.sheets["Sheet1"]

Ok. That's good. Ignore above.

@jtratner
Copy link
Contributor Author

jtratner commented Sep 9, 2013

okay, I added some additional documentation for this and testing for registering writers, setting engines, etc. If no objections, going to merge at some point soon...

@jmcnamara
Copy link
Contributor

@jtratner For what it is worth this looks good to me. If you can get it merged I redo my PR for xlsxwriter support.

@jtratner
Copy link
Contributor Author

Yeah, I think it's close. Just want to look it over one more time. It
should be relatively straightforward for you, right?

@jmcnamara
Copy link
Contributor

It should be relatively straightforward for you, right?

It looks good from my end. I already tried it out on a clone from your fork.

When I add the xlsxwriter subclass do you want it in excel.py or in a separate file?

@jtratner
Copy link
Contributor Author

I'd put it in excel.py.

@jtratner
Copy link
Contributor Author

I just rebased this. I think it's in a pretty good state now. I could change ExcelReader slightly to do the same thing (for completeness sake) - is that worth doing right now even though we don't have any other readers in the pipeline at the moment?

The only other item is whether the formatting should be done by to_excel or by the individual Writer. At least for now, probably fine to leave it, but I could see getting speed improvements by leaving the default formatting as a classmethod on ExcelWriter that subclasses can overwrite...

@jtratner
Copy link
Contributor Author

I guess I will probably merge this tonight. So if you are interested/have time, please take a look.

``xlwt`` for ``.xls`` files. It's very simple to add new Excel writer engines
to ``pandas``. If you have multiple engines installed, you can choose the
engine to use by default via the options ``io.excel.xlsx.writer`` and
``io.excel.xls.writer``. (side note: if you want to add an Excel writer,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would take out this side note comment, that's for dev, not users; also Ithink we usually put the versionadded at the top of the new section?

@jtratner
Copy link
Contributor Author

okay, I'm probably going to merge this today - nothing much changed from past few days.

Taken from a new addition to the six library (again, license is in the
LICENSE directory).
Make ExcelWriter an ABC and add ExcelWriter config to core/config_init

ENH: Allow Panel.to_excel to pass keyword arguments
jtratner added a commit that referenced this pull request Sep 13, 2013
CLN: Make ExcelWriter more pluggable
@jtratner jtratner merged commit d20d9bf into pandas-dev:master Sep 13, 2013
@jtratner
Copy link
Contributor Author

@jmcnamara okay, this has been merged in...go ahead :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Make ExcelWriter more easily pluggable
4 participants