-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Design pvlib.io #261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I often put metadata in a pandas Series rather than dict because this |
As @bmu once mentioned, naming our new module io is probably asking for trouble because it conflicts with the builtin python io module. I'm open to dataio, datareader, inputoutput, or other suggestions. So far, we've only discussed creating a new module. @mikofski already complained in #235 that our modules are too long, and adding a bunch of new code to a unified io module might make it worse. I wonder if we might be better off in the long run if we instead create a subpackage comprised of one module per data type and pull the functions into the package-level namespace. This is similar to the way that the pandas io package is structured. So, we'd have something like
This subpackage structure might make for an easy to use API and a set of easy to read/maintain modules. Probably a good idea to stick with the (data, metadata) pattern for new reader functions. I cannot think of a downside to returning metadata as a Series, but my instinct is to leave it as a dict (possibly an OrderedDict) and let users convert it to a Series if they want to. I seem to remember that the pandas developers recommend against using Series as a dictionary, though I can't find the reference and could be making that up. All that being said, I'm not opposed to the change. |
It’s a plan, although I prefer iotools to dataio. Function pattern to be the same as pandas, i.e., read_(format) and to_(format) as needed. |
I defer to people with more experience on the code organization, but I do suspect there will be a few potentially shared elements. What I could imagine--and would really like to have--is a kind of enhanced read_csv that can read extra information in the header and/or footer in addition to the regular columnar data. Many file formats could build on that just by setting parameters and translating column or key names. An option could be to return the header and/or footer as a text block to be parsed later. |
@adriesse do you have an example file & code? How would you implement with the file irradiation-0e2a19f2-abe7-11e5-a880-5254002dbd9b.csv In most cases, the input will requrie an adaption of the dateparser to the specific data format. This usually gets you going.
In the latter case, what do we with additional variables e.g. in spectral datasets or in BSRN files, for example? |
Hi all, sorry I haven't chimed in. I already have tmy2 and tmy3 readers. I'll send them when I get in. AFA the dataio module and namespace question.
|
Mark, I am confused about your tmy readers. Do they do the same thing as pvlib's existing tmy readers? Let's take things one step at a time and have Cliff just make us a If necessary, the subpackage could have a I want to keep the bar for contributing to pvlib as low as possible, so it's fine with me if different modules do different amounts of processing to their respective data formats. Some might be 1000 line monsters that do QA and return data that is in "pvlib conventions," while others might not do much more than call |
RE: TMY@wholmgren , sorry I misunderstood the intention of the proposed RE: reorganizationI agree with you that the bar for contributions should be low. Your ideas for RE:
|
@cwhanse / @mikofski many items of the last comments are already in #29 are they complementary? shall the other be closed? I wonder if there are any observations regarding the comments in #261 (comment) Otherwise, who will provide a prototype or will have a PR each one with the data set reader they have implemented? |
@cwhanse please have a look at the PR The data is here: |
@cwhanse & @adriesse my refractored PR is up: @wholmgren seems to lead towards keeping the iotools simple and as a customised wrapper for pandas.read_csv Shall iotools/util also provide functions to:
So what shall be the outputs?
I personally prefer to have the library do as much as possible:
I am looking forward to your feedback & will then modify my PRs according to what seems to be consensus. |
I understand why you would like to handle the metadata with this library.
I wonder if the problem is that not all file types include all metadata?
For example the PVsyst output file doesn't include any timezone or lat/long
information, because it just references a .SIT file that includes that
information. Not sure how to handle this other than to have the user define
a Location separately to use.
Jessica
…On Mon, Dec 5, 2016 at 8:57 AM, DaCoEx ***@***.***> wrote:
@cwhanse <https://github.com/cwhanse> & @adriesse
<https://github.com/adriesse> my refractored PR is up:
- maccrad #279 <#279>
- pvsyst #280 <#280>
@wholmgren <https://github.com/wholmgren> seems to lead towards keeping
the iotools simple and as a customised wrapper for pandas.read_csv
Shall iotools/util also provide functions to:
- read metadata, usually in the lines before column header:
coordinates, time reference?
- localise the UTC based timeseries as suggested: 4 days ago
<#270 (comment)>
So what shall be the outputs?
1. a dataframe with the raw data?
2. a dataframe with renamed columns to match pvlib convention?
3. a metadata dictionary?
4. a location with timezone, which involves usually retrieving this
info either from geonames or using an additional package.
5. a dataframe with renamed columns to match pvlib convention &
localised index?
I personally prefer to have the library do as much as possible:
- read data
- reformat data
- prepare a Location
- localise data
I am looking forward to your feedback & will then modify my PRs according
to what seems to be consensus.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#261 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AH66AcfaCRnSZDxoPpBb9fNqf6VM-NVaks5rFEKIgaJpZM4KweNy>
.
|
Yes. Timezone is derived either by web query to geonames or local libraries.
Correct. Actually, in the case of PVSyst, I would assume that a location exists because PVSyst hourly output usually does not include GHI. I would assume that a pvlib user uses this data to compare the result of both modelling environments. So my current proposal for addressing this in iotools:
So depening on the dataset, the user coud employ the functions in |
Yes, having tools available for the user to apply sounds like the right
approach.
…On Tue, Dec 6, 2016 at 2:05 AM, DaCoEx ***@***.***> wrote:
I wonder if the problem is that not all file types include all metadata?
Yes.
But most scientific providers at least include information on time
reference, i.e. UTC.
So with this and the coordinates, we could derive the timezine and
calculate the location for the typical input meto files.
Timezone is derived either by web query to geonames or local libraries.
For example the PVsyst output file doesn't include any timezone or
lat/long information, because it just >references a .SIT file that includes
that information. Not sure how to handle this other than to have the >user
define a Location separately to use.
Correct. Actually, in the case of PVSyst, I would assume that a location
exists because PVSyst hourly output usually does not include GHI. I would
assume that a pvlib user uses this data to compare the result of both
modelling environments.
So my current proposal would be address in iotools:
1. Standard output of a pvlib.iotools.FORMAT reader would be a tuple
with
- dataframe with raw data (for comparison & debugging)
- dataframe with renamed columns to match pvlib convention
- metadata dictionary as suggested by @wholmgren
<https://github.com/wholmgren> in iotools: reader for maccrad #279
<#279 (comment)>
1. pvlib.iotools.util to include some tools
- optional tool to retrieve the timezone
- optional tool to localise dataframe
- further functions, e.g. checker if the radiation starts before
sunrise to inform that there is a timezone issue
So depening on the dataset, the user coud employ the functions in util
for more convenience.
The capabilities of each format / reader could be documented in the
docstring.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#261 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AH66AdRkKTYFZ3SIqrAi3gUMElcw5CQXks5rFTOAgaJpZM4KweNy>
.
|
try tzwhere to determine timezone from coordinates: from tzwhere import tzwhere
from datetime import datetime
import pytz
# timezone lookup, force nearest tz for coords outside of polygons
WHERETZ = tzwhere.tzwhere(shapely=True, forceTZ=True)
# daylight savings time (DST) in northern hemisphere starts in March and ends
# in November and the opposite in southern hemisphere
JAN1 = datetime(2016, 1, 1) # date with standard time in northern hemisphere
JUN1 = datetime(2016, 6, 1) # date with standard time in southern hemisphere
# notes and links on Python datetime tzinfo dst:
# http://stackoverflow.com/questions/17173298/is-a-specific-timezone-using-dst-right-now
# http://stackoverflow.com/questions/19774709/use-python-to-find-out-if-a-timezone-currently-in-daylight-savings-time
# methods for determining DST vs. STD
# tz.localize(JAN1).timetuple().tm_isdst -> boolean, True if DST, False if STD
# tz.localize(JAN1).dst() -> timedelta, 0 if STD
# tz.dst(JAN1) -> timedelta, 0 if STD
def tz_latlon(lat, lon):
"""
Timezone from latitude and longitude.
:param lat: latitude [deg]
:type lat: float
:param lon: longitude [deg]
:type lon: float
:return: timezone
:rtype: float
"""
# get name of time zone using tzwhere, force to nearest tz
tz_name = WHERETZ.tzNameAt(lat, lon, forceTZ=True)
# check if coordinates are over international waters
if not tz_name or tz_name in ('uninhabited', 'unknown'):
# coordinates over international waters only depend on longitude
return lon // 15.0
else:
tz_info = pytz.timezone(tz_name) # get tzinfo
# get the daylight savings time timedelta
tz_date = JAN1 # standard time in northern hemisphere
if tz_info.dst(tz_date):
# if DST timedelta is not zero, then it must be southern hemisphere
tz_date = JUN1 # a standard time in southern hemisphere
tz_str = tz_info.localize(tz_date).strftime('%z') # output timezone from ISO
# convert ISO timezone string to float, including partial timezones
return float(tz_str[:3]) + float(tz_str[3:]) / 60.0, tz_info
if __name__ == "__main__":
# test tz_latlon at San Francisco (GMT+8.0)
gmt, tz_info = tz_latlon(37.7, -122.4)
assert gmt == -8.0
assert tz_info.zone == 'America/Los_Angeles'
assert tz_info.utcoffset(JAN1).total_seconds()/3600.0 == -8.0
assert tz_info.utcoffset(JUN1).total_seconds()/3600.0 == -7.0
# New_Delhi, India (GMT-5.5)
gmt, tz_info = tz_latlon(28.6, 77.1)
assert gmt == 5.5
assert tz_info.zone == 'Asia/Kolkata'
assert tz_info.utcoffset(JAN1).total_seconds()/3600.0 == 5.5
assert tz_info.utcoffset(JUN1).total_seconds()/3600.0 == 5.5
# also works with integers (EG: Santa Cruz, CA)
gmt, tz_info = tz_latlon(37, -122)
assert gmt == -8.0
assert tz_info.zone == 'America/Los_Angeles' You can call this like in the script:
Also see:
Note: |
to the reverse geopy with ESRI ArcGIS works great. Registration with ESRI developer or ArcGIS public account is free. Mapquest also has free developer accounts and can be used with geopy. There are also several other geocoding services like Google and others. |
@jforbess so I conclude you agree to the structure described above in issuecomment-265110051? @mikofski Thanks for the code. But @wholmgren did not like the addition of an external dependency: This is why I propose above to make it optional. Well, if there are no further ideas or suggestions, I will revise the PR according to the discussion. |
@dacoex, yes, and I see that @wholmgren didn't like an iotools/util
function to localize tz, instead recommending using the pandas function,
which I understand, but part of me thinks that having a function in the api
will help users be consistent in their usage. This may be a philosophical
argument?
…On Tue, Dec 6, 2016 at 2:35 PM, DaCoEx ***@***.***> wrote:
@jforbess <https://github.com/jforbess> so I conclude you agree to the
structure described above in issuecomment-265110051
<#261 (comment)>?
@mikofski <https://github.com/mikofski> Thanks for the code.
I used a simpler library: iotools: reader for maccrad by dacoex · Pull
Request #279 · pvlib/pvlib-python
<https://github.com/pvlib/pvlib-python/pull/279/files/112b7bddc8e9ab554d01f8d79178d56d975e11b4#diff-88fd35c9724ff076feeb9cae8cebeeb9R10>
But @wholmgren <https://github.com/wholmgren> did not like the addition
of an external dependency:
issuecomment-264486119
<#274 (comment)>
This is why I propose above to make it optional.
Well, if there are no further ideas or suggestions, I will revise the PR
according to the discussion.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#261 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AH66AWrm97XkPsAeZ6GTW1woo3H2KgDIks5rFeMrgaJpZM4KweNy>
.
|
@dacoex I like timezonefinder please use pypi reference not github. I disagree with @wholmgren #274, IMO it's fine to have dependencies, as long as they are mature, well documented and widely used, which timezonefinder is, since it's on pypi, v1.5.7 has over 2000 downloads, it's based on tzwhere, the previous best choice, it's been recently updated and had a steady record of releases, etc. IMHO using open source is one of the reasons to use Python. PyPI is one of the reasons Python is so powerful. Adding dependencies instead of rolling your own makes your code stronger. Perhaps you can selectively let users use (or not) the import by putting it in a Although, if possible perhaps it's good to try to make the dependency arbitrary, so you could switch it later. For example
|
Thanks to all for their feedback. I will propose an improved version and also add solargis format. |
Here is another thought: There are actually two kinds of meta data in your maccrad file: per-file metadata and per-column meta data (in this case descriptions and units). I have seen this in some other file formats as well, and it could be something worth reading and returning separately. |
I don't know what you mean by raw data in this context. The columns are unchanged?
Probably, though would be great if this was an option with a rename=True default. Not sure if we should also change the existing tmy readers to be more consistent.
readtmy2 and readtmy3 return a tuple of metadata, data. Seems reasonable to me. Has anyone been unhappy with this in the past or wished that those readers did more?
I strongly oppose this. I think it is essential that pvlib's class layer lives above pvlib's functional layer, though, I will take all the blame if the distinction or the reasons for it are unclear. Returning a Location object would ruin the distinction. As discussed elsewhere, you could instead add a
I think that an IO reader should return a dataframe that is localized to the format of the timestamps of the file and/or the timezone specific metadata of the file. That is: if the timestamps say MST or -0700 then the dataframe should be localized accordingly, or, if the timestamps are ambiguous but the metadata says e.g. tz=MST then the dataframe should be localized. pvlib should not guess at the localization of the data. Fine with me if you want to add a
I can see an argument for
I'm confused about the arguments surrounding creating our own localize/convert functions. pandas has a
Sounds complicated to do reliably and probably unnecessary. |
OK.
Yes, just the result of
Where would that be placed? I still prefer to include a few shortcut functions in
Even the reader can be done by the user. But isn't this here also to make life more simple? I would prefer to let the software do whatever can be done automatically. The result is to be verified anyway.
Well, I took mine from pandas doc. but maybe they'd changed that api again? @adriesse I suggest to add the column metadata to the docstring as in Summarising, the next revision will be structured more closely like the existing tmy readers. As a learning, we may add a specification and some instructions on how to add a new reader to the docs. |
On Wed, Dec 7, 2016 at 8:31 AM, Will Holmgren <[email protected]>
wrote
further functions, e.g. checker if the radiation starts before sunrise to
inform that there is a timezone issue
Sounds complicated to do reliably and probably unnecessary.
This may be unnecessary for standard files, but I have been wanting it for
all of the data that I get from SCADA systems. Just found a system that
didn't apply Daylight Savings on the actual day, but a week early. But only
in the spring. And only the first two years. Otherwise, it had the right
alignment with Daylight Savings.
I spent a lot of time wrapping my head around the right way to handle
timezones because of daylight savings and the fact that my data sometimes
comes with timestamps from a timezone that it is not located in. (A client
pulls data in US/Eastern for a plant that is in California. The SCADA
thinks it is doing the right thing, maybe, because it is relative to where
the data is being queried, but it is not the right thing at all.)
But I admit, this shouldn't be an issue for any standard file that gets a
standard reader. But it is critical if anyone tries to generalize iotools
for a somewhat standard csv.
… —
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#261 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AH66ATjzki4jLt5hI3u28wrM7Nxkl5plks5rFt9ngaJpZM4KweNy>
.
|
yes, @jforbess is right. this also applies to most datalogger files... |
This issue seems a bit outdated, but anyway, I have just written a short function to read epw weather files from EnergyPlus. Note that the output is not harmonized with the output of the TMY functions.
|
@squoilin - Silvain, this EPW reader function is quite useful. I'm incorporating it into my development of bifacialvf and bifacial_radiance at github.com/nrel/bifacialvf and github.com/nrel/bifacial_radiance. If/ when this gets pulled into the pvlib distribution, I'll switch over to the 'official' version. |
I'm assembling code to create pvlib.io.
What functionality is desired? Specifically, from which formats do we want to read, and to which formats do we want to write?
I'm going to focus first on reading a few data formats. I plan to incorporate the current pvlib.tmy which reads tmy2 and tmy3 files, returning a tuple (data, metadata) where data is a pandas DataFrame and metadata is a dict. Any comments on this design? Personally I like it.
I have code to read surfrad files so I'll include that.
The text was updated successfully, but these errors were encountered: