-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: edge case when reading from postgresl with read_sql_query and datetime with tz and chunksize #11216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
111ff2a
to
705e7c5
Compare
705e7c5
to
1b34ae4
Compare
@jorisvandenbossche pls have a look. This is not a bug per-se, more of not wanting to actually coerce these ATM (as this is a new type and might be unexpected) as a user attested.
|
1b34ae4
to
f77ef3a
Compare
f77ef3a
to
3b12f16
Compare
@jreback thanks for working on this! xref #7364 were we also discussed some of these issues. The main hesitation I feel for this is that with or without chunksize can give different results (what it can do anyway, but still ..). So I was thinking, maybe we should coerce the datetimes to utc either way, also if it are datetime objects. It maybe also makes sense to return it as tz-aware data (but with utc timezone), since it is specified as aware in the database. I didn't yet look into your updated commit, but will come back to it this evening |
Note that what exactly is returned from postgres depends on the postgres server timezone settings (it stores it internally as UTC, and converts to the timezone of that setting on output) |
@jorisvandenbossche but that's exactly the point. I wan to coerce always to a naive tz (this is what this fixes). Its irrelevant whether you use chunksize, pass parse_dates, or use query. as I said I think that we can remove this at some point to pass thru a 'better' tz. |
@jreback I was reading through the gitter chat, and the So in this case, the starting point if we have a column 'timestamp with timezone' (for postgresql in this case), is the following:
When this is feeded into a pandas objects, previously this gave an object dtype preserving the above objects. Now, in master after the introduction of the datetime tz, this gives:
I agree that the above is not very useful, so coercing it to UTC is probably a good idea (= what you do in this PR, the only question is do we want naive or aware UTC).
And what I meant with chunksize giving a different result: if you chunk the above (like in the tests) in two sets of one row, you get two times a series with a uniform timezone, so they are coerced to datetime64. And when combined together with concat they are casted to naive:
|
We can't currently force postgres to actually create a |
3b12f16
to
ebb634c
Compare
ebb634c
to
bbbd5d7
Compare
…tetime with timezone types and a chunksize, pandas-dev#11216 - When we don't specifiy a chunksize we get an object dtype which is ok - We create a propery datetime64[ns, tz] type, but its a pytz.FixedOffset(....), which ATM is not really a useful/palatable type and is mostly confusing for now. In the future could attempt to coerce this to a nice tz, e.g. US/Eastern, not sure if this is possible - Note that this is w/o parse_dates specified
bbbd5d7
to
bd26dec
Compare
BUG: edge case when reading from postgresl with read_sql_query and datetime with tz and chunksize
which ATM is not really a useful/palatable type and is mostly confusing for now.
In the future could attempt to coerce this to a nice tz, e.g. US/Eastern, ,not sure if
this is possible.