Skip to content

API: DatetimeIndex creation with mixed tz timestamps #11488

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sinhrks opened this issue Oct 31, 2015 · 6 comments
Closed

API: DatetimeIndex creation with mixed tz timestamps #11488

sinhrks opened this issue Oct 31, 2015 · 6 comments
Labels
API Design Timezones Timezone data dtype
Milestone

Comments

@sinhrks
Copy link
Member

sinhrks commented Oct 31, 2015

Related to #11456. Currently, DatetimeIndex handles mixed tz values like below. This behavior sometimes triggers coercion between tz-aware/tz-naive.

pd.Index([pd.Timestamp('2011-01-01'), pd.Timestamp('2011-01-02', tz='US/Eastern')])
# DatetimeIndex(['2010-12-31 19:00:00-05:00', '2011-01-02 00:00:00-05:00'], dtype='datetime64[ns, US/Eastern]', freq=None)
# -> should be normal Index with object dtype?

pd.Index([pd.Timestamp('2011-01-01', tz='Asia/Tokyo'), pd.Timestamp('2011-01-02', tz='US/Eastern')])
# DatetimeIndex(['2010-12-31 15:00:00', '2011-01-02 05:00:00'], dtype='datetime64[ns]', freq=None)
# -> should be normal Index with object dtype?

 pd.Index([pd.Timestamp('2011-01-01'), pd.Timestamp('2011-01-02', tz='US/Eastern')], tz='Asia/Tokyo')
# DatetimeIndex(['2011-01-01 09:00:00+09:00', '2011-01-02 14:00:00+09:00'], dtype='datetime64[ns, Asia/Tokyo]', freq=None)
# -> OK, localized  to explicitly passed tz ('Asia/Tokyo')
@sinhrks sinhrks added API Design Timezones Timezone data dtype labels Oct 31, 2015
@jreback
Copy link
Contributor

jreback commented Oct 31, 2015

yeh I think first 2 should be Index

@sinhrks sinhrks added this to the 0.18.0 milestone Oct 31, 2015
@sinhrks
Copy link
Member Author

sinhrks commented Oct 31, 2015

OK, I set milestone as 0.18 as it causes breaking changes.

How about following rules? The changes are marked as CHANGED.

Creation with Index

User wants Index, but doesn't specify its type. Thus, results may not be DatetimeIndex.

  • When all input have the same timezone or no timezone, result will be DatetimeIndex (no change)

    pd.Index([pd.Timestamp('2011-01-01'), pd.Timestamp('2011-01-02')])
    # DatetimeIndex(['2011-01-01', '2011-01-02'], dtype='datetime64[ns]', freq=None)
    
    pd.Index([pd.Timestamp('2011-01-01', tz='Asia/Tokyo'), pd.Timestamp('2011-01-02', tz='Asia/Tokyo')])
    # DatetimeIndex(['2011-01-01 00:00:00+09:00', '2011-01-02 00:00:00+09:00'], dtype='datetime64[ns, Asia/Tokyo]', freq=None)
    
  • When input have different timezones (cannot be represented by DatetimeIndex without tz conversion), result will be Index (dtype=object) (CHANGED)

    Before:

    pd.Index([pd.Timestamp('2011-01-01'), pd.Timestamp('2011-01-02', tz='US/Eastern')])
    # DatetimeIndex(['2010-12-31 19:00:00-05:00', '2011-01-02 00:00:00-05:00'], dtype='datetime64[ns, US/Eastern]', freq=None)
    
    pd.Index([pd.Timestamp('2011-01-01', tz='Asia/Tokyo'), pd.Timestamp('2011-01-02', tz='US/Eastern')])
    # DatetimeIndex(['2010-12-31 15:00:00', '2011-01-02 05:00:00'], dtype='datetime64[ns]', freq=None)
    

    After:

    pd.Index([pd.Timestamp('2011-01-01'), pd.Timestamp('2011-01-02', tz='US/Eastern')])
    # Index([2011-01-01 00:00:00, 2011-01-02 00:00:00-05:00], dtype='object')
    
    pd.Index([pd.Timestamp('2011-01-01', tz='Asia/Tokyo'), pd.Timestamp('2011-01-02', tz='US/Eastern')])
    # Index([2011-01-01 00:00:00+09:00, 2011-01-02 00:00:00-05:00], dtype='object')
    
  • When user passes tz kw to Index, user wants to use the timezone. Convert/localize to passed tz, and the result will be DatetimeIndex (because it can be represented by dti) (no change)

    pd.Index([pd.Timestamp('2011-01-01'), pd.Timestamp('2011-01-02', tz='US/Eastern')], tz='Asia/Tokyo')
    # DatetimeIndex(['2011-01-01 09:00:00+09:00', '2011-01-02 14:00:00+09:00'], dtype='datetime64[ns, Asia/Tokyo]', freq=None)
    

Creation with DatetimeIndex

User wants DatetimeIndex. Raise if there is timezone mismatch.

  • When all input have the same timezone or no timezone, result will be DatetimeIndex (no change)

    pd.DatetimeIndex([pd.Timestamp('2011-01-01'), pd.Timestamp('2011-01-02')])
    # DatetimeIndex(['2011-01-01', '2011-01-02'], dtype='datetime64[ns]', freq=None)
    
    pd.DatetimeIndex([pd.Timestamp('2011-01-01', tz='Asia/Tokyo'), pd.Timestamp('2011-01-02', tz='Asia/Tokyo')])
    # DatetimeIndex(['2011-01-01 00:00:00+09:00', '2011-01-02 00:00:00+09:00'], dtype='datetime64[ns, Asia/Tokyo]', freq=None)
    
  • When input have different timezones, localize tz-naive but not convert tz-aware. (no change)

    pd.DatetimeIndex([pd.Timestamp('2011-01-01'), pd.Timestamp('2011-01-02', tz='US/Eastern')])
    # DatetimeIndex(['2010-12-31 19:00:00-05:00', '2011-01-02 00:00:00-05:00'], dtype='datetime64[ns, US/Eastern]', freq=None)
    
    pd.DatetimeIndex([pd.Timestamp('2011-01-01', tz='Asia/Tokyo'), pd.Timestamp('2011-01-02', tz='US/Eastern')])
    # ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True
    
  • When user passes tz kw to DatetimeIndex, user wants to use the timezone. Not convert tz-aware implicitly (no change)

    pd.DatetimeIndex([pd.Timestamp('2011-01-01'), pd.Timestamp('2011-01-02', tz='US/Eastern')], tz='Asia/Tokyo')
    # TypeError: Already tz-aware, use tz_convert to convert.
    
    pd.DatetimeIndex([pd.Timestamp('2011-01-01', tz='Asia/Tokyo'), pd.Timestamp('2011-01-02', tz='US/Eastern')], tz='Asia/Tikyo')
    # ValueError: Tz-aware datetime.datetime cannot be converted to datetime64 unless utc=True
    

@jreback
Copy link
Contributor

jreback commented Oct 31, 2015

0.18.0 is fine, though this is much more of a bug-fix than an API change. I think it is simply wrong that we are forcing conversions now. (in your changed section).

@sinhrks
Copy link
Member Author

sinhrks commented Oct 31, 2015

Yes, but I assume not few methods depends on the CHANGED logic to output the DTI.

I'm not sure how many ops relies on yet... Let me work once, and reconsider the milestone if it only affects to a narrow range.

@jreback
Copy link
Contributor

jreback commented Oct 31, 2015

sounds good!

@jreback
Copy link
Contributor

jreback commented Dec 10, 2015

closed by #11696

@jreback jreback closed this as completed Dec 10, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
API Design Timezones Timezone data dtype
Projects
None yet
Development

No branches or pull requests

2 participants