-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
PeriodIndex test keys that aren't strings #10801
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -2959,6 +2959,29 @@ def test_nat(self): | |||
self.assertIs(DatetimeIndex([np.nan])[0], pd.NaT) | |||
|
|||
|
|||
def test_contains(self): | |||
|
|||
i = self.create_index() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of copy-pasting :)
You can make a new mixin-class like DatetimeLike
and just include it inDatetimeIndex/PeriodIndex
eb98943
to
6019a84
Compare
@jreback ready to go, cheers for the comments |
class DatetimeAbsoluteLike(DatetimeLike): | ||
|
||
def test_contains(self): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the issue number as a comment
separate the lines when u have s comment
6019a84
to
51b0a6d
Compare
So I changed the
>>> pd.Timestamp(period_index[1])
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-24-22f6327b2caf> in <module>()
----> 1 pd.Timestamp(period_index[1])
pandas/tslib.pyx in pandas.tslib.Timestamp.__new__ (pandas/tslib.c:7638)()
pandas/tslib.pyx in pandas.tslib.convert_to_tsobject (pandas/tslib.c:21357)()
ValueError: Cannot convert Period to Timestamp unambiguously. Use to_timestamp |
self.assertTrue(pd.Timestamp('2013-1-1') in i) | ||
|
||
# pandas Period | ||
self.assertTrue(pd.Period('2013-1-1') in i) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add with a test with a different frequency
b13bd9f
to
2fe6a87
Compare
@jreback updated, cheers! |
class DatetimeAbsoluteLike(DatetimeLike): | ||
|
||
def test_datetimeabsolute_contains(self): | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add the issue number as a comment (the original one)
2fe6a87
to
0c2e9ed
Compare
@jreback updated, cheers! |
@@ -1257,6 +1257,12 @@ def get_loc(self, key, method=None): | |||
'when key is a time object') | |||
return self.indexer_at_time(key) | |||
|
|||
# check if it's a Period and the frequencies are the same - otherwise a monthly period would match for | |||
# a daily timestamp at the beginning of the month. NB: 'B' and 'D' therefore won't match | |||
if isinstance(key, com.ABCPeriod) and key.freq == self.freq: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is ok IF the datetimeindex has a freq, but if its None
(or a multiple) then what?
In [5]: '2013-03-31' in date_range('2013-01',periods=10,freq='M')[2:4:2]
Out[5]: True
In [10]: '2013-03-31' in date_range('2013-01',periods=10,freq='M')[2:4:2].union([Timestamp('2013-01-02')])
Out[10]: True
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback: the additional check is if it's a Period
- the examples returning True
above aren't a change.
Regarding the desired behavior there - I don't have a confident view here - but I think those probably should return True
. That's what DatetimeIndex
is - just a series of stamps. I wouldn't have thought the __contains__
method should have any knowledge about the frequency of the data - it's just asking if it's a member of the sequence. Anything else and people should be using PeriodIndex
.
Where I could see this being an issue is where you have a DatetimeIndex
that has dates at the beginning of each month, and then you ask if a Period
is in that. The .to_timestamp()
method will convert to the end of month and __contains__
will return False
. I think that's an unlikely case given how difficult it seems to be to create such an index, but escalating to you regardless.
Am I missing something in the prior example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not what I meant. I think these should work IF its a Period
. My question is that the freq is irrelevant if either the start/end date is in the DatetimeIndex
? So give some examples with a freq of None
, but that the Period
can match
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jreback Do you think any of these should be True
? I had thought that they should all be False
- that unless there in an explicit periodicity / freq to a DatetimeIndex
, it couldn't match a Period
.
I'm pushing that in the last case, but it seems that if you start guessing there you enter a murky gray area, and a better way of operating is for the user to specify the freq
for explicitness
In [7]: pd.date_range('2013-01',periods=10,freq='M')[2:4:2]
Out[7]: DatetimeIndex(['2013-03-31'], dtype='datetime64[ns]', freq='2M', tz=None)
In [8]: pd.Period('2013-03-31','D') in _
Out[8]: False
In [9]: pd.date_range('2013-01', periods=10, freq='D')[::2]
Out[9]:
DatetimeIndex(['2013-01-01', '2013-01-03', '2013-01-05', '2013-01-07',
'2013-01-09'],
dtype='datetime64[ns]', freq='2D', tz=None)
In [10]: pd.Period('2013-01-01','D') in _
Out[10]: False
In [11]: pd.date_range('2013-01',periods=10,freq='M')
Out[11]:
DatetimeIndex(['2013-01-31', '2013-02-28', '2013-03-31', '2013-04-30',
'2013-05-31', '2013-06-30', '2013-07-31', '2013-08-31',
'2013-09-30', '2013-10-31'],
dtype='datetime64[ns]', freq='M', tz=None)
In [12]: pd.Period('2013-10-31','D') in _
Out[12]: False
In [17]: idx=pd.date_range('2013-01',periods=10)
In [18]: idx.freq=None
In [19]: idx
Out[19]:
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
'2013-01-05', '2013-01-06', '2013-01-07', '2013-01-08',
'2013-01-09', '2013-01-10'],
dtype='datetime64[ns]', freq=None, tz=None)
In [20]: pd.Period('2013-01-01', 'B') in idx
Out[20]: False
In [21]: pd.Period('2013-01-01', 'D') in idx
Out[21]: False
0c2e9ed
to
d5499a3
Compare
d5499a3
to
0412053
Compare
@MaximilianR In fact I would go further. I would say ALL We cannot compare a Timespan with a Point index; If we do also something like that (e.g. I don't think this worked before so this is not an API change then, right? |
I agree with @jreback. It doesn't make sense for a period to be contained in any timestamp, because periods represent spans of time and timestamps represent instants in time. |
OK, @jreback, @shoyer, and you're confident these shouldn't be In [5]: pd.date_range('2013-01-01', periods=10, freq='D')
Out[5]:
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
'2013-01-05', '2013-01-06', '2013-01-07', '2013-01-08',
'2013-01-09', '2013-01-10'],
dtype='datetime64[ns]', freq='D', tz=None)
In [6]: pd.Period('2013-01-02','D') in _
Out[6]: True In [9]: pd.date_range('2013-01', periods=10, freq='M')
Out[9]:
DatetimeIndex(['2013-01-31', '2013-02-28', '2013-03-31', '2013-04-30',
'2013-05-31', '2013-06-30', '2013-07-31', '2013-08-31',
'2013-09-30', '2013-10-31'],
dtype='datetime64[ns]', freq='M', tz=None)
In [10]: pd.Period('2013-02','M') in _
Out[10]: True In which case, if we're considering the In [2]: pd.period_range('2013-01',periods=10,freq='M')
Out[2]:
PeriodIndex(['2013-01', '2013-02', '2013-03', '2013-04', '2013-05', '2013-06',
'2013-07', '2013-08', '2013-09', '2013-10'],
dtype='int64', freq='M')
In [3]: pd.Timestamp('2013-01-03','D') in _
Out[3]: False |
@MaximilianR I think all of those 3 examples should be
|
OK, that makes sense @jreback. I think it follows that In [16]: pd.period_range('2013-01',periods=10,freq='D')
Out[16]:
PeriodIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
'2013-01-05', '2013-01-06', '2013-01-07', '2013-01-08',
'2013-01-09', '2013-01-10'],
dtype='int64', freq='D')
In [17]: pd.Timestamp('2013-01-03') in _
Out[17]: False While this should be In [19]: '2013-01-03' in _
Out[19]: True How about for Agreed on the additional method, |
However a string can be directly coerced to a IIRC the original issue was:
So we are back full-circle. If we don't allow And from your example above.
Say we allowed [4], and [5] to be
|
@jreback I agree overall. But there's one case here that remains clear in the specific; I'm not sure if it's too specific to be generalizable though. A So I would propose that the following is In [11]: pd.period_range('2001-01-01', freq='D', periods=20)
Out[11]:
PeriodIndex(['2001-01-01', '2001-01-02', '2001-01-03', '2001-01-04',
'2001-01-05', '2001-01-06', '2001-01-07', '2001-01-08',
'2001-01-09', '2001-01-10', '2001-01-11', '2001-01-12',
'2001-01-13', '2001-01-14', '2001-01-15', '2001-01-16',
'2001-01-17', '2001-01-18', '2001-01-19', '2001-01-20'],
dtype='int64', freq='D')
In [12]: datetime.date(2001,1,3) in _
Out[12]: True In our case, our dates come from our DB as If there were other comparable types, like a What are your thoughts? |
@MaximilianR I would be inclined to agree if not for this fact:
In Python, datetime is a subclass of date, not the other way around. This seems highly strange to me... |
@shoyer Yes that is odd. |
@MaximilianR why don't you just
to the keys. |
@jreback That's what we do - and this is one of those things we have that seems like it should be in the library for all... But your guys' call - if you think it doesn't make sense then we should close |
@MaximilianR I don't think we can make this change on ONLY @MaximilianR thanks for the discussion. |
OK, cheers @jreback |
Closes #10798
Not sure if the testing is done in a good way - I just copy & pasted code to test
PeriodIndex
&DatetimeIndex
(it wouldn't work forLikeDatetimeIndex
because ofTimeDeltaIndex
) - very open to feedback on better ways to do this.