Skip to content

PeriodIndex test keys that aren't strings #10801

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.17.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -629,6 +629,7 @@ Bug Fixes
- Bug that caused segfault when resampling an empty Series (:issue:`10228`)
- Bug in ``DatetimeIndex`` and ``PeriodIndex.value_counts`` resets name from its result, but retains in result's ``Index``. (:issue:`10150`)
- Bug in ``pd.eval`` using ``numexpr`` engine coerces 1 element numpy array to scalar (:issue:`10546`)
- Bug in ``PeriodIndex.__contains__`` & ``DatetimeIndex.__contains__`` that always returned False for each other's objects (:issue:`10798`)
- Bug in ``pd.concat`` with ``axis=0`` when column is of dtype ``category`` (:issue:`10177`)
- Bug in ``read_msgpack`` where input type is not always checked (:issue:`10369`, :issue:`10630`)
- Bug in ``pd.read_csv`` with kwargs ``index_col=False``, ``index_col=['a', 'b']`` or ``dtype``
Expand Down
28 changes: 26 additions & 2 deletions pandas/tests/test_index.py
Original file line number Diff line number Diff line change
Expand Up @@ -2817,7 +2817,31 @@ def test_view(self):
result = self._holder(i)
tm.assert_index_equal(result, i)

class TestDatetimeIndex(DatetimeLike, tm.TestCase):
class DatetimeAbsoluteLike(DatetimeLike):

# GH10801
def test_datetimeabsolute_contains(self):

i = self.create_index()

self.assertTrue(i[2] in i)
self.assertFalse('2012' in i)

# python datetime objects
self.assertTrue(datetime(2013,1,1) in i)

# strings
self.assertTrue('2013-1-1' in i)

# Timestamp # GH10801
self.assertTrue(pd.Timestamp('2013-1-1') in i)

# pandas Period
self.assertTrue(pd.Period('2013-1-1', 'D') in i)
self.assertFalse(pd.Period('2013-1-1', 'M') in i)


class TestDatetimeIndex(DatetimeAbsoluteLike, tm.TestCase):
_holder = DatetimeIndex
_multiprocess_can_split_ = True

Expand Down Expand Up @@ -2964,7 +2988,7 @@ def test_nat(self):
self.assertIs(DatetimeIndex([np.nan])[0], pd.NaT)


class TestPeriodIndex(DatetimeLike, tm.TestCase):
class TestPeriodIndex(DatetimeAbsoluteLike, tm.TestCase):
_holder = PeriodIndex
_multiprocess_can_split_ = True

Expand Down
6 changes: 6 additions & 0 deletions pandas/tseries/index.py
Original file line number Diff line number Diff line change
Expand Up @@ -1289,6 +1289,12 @@ def get_loc(self, key, method=None):
'when key is a time object')
return self.indexer_at_time(key)

# check if it's a Period and the frequencies are the same - otherwise a monthly period would match for
# a daily timestamp at the beginning of the month. NB: 'B' and 'D' therefore won't match
if isinstance(key, com.ABCPeriod) and key.freq == self.freq:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ok IF the datetimeindex has a freq, but if its None (or a multiple) then what?

In [5]: '2013-03-31' in date_range('2013-01',periods=10,freq='M')[2:4:2]
Out[5]: True

In [10]: '2013-03-31' in date_range('2013-01',periods=10,freq='M')[2:4:2].union([Timestamp('2013-01-02')])
Out[10]: True

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback: the additional check is if it's a Period - the examples returning True above aren't a change.

Regarding the desired behavior there - I don't have a confident view here - but I think those probably should return True. That's what DatetimeIndex is - just a series of stamps. I wouldn't have thought the __contains__ method should have any knowledge about the frequency of the data - it's just asking if it's a member of the sequence. Anything else and people should be using PeriodIndex.

Where I could see this being an issue is where you have a DatetimeIndex that has dates at the beginning of each month, and then you ask if a Period is in that. The .to_timestamp() method will convert to the end of month and __contains__ will return False. I think that's an unlikely case given how difficult it seems to be to create such an index, but escalating to you regardless.

Am I missing something in the prior example?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not what I meant. I think these should work IF its a Period. My question is that the freq is irrelevant if either the start/end date is in the DatetimeIndex? So give some examples with a freq of None , but that the Period can match

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jreback Do you think any of these should be True? I had thought that they should all be False - that unless there in an explicit periodicity / freq to a DatetimeIndex, it couldn't match a Period.
I'm pushing that in the last case, but it seems that if you start guessing there you enter a murky gray area, and a better way of operating is for the user to specify the freq for explicitness

In [7]: pd.date_range('2013-01',periods=10,freq='M')[2:4:2]
Out[7]: DatetimeIndex(['2013-03-31'], dtype='datetime64[ns]', freq='2M', tz=None)

In [8]: pd.Period('2013-03-31','D') in _
Out[8]: False
In [9]: pd.date_range('2013-01', periods=10, freq='D')[::2]
Out[9]: 
DatetimeIndex(['2013-01-01', '2013-01-03', '2013-01-05', '2013-01-07',
               '2013-01-09'],
              dtype='datetime64[ns]', freq='2D', tz=None)

In [10]: pd.Period('2013-01-01','D') in _
Out[10]: False
In [11]: pd.date_range('2013-01',periods=10,freq='M')
Out[11]: 
DatetimeIndex(['2013-01-31', '2013-02-28', '2013-03-31', '2013-04-30',
               '2013-05-31', '2013-06-30', '2013-07-31', '2013-08-31',
               '2013-09-30', '2013-10-31'],
              dtype='datetime64[ns]', freq='M', tz=None)

In [12]: pd.Period('2013-10-31','D') in _
Out[12]: False
In [17]: idx=pd.date_range('2013-01',periods=10)

In [18]: idx.freq=None

In [19]: idx
Out[19]: 
DatetimeIndex(['2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04',
               '2013-01-05', '2013-01-06', '2013-01-07', '2013-01-08',
               '2013-01-09', '2013-01-10'],
              dtype='datetime64[ns]', freq=None, tz=None)

In [20]: pd.Period('2013-01-01', 'B') in idx
Out[20]: False

In [21]: pd.Period('2013-01-01', 'D') in idx
Out[21]: False

key = key.to_timestamp()
return Index.get_loc(self, key, method=method)

try:
return Index.get_loc(self, key, method=method)
except (KeyError, ValueError, TypeError):
Expand Down
14 changes: 7 additions & 7 deletions pandas/tseries/period.py
Original file line number Diff line number Diff line change
Expand Up @@ -296,14 +296,14 @@ def _na_value(self):
return self._box_func(tslib.iNaT)

def __contains__(self, key):
# if key isn't a Period of the same freq, rely on `get_loc` for the coercion.
if not isinstance(key, Period) or key.freq != self.freq:
if isinstance(key, compat.string_types):
try:
self.get_loc(key)
return True
except Exception:
return False
return False
try:
self.get_loc(key)
return True
except Exception:
return False
# If it is a Period of the same freq, go straight to the _engine
return key.ordinal in self._engine

@property
Expand Down