Skip to content

Commit 22b1f4e

Browse files
committed
API: warning to raise KeyError in the future if not all elements of a list are selected via .loc
closes #15747
1 parent 6993c1b commit 22b1f4e

File tree

14 files changed

+316
-56
lines changed

14 files changed

+316
-56
lines changed

doc/source/indexing.rst

+88
Original file line numberDiff line numberDiff line change
@@ -667,6 +667,94 @@ For getting *multiple* indexers, using ``.get_indexer``
667667
dfd.iloc[[0, 2], dfd.columns.get_indexer(['A', 'B'])]
668668
669669
670+
.. _indexing.deprecate_loc_reindex_listlike:
671+
672+
Using loc with missing keys in a list is Deprecated
673+
---------------------------------------------------
674+
675+
.. warning::
676+
677+
Starting in 0.21.0, using ``.loc`` with a list-like containing missing key, is deprecated, in favor of ``.reindex``.
678+
679+
In prior versions, using ``.loc[list-of-keys]`` would work as long as *at least 1* of the keys was found (otherwise it
680+
would raise a ``KeyError``). This behavior is deprecated and will show a warning message pointing to this section. The
681+
recommeded alternative is to use ``.reindex()``.
682+
683+
For example.
684+
685+
.. ipython:: python
686+
687+
s = Series([1, 2, 3])
688+
s
689+
690+
Selection with all keys found is unchanged.
691+
692+
.. ipython:: python
693+
694+
s.loc[[1, 2]]
695+
696+
Previous Behavior
697+
698+
.. code-block:: ipython
699+
700+
701+
In [4]: s.loc[[1, 2, 3]]
702+
Out[4]:
703+
1 2.0
704+
2 3.0
705+
3 NaN
706+
dtype: float64
707+
708+
709+
Current Behavior
710+
711+
In [4]: s.loc[[1, 2, 3]]
712+
Passing list-likes to .loc with any non-matching elements will raise
713+
KeyError in the future, you can use .reindex() as an alternative.
714+
715+
See the documentation here:
716+
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
717+
718+
Out[4]:
719+
1 2.0
720+
2 3.0
721+
3 NaN
722+
dtype: float64
723+
724+
725+
Reindexing
726+
~~~~~~~~~~
727+
728+
The idiomatic way to achieve selecting potentially not-found elmenents is via ``.reindex()``. See also the section on :ref:`reindexing <basics.reindexing>`.
729+
730+
.. ipython:: python
731+
732+
s.reindex([1, 2, 3])
733+
734+
Alternatively, if you want to select only *valid* keys, the following is idiomatic; furthermore this is more efficient, and is guaranteed to preserve the dtype of the selection.
735+
736+
.. ipython:: python
737+
738+
keys = [1, 2, 3]
739+
s.loc[s.index & keys]
740+
741+
Having a duplicated index will raise for a ``.reindex()``:
742+
743+
.. ipython:: python
744+
745+
s = pd.Series(np.arange(4), index=['a', 'a', 'b', 'c'])
746+
747+
.. code-block:: python
748+
749+
In [17]: s.reindex(['c', 'd'])
750+
ValueError: cannot reindex from a duplicate axis
751+
752+
The idiomatic expression again allows this operation to proceed
753+
754+
.. ipython:: python
755+
756+
s.loc[s.index & ['c', 'd']]
757+
670758
.. _indexing.basics.partial_setting:
671759

672760
Selecting Random Samples

doc/source/whatsnew/v0.21.0.txt

+56
Original file line numberDiff line numberDiff line change
@@ -158,6 +158,62 @@ We have updated our minimum supported versions of dependencies (:issue:`15206`,
158158
| Bottleneck | 1.0.0 | |
159159
+--------------+-----------------+----------+
160160

161+
.. _whatsnew_0210.api_breaking.loc:
162+
163+
.loc with a list-like containing messing keys is Deprecated
164+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
165+
166+
Selecting at least 1 valid key with a list-like indexer would succeed and return ``NaN`` for non-found elements.
167+
This is exactly the function of ``.reindex()``. This will now show a ``FutureWarning`` message; in the future this will raise ``KeyError`` (:issue:`15747`)
168+
See the :ref:`deprecation docs <indexing.deprecate_loc_reindex_listlike>`.
169+
170+
171+
.. ipython:: python
172+
173+
s = Series([1, 2, 3])
174+
s
175+
176+
Previous Behavior
177+
178+
.. code-block:: ipython
179+
180+
181+
In [4]: s.loc[[1, 2, 3]]
182+
Out[4]:
183+
1 2.0
184+
2 3.0
185+
3 NaN
186+
dtype: float64
187+
188+
189+
Current Behavior
190+
191+
In [4]: s.loc[[1, 2, 3]]
192+
Passing list-likes to .loc with any non-matching elements will raise
193+
KeyError in the future, you can use .reindex() as an alternative.
194+
195+
See the documentation here:
196+
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike
197+
198+
Out[4]:
199+
1 2.0
200+
2 3.0
201+
3 NaN
202+
dtype: float64
203+
204+
The idiomatic way to achieve selecting potentially not-found elmenents is via ``.reindex()``
205+
206+
.. ipython:: python
207+
208+
s.reindex([1, 2, 3])
209+
210+
Selection with all keys found is unchanged.
211+
212+
.. ipython:: python
213+
214+
s.loc[[1, 2]]
215+
216+
161217
.. _whatsnew_0210.api_breaking.pandas_eval:
162218

163219
Improved error handling during item assignment in pd.eval

pandas/core/indexing.py

+28-5
Original file line numberDiff line numberDiff line change
@@ -1417,12 +1417,35 @@ def _has_valid_type(self, key, axis):
14171417
if isinstance(key, tuple) and isinstance(ax, MultiIndex):
14181418
return True
14191419

1420-
# TODO: don't check the entire key unless necessary
1421-
if (not is_iterator(key) and len(key) and
1422-
np.all(ax.get_indexer_for(key) < 0)):
1420+
if not is_iterator(key) and len(key):
14231421

1424-
raise KeyError("None of [%s] are in the [%s]" %
1425-
(key, self.obj._get_axis_name(axis)))
1422+
# True indicates missing values
1423+
missing = ax.get_indexer_for(key) < 0
1424+
1425+
if np.any(missing):
1426+
if len(key) == 1 or np.all(missing):
1427+
raise KeyError("None of [%s] are in the [%s]" %
1428+
(key, self.obj._get_axis_name(axis)))
1429+
1430+
else:
1431+
1432+
# we skip the warning on Categorical/Interval
1433+
# as this check is actually done (check for
1434+
# non-missing values), but a bit later in the
1435+
# code, so we want to avoid warning & then
1436+
# just raising
1437+
_missing_key_warning = textwrap.dedent("""
1438+
Passing list-likes to .loc with any non-matching elements will raise
1439+
KeyError in the future, you can use .reindex() as an alternative.
1440+
1441+
See the documentation here:
1442+
http://pandas.pydata.org/pandas-docs/stable/indexing.html#deprecate-loc-reindex-listlike""") # noqa
1443+
1444+
if not (ax.is_categorical() or ax.is_interval()):
1445+
warnings.warn(_missing_key_warning,
1446+
FutureWarning, stacklevel=5)
1447+
1448+
return True
14261449

14271450
return True
14281451

pandas/io/formats/excel.py

+9-1
Original file line numberDiff line numberDiff line change
@@ -353,7 +353,15 @@ def __init__(self, df, na_rep='', float_format=None, cols=None,
353353
self.styler = None
354354
self.df = df
355355
if cols is not None:
356-
self.df = df.loc[:, cols]
356+
357+
# all missing, raise
358+
if not len(Index(cols) & df.columns):
359+
raise KeyError
360+
361+
# 1 missing is ok
362+
# TODO(jreback)k this should raise
363+
# on *any* missing columns
364+
self.df = df.reindex(columns=cols)
357365
self.columns = self.df.columns
358366
self.float_format = float_format
359367
self.index = index

pandas/tests/indexing/test_categorical.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -111,7 +111,8 @@ def test_loc_listlike(self):
111111
assert_frame_equal(result, expected, check_index_type=True)
112112

113113
# not all labels in the categories
114-
pytest.raises(KeyError, lambda: self.df2.loc[['a', 'd']])
114+
with pytest.raises(KeyError):
115+
self.df2.loc[['a', 'd']]
115116

116117
def test_loc_listlike_dtypes(self):
117118
# GH 11586

pandas/tests/indexing/test_datetime.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -223,7 +223,7 @@ def test_series_partial_set_datetime(self):
223223
Timestamp('2011-01-03')]
224224
exp = Series([np.nan, 0.2, np.nan],
225225
index=pd.DatetimeIndex(keys, name='idx'), name='s')
226-
tm.assert_series_equal(ser.loc[keys], exp, check_index_type=True)
226+
tm.assert_series_equal(ser.reindex(keys), exp, check_index_type=True)
227227

228228
def test_series_partial_set_period(self):
229229
# GH 11497
@@ -248,5 +248,5 @@ def test_series_partial_set_period(self):
248248
pd.Period('2011-01-03', freq='D')]
249249
exp = Series([np.nan, 0.2, np.nan],
250250
index=pd.PeriodIndex(keys, name='idx'), name='s')
251-
result = ser.loc[keys]
251+
result = ser.reindex(keys)
252252
tm.assert_series_equal(result, exp)

pandas/tests/indexing/test_iloc.py

+2-1
Original file line numberDiff line numberDiff line change
@@ -617,7 +617,8 @@ def test_iloc_non_unique_indexing(self):
617617
expected = DataFrame(new_list)
618618
expected = pd.concat([expected, DataFrame(index=idx[idx > sidx.max()])
619619
])
620-
result = df2.loc[idx]
620+
with tm.assert_produces_warning(FutureWarning, check_stacklevel=False):
621+
result = df2.loc[idx]
621622
tm.assert_frame_equal(result, expected, check_index_type=False)
622623

623624
def test_iloc_empty_list_indexer_is_ok(self):

pandas/tests/indexing/test_indexing.py

+12-6
Original file line numberDiff line numberDiff line change
@@ -176,7 +176,8 @@ def test_dups_fancy_indexing(self):
176176
'test1': [7., 6, np.nan],
177177
'other': ['d', 'c', np.nan]}, index=rows)
178178

179-
result = df.loc[rows]
179+
with tm.assert_produces_warning(FutureWarning, check_stacklevel=False):
180+
result = df.loc[rows]
180181
tm.assert_frame_equal(result, expected)
181182

182183
# see GH5553, make sure we use the right indexer
@@ -186,7 +187,8 @@ def test_dups_fancy_indexing(self):
186187
'other': [np.nan, np.nan, np.nan,
187188
'd', 'c', np.nan]},
188189
index=rows)
189-
result = df.loc[rows]
190+
with tm.assert_produces_warning(FutureWarning, check_stacklevel=False):
191+
result = df.loc[rows]
190192
tm.assert_frame_equal(result, expected)
191193

192194
# inconsistent returns for unique/duplicate indices when values are
@@ -203,20 +205,23 @@ def test_dups_fancy_indexing(self):
203205

204206
# GH 4619; duplicate indexer with missing label
205207
df = DataFrame({"A": [0, 1, 2]})
206-
result = df.loc[[0, 8, 0]]
208+
with tm.assert_produces_warning(FutureWarning, check_stacklevel=False):
209+
result = df.loc[[0, 8, 0]]
207210
expected = DataFrame({"A": [0, np.nan, 0]}, index=[0, 8, 0])
208211
tm.assert_frame_equal(result, expected, check_index_type=False)
209212

210213
df = DataFrame({"A": list('abc')})
211-
result = df.loc[[0, 8, 0]]
214+
with tm.assert_produces_warning(FutureWarning, check_stacklevel=False):
215+
result = df.loc[[0, 8, 0]]
212216
expected = DataFrame({"A": ['a', np.nan, 'a']}, index=[0, 8, 0])
213217
tm.assert_frame_equal(result, expected, check_index_type=False)
214218

215219
# non unique with non unique selector
216220
df = DataFrame({'test': [5, 7, 9, 11]}, index=['A', 'A', 'B', 'C'])
217221
expected = DataFrame(
218222
{'test': [5, 7, 5, 7, np.nan]}, index=['A', 'A', 'A', 'A', 'E'])
219-
result = df.loc[['A', 'A', 'E']]
223+
with tm.assert_produces_warning(FutureWarning, check_stacklevel=False):
224+
result = df.loc[['A', 'A', 'E']]
220225
tm.assert_frame_equal(result, expected)
221226

222227
# GH 5835
@@ -227,7 +232,8 @@ def test_dups_fancy_indexing(self):
227232
expected = pd.concat(
228233
[df.loc[:, ['A', 'B']], DataFrame(np.nan, columns=['C'],
229234
index=df.index)], axis=1)
230-
result = df.loc[:, ['A', 'B', 'C']]
235+
with tm.assert_produces_warning(FutureWarning, check_stacklevel=False):
236+
result = df.loc[:, ['A', 'B', 'C']]
231237
tm.assert_frame_equal(result, expected)
232238

233239
# GH 6504, multi-axis indexing

pandas/tests/indexing/test_loc.py

+25-5
Original file line numberDiff line numberDiff line change
@@ -152,12 +152,15 @@ def test_loc_getitem_label_list(self):
152152
[Timestamp('20130102'), Timestamp('20130103')],
153153
typs=['ts'], axes=0)
154154

155+
def test_loc_getitem_label_list_with_missing(self):
155156
self.check_result('list lbl', 'loc', [0, 1, 2], 'indexer', [0, 1, 2],
156157
typs=['empty'], fails=KeyError)
157-
self.check_result('list lbl', 'loc', [0, 2, 3], 'ix', [0, 2, 3],
158-
typs=['ints', 'uints'], axes=0, fails=KeyError)
159-
self.check_result('list lbl', 'loc', [3, 6, 7], 'ix', [3, 6, 7],
160-
typs=['ints', 'uints'], axes=1, fails=KeyError)
158+
with tm.assert_produces_warning(FutureWarning, check_stacklevel=False):
159+
self.check_result('list lbl', 'loc', [0, 2, 3], 'ix', [0, 2, 3],
160+
typs=['ints', 'uints'], axes=0, fails=KeyError)
161+
with tm.assert_produces_warning(FutureWarning, check_stacklevel=False):
162+
self.check_result('list lbl', 'loc', [3, 6, 7], 'ix', [3, 6, 7],
163+
typs=['ints', 'uints'], axes=1, fails=KeyError)
161164
self.check_result('list lbl', 'loc', [4, 8, 10], 'ix', [4, 8, 10],
162165
typs=['ints', 'uints'], axes=2, fails=KeyError)
163166

@@ -249,7 +252,7 @@ def test_loc_to_fail(self):
249252
pytest.raises(KeyError, lambda: s.loc[['4']])
250253

251254
s.loc[-1] = 3
252-
result = s.loc[[-1, -2]]
255+
result = s.reindex([-1, -2])
253256
expected = Series([3, np.nan], index=[-1, -2])
254257
tm.assert_series_equal(result, expected)
255258

@@ -277,6 +280,23 @@ def f():
277280

278281
pytest.raises(KeyError, f)
279282

283+
def test_loc_getitem_list_with_fail(self):
284+
# 15747
285+
# should KeyError if *any* missing labels
286+
287+
s = Series([1, 2, 3])
288+
289+
s.loc[[2]]
290+
291+
with pytest.raises(KeyError):
292+
s.loc[[3]]
293+
294+
# a non-match and a match
295+
with tm.assert_produces_warning(FutureWarning):
296+
expected = s.loc[[2, 3]]
297+
result = s.reindex([2, 3])
298+
tm.assert_series_equal(result, expected)
299+
280300
def test_loc_getitem_label_slice(self):
281301

282302
# label slices (with ints)

0 commit comments

Comments
 (0)