-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Construction of Series from dict containing NaN as key #18496
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -181,7 +181,8 @@ def test_concat_empty_series_dtypes(self): | |
# categorical | ||
assert pd.concat([Series(dtype='category'), | ||
Series(dtype='category')]).dtype == 'category' | ||
assert pd.concat([Series(dtype='category'), | ||
# GH 18515 | ||
assert pd.concat([Series(np.array([]), dtype='category'), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Why do you change this? (it's only the dtype of the category, it is still a categorical series, so for the concat that does not matter) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, the dtype of the category does matter for concat (and rightly so, since conceptually the fact that a categorical of ints is really a categorical is only an implementation detail when you're going to concat it to a non-categorical anyway). (... or just try that test before and after my change) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How does this PR change the behaviour of empty category series? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is it clear to you that that test fails with this PR? If yes, please clarify/reformulate what you're asking for. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this PR changes (fixes) the dtype of categories for There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. So There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, asking is good. This PR makes (incidentally, but it is the correct behaviour) category series initialized with an empty So since you reopened #18515, now I will add "closes #18515" and push again. OK? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yep. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done, ping |
||
Series(dtype='float64')]).dtype == 'float64' | ||
assert pd.concat([Series(dtype='category'), | ||
Series(dtype='object')]).dtype == 'object' | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -4,6 +4,7 @@ | |
import pytest | ||
|
||
from datetime import datetime, timedelta | ||
from collections import OrderedDict | ||
|
||
from numpy import nan | ||
import numpy as np | ||
|
@@ -79,17 +80,42 @@ def test_constructor(self): | |
m = MultiIndex.from_arrays([[1, 2], [3, 4]]) | ||
pytest.raises(NotImplementedError, Series, m) | ||
|
||
def test_constructor_empty(self): | ||
@pytest.mark.parametrize('input_class', [list, dict, OrderedDict]) | ||
def test_constructor_empty(self, input_class): | ||
empty = Series() | ||
empty2 = Series([]) | ||
empty2 = Series(input_class()) | ||
|
||
# the are Index() and RangeIndex() which don't compare type equal | ||
# these are Index() and RangeIndex() which don't compare type equal | ||
# but are just .equals | ||
assert_series_equal(empty, empty2, check_index_type=False) | ||
|
||
empty = Series(index=lrange(10)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. so I think these tests got eliminated? can you add another test (or add onto the construction via There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This test is wrong (dtype shouldn't be There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The xfailing test would basically be this one with explicit dtype, it's better to change it when fixing #17261 . Adding a fixed test for this instead. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. done, ping |
||
empty2 = Series(np.nan, index=lrange(10)) | ||
assert_series_equal(empty, empty2) | ||
# With explicit dtype: | ||
empty = Series(dtype='float64') | ||
empty2 = Series(input_class(), dtype='float64') | ||
assert_series_equal(empty, empty2, check_index_type=False) | ||
|
||
# GH 18515 : with dtype=category: | ||
empty = Series(dtype='category') | ||
empty2 = Series(input_class(), dtype='category') | ||
assert_series_equal(empty, empty2, check_index_type=False) | ||
|
||
if input_class is not list: | ||
# With index: | ||
empty = Series(index=lrange(10)) | ||
empty2 = Series(input_class(), index=lrange(10)) | ||
assert_series_equal(empty, empty2) | ||
|
||
# With index and dtype float64: | ||
empty = Series(np.nan, index=lrange(10)) | ||
empty2 = Series(input_class(), index=lrange(10), dtype='float64') | ||
assert_series_equal(empty, empty2) | ||
|
||
@pytest.mark.parametrize('input_arg', [np.nan, float('nan')]) | ||
def test_constructor_nan(self, input_arg): | ||
empty = Series(dtype='float64', index=lrange(10)) | ||
empty2 = Series(input_arg, index=lrange(10)) | ||
|
||
assert_series_equal(empty, empty2, check_index_type=False) | ||
|
||
def test_constructor_series(self): | ||
index1 = ['d', 'b', 'a', 'c'] | ||
|
@@ -625,6 +651,21 @@ def test_constructor_dict(self): | |
expected.iloc[1] = 1 | ||
assert_series_equal(result, expected) | ||
|
||
@pytest.mark.parametrize("value", [2, np.nan, None, float('nan')]) | ||
def test_constructor_dict_nan_key(self, value): | ||
# GH 18480 | ||
d = {1: 'a', value: 'b', float('nan'): 'c', 4: 'd'} | ||
result = Series(d).sort_values() | ||
expected = Series(['a', 'b', 'c', 'd'], index=[1, value, np.nan, 4]) | ||
assert_series_equal(result, expected) | ||
|
||
# MultiIndex: | ||
d = {(1, 1): 'a', (2, np.nan): 'b', (3, value): 'c'} | ||
result = Series(d).sort_values() | ||
expected = Series(['a', 'b', 'c'], | ||
index=Index([(1, 1), (2, np.nan), (3, value)])) | ||
assert_series_equal(result, expected) | ||
|
||
def test_constructor_dict_datetime64_index(self): | ||
# GH 9456 | ||
|
||
|
@@ -658,8 +699,6 @@ def test_constructor_tuple_of_tuples(self): | |
s = Series(data) | ||
assert tuple(s) == data | ||
|
||
@pytest.mark.xfail(reason='GH 18480 (Series initialization from dict with ' | ||
'NaN keys') | ||
def test_constructor_dict_of_tuples(self): | ||
data = {(1, 2): 3, | ||
(None, 5): 6} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was the only usage of
_get_values_from_dict
, so this could be cleaned up. There are also multiple implementation (for the different types of indices, not sure if those differences are important and are all catched in the new implementation)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The different implementations seem unused and broken, see for instance
however, in principle they do something sensible (not necessarily expected), which is to look for
Timestamp
keys in the dict. The "new implementation", that is theSeries
construction, doesn't care about this (and shouldn't, I think).I'm OK with removing all of this if you want (in another PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can be removed here (it is this PR that changes the implementation)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done, ping