-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: DatetimeIndex._data should return an ndarray #20912
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
this is a band aid |
Codecov Report
@@ Coverage Diff @@
## master #20912 +/- ##
==========================================
+ Coverage 91.92% 91.92% +<.01%
==========================================
Files 160 160
Lines 49913 49915 +2
==========================================
+ Hits 45882 45884 +2
Misses 4031 4031
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
see comments
0b635d0
to
6b0b72b
Compare
tz, | ||
ambiguous=ambiguous) | ||
index = index.view(_NS_DTYPE) | ||
arr = conversion.tz_localize_to_utc(_ensure_int64(index), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tz_localize_to_utc
returns an array and not a DatetimeIndex
so I then convert this array to a DatetimeIndex
called index
so I can pass index.values
to _simple_new
below
6b0b72b
to
2735818
Compare
doc/source/whatsnew/v0.23.1.txt
Outdated
@@ -111,3 +111,4 @@ Other | |||
|
|||
- Tab completion on :class:`Index` in IPython no longer outputs deprecation warnings (:issue:`21125`) | |||
- Bug preventing pandas from being importable with -OO optimization (:issue:`21071`) | |||
- ``DatetimeIndex._data`` now returns a numpy array in all cases (:issue:`20810`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need to add this in the whatsnew, since this is not a user facing change (as user should not be aware of or use the _data
attribute)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, thanks.
2735818
to
7feeddb
Compare
46e18aa
to
71694ae
Compare
happy to take a patch with a non-band aid fix (or can close for now) |
Can you be more specific than this? Currently, the
So possible fixes I see:
From those options, the first seems reasonable to me. I think 3) is also fine (although that is less explicit). |
038ca34
to
58c8f5c
Compare
58c8f5c
to
ccc874d
Compare
@jorisvandenbossche thanks for that summary! @jreback do you agree with the above comment or is this still a band-aid? |
@@ -588,7 +588,9 @@ def _generate(cls, start, end, periods, name, freq, | |||
index = index[1:] | |||
if not right_closed and len(index) and index[-1] == end: | |||
index = index[:-1] | |||
index = cls._simple_new(index, name=name, freq=freq, tz=tz) | |||
|
|||
index = cls._simple_new(index.values, name=name, freq=freq, tz=tz) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don’t like having 2 different construction paths generally: eg we always need to be an ndarray or already converted to a DTI by the time _simple_new gets called
what i would do is run all of the index tests and see what the current state is
then probably settle on an ndarray input to _simple_new and put an assertion to validate this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which 2 different construction paths do you mean?
With the update above, index
is always an index, and it's always the values that are passed to _simple_new
. So it is ndarray input to _simple_new
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
my point is that we need an assertion to validate this. it may be that everything is fixed, but we should actually test this.
ccc874d
to
1ab2770
Compare
pandas/core/indexes/datetimes.py
Outdated
@@ -609,6 +611,8 @@ def _simple_new(cls, values, name=None, freq=None, tz=None, | |||
dtype=dtype, **kwargs) | |||
values = np.array(values, copy=False) | |||
|
|||
assert isinstance(values, np.ndarray) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was suggested above that we should have an assertion to check whether the input to _simple_new
is actually always an ndarray with the new changes. It turns out that it's still not guaranteed to be an ndarray. In particular, _shallow_copy
sometimes calls _simple_new
with a non-ndarray input. Some of these cases are handled by the code directly above this new assert statement but one case that is not handled is the DatetimeIndex
(i.e., it is not converted to an ndarray). This is why I have put code to convert a DTI in _shallow_copy
to an ndarray, although I realise this may not be the correct way to handle this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add .a message on the assert as well
can also assert is_integer_dtype(values)
pandas/core/indexes/base.py
Outdated
@@ -506,6 +506,9 @@ def _shallow_copy(self, values=None, **kwargs): | |||
attributes.update(kwargs) | |||
if not len(values) and 'dtype' not in kwargs: | |||
attributes['dtype'] = self.dtype | |||
from pandas import DatetimeIndex |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed below, this code converts a DTI to ndarray before calling _simple_new
. All the other cases either seem to be an ndarray already or are converted to ndarray in the _simple_new
function. I expect that there is probably a better way of handling this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can just do
with a comment
values = getattr(values, 'values', values)
1ab2770
to
22fa07a
Compare
Hello @reidy-p! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on July 09, 2018 at 16:01 Hours UTC |
41fd5ee
to
deca8a4
Compare
pandas/core/indexes/datetimes.py
Outdated
@@ -607,6 +611,9 @@ def _simple_new(cls, values, name=None, freq=None, tz=None, | |||
dtype=dtype, **kwargs) | |||
values = np.array(values, copy=False) | |||
|
|||
# values should be a numpy array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use the format, not commment is needed
assert ...., "values are not an np.ndarray"
assert the integer dtype as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the intention of an assert is_integer_dtype(values)
? values
is often an ndarray of datetime64[ns] at this stage which means this assert fails very often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh right, sorry the assert should be assert is_datetime64_dtype, it always should be this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
its just better that we have certain guarantees in theses low level constructors
pandas/core/indexes/datetimes.py
Outdated
tz, | ||
ambiguous=ambiguous) | ||
|
||
arr = arr.view(_NS_DTYPE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think can remove this arr.view(...)
9dd0150
to
3d4bc3c
Compare
pandas/core/indexes/datetimes.py
Outdated
@@ -2087,6 +2094,8 @@ def _generate_regular_range(start, end, periods, freq): | |||
"if a 'period' is given.") | |||
|
|||
data = np.arange(b, e, stride, dtype=np.int64) | |||
|
|||
# _simple_new is getting an array of int64 here | |||
data = DatetimeIndex._simple_new(data, None, tz=tz) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a new assert statement in _simple_new
to check whether the input is an array of datetime64[ns]. But in this case data
is an array of int64 so the assert statement fails. Is there a convenient way to rewrite this part to make data
an array of datetime64[ns] before calling _simple_new
so the assert works?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh yes, .view(_NS_DTYPE)
@reidy-p if you can rebase. datetimes have been changing a bit as getting ready for DatetimeArray cc @jbrockmendel |
Yeah sorry I just saw the new changes. I'll rebase. |
47d71ef
to
98b3e80
Compare
@@ -608,12 +610,14 @@ def _simple_new(cls, values, name=None, freq=None, tz=None, | |||
dtype=dtype, **kwargs) | |||
values = np.array(values, copy=False) | |||
|
|||
if is_object_dtype(values): | |||
return cls(values, name=name, freq=freq, tz=tz, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this just never hit?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I think it's never hit
68e1d46
to
ea39791
Compare
ea39791
to
e18d996
Compare
@reidy-p lgtm. ping on green. |
@jreback thanks! This is green now |
thanks @reidy-p |
values = _ensure_int64(values).view(_NS_DTYPE) | ||
|
||
values = getattr(values, 'values', values) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this one still needed? I thought the idea was now to ensure ndarrays are passed to _simple_new
and not DatetimeIndexes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well-spotted! I inserted this when I was trying to investigate why some tests were failing and I meant to move it before pushing the commit but forgot. I think we can move this line to just before the call to _simple_new
in this file:
pandas/pandas/core/arrays/datetimelike.py
Lines 43 to 53 in 1dd05cc
def _shallow_copy(self, values=None, **kwargs): | |
if values is None: | |
# Note: slightly different from Index implementation which defaults | |
# to self.values | |
values = self._ndarray_values | |
attributes = self._get_attributes_dict() | |
attributes.update(kwargs) | |
if not len(values) and 'dtype' not in kwargs: | |
attributes['dtype'] = self.dtype | |
return self._simple_new(values, **attributes) |
Does this make sense? I did the same thing here:
pandas/pandas/core/indexes/base.py
Lines 501 to 513 in 1dd05cc
@Appender(_index_shared_docs['_shallow_copy']) | |
def _shallow_copy(self, values=None, **kwargs): | |
if values is None: | |
values = self.values | |
attributes = self._get_attributes_dict() | |
attributes.update(kwargs) | |
if not len(values) and 'dtype' not in kwargs: | |
attributes['dtype'] = self.dtype | |
# _simple_new expects an ndarray | |
values = getattr(values, 'values', values) | |
return self._simple_new(values, **attributes) |
git diff upstream/master -u -- "*.py" | flake8 --diff
The change I made seems to fix the case in the original issue without breaking any tests.
On my branch:
But is the solution too simple or is something more sophisticated required?
And do we need tests for this issue?