BUG: DatetimeIndex._data should return an ndarray #20912

reidy-p · 2018-05-01T22:23:35Z

closes BUG: inconsistent state of DatetimeIndex._data #20810
tests added / passed
passes git diff upstream/master -u -- "*.py" | flake8 --diff
whatsnew entry

The change I made seems to fix the case in the original issue without breaking any tests.

On my branch:

In [1]: idx1 = pd.DatetimeIndex(start="2012-01-01", periods=3, freq='D') # date_range kind of construction

In [2]: idx1._data
array(['2012-01-01T00:00:00.000000000', '2012-01-02T00:00:00.000000000',
       '2012-01-03T00:00:00.000000000'], dtype='datetime64[ns]')

In [3]: idx2 = pd.DatetimeIndex(idx1)

In [4]: idx2._data
Out[4]: 
array(['2012-01-01T00:00:00.000000000', '2012-01-02T00:00:00.000000000',
       '2012-01-03T00:00:00.000000000'], dtype='datetime64[ns]')

But is the solution too simple or is something more sophisticated required?

And do we need tests for this issue?

jreback · 2018-05-01T22:34:37Z

this is a band aid
it shouldn’t be set in the first place like this

codecov · 2018-05-02T02:11:24Z

Codecov Report

Merging #20912 into master will increase coverage by <.01%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #20912      +/-   ##
==========================================
+ Coverage   91.92%   91.92%   +<.01%     
==========================================
  Files         160      160              
  Lines       49913    49915       +2     
==========================================
+ Hits        45882    45884       +2     
  Misses       4031     4031

Flag	Coverage Δ
#multiple	`90.3% <100%> (ø)`	⬆️
#single	`42.11% <90.9%> (ø)`	⬆️

Impacted Files	Coverage Δ
pandas/io/pytables.py	`92.48% <100%> (ø)`	⬆️
pandas/core/indexes/base.py	`96.58% <100%> (-0.06%)`	⬇️
pandas/core/indexes/datetimes.py	`95.21% <100%> (+0.11%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d05e8f2...e18d996. Read the comment docs.

jreback

see comments

reidy-p · 2018-05-08T21:29:22Z

pandas/core/indexes/datetimes.py

-                                                          tz,
-                                                          ambiguous=ambiguous)
-                    index = index.view(_NS_DTYPE)
+                    arr = conversion.tz_localize_to_utc(_ensure_int64(index),


tz_localize_to_utc returns an array and not a DatetimeIndex so I then convert this array to a DatetimeIndex called index so I can pass index.values to _simple_new below

jorisvandenbossche · 2018-06-06T13:32:27Z

doc/source/whatsnew/v0.23.1.txt

@@ -111,3 +111,4 @@ Other

 - Tab completion on :class:`Index` in IPython no longer outputs deprecation warnings (:issue:`21125`)
 - Bug preventing pandas from being importable with -OO optimization (:issue:`21071`)
+- ``DatetimeIndex._data`` now returns a numpy array in all cases (:issue:`20810`)


I don't think we need to add this in the whatsnew, since this is not a user facing change (as user should not be aware of or use the _data attribute)

Good point, thanks.

jreback · 2018-06-19T01:53:24Z

happy to take a patch with a non-band aid fix (or can close for now)

jorisvandenbossche · 2018-06-19T07:42:14Z

this is a band aid
it shouldn’t be set in the first place like this

Can you be more specific than this?
I don't think this fix is necessarily a band aid.

Currently, the index object (which at the end of the _generate method is passed to _simple_new) is generated in some different ways:

cls._cached_range -> returns DatetimeIndex
_generate_regular_range -> returns DatetimeIndex
passed through conversion.tz_localize_to_utc -> returns array
tools.to_datetime -> returns DatetimeIndex

So possible fixes I see:

make sure that index is a DatetimeIndex in the end in all cases and update the final _simple_new call (this is what @reidy-p did)
make sure that each case results in a datetime64 array (this seems more work to do that conversion in each place)
just before the _simple_new call, check if index is a DatetimeIndex or not, and convert there to ndarray if needed
change _simple_new to convert DatetimeIndex to ndarray if passed one.

From those options, the first seems reasonable to me. I think 3) is also fine (although that is less explicit).

reidy-p · 2018-06-29T19:52:21Z

@jorisvandenbossche thanks for that summary!

@jreback do you agree with the above comment or is this still a band-aid?

jreback · 2018-07-01T15:32:49Z

pandas/core/indexes/datetimes.py

@@ -588,7 +588,9 @@ def _generate(cls, start, end, periods, name, freq,
            index = index[1:]
        if not right_closed and len(index) and index[-1] == end:
            index = index[:-1]
-        index = cls._simple_new(index, name=name, freq=freq, tz=tz)
+
+        index = cls._simple_new(index.values, name=name, freq=freq, tz=tz)


i don’t like having 2 different construction paths generally: eg we always need to be an ndarray or already converted to a DTI by the time _simple_new gets called

what i would do is run all of the index tests and see what the current state is
then probably settle on an ndarray input to _simple_new and put an assertion to validate this

Which 2 different construction paths do you mean?
With the update above, index is always an index, and it's always the values that are passed to _simple_new. So it is ndarray input to _simple_new.

my point is that we need an assertion to validate this. it may be that everything is fixed, but we should actually test this.

reidy-p · 2018-07-06T21:22:09Z

pandas/core/indexes/datetimes.py

@@ -609,6 +611,8 @@ def _simple_new(cls, values, name=None, freq=None, tz=None,
                           dtype=dtype, **kwargs)
            values = np.array(values, copy=False)

+        assert isinstance(values, np.ndarray)


It was suggested above that we should have an assertion to check whether the input to _simple_new is actually always an ndarray with the new changes. It turns out that it's still not guaranteed to be an ndarray. In particular, _shallow_copy sometimes calls _simple_new with a non-ndarray input. Some of these cases are handled by the code directly above this new assert statement but one case that is not handled is the DatetimeIndex (i.e., it is not converted to an ndarray). This is why I have put code to convert a DTI in _shallow_copy to an ndarray, although I realise this may not be the correct way to handle this.

can you add .a message on the assert as well
can also assert is_integer_dtype(values)

reidy-p · 2018-07-06T21:24:06Z

pandas/core/indexes/base.py

@@ -506,6 +506,9 @@ def _shallow_copy(self, values=None, **kwargs):
        attributes.update(kwargs)
        if not len(values) and 'dtype' not in kwargs:
            attributes['dtype'] = self.dtype
+        from pandas import DatetimeIndex


As discussed below, this code converts a DTI to ndarray before calling _simple_new. All the other cases either seem to be an ndarray already or are converted to ndarray in the _simple_new function. I expect that there is probably a better way of handling this.

you can just do

with a comment
values = getattr(values, 'values', values)

pep8speaks · 2018-07-07T15:50:11Z

Hello @reidy-p! Thanks for updating the PR.

Cheers ! There are no PEP8 issues in this Pull Request. 🍻

Comment last updated on July 09, 2018 at 16:01 Hours UTC

jreback · 2018-07-07T15:55:29Z

pandas/core/indexes/datetimes.py

@@ -607,6 +611,9 @@ def _simple_new(cls, values, name=None, freq=None, tz=None,
                           dtype=dtype, **kwargs)
            values = np.array(values, copy=False)

+        # values should be a numpy array


use the format, not commment is needed
assert ...., "values are not an np.ndarray"
assert the integer dtype as well

What's the intention of an assert is_integer_dtype(values)? values is often an ndarray of datetime64[ns] at this stage which means this assert fails very often.

oh right, sorry the assert should be assert is_datetime64_dtype, it always should be this

its just better that we have certain guarantees in theses low level constructors

jreback · 2018-07-07T15:55:44Z

pandas/core/indexes/datetimes.py

+                                                        tz,
+                                                        ambiguous=ambiguous)
+
+                    arr = arr.view(_NS_DTYPE)


I think can remove this arr.view(...)

reidy-p · 2018-07-07T21:41:28Z

pandas/core/indexes/datetimes.py

@@ -2087,6 +2094,8 @@ def _generate_regular_range(start, end, periods, freq):
                             "if a 'period' is given.")

        data = np.arange(b, e, stride, dtype=np.int64)
+
+        # _simple_new is getting an array of int64 here
        data = DatetimeIndex._simple_new(data, None, tz=tz)


There is a new assert statement in _simple_new to check whether the input is an array of datetime64[ns]. But in this case data is an array of int64 so the assert statement fails. Is there a convenient way to rewrite this part to make data an array of datetime64[ns] before calling _simple_new so the assert works?

oh yes, .view(_NS_DTYPE)

jreback · 2018-07-08T21:04:37Z

@reidy-p if you can rebase. datetimes have been changing a bit as getting ready for DatetimeArray cc @jbrockmendel

reidy-p · 2018-07-08T21:05:27Z

Yeah sorry I just saw the new changes. I'll rebase.

jreback · 2018-07-08T22:00:23Z

pandas/core/indexes/datetimes.py

@@ -608,12 +610,14 @@ def _simple_new(cls, values, name=None, freq=None, tz=None,
                           dtype=dtype, **kwargs)
            values = np.array(values, copy=False)

-        if is_object_dtype(values):
-            return cls(values, name=name, freq=freq, tz=tz,


was this just never hit?

Yes I think it's never hit

jreback · 2018-07-09T21:51:51Z

@reidy-p lgtm. ping on green.

reidy-p · 2018-07-10T08:23:58Z

@jreback thanks! This is green now

jreback · 2018-07-10T10:08:51Z

thanks @reidy-p

jorisvandenbossche · 2018-07-10T15:46:38Z

pandas/core/indexes/datetimes.py

            values = _ensure_int64(values).view(_NS_DTYPE)

+        values = getattr(values, 'values', values)


Why is this one still needed? I thought the idea was now to ensure ndarrays are passed to _simple_new and not DatetimeIndexes?

Well-spotted! I inserted this when I was trying to investigate why some tests were failing and I meant to move it before pushing the commit but forgot. I think we can move this line to just before the call to _simple_new in this file:

pandas/pandas/core/arrays/datetimelike.py

Lines 43 to 53 in 1dd05cc

def _shallow_copy(self, values=None, **kwargs):

if values is None:

# Note: slightly different from Index implementation which defaults

# to self.values

values = self._ndarray_values

attributes = self._get_attributes_dict()

attributes.update(kwargs)

if not len(values) and 'dtype' not in kwargs:

attributes['dtype'] = self.dtype

return self._simple_new(values, **attributes)

Does this make sense? I did the same thing here:

pandas/pandas/core/indexes/base.py

Lines 501 to 513 in 1dd05cc

@Appender(_index_shared_docs['_shallow_copy'])

def _shallow_copy(self, values=None, **kwargs):

if values is None:

values = self.values

attributes = self._get_attributes_dict()

attributes.update(kwargs)

if not len(values) and 'dtype' not in kwargs:

attributes['dtype'] = self.dtype

# _simple_new expects an ndarray

values = getattr(values, 'values', values)

return self._simple_new(values, **attributes)

jreback requested changes May 2, 2018

View reviewed changes

gfyoung added Bug Datetime Datetime data dtype Compat pandas objects compatability with Numpy or Python functions labels May 8, 2018

reidy-p force-pushed the datetimeindex_data branch from 0b635d0 to 6b0b72b Compare May 8, 2018 21:25

reidy-p commented May 8, 2018

View reviewed changes

reidy-p force-pushed the datetimeindex_data branch from 6b0b72b to 2735818 Compare June 2, 2018 12:36

jorisvandenbossche reviewed Jun 6, 2018

View reviewed changes

reidy-p force-pushed the datetimeindex_data branch from 2735818 to 7feeddb Compare June 6, 2018 19:11

reidy-p force-pushed the datetimeindex_data branch 2 times, most recently from 46e18aa to 71694ae Compare June 14, 2018 19:57

reidy-p force-pushed the datetimeindex_data branch 2 times, most recently from 038ca34 to 58c8f5c Compare June 22, 2018 15:10

reidy-p force-pushed the datetimeindex_data branch from 58c8f5c to ccc874d Compare June 29, 2018 19:50

jreback requested changes Jul 1, 2018

View reviewed changes

reidy-p force-pushed the datetimeindex_data branch from ccc874d to 1ab2770 Compare July 6, 2018 21:16

reidy-p commented Jul 6, 2018

View reviewed changes

reidy-p force-pushed the datetimeindex_data branch from 1ab2770 to 22fa07a Compare July 7, 2018 15:50

reidy-p force-pushed the datetimeindex_data branch 2 times, most recently from 41fd5ee to deca8a4 Compare July 7, 2018 15:52

jreback requested changes Jul 7, 2018

View reviewed changes

reidy-p force-pushed the datetimeindex_data branch from 9dd0150 to 3d4bc3c Compare July 7, 2018 21:38

reidy-p commented Jul 7, 2018

View reviewed changes

reidy-p force-pushed the datetimeindex_data branch from 47d71ef to 98b3e80 Compare July 8, 2018 21:15

jreback reviewed Jul 8, 2018

View reviewed changes

reidy-p added 8 commits July 9, 2018 15:08

BUG: DatetimeIndex._data should return an ndarray

92416a6

tz_localize_to_utc generates an array not DTI

a6a038b

Remove whatsnew

917917e

Check whether _simple_new always receives an ndarray

cc6ab25

cleaner way to get ndarray from DTI and fix failing pytables tests

f841afd

assert that _simple_new always receives an array of datetime64[ns]

4233f6f

use .view(_NS_DTYPE)

c20bb44

fix and rebase

c207289

reidy-p force-pushed the datetimeindex_data branch from 68e1d46 to ea39791 Compare July 9, 2018 15:55

lint

e18d996

reidy-p force-pushed the datetimeindex_data branch from ea39791 to e18d996 Compare July 9, 2018 16:01

jreback added this to the 0.24.0 milestone Jul 9, 2018

jreback approved these changes Jul 9, 2018

View reviewed changes

jreback merged commit eeab164 into pandas-dev:master Jul 10, 2018

jorisvandenbossche reviewed Jul 10, 2018

View reviewed changes

mroeschke mentioned this pull request Jul 31, 2018

BUG: DatetimeIndex.unique shifts tz-aware dates #21737

Closed

Sup3rGeo pushed a commit to Sup3rGeo/pandas that referenced this pull request Oct 1, 2018

BUG: DatetimeIndex._data should return an ndarray (pandas-dev#20912)

dc2ca92

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: DatetimeIndex._data should return an ndarray #20912

BUG: DatetimeIndex._data should return an ndarray #20912

reidy-p commented May 1, 2018 •

edited

Loading

jreback commented May 1, 2018

codecov bot commented May 2, 2018 •

edited

Loading

jreback left a comment

reidy-p May 8, 2018

jorisvandenbossche Jun 6, 2018

reidy-p Jun 6, 2018

jreback commented Jun 19, 2018

jorisvandenbossche commented Jun 19, 2018

reidy-p commented Jun 29, 2018

jreback Jul 1, 2018

jorisvandenbossche Jul 2, 2018

jreback Jul 3, 2018

reidy-p Jul 6, 2018

jreback Jul 6, 2018

reidy-p Jul 6, 2018

jreback Jul 6, 2018

pep8speaks commented Jul 7, 2018 •

edited

Loading

jreback Jul 7, 2018

reidy-p Jul 7, 2018

jreback Jul 7, 2018

jreback Jul 7, 2018

jreback Jul 7, 2018

reidy-p Jul 7, 2018 •

edited

Loading

jreback Jul 7, 2018

jreback commented Jul 8, 2018

reidy-p commented Jul 8, 2018

jreback Jul 8, 2018

reidy-p Jul 9, 2018 •

edited

Loading

jreback commented Jul 9, 2018

reidy-p commented Jul 10, 2018

jreback commented Jul 10, 2018

jorisvandenbossche Jul 10, 2018

reidy-p Jul 10, 2018

		values = _ensure_int64(values).view(_NS_DTYPE)

		values = getattr(values, 'values', values)

	def _shallow_copy(self, values=None, **kwargs):
	if values is None:
	# Note: slightly different from Index implementation which defaults
	# to self.values
	values = self._ndarray_values

	attributes = self._get_attributes_dict()
	attributes.update(kwargs)
	if not len(values) and 'dtype' not in kwargs:
	attributes['dtype'] = self.dtype
	return self._simple_new(values, **attributes)

	@Appender(_index_shared_docs['_shallow_copy'])
	def _shallow_copy(self, values=None, **kwargs):
	if values is None:
	values = self.values
	attributes = self._get_attributes_dict()
	attributes.update(kwargs)
	if not len(values) and 'dtype' not in kwargs:
	attributes['dtype'] = self.dtype

	# _simple_new expects an ndarray
	values = getattr(values, 'values', values)

	return self._simple_new(values, **attributes)

BUG: DatetimeIndex._data should return an ndarray #20912

BUG: DatetimeIndex._data should return an ndarray #20912

Conversation

reidy-p commented May 1, 2018 • edited Loading

jreback commented May 1, 2018

codecov bot commented May 2, 2018 • edited Loading

Codecov Report

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jun 19, 2018

jorisvandenbossche commented Jun 19, 2018

reidy-p commented Jun 29, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pep8speaks commented Jul 7, 2018 • edited Loading

Comment last updated on July 09, 2018 at 16:01 Hours UTC

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reidy-p Jul 7, 2018 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jreback commented Jul 8, 2018

reidy-p commented Jul 8, 2018

Choose a reason for hiding this comment

reidy-p Jul 9, 2018 • edited Loading

Choose a reason for hiding this comment

jreback commented Jul 9, 2018

reidy-p commented Jul 10, 2018

jreback commented Jul 10, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

reidy-p commented May 1, 2018 •

edited

Loading

codecov bot commented May 2, 2018 •

edited

Loading

pep8speaks commented Jul 7, 2018 •

edited

Loading

reidy-p Jul 7, 2018 •

edited

Loading

reidy-p Jul 9, 2018 •

edited

Loading