Skip to content

Commit 2f24d6e

Browse files
committed
Merge pull request #4822 from jreback/timedelta_hdf
ENH/CLN: support enhanced timedelta64 operations/conversions
2 parents 4577064 + ef2cfb1 commit 2f24d6e

17 files changed

+588
-132
lines changed

doc/source/io.rst

+20
Original file line numberDiff line numberDiff line change
@@ -2009,6 +2009,26 @@ space. These are in terms of the total number of rows in a table.
20092009
Term('minor_axis', '=', ['A','B']) ],
20102010
start=0, stop=10)
20112011
2012+
**Using timedelta64[ns]**
2013+
2014+
.. versionadded:: 0.13
2015+
2016+
Beginning in 0.13.0, you can store and query using the ``timedelta64[ns]`` type. Terms can be
2017+
specified in the format: ``<float>(<unit>)``, where float may be signed (and fractional), and unit can be
2018+
``D,s,ms,us,ns`` for the timedelta. Here's an example:
2019+
2020+
.. warning::
2021+
2022+
This requires ``numpy >= 1.7``
2023+
2024+
.. ipython:: python
2025+
2026+
from datetime import timedelta
2027+
dftd = DataFrame(dict(A = Timestamp('20130101'), B = [ Timestamp('20130101') + timedelta(days=i,seconds=10) for i in range(10) ]))
2028+
dftd['C'] = dftd['A']-dftd['B']
2029+
dftd
2030+
store.append('dftd',dftd,data_columns=True)
2031+
store.select('dftd',Term("C","<","-3.5D"))
20122032
20132033
Indexing
20142034
~~~~~~~~

doc/source/release.rst

+3
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,7 @@ API Changes
156156
- a column multi-index will be recreated properly (:issue:`4710`); raise on trying to use a multi-index
157157
with data_columns on the same axis
158158
- ``select_as_coordinates`` will now return an ``Int64Index`` of the resultant selection set
159+
- support ``timedelta64[ns]`` as a serialization type (:issue:`3577`)
159160
- ``JSON``
160161

161162
- added ``date_unit`` parameter to specify resolution of timestamps. Options
@@ -190,6 +191,8 @@ API Changes
190191
- provide automatic dtype conversions on _reduce operations (:issue:`3371`)
191192
- exclude non-numerics if mixed types with datelike in _reduce operations (:issue:`3371`)
192193
- default for ``tupleize_cols`` is now ``False`` for both ``to_csv`` and ``read_csv``. Fair warning in 0.12 (:issue:`3604`)
194+
- moved timedeltas support to pandas.tseries.timedeltas.py; add timedeltas string parsing,
195+
add top-level ``to_timedelta`` function
193196

194197
Internal Refactoring
195198
~~~~~~~~~~~~~~~~~~~~

doc/source/timeseries.rst

+20
Original file line numberDiff line numberDiff line change
@@ -1211,6 +1211,26 @@ Time Deltas & Conversions
12111211

12121212
.. versionadded:: 0.13
12131213

1214+
**string/integer conversion**
1215+
1216+
Using the top-level ``to_timedelta``, you can convert a scalar or array from the standard
1217+
timedelta format (produced by ``to_csv``) into a timedelta type (``np.timedelta64`` in ``nanoseconds``).
1218+
It can also construct Series.
1219+
1220+
.. warning::
1221+
1222+
This requires ``numpy >= 1.7``
1223+
1224+
.. ipython:: python
1225+
1226+
to_timedelta('1 days 06:05:01.00003')
1227+
to_timedelta('15.5us')
1228+
to_timedelta(['1 days 06:05:01.00003','15.5us','nan'])
1229+
to_timedelta(np.arange(5),unit='s')
1230+
to_timedelta(np.arange(5),unit='d')
1231+
1232+
**frequency conversion**
1233+
12141234
Timedeltas can be converted to other 'frequencies' by dividing by another timedelta.
12151235
These operations yield ``float64`` dtyped Series.
12161236

doc/source/v0.13.0.txt

+16-1
Original file line numberDiff line numberDiff line change
@@ -80,7 +80,7 @@ API changes
8080
See :ref:`here<io.hdf5-selecting_coordinates>` for an example.
8181
- allow a passed locations array or mask as a ``where`` condition (:issue:`4467`).
8282
See :ref:`here<io.hdf5-where_mask>` for an example.
83-
83+
- support ``timedelta64[ns]`` as a serialization type (:issue:`3577`)
8484
- the ``format`` keyword now replaces the ``table`` keyword; allowed values are ``fixed(f)`` or ``table(t)``
8585
the same defaults as prior < 0.13.0 remain, e.g. ``put`` implies 'fixed` or 'f' (Fixed) format
8686
and ``append`` imples 'table' or 't' (Table) format
@@ -208,6 +208,21 @@ Enhancements
208208

209209
- ``timedelta64[ns]`` operations
210210

211+
- Using the new top-level ``to_timedelta``, you can convert a scalar or array from the standard
212+
timedelta format (produced by ``to_csv``) into a timedelta type (``np.timedelta64`` in ``nanoseconds``).
213+
214+
.. warning::
215+
216+
This requires ``numpy >= 1.7``
217+
218+
.. ipython:: python
219+
220+
to_timedelta('1 days 06:05:01.00003')
221+
to_timedelta('15.5us')
222+
to_timedelta(['1 days 06:05:01.00003','15.5us','nan'])
223+
to_timedelta(np.arange(5),unit='s')
224+
to_timedelta(np.arange(5),unit='d')
225+
211226
- A Series of dtype ``timedelta64[ns]`` can now be divided by another
212227
``timedelta64[ns]`` object to yield a ``float64`` dtyped Series. This
213228
is frequency conversion. See :ref:`here<timeseries.timedeltas_convert>` for the docs.

pandas/__init__.py

+13
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,19 @@
1818
from datetime import datetime
1919
import numpy as np
2020

21+
# XXX: HACK for NumPy 1.5.1 to suppress warnings
22+
try:
23+
np.seterr(all='ignore')
24+
# np.set_printoptions(suppress=True)
25+
except Exception: # pragma: no cover
26+
pass
27+
28+
# numpy versioning
29+
from distutils.version import LooseVersion
30+
_np_version = np.version.short_version
31+
_np_version_under1p6 = LooseVersion(_np_version) < '1.6'
32+
_np_version_under1p7 = LooseVersion(_np_version) < '1.7'
33+
2134
from pandas.version import version as __version__
2235
from pandas.info import __doc__
2336

pandas/core/common.py

+4-100
Original file line numberDiff line numberDiff line change
@@ -11,35 +11,20 @@
1111
import pandas.algos as algos
1212
import pandas.lib as lib
1313
import pandas.tslib as tslib
14-
from distutils.version import LooseVersion
1514
from pandas import compat
1615
from pandas.compat import StringIO, BytesIO, range, long, u, zip, map
1716
from datetime import timedelta
1817

1918
from pandas.core.config import get_option
2019
from pandas.core import array as pa
2120

22-
23-
# XXX: HACK for NumPy 1.5.1 to suppress warnings
24-
try:
25-
np.seterr(all='ignore')
26-
# np.set_printoptions(suppress=True)
27-
except Exception: # pragma: no cover
28-
pass
29-
30-
3121
class PandasError(Exception):
3222
pass
3323

3424

3525
class AmbiguousIndexError(PandasError, KeyError):
3626
pass
3727

38-
# versioning
39-
_np_version = np.version.short_version
40-
_np_version_under1p6 = LooseVersion(_np_version) < '1.6'
41-
_np_version_under1p7 = LooseVersion(_np_version) < '1.7'
42-
4328
_POSSIBLY_CAST_DTYPES = set([np.dtype(t)
4429
for t in ['M8[ns]', 'm8[ns]', 'O', 'int8', 'uint8', 'int16', 'uint16', 'int32', 'uint32', 'int64', 'uint64']])
4530

@@ -704,34 +689,13 @@ def diff(arr, n, axis=0):
704689

705690
return out_arr
706691

707-
708-
def _coerce_scalar_to_timedelta_type(r):
709-
# kludgy here until we have a timedelta scalar
710-
# handle the numpy < 1.7 case
711-
712-
if is_integer(r):
713-
r = timedelta(microseconds=r/1000)
714-
715-
if _np_version_under1p7:
716-
if not isinstance(r, timedelta):
717-
raise AssertionError("Invalid type for timedelta scalar: %s" % type(r))
718-
if compat.PY3:
719-
# convert to microseconds in timedelta64
720-
r = np.timedelta64(int(r.total_seconds()*1e9 + r.microseconds*1000))
721-
else:
722-
return r
723-
724-
if isinstance(r, timedelta):
725-
r = np.timedelta64(r)
726-
elif not isinstance(r, np.timedelta64):
727-
raise AssertionError("Invalid type for timedelta scalar: %s" % type(r))
728-
return r.astype('timedelta64[ns]')
729-
730692
def _coerce_to_dtypes(result, dtypes):
731693
""" given a dtypes and a result set, coerce the result elements to the dtypes """
732694
if len(result) != len(dtypes):
733695
raise AssertionError("_coerce_to_dtypes requires equal len arrays")
734696

697+
from pandas.tseries.timedeltas import _coerce_scalar_to_timedelta_type
698+
735699
def conv(r,dtype):
736700
try:
737701
if isnull(r):
@@ -1324,68 +1288,6 @@ def _possibly_convert_platform(values):
13241288

13251289
return values
13261290

1327-
1328-
def _possibly_cast_to_timedelta(value, coerce=True):
1329-
""" try to cast to timedelta64, if already a timedeltalike, then make
1330-
sure that we are [ns] (as numpy 1.6.2 is very buggy in this regards,
1331-
don't force the conversion unless coerce is True
1332-
1333-
if coerce='compat' force a compatibilty coercerion (to timedeltas) if needeed
1334-
"""
1335-
1336-
# coercion compatability
1337-
if coerce == 'compat' and _np_version_under1p7:
1338-
1339-
def convert(td, dtype):
1340-
1341-
# we have an array with a non-object dtype
1342-
if hasattr(td,'item'):
1343-
td = td.astype(np.int64).item()
1344-
if td == tslib.iNaT:
1345-
return td
1346-
if dtype == 'm8[us]':
1347-
td *= 1000
1348-
return td
1349-
1350-
if td == tslib.compat_NaT:
1351-
return tslib.iNaT
1352-
1353-
# convert td value to a nanosecond value
1354-
d = td.days
1355-
s = td.seconds
1356-
us = td.microseconds
1357-
1358-
if dtype == 'object' or dtype == 'm8[ns]':
1359-
td = 1000*us + (s + d * 24 * 3600) * 10 ** 9
1360-
else:
1361-
raise ValueError("invalid conversion of dtype in np < 1.7 [%s]" % dtype)
1362-
1363-
return td
1364-
1365-
# < 1.7 coercion
1366-
if not is_list_like(value):
1367-
value = np.array([ value ])
1368-
1369-
dtype = value.dtype
1370-
return np.array([ convert(v,dtype) for v in value ], dtype='m8[ns]')
1371-
1372-
# deal with numpy not being able to handle certain timedelta operations
1373-
if isinstance(value, (ABCSeries, np.ndarray)) and value.dtype.kind == 'm':
1374-
if value.dtype != 'timedelta64[ns]':
1375-
value = value.astype('timedelta64[ns]')
1376-
return value
1377-
1378-
# we don't have a timedelta, but we want to try to convert to one (but
1379-
# don't force it)
1380-
if coerce:
1381-
new_value = tslib.array_to_timedelta64(
1382-
_values_from_object(value).astype(object), coerce=False)
1383-
if new_value.dtype == 'i8':
1384-
value = np.array(new_value, dtype='timedelta64[ns]')
1385-
1386-
return value
1387-
1388-
13891291
def _possibly_cast_to_datetime(value, dtype, coerce=False):
13901292
""" try to cast the array/value to a datetimelike dtype, converting float nan to iNaT """
13911293

@@ -1423,6 +1325,7 @@ def _possibly_cast_to_datetime(value, dtype, coerce=False):
14231325
from pandas.tseries.tools import to_datetime
14241326
value = to_datetime(value, coerce=coerce).values
14251327
elif is_timedelta64:
1328+
from pandas.tseries.timedeltas import _possibly_cast_to_timedelta
14261329
value = _possibly_cast_to_timedelta(value)
14271330
except:
14281331
pass
@@ -1448,6 +1351,7 @@ def _possibly_cast_to_datetime(value, dtype, coerce=False):
14481351
except:
14491352
pass
14501353
elif inferred_type in ['timedelta', 'timedelta64']:
1354+
from pandas.tseries.timedeltas import _possibly_cast_to_timedelta
14511355
value = _possibly_cast_to_timedelta(value)
14521356

14531357
return value

pandas/core/generic.py

+2-2
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,7 @@
1313
from pandas.tseries.index import DatetimeIndex
1414
from pandas.core.internals import BlockManager
1515
import pandas.core.common as com
16-
from pandas import compat
16+
from pandas import compat, _np_version_under1p7
1717
from pandas.compat import map, zip, lrange
1818
from pandas.core.common import (isnull, notnull, is_list_like,
1919
_values_from_object,
@@ -1908,7 +1908,7 @@ def abs(self):
19081908
obj = np.abs(self)
19091909

19101910
# suprimo numpy 1.6 hacking
1911-
if com._np_version_under1p7:
1911+
if _np_version_under1p7:
19121912
if self.ndim == 1:
19131913
if obj.dtype == 'm8[us]':
19141914
obj = obj.astype('m8[ns]')

pandas/core/series.py

+9-7
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
_asarray_tuplesafe, is_integer_dtype,
2020
_NS_DTYPE, _TD_DTYPE,
2121
_infer_dtype_from_scalar, is_list_like, _values_from_object,
22+
_possibly_cast_to_datetime, _possibly_castable, _possibly_convert_platform,
2223
ABCSparseArray)
2324
from pandas.core.index import (Index, MultiIndex, InvalidIndexError,
2425
_ensure_index, _handle_legacy_indexes)
@@ -32,6 +33,7 @@
3233
from pandas.tseries.index import DatetimeIndex
3334
from pandas.tseries.period import PeriodIndex, Period
3435
from pandas.tseries.offsets import DateOffset
36+
from pandas.tseries.timedeltas import _possibly_cast_to_timedelta
3537
from pandas import compat
3638
from pandas.util.terminal import get_terminal_size
3739
from pandas.compat import zip, lzip, u, OrderedDict
@@ -142,7 +144,7 @@ def _convert_to_array(self, values, name=None):
142144
values = values.to_series()
143145
elif inferred_type in ('timedelta', 'timedelta64'):
144146
# have a timedelta, convert to to ns here
145-
values = com._possibly_cast_to_timedelta(values, coerce=coerce)
147+
values = _possibly_cast_to_timedelta(values, coerce=coerce)
146148
elif inferred_type == 'integer':
147149
# py3 compat where dtype is 'm' but is an integer
148150
if values.dtype.kind == 'm':
@@ -160,7 +162,7 @@ def _convert_to_array(self, values, name=None):
160162
raise TypeError("cannot use a non-absolute DateOffset in "
161163
"datetime/timedelta operations [{0}]".format(
162164
','.join([ com.pprint_thing(v) for v in values[mask] ])))
163-
values = com._possibly_cast_to_timedelta(os, coerce=coerce)
165+
values = _possibly_cast_to_timedelta(os, coerce=coerce)
164166
else:
165167
raise TypeError("incompatible type [{0}] for a datetime/timedelta operation".format(pa.array(values).dtype))
166168

@@ -3215,11 +3217,11 @@ def _try_cast(arr, take_fast_path):
32153217

32163218
# perf shortcut as this is the most common case
32173219
if take_fast_path:
3218-
if com._possibly_castable(arr) and not copy and dtype is None:
3220+
if _possibly_castable(arr) and not copy and dtype is None:
32193221
return arr
32203222

32213223
try:
3222-
arr = com._possibly_cast_to_datetime(arr, dtype)
3224+
arr = _possibly_cast_to_datetime(arr, dtype)
32233225
subarr = pa.array(arr, dtype=dtype, copy=copy)
32243226
except (ValueError, TypeError):
32253227
if dtype is not None and raise_cast_failure:
@@ -3266,9 +3268,9 @@ def _try_cast(arr, take_fast_path):
32663268
subarr = lib.maybe_convert_objects(subarr)
32673269

32683270
else:
3269-
subarr = com._possibly_convert_platform(data)
3271+
subarr = _possibly_convert_platform(data)
32703272

3271-
subarr = com._possibly_cast_to_datetime(subarr, dtype)
3273+
subarr = _possibly_cast_to_datetime(subarr, dtype)
32723274

32733275
else:
32743276
subarr = _try_cast(data, False)
@@ -3285,7 +3287,7 @@ def _try_cast(arr, take_fast_path):
32853287
dtype, value = _infer_dtype_from_scalar(value)
32863288
else:
32873289
# need to possibly convert the value here
3288-
value = com._possibly_cast_to_datetime(value, dtype)
3290+
value = _possibly_cast_to_datetime(value, dtype)
32893291

32903292
subarr = pa.empty(len(index), dtype=dtype)
32913293
subarr.fill(value)

0 commit comments

Comments
 (0)