Skip to content

Commit 9c5c378

Browse files
authored
Merge branch 'main' into share-datetime-parsing-format-paths
2 parents 891ab4e + eff6566 commit 9c5c378

37 files changed

+243
-112
lines changed

.github/actions/setup-conda/action.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -18,7 +18,7 @@ runs:
1818
- name: Set Arrow version in ${{ inputs.environment-file }} to ${{ inputs.pyarrow-version }}
1919
run: |
2020
grep -q ' - pyarrow' ${{ inputs.environment-file }}
21-
sed -i"" -e "s/ - pyarrow<10/ - pyarrow=${{ inputs.pyarrow-version }}/" ${{ inputs.environment-file }}
21+
sed -i"" -e "s/ - pyarrow/ - pyarrow=${{ inputs.pyarrow-version }}/" ${{ inputs.environment-file }}
2222
cat ${{ inputs.environment-file }}
2323
shell: bash
2424
if: ${{ inputs.pyarrow-version }}

.github/workflows/ubuntu.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ jobs:
2929
matrix:
3030
env_file: [actions-38.yaml, actions-39.yaml, actions-310.yaml]
3131
pattern: ["not single_cpu", "single_cpu"]
32-
pyarrow_version: ["7", "8", "9"]
32+
pyarrow_version: ["7", "8", "9", "10"]
3333
include:
3434
- name: "Downstream Compat"
3535
env_file: actions-38-downstream_compat.yaml

ci/deps/actions-310.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ dependencies:
4242
- psycopg2
4343
- pymysql
4444
- pytables
45-
- pyarrow<10
45+
- pyarrow
4646
- pyreadstat
4747
- python-snappy
4848
- pyxlsb

ci/deps/actions-38-downstream_compat.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ dependencies:
4040
- openpyxl
4141
- odfpy
4242
- psycopg2
43-
- pyarrow<10
43+
- pyarrow
4444
- pymysql
4545
- pyreadstat
4646
- pytables

ci/deps/actions-38.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ dependencies:
4040
- odfpy
4141
- pandas-gbq
4242
- psycopg2
43-
- pyarrow<10
43+
- pyarrow
4444
- pymysql
4545
- pyreadstat
4646
- pytables

ci/deps/actions-39.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ dependencies:
4141
- pandas-gbq
4242
- psycopg2
4343
- pymysql
44-
- pyarrow<10
44+
- pyarrow
4545
- pyreadstat
4646
- pytables
4747
- python-snappy

ci/deps/circle-38-arm64.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ dependencies:
4040
- odfpy
4141
- pandas-gbq
4242
- psycopg2
43-
- pyarrow<10
43+
- pyarrow
4444
- pymysql
4545
# Not provided on ARM
4646
#- pyreadstat

doc/source/whatsnew/v2.0.0.rst

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -466,7 +466,8 @@ Other API changes
466466
- :meth:`Index.astype` now allows casting from ``float64`` dtype to datetime-like dtypes, matching :class:`Series` behavior (:issue:`49660`)
467467
- Passing data with dtype of "timedelta64[s]", "timedelta64[ms]", or "timedelta64[us]" to :class:`TimedeltaIndex`, :class:`Series`, or :class:`DataFrame` constructors will now retain that dtype instead of casting to "timedelta64[ns]"; timedelta64 data with lower resolution will be cast to the lowest supported resolution "timedelta64[s]" (:issue:`49014`)
468468
- Passing ``dtype`` of "timedelta64[s]", "timedelta64[ms]", or "timedelta64[us]" to :class:`TimedeltaIndex`, :class:`Series`, or :class:`DataFrame` constructors will now retain that dtype instead of casting to "timedelta64[ns]"; passing a dtype with lower resolution for :class:`Series` or :class:`DataFrame` will be cast to the lowest supported resolution "timedelta64[s]" (:issue:`49014`)
469-
- Passing a ``np.datetime64`` object with non-nanosecond resolution to :class:`Timestamp` will retain the input resolution if it is "s", "ms", or "ns"; otherwise it will be cast to the closest supported resolution (:issue:`49008`)
469+
- Passing a ``np.datetime64`` object with non-nanosecond resolution to :class:`Timestamp` will retain the input resolution if it is "s", "ms", "us", or "ns"; otherwise it will be cast to the closest supported resolution (:issue:`49008`)
470+
- Passing a string in ISO-8601 format to :class:`Timestamp` will retain the resolution of the parsed input if it is "s", "ms", "us", or "ns"; otherwise it will be cast to the closest supported resolution (:issue:`49737`)
470471
- The ``other`` argument in :meth:`DataFrame.mask` and :meth:`Series.mask` now defaults to ``no_default`` instead of ``np.nan`` consistent with :meth:`DataFrame.where` and :meth:`Series.where`. Entries will be filled with the corresponding NULL value (``np.nan`` for numpy dtypes, ``pd.NA`` for extension dtypes). (:issue:`49111`)
471472
- Changed behavior of :meth:`Series.quantile` and :meth:`DataFrame.quantile` with :class:`SparseDtype` to retain sparse dtype (:issue:`49583`)
472473
- When creating a :class:`Series` with a object-dtype :class:`Index` of datetime objects, pandas no longer silently converts the index to a :class:`DatetimeIndex` (:issue:`39307`, :issue:`23598`)
@@ -807,6 +808,7 @@ Datetimelike
807808
- Bug in :func:`to_datetime` was throwing ``ValueError`` when parsing dates with ISO8601 format where some values were not zero-padded (:issue:`21422`)
808809
- Bug in :func:`to_datetime` was giving incorrect results when using ``format='%Y%m%d'`` and ``errors='ignore'`` (:issue:`26493`)
809810
- Bug in :func:`to_datetime` was failing to parse date strings ``'today'`` and ``'now'`` if ``format`` was not ISO8601 (:issue:`50359`)
811+
- Bug in :meth:`Timestamp.round` when the ``freq`` argument has zero-duration (e.g. "0ns") returning incorrect results instead of raising (:issue:`49737`)
810812
- Bug in :func:`to_datetime` was not raising ``ValueError`` when invalid format was passed and ``errors`` was ``'ignore'`` or ``'coerce'`` (:issue:`50266`)
811813
- Bug in :class:`DateOffset` was throwing ``TypeError`` when constructing with milliseconds and another super-daily argument (:issue:`49897`)
812814
-
@@ -839,6 +841,7 @@ Conversion
839841
- Bug in :meth:`Series.convert_dtypes` not converting dtype to nullable dtype when :class:`Series` contains ``NA`` and has dtype ``object`` (:issue:`48791`)
840842
- Bug where any :class:`ExtensionDtype` subclass with ``kind="M"`` would be interpreted as a timezone type (:issue:`34986`)
841843
- Bug in :class:`.arrays.ArrowExtensionArray` that would raise ``NotImplementedError`` when passed a sequence of strings or binary (:issue:`49172`)
844+
- Bug in :meth:`Series.astype` raising ``pyarrow.ArrowInvalid`` when converting from a non-pyarrow string dtype to a pyarrow numeric type (:issue:`50430`)
842845
- Bug in :func:`to_datetime` was not respecting ``exact`` argument when ``format`` was an ISO8601 format (:issue:`12649`)
843846
- Bug in :meth:`TimedeltaArray.astype` raising ``TypeError`` when converting to a pyarrow duration type (:issue:`49795`)
844847
-

environment.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -43,7 +43,7 @@ dependencies:
4343
- odfpy
4444
- py
4545
- psycopg2
46-
- pyarrow<10
46+
- pyarrow
4747
- pymysql
4848
- pyreadstat
4949
- pytables

pandas/_libs/tslibs/conversion.pyx

Lines changed: 33 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -405,7 +405,8 @@ cdef _TSObject convert_datetime_to_tsobject(
405405

406406

407407
cdef _TSObject _create_tsobject_tz_using_offset(npy_datetimestruct dts,
408-
int tzoffset, tzinfo tz=None):
408+
int tzoffset, tzinfo tz=None,
409+
NPY_DATETIMEUNIT reso=NPY_FR_ns):
409410
"""
410411
Convert a datetimestruct `dts`, along with initial timezone offset
411412
`tzoffset` to a _TSObject (with timezone object `tz` - optional).
@@ -416,6 +417,7 @@ cdef _TSObject _create_tsobject_tz_using_offset(npy_datetimestruct dts,
416417
tzoffset: int
417418
tz : tzinfo or None
418419
timezone for the timezone-aware output.
420+
reso : NPY_DATETIMEUNIT, default NPY_FR_ns
419421
420422
Returns
421423
-------
@@ -427,16 +429,19 @@ cdef _TSObject _create_tsobject_tz_using_offset(npy_datetimestruct dts,
427429
datetime dt
428430
Py_ssize_t pos
429431

430-
value = npy_datetimestruct_to_datetime(NPY_FR_ns, &dts)
432+
value = npy_datetimestruct_to_datetime(reso, &dts)
431433
obj.dts = dts
432434
obj.tzinfo = timezone(timedelta(minutes=tzoffset))
433-
obj.value = tz_localize_to_utc_single(value, obj.tzinfo)
435+
obj.value = tz_localize_to_utc_single(
436+
value, obj.tzinfo, ambiguous=None, nonexistent=None, creso=reso
437+
)
438+
obj.creso = reso
434439
if tz is None:
435-
check_overflows(obj, NPY_FR_ns)
440+
check_overflows(obj, reso)
436441
return obj
437442

438443
cdef:
439-
Localizer info = Localizer(tz, NPY_FR_ns)
444+
Localizer info = Localizer(tz, reso)
440445

441446
# Infer fold from offset-adjusted obj.value
442447
# see PEP 495 https://www.python.org/dev/peps/pep-0495/#the-fold-attribute
@@ -454,6 +459,7 @@ cdef _TSObject _create_tsobject_tz_using_offset(npy_datetimestruct dts,
454459
obj.dts.us, obj.tzinfo, fold=obj.fold)
455460
obj = convert_datetime_to_tsobject(
456461
dt, tz, nanos=obj.dts.ps // 1000)
462+
obj.ensure_reso(reso) # TODO: more performant to get reso right up front?
457463
return obj
458464

459465

@@ -490,7 +496,7 @@ cdef _TSObject _convert_str_to_tsobject(object ts, tzinfo tz, str unit,
490496
int out_local = 0, out_tzoffset = 0, string_to_dts_failed
491497
datetime dt
492498
int64_t ival
493-
NPY_DATETIMEUNIT out_bestunit
499+
NPY_DATETIMEUNIT out_bestunit, reso
494500

495501
if len(ts) == 0 or ts in nat_strings:
496502
ts = NaT
@@ -513,19 +519,26 @@ cdef _TSObject _convert_str_to_tsobject(object ts, tzinfo tz, str unit,
513519
&out_tzoffset, False
514520
)
515521
if not string_to_dts_failed:
522+
reso = get_supported_reso(out_bestunit)
516523
try:
517-
check_dts_bounds(&dts, NPY_FR_ns)
524+
check_dts_bounds(&dts, reso)
518525
if out_local == 1:
519-
return _create_tsobject_tz_using_offset(dts,
520-
out_tzoffset, tz)
526+
return _create_tsobject_tz_using_offset(
527+
dts, out_tzoffset, tz, reso
528+
)
521529
else:
522-
ival = npy_datetimestruct_to_datetime(NPY_FR_ns, &dts)
530+
ival = npy_datetimestruct_to_datetime(reso, &dts)
523531
if tz is not None:
524532
# shift for _localize_tso
525-
ival = tz_localize_to_utc_single(ival, tz,
526-
ambiguous="raise")
527-
528-
return convert_to_tsobject(ival, tz, None, False, False)
533+
ival = tz_localize_to_utc_single(
534+
ival, tz, ambiguous="raise", nonexistent=None, creso=reso
535+
)
536+
obj = _TSObject()
537+
obj.dts = dts
538+
obj.value = ival
539+
obj.creso = reso
540+
maybe_localize_tso(obj, tz, obj.creso)
541+
return obj
529542

530543
except OutOfBoundsDatetime:
531544
# GH#19382 for just-barely-OutOfBounds falling back to dateutil
@@ -538,10 +551,12 @@ cdef _TSObject _convert_str_to_tsobject(object ts, tzinfo tz, str unit,
538551
pass
539552

540553
try:
541-
dt = parse_datetime_string(ts, dayfirst=dayfirst,
542-
yearfirst=yearfirst)
543-
except (ValueError, OverflowError):
544-
raise ValueError("could not convert string to Timestamp")
554+
# TODO: use the one that returns reso
555+
dt = parse_datetime_string(
556+
ts, dayfirst=dayfirst, yearfirst=yearfirst
557+
)
558+
except (ValueError, OverflowError) as err:
559+
raise ValueError("could not convert string to Timestamp") from err
545560

546561
return convert_datetime_to_tsobject(dt, tz)
547562

pandas/_libs/tslibs/offsets.pyx

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -162,7 +162,11 @@ def apply_wraps(func):
162162

163163
result = func(self, other)
164164

165-
result = (<_Timestamp>Timestamp(result))._as_creso(other._creso)
165+
result2 = Timestamp(result).as_unit(other.unit)
166+
if result == result2:
167+
# i.e. the conversion is non-lossy, not the case for e.g.
168+
# test_milliseconds_combination
169+
result = result2
166170

167171
if self._adjust_dst:
168172
result = result.tz_localize(tz)

pandas/_libs/tslibs/timestamps.pyx

Lines changed: 12 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -448,6 +448,7 @@ cdef class _Timestamp(ABCTimestamp):
448448
# cython semantics, args have been switched and this is __radd__
449449
# TODO(cython3): remove this it moved to __radd__
450450
return other.__add__(self)
451+
451452
return NotImplemented
452453

453454
def __radd__(self, other):
@@ -1560,8 +1561,17 @@ class Timestamp(_Timestamp):
15601561
cdef:
15611562
int64_t nanos
15621563

1563-
to_offset(freq).nanos # raises on non-fixed freq
1564-
nanos = delta_to_nanoseconds(to_offset(freq), self._creso)
1564+
freq = to_offset(freq)
1565+
freq.nanos # raises on non-fixed freq
1566+
nanos = delta_to_nanoseconds(freq, self._creso)
1567+
if nanos == 0:
1568+
if freq.nanos == 0:
1569+
raise ValueError("Division by zero in rounding")
1570+
1571+
# e.g. self.unit == "s" and sub-second freq
1572+
return self
1573+
1574+
# TODO: problem if nanos==0
15651575

15661576
if self.tz is not None:
15671577
value = self.tz_localize(None).value

pandas/compat/pyarrow.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,10 @@
1313
pa_version_under7p0 = _palv < Version("7.0.0")
1414
pa_version_under8p0 = _palv < Version("8.0.0")
1515
pa_version_under9p0 = _palv < Version("9.0.0")
16+
pa_version_under10p0 = _palv < Version("10.0.0")
1617
except ImportError:
1718
pa_version_under6p0 = True
1819
pa_version_under7p0 = True
1920
pa_version_under8p0 = True
2021
pa_version_under9p0 = True
22+
pa_version_under10p0 = True

pandas/core/arrays/arrow/array.py

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -206,17 +206,17 @@ def _from_sequence(cls, scalars, *, dtype: Dtype | None = None, copy: bool = Fal
206206
Construct a new ExtensionArray from a sequence of scalars.
207207
"""
208208
pa_dtype = to_pyarrow_type(dtype)
209-
is_cls = isinstance(scalars, cls)
210-
if is_cls or isinstance(scalars, (pa.Array, pa.ChunkedArray)):
211-
if is_cls:
212-
scalars = scalars._data
213-
if pa_dtype:
214-
scalars = scalars.cast(pa_dtype)
215-
return cls(scalars)
216-
else:
217-
return cls(
218-
pa.chunked_array(pa.array(scalars, type=pa_dtype, from_pandas=True))
219-
)
209+
if isinstance(scalars, cls):
210+
scalars = scalars._data
211+
elif not isinstance(scalars, (pa.Array, pa.ChunkedArray)):
212+
try:
213+
scalars = pa.array(scalars, type=pa_dtype, from_pandas=True)
214+
except pa.ArrowInvalid:
215+
# GH50430: let pyarrow infer type, then cast
216+
scalars = pa.array(scalars, from_pandas=True)
217+
if pa_dtype:
218+
scalars = scalars.cast(pa_dtype)
219+
return cls(scalars)
220220

221221
@classmethod
222222
def _from_sequence_of_strings(

pandas/core/computation/pytables.py

Lines changed: 4 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,6 @@
1111
import numpy as np
1212

1313
from pandas._libs.tslibs import (
14-
NaT,
1514
Timedelta,
1615
Timestamp,
1716
)
@@ -216,17 +215,16 @@ def stringify(value):
216215
if isinstance(v, (int, float)):
217216
v = stringify(v)
218217
v = ensure_decoded(v)
219-
v = Timestamp(v)
220-
if v is not NaT:
221-
v = v.as_unit("ns") # pyright: ignore[reportGeneralTypeIssues]
218+
v = Timestamp(v).as_unit("ns")
222219
if v.tz is not None:
223220
v = v.tz_convert("UTC")
224221
return TermValue(v, v.value, kind)
225222
elif kind in ("timedelta64", "timedelta"):
226223
if isinstance(v, str):
227-
v = Timedelta(v).value
224+
v = Timedelta(v)
228225
else:
229-
v = Timedelta(v, unit="s").value
226+
v = Timedelta(v, unit="s")
227+
v = v.as_unit("ns").value
230228
return TermValue(int(v), v, kind)
231229
elif meta == "category":
232230
metadata = extract_array(self.metadata, extract_numpy=True)

pandas/core/resample.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2085,7 +2085,7 @@ def _adjust_dates_anchored(
20852085
elif origin == "start":
20862086
origin_nanos = first.value
20872087
elif isinstance(origin, Timestamp):
2088-
origin_nanos = origin.value
2088+
origin_nanos = origin.as_unit("ns").value
20892089
elif origin in ["end", "end_day"]:
20902090
origin_last = last if origin == "end" else last.ceil("D")
20912091
sub_freq_times = (origin_last.value - first.value) // freq.nanos

pandas/tests/arithmetic/test_datetime64.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1699,15 +1699,15 @@ def test_datetimeindex_sub_timestamp_overflow(self):
16991699
dtimax = pd.to_datetime(["2021-12-28 17:19", Timestamp.max])
17001700
dtimin = pd.to_datetime(["2021-12-28 17:19", Timestamp.min])
17011701

1702-
tsneg = Timestamp("1950-01-01")
1702+
tsneg = Timestamp("1950-01-01").as_unit("ns")
17031703
ts_neg_variants = [
17041704
tsneg,
17051705
tsneg.to_pydatetime(),
17061706
tsneg.to_datetime64().astype("datetime64[ns]"),
17071707
tsneg.to_datetime64().astype("datetime64[D]"),
17081708
]
17091709

1710-
tspos = Timestamp("1980-01-01")
1710+
tspos = Timestamp("1980-01-01").as_unit("ns")
17111711
ts_pos_variants = [
17121712
tspos,
17131713
tspos.to_pydatetime(),

pandas/tests/arrays/interval/test_interval.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -287,7 +287,7 @@ def test_arrow_array():
287287
with pytest.raises(TypeError, match="Not supported to convert IntervalArray"):
288288
pa.array(intervals, type="float64")
289289

290-
with pytest.raises(TypeError, match="different 'subtype'"):
290+
with pytest.raises(TypeError, match="different 'subtype'|to convert IntervalArray"):
291291
pa.array(intervals, type=ArrowIntervalType(pa.float64(), "left"))
292292

293293

pandas/tests/arrays/period/test_arrow_compat.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
11
import pytest
22

3+
from pandas.compat.pyarrow import pa_version_under10p0
4+
35
from pandas.core.dtypes.dtypes import PeriodDtype
46

57
import pandas as pd
@@ -26,6 +28,7 @@ def test_arrow_extension_type():
2628
assert hash(p1) != hash(p3)
2729

2830

31+
@pytest.mark.xfail(not pa_version_under10p0, reason="Wrong behavior with pyarrow 10")
2932
@pytest.mark.parametrize(
3033
"data, freq",
3134
[

pandas/tests/arrays/test_timedeltas.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -102,7 +102,7 @@ def test_add_pdnat(self, tda):
102102
# TODO: 2022-07-11 this is the only test that gets to DTA.tz_convert
103103
# or tz_localize with non-nano; implement tests specific to that.
104104
def test_add_datetimelike_scalar(self, tda, tz_naive_fixture):
105-
ts = pd.Timestamp("2016-01-01", tz=tz_naive_fixture)
105+
ts = pd.Timestamp("2016-01-01", tz=tz_naive_fixture).as_unit("ns")
106106

107107
expected = tda.as_unit("ns") + ts
108108
res = tda + ts

pandas/tests/extension/test_arrow.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1471,6 +1471,14 @@ def test_astype_from_non_pyarrow(data):
14711471
tm.assert_extension_array_equal(result, data)
14721472

14731473

1474+
def test_astype_float_from_non_pyarrow_str():
1475+
# GH50430
1476+
ser = pd.Series(["1.0"])
1477+
result = ser.astype("float64[pyarrow]")
1478+
expected = pd.Series([1.0], dtype="float64[pyarrow]")
1479+
tm.assert_series_equal(result, expected)
1480+
1481+
14741482
def test_to_numpy_with_defaults(data):
14751483
# GH49973
14761484
result = data.to_numpy()

0 commit comments

Comments
 (0)