Skip to content

Commit 891ab4e

Browse files
authored
Merge branch 'main' into share-datetime-parsing-format-paths
2 parents 8b07c46 + 9918c84 commit 891ab4e

File tree

92 files changed

+896
-727
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

92 files changed

+896
-727
lines changed

.github/workflows/macos-windows.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -16,6 +16,7 @@ env:
1616
PANDAS_CI: 1
1717
PYTEST_TARGET: pandas
1818
PATTERN: "not slow and not db and not network and not single_cpu"
19+
TEST_ARGS: "-W error:::pandas"
1920

2021

2122
permissions:

.github/workflows/ubuntu.yml

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ jobs:
3838
- name: "Minimum Versions"
3939
env_file: actions-38-minimum_versions.yaml
4040
pattern: "not slow and not network and not single_cpu"
41+
test_args: ""
4142
- name: "Locale: it_IT"
4243
env_file: actions-38.yaml
4344
pattern: "not slow and not network and not single_cpu"
@@ -62,10 +63,12 @@ jobs:
6263
env_file: actions-310.yaml
6364
pattern: "not slow and not network and not single_cpu"
6465
pandas_copy_on_write: "1"
66+
test_args: ""
6567
- name: "Data Manager"
6668
env_file: actions-38.yaml
6769
pattern: "not slow and not network and not single_cpu"
6870
pandas_data_manager: "array"
71+
test_args: ""
6972
- name: "Pypy"
7073
env_file: actions-pypy-38.yaml
7174
pattern: "not slow and not network and not single_cpu"
@@ -93,7 +96,7 @@ jobs:
9396
LC_ALL: ${{ matrix.lc_all || '' }}
9497
PANDAS_DATA_MANAGER: ${{ matrix.pandas_data_manager || 'block' }}
9598
PANDAS_COPY_ON_WRITE: ${{ matrix.pandas_copy_on_write || '0' }}
96-
TEST_ARGS: ${{ matrix.test_args || '' }}
99+
TEST_ARGS: ${{ matrix.test_args || '-W error:::pandas' }}
97100
PYTEST_WORKERS: ${{ contains(matrix.pattern, 'not single_cpu') && 'auto' || '1' }}
98101
PYTEST_TARGET: ${{ matrix.pytest_target || 'pandas' }}
99102
IS_PYPY: ${{ contains(matrix.env_file, 'pypy') }}

.pre-commit-config.yaml

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -333,3 +333,13 @@ repos:
333333
additional_dependencies:
334334
- autotyping==22.9.0
335335
- libcst==0.4.7
336+
- id: check-test-naming
337+
name: check that test names start with 'test'
338+
entry: python -m scripts.check_test_naming
339+
types: [python]
340+
files: ^pandas/tests
341+
language: python
342+
exclude: |
343+
(?x)
344+
^pandas/tests/generic/test_generic.py # GH50380
345+
|^pandas/tests/io/json/test_readlines.py # GH50378

doc/source/reference/indexing.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -298,6 +298,7 @@ MultiIndex components
298298
MultiIndex.swaplevel
299299
MultiIndex.reorder_levels
300300
MultiIndex.remove_unused_levels
301+
MultiIndex.drop
301302

302303
MultiIndex selecting
303304
~~~~~~~~~~~~~~~~~~~~

doc/source/user_guide/io.rst

Lines changed: 5 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -275,6 +275,9 @@ parse_dates : boolean or list of ints or names or list of lists or dict, default
275275
infer_datetime_format : boolean, default ``False``
276276
If ``True`` and parse_dates is enabled for a column, attempt to infer the
277277
datetime format to speed up the processing.
278+
279+
.. deprecated:: 2.0.0
280+
A strict version of this argument is now the default, passing it has no effect.
278281
keep_date_col : boolean, default ``False``
279282
If ``True`` and parse_dates specifies combining multiple columns then keep the
280283
original columns.
@@ -916,12 +919,10 @@ an exception is raised, the next one is tried:
916919

917920
Note that performance-wise, you should try these methods of parsing dates in order:
918921

919-
1. Try to infer the format using ``infer_datetime_format=True`` (see section below).
920-
921-
2. If you know the format, use ``pd.to_datetime()``:
922+
1. If you know the format, use ``pd.to_datetime()``:
922923
``date_parser=lambda x: pd.to_datetime(x, format=...)``.
923924

924-
3. If you have a really non-standard format, use a custom ``date_parser`` function.
925+
2. If you have a really non-standard format, use a custom ``date_parser`` function.
925926
For optimal performance, this should be vectorized, i.e., it should accept arrays
926927
as arguments.
927928

doc/source/whatsnew/v2.0.0.rst

Lines changed: 13 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -28,10 +28,10 @@ The available extras, found in the :ref:`installation guide<install.dependencies
2828
``[all, performance, computation, timezone, fss, aws, gcp, excel, parquet, feather, hdf5, spss, postgresql, mysql,
2929
sql-other, html, xml, plot, output_formatting, clipboard, compression, test]`` (:issue:`39164`).
3030

31-
.. _whatsnew_200.enhancements.io_use_nullable_dtypes_and_nullable_backend:
31+
.. _whatsnew_200.enhancements.io_use_nullable_dtypes_and_dtype_backend:
3232

33-
Configuration option, ``mode.nullable_backend``, to return pyarrow-backed dtypes
34-
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
33+
Configuration option, ``mode.dtype_backend``, to return pyarrow-backed dtypes
34+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
3535

3636
The ``use_nullable_dtypes`` keyword argument has been expanded to the following functions to enable automatic conversion to nullable dtypes (:issue:`36712`)
3737

@@ -41,7 +41,7 @@ The ``use_nullable_dtypes`` keyword argument has been expanded to the following
4141
* :func:`read_sql_query`
4242
* :func:`read_sql_table`
4343

44-
Additionally a new global configuration, ``mode.nullable_backend`` can now be used in conjunction with the parameter ``use_nullable_dtypes=True`` in the following functions
44+
Additionally a new global configuration, ``mode.dtype_backend`` can now be used in conjunction with the parameter ``use_nullable_dtypes=True`` in the following functions
4545
to select the nullable dtypes implementation.
4646

4747
* :func:`read_csv` (with ``engine="pyarrow"`` or ``engine="python"``)
@@ -50,12 +50,12 @@ to select the nullable dtypes implementation.
5050
* :func:`read_orc`
5151

5252

53-
And the following methods will also utilize the ``mode.nullable_backend`` option.
53+
And the following methods will also utilize the ``mode.dtype_backend`` option.
5454

5555
* :meth:`DataFrame.convert_dtypes`
5656
* :meth:`Series.convert_dtypes`
5757

58-
By default, ``mode.nullable_backend`` is set to ``"pandas"`` to return existing, numpy-backed nullable dtypes, but it can also
58+
By default, ``mode.dtype_backend`` is set to ``"pandas"`` to return existing, numpy-backed nullable dtypes, but it can also
5959
be set to ``"pyarrow"`` to return pyarrow-backed, nullable :class:`ArrowDtype` (:issue:`48957`, :issue:`49997`).
6060

6161
.. ipython:: python
@@ -65,12 +65,12 @@ be set to ``"pyarrow"`` to return pyarrow-backed, nullable :class:`ArrowDtype` (
6565
1,2.5,True,a,,,,,
6666
3,4.5,False,b,6,7.5,True,a,
6767
""")
68-
with pd.option_context("mode.nullable_backend", "pandas"):
68+
with pd.option_context("mode.dtype_backend", "pandas"):
6969
df = pd.read_csv(data, use_nullable_dtypes=True)
7070
df.dtypes
7171
7272
data.seek(0)
73-
with pd.option_context("mode.nullable_backend", "pyarrow"):
73+
with pd.option_context("mode.dtype_backend", "pyarrow"):
7474
df_pyarrow = pd.read_csv(data, use_nullable_dtypes=True, engine="pyarrow")
7575
df_pyarrow.dtypes
7676
@@ -717,13 +717,17 @@ Removal of prior version deprecations/changes
717717
- Changed default of ``numeric_only`` to ``False`` in all DataFrame methods with that argument (:issue:`46096`, :issue:`46906`)
718718
- Changed default of ``numeric_only`` to ``False`` in :meth:`Series.rank` (:issue:`47561`)
719719
- Enforced deprecation of silently dropping nuisance columns in groupby and resample operations when ``numeric_only=False`` (:issue:`41475`)
720+
- Changed behavior in setting values with ``df.loc[:, foo] = bar`` or ``df.iloc[:, foo] = bar``, these now always attempt to set values inplace before falling back to casting (:issue:`45333`)
720721
- Changed default of ``numeric_only`` in various :class:`.DataFrameGroupBy` methods; all methods now default to ``numeric_only=False`` (:issue:`46072`)
721722
- Changed default of ``numeric_only`` to ``False`` in :class:`.Resampler` methods (:issue:`47177`)
722723
- Using the method :meth:`DataFrameGroupBy.transform` with a callable that returns DataFrames will align to the input's index (:issue:`47244`)
723724
- When providing a list of columns of length one to :meth:`DataFrame.groupby`, the keys that are returned by iterating over the resulting :class:`DataFrameGroupBy` object will now be tuples of length one (:issue:`47761`)
724725
- Removed deprecated methods :meth:`ExcelWriter.write_cells`, :meth:`ExcelWriter.save`, :meth:`ExcelWriter.cur_sheet`, :meth:`ExcelWriter.handles`, :meth:`ExcelWriter.path` (:issue:`45795`)
725726
- The :class:`ExcelWriter` attribute ``book`` can no longer be set; it is still available to be accessed and mutated (:issue:`48943`)
726727
- Removed unused ``*args`` and ``**kwargs`` in :class:`Rolling`, :class:`Expanding`, and :class:`ExponentialMovingWindow` ops (:issue:`47851`)
728+
- Removed the deprecated argument ``line_terminator`` from :meth:`DataFrame.to_csv` (:issue:`45302`)
729+
- Removed the deprecated argument ``label`` from :func:`lreshape` (:issue:`30219`)
730+
- Arguments after ``expr`` in :meth:`DataFrame.eval` and :meth:`DataFrame.query` are keyword-only (:issue:`47587`)
727731
-
728732

729733
.. ---------------------------------------------------------------------------
@@ -804,6 +808,7 @@ Datetimelike
804808
- Bug in :func:`to_datetime` was giving incorrect results when using ``format='%Y%m%d'`` and ``errors='ignore'`` (:issue:`26493`)
805809
- Bug in :func:`to_datetime` was failing to parse date strings ``'today'`` and ``'now'`` if ``format`` was not ISO8601 (:issue:`50359`)
806810
- Bug in :func:`to_datetime` was not raising ``ValueError`` when invalid format was passed and ``errors`` was ``'ignore'`` or ``'coerce'`` (:issue:`50266`)
811+
- Bug in :class:`DateOffset` was throwing ``TypeError`` when constructing with milliseconds and another super-daily argument (:issue:`49897`)
807812
-
808813

809814
Timedelta

pandas/_libs/src/ujson/lib/ultrajsonenc.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1080,11 +1080,11 @@ void encode(JSOBJ obj, JSONObjectEncoder *enc, const char *name,
10801080

10811081
case JT_UTF8: {
10821082
value = enc->getStringValue(obj, &tc, &szlen);
1083-
Buffer_Reserve(enc, RESERVE_STRING(szlen));
10841083
if (enc->errorMsg) {
10851084
enc->endTypeContext(obj, &tc);
10861085
return;
10871086
}
1087+
Buffer_Reserve(enc, RESERVE_STRING(szlen));
10881088
Buffer_AppendCharUnchecked(enc, '\"');
10891089

10901090
if (enc->forceASCII) {

pandas/_libs/tslibs/offsets.pyx

Lines changed: 47 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -298,43 +298,54 @@ _relativedelta_kwds = {"years", "months", "weeks", "days", "year", "month",
298298

299299

300300
cdef _determine_offset(kwds):
301-
# timedelta is used for sub-daily plural offsets and all singular
302-
# offsets, relativedelta is used for plural offsets of daily length or
303-
# more, nanosecond(s) are handled by apply_wraps
304-
kwds_no_nanos = dict(
305-
(k, v) for k, v in kwds.items()
306-
if k not in ("nanosecond", "nanoseconds")
307-
)
308-
# TODO: Are nanosecond and nanoseconds allowed somewhere?
309-
310-
_kwds_use_relativedelta = ("years", "months", "weeks", "days",
311-
"year", "month", "week", "day", "weekday",
312-
"hour", "minute", "second", "microsecond",
313-
"millisecond")
314-
315-
use_relativedelta = False
316-
if len(kwds_no_nanos) > 0:
317-
if any(k in _kwds_use_relativedelta for k in kwds_no_nanos):
318-
if "millisecond" in kwds_no_nanos:
319-
raise NotImplementedError(
320-
"Using DateOffset to replace `millisecond` component in "
321-
"datetime object is not supported. Use "
322-
"`microsecond=timestamp.microsecond % 1000 + ms * 1000` "
323-
"instead."
324-
)
325-
offset = relativedelta(**kwds_no_nanos)
326-
use_relativedelta = True
327-
else:
328-
# sub-daily offset - use timedelta (tz-aware)
329-
offset = timedelta(**kwds_no_nanos)
330-
elif any(nano in kwds for nano in ("nanosecond", "nanoseconds")):
331-
offset = timedelta(days=0)
332-
else:
333-
# GH 45643/45890: (historically) defaults to 1 day for non-nano
334-
# since datetime.timedelta doesn't handle nanoseconds
335-
offset = timedelta(days=1)
336-
return offset, use_relativedelta
301+
if not kwds:
302+
# GH 45643/45890: (historically) defaults to 1 day
303+
return timedelta(days=1), False
304+
305+
if "millisecond" in kwds:
306+
raise NotImplementedError(
307+
"Using DateOffset to replace `millisecond` component in "
308+
"datetime object is not supported. Use "
309+
"`microsecond=timestamp.microsecond % 1000 + ms * 1000` "
310+
"instead."
311+
)
312+
313+
nanos = {"nanosecond", "nanoseconds"}
314+
315+
# nanos are handled by apply_wraps
316+
if all(k in nanos for k in kwds):
317+
return timedelta(days=0), False
337318

319+
kwds_no_nanos = {k: v for k, v in kwds.items() if k not in nanos}
320+
321+
kwds_use_relativedelta = {
322+
"year", "month", "day", "hour", "minute",
323+
"second", "microsecond", "weekday", "years", "months", "weeks", "days",
324+
"hours", "minutes", "seconds", "microseconds"
325+
}
326+
327+
# "weeks" and "days" are left out despite being valid args for timedelta,
328+
# because (historically) timedelta is used only for sub-daily.
329+
kwds_use_timedelta = {
330+
"seconds", "microseconds", "milliseconds", "minutes", "hours",
331+
}
332+
333+
if all(k in kwds_use_timedelta for k in kwds_no_nanos):
334+
# Sub-daily offset - use timedelta (tz-aware)
335+
# This also handles "milliseconds" (plur): see GH 49897
336+
return timedelta(**kwds_no_nanos), False
337+
338+
# convert milliseconds to microseconds, so relativedelta can parse it
339+
if "milliseconds" in kwds_no_nanos:
340+
micro = kwds_no_nanos.pop("milliseconds") * 1000
341+
kwds_no_nanos["microseconds"] = kwds_no_nanos.get("microseconds", 0) + micro
342+
343+
if all(k in kwds_use_relativedelta for k in kwds_no_nanos):
344+
return relativedelta(**kwds_no_nanos), True
345+
346+
raise ValueError(
347+
f"Invalid argument/s or bad combination of arguments: {list(kwds.keys())}"
348+
)
338349

339350
# ---------------------------------------------------------------------
340351
# Mixins & Singletons
@@ -1163,7 +1174,6 @@ cdef class RelativeDeltaOffset(BaseOffset):
11631174

11641175
def __init__(self, n=1, normalize=False, **kwds):
11651176
BaseOffset.__init__(self, n, normalize)
1166-
11671177
off, use_rd = _determine_offset(kwds)
11681178
object.__setattr__(self, "_offset", off)
11691179
object.__setattr__(self, "_use_relativedelta", use_rd)

pandas/_libs/tslibs/parsing.pyx

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -996,9 +996,11 @@ def guess_datetime_format(dt_str: str, bint dayfirst=False) -> str | None:
996996

997997
cdef str _fill_token(token: str, padding: int):
998998
cdef str token_filled
999-
if "." not in token:
999+
if re.search(r"\d+\.\d+", token) is None:
1000+
# For example: 98
10001001
token_filled = token.zfill(padding)
10011002
else:
1003+
# For example: 00.123
10021004
seconds, nanoseconds = token.split(".")
10031005
seconds = f"{int(seconds):02d}"
10041006
# right-pad so we get nanoseconds, then only take

pandas/_libs/tslibs/timedeltas.pyx

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,5 @@
11
import collections
2+
import warnings
23

34
cimport cython
45
from cpython.object cimport (
@@ -1947,9 +1948,13 @@ class Timedelta(_Timedelta):
19471948

19481949
if other.dtype.kind == "m":
19491950
# also timedelta-like
1950-
# TODO: could suppress
1951-
# RuntimeWarning: invalid value encountered in floor_divide
1952-
result = self.asm8 // other
1951+
with warnings.catch_warnings():
1952+
warnings.filterwarnings(
1953+
"ignore",
1954+
"invalid value encountered in floor_divide",
1955+
RuntimeWarning
1956+
)
1957+
result = self.asm8 // other
19531958
mask = other.view("i8") == NPY_NAT
19541959
if mask.any():
19551960
# We differ from numpy here
@@ -1987,9 +1992,13 @@ class Timedelta(_Timedelta):
19871992

19881993
if other.dtype.kind == "m":
19891994
# also timedelta-like
1990-
# TODO: could suppress
1991-
# RuntimeWarning: invalid value encountered in floor_divide
1992-
result = other // self.asm8
1995+
with warnings.catch_warnings():
1996+
warnings.filterwarnings(
1997+
"ignore",
1998+
"invalid value encountered in floor_divide",
1999+
RuntimeWarning
2000+
)
2001+
result = other // self.asm8
19932002
mask = other.view("i8") == NPY_NAT
19942003
if mask.any():
19952004
# We differ from numpy here

pandas/core/arrays/arrow/array.py

Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -853,6 +853,45 @@ def _concat_same_type(
853853
arr = pa.chunked_array(chunks)
854854
return cls(arr)
855855

856+
def _accumulate(
857+
self, name: str, *, skipna: bool = True, **kwargs
858+
) -> ArrowExtensionArray | ExtensionArray:
859+
"""
860+
Return an ExtensionArray performing an accumulation operation.
861+
862+
The underlying data type might change.
863+
864+
Parameters
865+
----------
866+
name : str
867+
Name of the function, supported values are:
868+
- cummin
869+
- cummax
870+
- cumsum
871+
- cumprod
872+
skipna : bool, default True
873+
If True, skip NA values.
874+
**kwargs
875+
Additional keyword arguments passed to the accumulation function.
876+
Currently, there is no supported kwarg.
877+
878+
Returns
879+
-------
880+
array
881+
882+
Raises
883+
------
884+
NotImplementedError : subclass does not define accumulations
885+
"""
886+
pyarrow_name = {
887+
"cumsum": "cumulative_sum_checked",
888+
}.get(name, name)
889+
pyarrow_meth = getattr(pc, pyarrow_name, None)
890+
if pyarrow_meth is None:
891+
return super()._accumulate(name, skipna=skipna, **kwargs)
892+
result = pyarrow_meth(self._data, skip_nulls=skipna, **kwargs)
893+
return type(self)(result)
894+
856895
def _reduce(self, name: str, *, skipna: bool = True, **kwargs):
857896
"""
858897
Return a scalar result of performing the reduction operation.

pandas/core/arrays/masked.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -727,12 +727,13 @@ def _cmp_method(self, other, op) -> BooleanArray:
727727
mask = np.ones(self._data.shape, dtype="bool")
728728
else:
729729
with warnings.catch_warnings():
730-
# numpy may show a FutureWarning:
730+
# numpy may show a FutureWarning or DeprecationWarning:
731731
# elementwise comparison failed; returning scalar instead,
732732
# but in the future will perform elementwise comparison
733733
# before returning NotImplemented. We fall back to the correct
734734
# behavior today, so that should be fine to ignore.
735735
warnings.filterwarnings("ignore", "elementwise", FutureWarning)
736+
warnings.filterwarnings("ignore", "elementwise", DeprecationWarning)
736737
with np.errstate(all="ignore"):
737738
method = getattr(self._data, f"__{op.__name__}__")
738739
result = method(other)

0 commit comments

Comments
 (0)