Skip to content

Commit 5a2d126

Browse files
Merge branch 'master' into b6
2 parents af0585c + 3e9f09f commit 5a2d126

File tree

76 files changed

+2796
-1452
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

76 files changed

+2796
-1452
lines changed

.gitignore

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -50,6 +50,8 @@ dist
5050
*.egg-info
5151
.eggs
5252
.pypirc
53+
# type checkers
54+
pandas/py.typed
5355

5456
# tox testing tool
5557
.tox

doc/source/development/contributing_codebase.rst

Lines changed: 36 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -303,7 +303,7 @@ pandas strongly encourages the use of :pep:`484` style type hints. New developme
303303
Style guidelines
304304
~~~~~~~~~~~~~~~~
305305

306-
Types imports should follow the ``from typing import ...`` convention. So rather than
306+
Type imports should follow the ``from typing import ...`` convention. Some types do not need to be imported since :pep:`585` some builtin constructs, such as ``list`` and ``tuple``, can directly be used for type annotations. So rather than
307307

308308
.. code-block:: python
309309
@@ -315,21 +315,31 @@ You should write
315315

316316
.. code-block:: python
317317
318-
from typing import List, Optional, Union
318+
primes: list[int] = []
319319
320-
primes: List[int] = []
320+
``Optional`` should be avoided in favor of the shorter ``| None``, so instead of
321321

322-
``Optional`` should be used where applicable, so instead of
322+
.. code-block:: python
323+
324+
from typing import Union
325+
326+
maybe_primes: list[Union[int, None]] = []
327+
328+
or
323329

324330
.. code-block:: python
325331
326-
maybe_primes: List[Union[int, None]] = []
332+
from typing import Optional
333+
334+
maybe_primes: list[Optional[int]] = []
327335
328336
You should write
329337

330338
.. code-block:: python
331339
332-
maybe_primes: List[Optional[int]] = []
340+
from __future__ import annotations # noqa: F404
341+
342+
maybe_primes: list[int | None] = []
333343
334344
In some cases in the code base classes may define class variables that shadow builtins. This causes an issue as described in `Mypy 1775 <https://github.com/python/mypy/issues/1775#issuecomment-310969854>`_. The defensive solution here is to create an unambiguous alias of the builtin and use that without your annotation. For example, if you come across a definition like
335345

@@ -410,6 +420,26 @@ A recent version of ``numpy`` (>=1.21.0) is required for type validation.
410420

411421
.. _contributing.ci:
412422

423+
Testing type hints in code using pandas
424+
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
425+
426+
.. warning::
427+
428+
* Pandas is not yet a py.typed library (:pep:`561`)!
429+
The primary purpose of locally declaring pandas as a py.typed library is to test and
430+
improve the pandas-builtin type annotations.
431+
432+
Until pandas becomes a py.typed library, it is possible to easily experiment with the type
433+
annotations shipped with pandas by creating an empty file named "py.typed" in the pandas
434+
installation folder:
435+
436+
.. code-block:: none
437+
438+
python -c "import pandas; import pathlib; (pathlib.Path(pandas.__path__[0]) / 'py.typed').touch()"
439+
440+
The existence of the py.typed file signals to type checkers that pandas is already a py.typed
441+
library. This makes type checkers aware of the type annotations shipped with pandas.
442+
413443
Testing with continuous integration
414444
-----------------------------------
415445

doc/source/development/developer.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -180,7 +180,7 @@ As an example of fully-formed metadata:
180180
'numpy_type': 'int64',
181181
'metadata': None}
182182
],
183-
'pandas_version': '0.20.0',
183+
'pandas_version': '1.4.0',
184184
'creator': {
185185
'library': 'pyarrow',
186186
'version': '0.13.0'

doc/source/reference/groupby.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -122,6 +122,7 @@ application to columns of a specific data type.
122122
DataFrameGroupBy.skew
123123
DataFrameGroupBy.take
124124
DataFrameGroupBy.tshift
125+
DataFrameGroupBy.value_counts
125126

126127
The following methods are available only for ``SeriesGroupBy`` objects.
127128

doc/source/user_guide/io.rst

Lines changed: 11 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1903,6 +1903,7 @@ with optional parameters:
19031903
``index``; dict like {index -> {column -> value}}
19041904
``columns``; dict like {column -> {index -> value}}
19051905
``values``; just the values array
1906+
``table``; adhering to the JSON `Table Schema`_
19061907

19071908
* ``date_format`` : string, type of date conversion, 'epoch' for timestamp, 'iso' for ISO8601.
19081909
* ``double_precision`` : The number of decimal places to use when encoding floating point values, default 10.
@@ -2477,7 +2478,6 @@ A few notes on the generated table schema:
24772478
* For ``MultiIndex``, ``mi.names`` is used. If any level has no name,
24782479
then ``level_<i>`` is used.
24792480

2480-
24812481
``read_json`` also accepts ``orient='table'`` as an argument. This allows for
24822482
the preservation of metadata such as dtypes and index names in a
24832483
round-trippable manner.
@@ -2519,8 +2519,18 @@ indicate missing values and the subsequent read cannot distinguish the intent.
25192519
25202520
os.remove("test.json")
25212521
2522+
When using ``orient='table'`` along with user-defined ``ExtensionArray``,
2523+
the generated schema will contain an additional ``extDtype`` key in the respective
2524+
``fields`` element. This extra key is not standard but does enable JSON roundtrips
2525+
for extension types (e.g. ``read_json(df.to_json(orient="table"), orient="table")``).
2526+
2527+
The ``extDtype`` key carries the name of the extension, if you have properly registered
2528+
the ``ExtensionDtype``, pandas will use said name to perform a lookup into the registry
2529+
and re-convert the serialized data into your custom dtype.
2530+
25222531
.. _Table Schema: https://specs.frictionlessdata.io/table-schema/
25232532

2533+
25242534
HTML
25252535
----
25262536

doc/source/user_guide/timeseries.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2424,7 +2424,7 @@ you can use the ``tz_convert`` method.
24242424

24252425
For ``pytz`` time zones, it is incorrect to pass a time zone object directly into
24262426
the ``datetime.datetime`` constructor
2427-
(e.g., ``datetime.datetime(2011, 1, 1, tz=pytz.timezone('US/Eastern'))``.
2427+
(e.g., ``datetime.datetime(2011, 1, 1, tzinfo=pytz.timezone('US/Eastern'))``.
24282428
Instead, the datetime needs to be localized using the ``localize`` method
24292429
on the ``pytz`` time zone object.
24302430

doc/source/whatsnew/v1.4.0.rst

Lines changed: 12 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -217,9 +217,10 @@ Other enhancements
217217
- Added :meth:`.ExponentialMovingWindow.sum` (:issue:`13297`)
218218
- :meth:`Series.str.split` now supports a ``regex`` argument that explicitly specifies whether the pattern is a regular expression. Default is ``None`` (:issue:`43563`, :issue:`32835`, :issue:`25549`)
219219
- :meth:`DataFrame.dropna` now accepts a single label as ``subset`` along with array-like (:issue:`41021`)
220+
- Added :meth:`DataFrameGroupBy.value_counts` (:issue:`43564`)
220221
- :class:`ExcelWriter` argument ``if_sheet_exists="overlay"`` option added (:issue:`40231`)
221222
- :meth:`read_excel` now accepts a ``decimal`` argument that allow the user to specify the decimal point when parsing string columns to numeric (:issue:`14403`)
222-
- :meth:`.GroupBy.mean`, :meth:`.GroupBy.std`, and :meth:`.GroupBy.var` now supports `Numba <http://numba.pydata.org/>`_ execution with the ``engine`` keyword (:issue:`43731`, :issue:`44862`)
223+
- :meth:`.GroupBy.mean`, :meth:`.GroupBy.std`, :meth:`.GroupBy.var`, :meth:`.GroupBy.sum` now supports `Numba <http://numba.pydata.org/>`_ execution with the ``engine`` keyword (:issue:`43731`, :issue:`44862`, :issue:`44939`)
223224
- :meth:`Timestamp.isoformat`, now handles the ``timespec`` argument from the base :class:``datetime`` class (:issue:`26131`)
224225
- :meth:`NaT.to_numpy` ``dtype`` argument is now respected, so ``np.timedelta64`` can be returned (:issue:`44460`)
225226
- New option ``display.max_dir_items`` customizes the number of columns added to :meth:`Dataframe.__dir__` and suggested for tab completion (:issue:`37996`)
@@ -231,6 +232,7 @@ Other enhancements
231232
- :meth:`UInt64Index.map` now retains ``dtype`` where possible (:issue:`44609`)
232233
- :meth:`read_json` can now parse unsigned long long integers (:issue:`26068`)
233234
- :meth:`DataFrame.take` now raises a ``TypeError`` when passed a scalar for the indexer (:issue:`42875`)
235+
- :class:`ExtensionDtype` and :class:`ExtensionArray` are now (de)serialized when exporting a :class:`DataFrame` with :meth:`DataFrame.to_json` using ``orient='table'`` (:issue:`20612`, :issue:`44705`).
234236
-
235237

236238

@@ -454,6 +456,7 @@ Other API changes
454456
- :meth:`Index.get_indexer_for` no longer accepts keyword arguments (other than 'target'); in the past these would be silently ignored if the index was not unique (:issue:`42310`)
455457
- Change in the position of the ``min_rows`` argument in :meth:`DataFrame.to_string` due to change in the docstring (:issue:`44304`)
456458
- Reduction operations for :class:`DataFrame` or :class:`Series` now raising a ``ValueError`` when ``None`` is passed for ``skipna`` (:issue:`44178`)
459+
- :func:`read_csv` and :func:`read_html` no longer raising an error when one of the header rows consists only of ``Unnamed:`` columns (:issue:`13054`)
457460
- Changed the ``name`` attribute of several holidays in
458461
``USFederalHolidayCalendar`` to match `official federal holiday
459462
names <https://www.opm.gov/policy-data-oversight/pay-leave/federal-holidays/>`_
@@ -529,7 +532,7 @@ Other Deprecations
529532
- Deprecated silent dropping of columns that raised a ``TypeError`` in :class:`Series.transform` and :class:`DataFrame.transform` when used with a dictionary (:issue:`43740`)
530533
- Deprecated silent dropping of columns that raised a ``TypeError``, ``DataError``, and some cases of ``ValueError`` in :meth:`Series.aggregate`, :meth:`DataFrame.aggregate`, :meth:`Series.groupby.aggregate`, and :meth:`DataFrame.groupby.aggregate` when used with a list (:issue:`43740`)
531534
- Deprecated casting behavior when setting timezone-aware value(s) into a timezone-aware :class:`Series` or :class:`DataFrame` column when the timezones do not match. Previously this cast to object dtype. In a future version, the values being inserted will be converted to the series or column's existing timezone (:issue:`37605`)
532-
- Deprecated casting behavior when passing an item with mismatched-timezone to :meth:`DatetimeIndex.insert`, :meth:`DatetimeIndex.putmask`, :meth:`DatetimeIndex.where` :meth:`DatetimeIndex.fillna`, :meth:`Series.mask`, :meth:`Series.where`, :meth:`Series.fillna`, :meth:`Series.shift`, :meth:`Series.replace`, :meth:`Series.reindex` (and :class:`DataFrame` column analogues). In the past this has cast to object dtype. In a future version, these will cast the passed item to the index or series's timezone (:issue:`37605`)
535+
- Deprecated casting behavior when passing an item with mismatched-timezone to :meth:`DatetimeIndex.insert`, :meth:`DatetimeIndex.putmask`, :meth:`DatetimeIndex.where` :meth:`DatetimeIndex.fillna`, :meth:`Series.mask`, :meth:`Series.where`, :meth:`Series.fillna`, :meth:`Series.shift`, :meth:`Series.replace`, :meth:`Series.reindex` (and :class:`DataFrame` column analogues). In the past this has cast to object dtype. In a future version, these will cast the passed item to the index or series's timezone (:issue:`37605`,:issue:`44940`)
533536
- Deprecated the 'errors' keyword argument in :meth:`Series.where`, :meth:`DataFrame.where`, :meth:`Series.mask`, and meth:`DataFrame.mask`; in a future version the argument will be removed (:issue:`44294`)
534537
- Deprecated the ``prefix`` keyword argument in :func:`read_csv` and :func:`read_table`, in a future version the argument will be removed (:issue:`43396`)
535538
- Deprecated :meth:`PeriodIndex.astype` to ``datetime64[ns]`` or ``DatetimeTZDtype``, use ``obj.to_timestamp(how).tz_localize(dtype.tz)`` instead (:issue:`44398`)
@@ -540,6 +543,7 @@ Other Deprecations
540543
- Deprecated parameter ``names`` in :meth:`Index.copy` (:issue:`44916`)
541544
- A deprecation warning is now shown for :meth:`DataFrame.to_latex` indicating the arguments signature may change and emulate more the arguments to :meth:`.Styler.to_latex` in future versions (:issue:`44411`)
542545
- Deprecated :meth:`Categorical.replace`, use :meth:`Series.replace` instead (:issue:`44929`)
546+
- Deprecated :meth:`Index.__getitem__` with a bool key; use ``index.values[key]`` to get the old behavior (:issue:`44051`)
543547
-
544548

545549
.. ---------------------------------------------------------------------------
@@ -627,6 +631,7 @@ Datetimelike
627631
- Bug in adding a ``np.timedelta64`` object to a :class:`BusinessDay` or :class:`CustomBusinessDay` object incorrectly raising (:issue:`44532`)
628632
- Bug in :meth:`Index.insert` for inserting ``np.datetime64``, ``np.timedelta64`` or ``tuple`` into :class:`Index` with ``dtype='object'`` with negative loc adding ``None`` and replacing existing value (:issue:`44509`)
629633
- Bug in :meth:`Series.mode` with ``DatetimeTZDtype`` incorrectly returning timezone-naive and ``PeriodDtype`` incorrectly raising (:issue:`41927`)
634+
- Bug in :class:`DateOffset`` addition with :class:`Timestamp` where ``offset.nanoseconds`` would not be included in the result. (:issue:`43968`)
630635
-
631636

632637
Timedelta
@@ -760,6 +765,7 @@ I/O
760765
- :meth:`DataFrame.to_csv` and :meth:`Series.to_csv` with ``compression`` set to ``'zip'`` no longer create a zip file containing a file ending with ".zip". Instead, they try to infer the inner file name more smartly. (:issue:`39465`)
761766
- Bug in :func:`read_csv` where reading a mixed column of booleans and missing values to a float type results in the missing values becoming 1.0 rather than NaN (:issue:`42808`, :issue:`34120`)
762767
- Bug in :func:`read_csv` when passing simultaneously a parser in ``date_parser`` and ``parse_dates=False``, the parsing was still called (:issue:`44366`)
768+
- Bug in :func:`read_csv` not setting name of :class:`MultiIndex` columns correctly when ``index_col`` is not the first column (:issue:`38549`)
763769
- Bug in :func:`read_csv` silently ignoring errors when failling to create a memory-mapped file (:issue:`44766`)
764770
- Bug in :func:`read_csv` when passing a ``tempfile.SpooledTemporaryFile`` opened in binary mode (:issue:`44748`)
765771
-
@@ -769,6 +775,7 @@ Period
769775
- Bug in adding a :class:`Period` object to a ``np.timedelta64`` object incorrectly raising ``TypeError`` (:issue:`44182`)
770776
- Bug in :meth:`PeriodIndex.to_timestamp` when the index has ``freq="B"`` inferring ``freq="D"`` for its result instead of ``freq="B"`` (:issue:`44105`)
771777
- Bug in :class:`Period` constructor incorrectly allowing ``np.timedelta64("NaT")`` (:issue:`44507`)
778+
- Bug in :meth:`PeriodIndex.to_timestamp` giving incorrect values for indexes with non-contiguous data (:issue:`44100`)
772779
-
773780

774781
Plotting
@@ -794,6 +801,7 @@ Groupby/resample/rolling
794801
- Bug in :meth:`GroupBy.mean` failing with ``complex`` dtype (:issue:`43701`)
795802
- Fixed bug in :meth:`Series.rolling` and :meth:`DataFrame.rolling` not calculating window bounds correctly for the first row when ``center=True`` and index is decreasing (:issue:`43927`)
796803
- Fixed bug in :meth:`Series.rolling` and :meth:`DataFrame.rolling` for centered datetimelike windows with uneven nanosecond (:issue:`43997`)
804+
- Bug in :meth:`GroupBy.mean` raising ``KeyError`` when column was selected at least twice (:issue:`44924`)
797805
- Bug in :meth:`GroupBy.nth` failing on ``axis=1`` (:issue:`43926`)
798806
- Fixed bug in :meth:`Series.rolling` and :meth:`DataFrame.rolling` not respecting right bound on centered datetime-like windows, if the index contain duplicates (:issue:`3944`)
799807
- Bug in :meth:`Series.rolling` and :meth:`DataFrame.rolling` when using a :class:`pandas.api.indexers.BaseIndexer` subclass that returned unequal start and end arrays would segfault instead of raising a ``ValueError`` (:issue:`44470`)
@@ -824,6 +832,7 @@ Sparse
824832
- Bug in :meth:`SparseArray.max` and :meth:`SparseArray.min` raising ``ValueError`` for arrays with 0 non-null elements (:issue:`43527`)
825833
- Bug in :meth:`DataFrame.sparse.to_coo` silently converting non-zero fill values to zero (:issue:`24817`)
826834
- Bug in :class:`SparseArray` comparison methods with an array-like operand of mismatched length raising ``AssertionError`` or unclear ``ValueError`` depending on the input (:issue:`43863`)
835+
- Bug in :class:`SparseArray` arithmetic methods ``floordiv`` and ``mod`` behaviors when dividing by zero not matching the non-sparse :class:`Series` behavior (:issue:`38172`)
827836
-
828837

829838
ExtensionArray
@@ -837,7 +846,7 @@ ExtensionArray
837846
- Bug in :func:`array` incorrectly raising when passed a ``ndarray`` with ``float16`` dtype (:issue:`44715`)
838847
- Bug in calling ``np.sqrt`` on :class:`BooleanArray` returning a malformed :class:`FloatingArray` (:issue:`44715`)
839848
- Bug in :meth:`Series.where` with ``ExtensionDtype`` when ``other`` is a NA scalar incompatible with the series dtype (e.g. ``NaT`` with a numeric dtype) incorrectly casting to a compatible NA value (:issue:`44697`)
840-
-
849+
- Fixed bug in :meth:`Series.replace` with ``FloatDtype``, ``string[python]``, or ``string[pyarrow]`` dtype not being preserved when possible (:issue:`33484`)
841850

842851
Styler
843852
^^^^^^

pandas/__init__.py

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,13 +23,15 @@
2323

2424
try:
2525
from pandas._libs import hashtable as _hashtable, lib as _lib, tslib as _tslib
26-
except ImportError as e: # pragma: no cover
27-
module = e.name
26+
except ImportError as err: # pragma: no cover
27+
module = err.name
2828
raise ImportError(
2929
f"C extension: {module} not built. If you want to import "
3030
"pandas from the source directory, you may need to run "
3131
"'python setup.py build_ext --force' to build the C extensions first."
32-
) from e
32+
) from err
33+
else:
34+
del _tslib, _lib, _hashtable
3335

3436
from pandas._config import (
3537
get_option,

pandas/_libs/sparse_op_helper.pxi.in

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -42,6 +42,11 @@ cdef inline sparse_t __mod__(sparse_t a, sparse_t b):
4242
cdef inline sparse_t __floordiv__(sparse_t a, sparse_t b):
4343
if b == 0:
4444
if sparse_t is float64_t:
45+
# Match non-sparse Series behavior implemented in mask_zero_div_zero
46+
if a > 0:
47+
return INF
48+
elif a < 0:
49+
return -INF
4550
return NaN
4651
else:
4752
return 0

pandas/_libs/tslibs/offsets.pyx

Lines changed: 11 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -186,8 +186,9 @@ def apply_wraps(func):
186186
if self.normalize:
187187
result = result.normalize()
188188

189-
# nanosecond may be deleted depending on offset process
190-
if not self.normalize and nano != 0:
189+
# If the offset object does not have a nanoseconds component,
190+
# the result's nanosecond component may be lost.
191+
if not self.normalize and nano != 0 and not hasattr(self, "nanoseconds"):
191192
if result.nanosecond != nano:
192193
if result.tz is not None:
193194
# convert to UTC
@@ -333,7 +334,7 @@ cdef _determine_offset(kwds):
333334
# sub-daily offset - use timedelta (tz-aware)
334335
offset = timedelta(**kwds_no_nanos)
335336
else:
336-
offset = timedelta(1)
337+
offset = timedelta(0)
337338
return offset, use_relativedelta
338339

339340

@@ -1068,12 +1069,17 @@ cdef class RelativeDeltaOffset(BaseOffset):
10681069
# perform calculation in UTC
10691070
other = other.replace(tzinfo=None)
10701071

1072+
if hasattr(self, "nanoseconds"):
1073+
td_nano = Timedelta(nanoseconds=self.nanoseconds)
1074+
else:
1075+
td_nano = Timedelta(0)
1076+
10711077
if self.n > 0:
10721078
for i in range(self.n):
1073-
other = other + self._offset
1079+
other = other + self._offset + td_nano
10741080
else:
10751081
for i in range(-self.n):
1076-
other = other - self._offset
1082+
other = other - self._offset - td_nano
10771083

10781084
if tzinfo is not None and self._use_relativedelta:
10791085
# bring tz back from UTC calculation

pandas/_libs/tslibs/period.pyx

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1088,6 +1088,7 @@ def period_asfreq_arr(ndarray[int64_t] arr, int freq1, int freq2, bint end):
10881088
"""
10891089
cdef:
10901090
Py_ssize_t n = len(arr)
1091+
Py_ssize_t increment = arr.strides[0] // 8
10911092
ndarray[int64_t] result = np.empty(n, dtype=np.int64)
10921093

10931094
_period_asfreq(
@@ -1097,6 +1098,7 @@ def period_asfreq_arr(ndarray[int64_t] arr, int freq1, int freq2, bint end):
10971098
freq1,
10981099
freq2,
10991100
end,
1101+
increment,
11001102
)
11011103
return result
11021104

@@ -1110,6 +1112,7 @@ cdef void _period_asfreq(
11101112
int freq1,
11111113
int freq2,
11121114
bint end,
1115+
Py_ssize_t increment=1,
11131116
):
11141117
"""See period_asfreq.__doc__"""
11151118
cdef:
@@ -1127,7 +1130,7 @@ cdef void _period_asfreq(
11271130
get_asfreq_info(freq1, freq2, end, &af_info)
11281131

11291132
for i in range(length):
1130-
val = ordinals[i]
1133+
val = ordinals[i * increment]
11311134
if val != NPY_NAT:
11321135
val = func(val, &af_info)
11331136
out[i] = val

0 commit comments

Comments
 (0)