Skip to content

Commit 42a1b35

Browse files
author
Marc Garcia
committed
Merge remote-tracking branch 'upstream/master' into plot_api
2 parents a2330b2 + 8507170 commit 42a1b35

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

82 files changed

+1094
-405
lines changed

ci/deps/azure-37-locale.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ dependencies:
1010
- jinja2
1111
- lxml
1212
- matplotlib
13+
- moto
1314
- nomkl
1415
- numexpr
1516
- numpy
@@ -32,4 +33,3 @@ dependencies:
3233
- pip
3334
- pip:
3435
- hypothesis>=3.58.0
35-
- moto # latest moto in conda-forge fails with 3.7, move to conda dependencies when this is fixed

ci/deps/azure-windows-37.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ dependencies:
1010
- jinja2
1111
- lxml
1212
- matplotlib=2.2.*
13+
- moto
1314
- numexpr
1415
- numpy=1.14.*
1516
- openpyxl
@@ -29,6 +30,5 @@ dependencies:
2930
- pytest-xdist
3031
- pytest-mock
3132
- pytest-azurepipelines
32-
- moto
3333
- hypothesis>=3.58.0
3434
- pyreadstat

ci/deps/travis-36-cov.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ dependencies:
1212
- geopandas
1313
- html5lib
1414
- matplotlib
15+
- moto
1516
- nomkl
1617
- numexpr
1718
- numpy=1.15.*
@@ -46,6 +47,5 @@ dependencies:
4647
- pip:
4748
- brotlipy
4849
- coverage
49-
- moto
5050
- pandas-datareader
5151
- python-dateutil

ci/deps/travis-36-locale.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ dependencies:
1414
- jinja2
1515
- lxml=3.8.0
1616
- matplotlib=3.0.*
17+
- moto
1718
- nomkl
1819
- numexpr
1920
- numpy
@@ -36,7 +37,6 @@ dependencies:
3637
- pytest>=4.0.2
3738
- pytest-xdist
3839
- pytest-mock
39-
- moto
4040
- pip
4141
- pip:
4242
- hypothesis>=3.58.0

doc/source/development/extending.rst

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -208,6 +208,25 @@ will
208208
2. call ``result = op(values, ExtensionArray)``
209209
3. re-box the result in a ``Series``
210210

211+
.. _extending.extension.ufunc:
212+
213+
NumPy Universal Functions
214+
^^^^^^^^^^^^^^^^^^^^^^^^^
215+
216+
:class:`Series` implements ``__array_ufunc__``. As part of the implementation,
217+
pandas unboxes the ``ExtensionArray`` from the :class:`Series`, applies the ufunc,
218+
and re-boxes it if necessary.
219+
220+
If applicable, we highly recommend that you implement ``__array_ufunc__`` in your
221+
extension array to avoid coercion to an ndarray. See
222+
`the numpy documentation <https://docs.scipy.org/doc/numpy/reference/generated/numpy.lib.mixins.NDArrayOperatorsMixin.html>`__
223+
for an example.
224+
225+
As part of your implementation, we require that you defer to pandas when a pandas
226+
container (:class:`Series`, :class:`DataFrame`, :class:`Index`) is detected in ``inputs``.
227+
If any of those is present, you should return ``NotImplemented``. Pandas will take care of
228+
unboxing the array from the container and re-calling the ufunc with the unwrapped input.
229+
211230
.. _extending.extension.testing:
212231

213232
Testing extension arrays

doc/source/getting_started/dsintro.rst

Lines changed: 42 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -731,28 +731,62 @@ DataFrame interoperability with NumPy functions
731731
.. _dsintro.numpy_interop:
732732

733733
Elementwise NumPy ufuncs (log, exp, sqrt, ...) and various other NumPy functions
734-
can be used with no issues on DataFrame, assuming the data within are numeric:
734+
can be used with no issues on Series and DataFrame, assuming the data within
735+
are numeric:
735736

736737
.. ipython:: python
737738
738739
np.exp(df)
739740
np.asarray(df)
740741
741-
The dot method on DataFrame implements matrix multiplication:
742+
DataFrame is not intended to be a drop-in replacement for ndarray as its
743+
indexing semantics and data model are quite different in places from an n-dimensional
744+
array.
745+
746+
:class:`Series` implements ``__array_ufunc__``, which allows it to work with NumPy's
747+
`universal functions <https://docs.scipy.org/doc/numpy/reference/ufuncs.html>`_.
748+
749+
The ufunc is applied to the underlying array in a Series.
742750

743751
.. ipython:: python
744752
745-
df.T.dot(df)
753+
ser = pd.Series([1, 2, 3, 4])
754+
np.exp(ser)
746755
747-
Similarly, the dot method on Series implements dot product:
756+
Like other parts of the library, pandas will automatically align labeled inputs
757+
as part of a ufunc with multiple inputs. For example, using :meth:`numpy.remainder`
758+
on two :class:`Series` with differently ordered labels will align before the operation.
748759

749760
.. ipython:: python
750761
751-
s1 = pd.Series(np.arange(5, 10))
752-
s1.dot(s1)
762+
ser1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
763+
ser2 = pd.Series([1, 3, 5], index=['b', 'a', 'c'])
764+
ser1
765+
ser2
766+
np.remainder(ser1, ser2)
753767
754-
DataFrame is not intended to be a drop-in replacement for ndarray as its
755-
indexing semantics are quite different in places from a matrix.
768+
As usual, the union of the two indices is taken, and non-overlapping values are filled
769+
with missing values.
770+
771+
.. ipython:: python
772+
773+
ser3 = pd.Series([2, 4, 6], index=['b', 'c', 'd'])
774+
ser3
775+
np.remainder(ser1, ser3)
776+
777+
When a binary ufunc is applied to a :class:`Series` and :class:`Index`, the Series
778+
implementation takes precedence and a Series is returned.
779+
780+
.. ipython:: python
781+
782+
ser = pd.Series([1, 2, 3])
783+
idx = pd.Index([4, 5, 6])
784+
785+
np.maximum(ser, idx)
786+
787+
NumPy ufuncs are safe to apply to :class:`Series` backed by non-ndarray arrays,
788+
for example :class:`SparseArray` (see :ref:`sparse.calculation`). If possible,
789+
the ufunc is applied without converting the underlying data to an ndarray.
756790

757791
Console display
758792
~~~~~~~~~~~~~~~

doc/source/reference/arrays.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -335,6 +335,7 @@ A collection of intervals may be stored in an :class:`arrays.IntervalArray`.
335335
arrays.IntervalArray.from_arrays
336336
arrays.IntervalArray.from_tuples
337337
arrays.IntervalArray.from_breaks
338+
arrays.IntervalArray.contains
338339
arrays.IntervalArray.overlaps
339340
arrays.IntervalArray.set_closed
340341
arrays.IntervalArray.to_tuples

doc/source/reference/indexing.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -248,7 +248,6 @@ IntervalIndex components
248248
IntervalIndex.from_arrays
249249
IntervalIndex.from_tuples
250250
IntervalIndex.from_breaks
251-
IntervalIndex.contains
252251
IntervalIndex.left
253252
IntervalIndex.right
254253
IntervalIndex.mid
@@ -260,6 +259,7 @@ IntervalIndex components
260259
IntervalIndex.get_loc
261260
IntervalIndex.get_indexer
262261
IntervalIndex.set_closed
262+
IntervalIndex.contains
263263
IntervalIndex.overlaps
264264
IntervalIndex.to_tuples
265265

doc/source/user_guide/computation.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@
55
Computational tools
66
===================
77

8+
89
Statistical functions
910
---------------------
1011

doc/source/user_guide/io.rst

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -108,8 +108,7 @@ header : int or list of ints, default ``'infer'``
108108
line of data rather than the first line of the file.
109109
names : array-like, default ``None``
110110
List of column names to use. If file contains no header row, then you should
111-
explicitly pass ``header=None``. Duplicates in this list will cause
112-
a ``UserWarning`` to be issued.
111+
explicitly pass ``header=None``. Duplicates in this list are not allowed.
113112
index_col : int, str, sequence of int / str, or False, default ``None``
114113
Column(s) to use as the row labels of the ``DataFrame``, either given as
115114
string name or column index. If a sequence of int / str is given, a

doc/source/whatsnew/v0.19.0.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -218,7 +218,7 @@ contained the values ``[0, 3]``.
218218
**New behavior**:
219219

220220
.. ipython:: python
221-
:okwarning:
221+
:okexcept:
222222
223223
pd.read_csv(StringIO(data), names=names)
224224

doc/source/whatsnew/v0.25.0.rst

Lines changed: 15 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -567,6 +567,7 @@ Other API changes
567567
- Using an unsupported version of Beautiful Soup 4 will now raise an ``ImportError`` instead of a ``ValueError`` (:issue:`27063`)
568568
- :meth:`Series.to_excel` and :meth:`DataFrame.to_excel` will now raise a ``ValueError`` when saving timezone aware data. (:issue:`27008`, :issue:`7056`)
569569
- :meth:`DataFrame.to_hdf` and :meth:`Series.to_hdf` will now raise a ``NotImplementedError`` when saving a :class:`MultiIndex` with extention data types for a ``fixed`` format. (:issue:`7775`)
570+
- Passing duplicate ``names`` in :meth:`read_csv` will now raise a ``ValueError`` (:issue:`17346`)
570571

571572
.. _whatsnew_0250.deprecations:
572573

@@ -615,12 +616,16 @@ Other deprecations
615616
Use the public attributes :attr:`~RangeIndex.start`, :attr:`~RangeIndex.stop` and :attr:`~RangeIndex.step` instead (:issue:`26581`).
616617
- The :meth:`Series.ftype`, :meth:`Series.ftypes` and :meth:`DataFrame.ftypes` methods are deprecated and will be removed in a future version.
617618
Instead, use :meth:`Series.dtype` and :meth:`DataFrame.dtypes` (:issue:`26705`).
619+
- The :meth:`Series.get_values`, :meth:`DataFrame.get_values`, :meth:`Index.get_values`,
620+
:meth:`SparseArray.get_values` and :meth:`Categorical.get_values` methods are deprecated.
621+
One of ``np.asarray(..)`` or :meth:`~Series.to_numpy` can be used instead (:issue:`19617`).
618622
- :meth:`Timedelta.resolution` is deprecated and replaced with :meth:`Timedelta.resolution_string`. In a future version, :meth:`Timedelta.resolution` will be changed to behave like the standard library :attr:`timedelta.resolution` (:issue:`21344`)
619623
- :func:`read_table` has been undeprecated. (:issue:`25220`)
620624
- :attr:`Index.dtype_str` is deprecated. (:issue:`18262`)
621625
- :attr:`Series.imag` and :attr:`Series.real` are deprecated. (:issue:`18262`)
622626
- :meth:`Series.put` is deprecated. (:issue:`18262`)
623627
- :meth:`Index.item` and :meth:`Series.item` is deprecated. (:issue:`18262`)
628+
- :meth:`Index.contains` is deprecated. Use ``key in index`` (``__contains__``) instead (:issue:`17753`).
624629

625630
.. _whatsnew_0250.prior_deprecations:
626631

@@ -641,6 +646,8 @@ Removal of prior version deprecations/changes
641646
- Removed the previously deprecated ``raise_on_error`` keyword argument in :meth:`DataFrame.where` and :meth:`DataFrame.mask` (:issue:`17744`)
642647
- Removed the previously deprecated ``ordered`` and ``categories`` keyword arguments in ``astype`` (:issue:`17742`)
643648
- Removed the previously deprecated ``cdate_range`` (:issue:`17691`)
649+
- Removed the previously deprecated ``True`` option for the ``dropna`` keyword argument in :func:`SeriesGroupBy.nth` (:issue:`17493`)
650+
- Removed the previously deprecated ``convert`` keyword argument in :meth:`Series.take` and :meth:`DataFrame.take`(:issue:`17352`)
644651

645652
.. _whatsnew_0250.performance:
646653

@@ -757,7 +764,7 @@ Interval
757764

758765
- Construction of :class:`Interval` is restricted to numeric, :class:`Timestamp` and :class:`Timedelta` endpoints (:issue:`23013`)
759766
- Fixed bug in :class:`Series`/:class:`DataFrame` not displaying ``NaN`` in :class:`IntervalIndex` with missing values (:issue:`25984`)
760-
-
767+
- Bug in :class:`Index` constructor where passing mixed closed :class:`Interval` objects would result in a ``ValueError`` instead of an ``object`` dtype ``Index`` (:issue:`27172`)
761768

762769
Indexing
763770
^^^^^^^^
@@ -770,7 +777,9 @@ Indexing
770777
- Bug in which :meth:`DataFrame.to_csv` caused a segfault for a reindexed data frame, when the indices were single-level :class:`MultiIndex` (:issue:`26303`).
771778
- Fixed bug where assigning a :class:`arrays.PandasArray` to a :class:`pandas.core.frame.DataFrame` would raise error (:issue:`26390`)
772779
- Allow keyword arguments for callable local reference used in the :meth:`DataFrame.query` string (:issue:`26426`)
773-
780+
- Bug which produced ``AttributeError`` on partial matching :class:`Timestamp` in a :class:`MultiIndex` (:issue:`26944`)
781+
- Bug in :meth:`DataFrame.loc` and :meth:`DataFrame.iloc` on a :class:`DataFrame` with a single timezone-aware datetime64[ns] column incorrectly returning a scalar instead of a :class:`Series` (:issue:`27110`)
782+
-
774783

775784
Missing
776785
^^^^^^^
@@ -851,6 +860,8 @@ Groupby/resample/rolling
851860
- Bug in :meth:`pandas.core.groupby.GroupBy.agg` where incorrect results are returned for uint64 columns. (:issue:`26310`)
852861
- Bug in :meth:`pandas.core.window.Rolling.median` and :meth:`pandas.core.window.Rolling.quantile` where MemoryError is raised with empty window (:issue:`26005`)
853862
- Bug in :meth:`pandas.core.window.Rolling.median` and :meth:`pandas.core.window.Rolling.quantile` where incorrect results are returned with ``closed='left'`` and ``closed='neither'`` (:issue:`26005`)
863+
- Improved :class:`pandas.core.window.Rolling`, :class:`pandas.core.window.Window` and :class:`pandas.core.window.EWM` functions to exclude nuisance columns from results instead of raising errors and raise a ``DataError`` only if all columns are nuisance (:issue:`12537`)
864+
- Bug in :meth:`pandas.core.window.Rolling.max` and :meth:`pandas.core.window.Rolling.min` where incorrect results are returned with an empty variable window`` (:issue:`26005`)
854865

855866
Reshaping
856867
^^^^^^^^^
@@ -882,6 +893,7 @@ Sparse
882893
- Introduce a better error message in :meth:`Series.sparse.from_coo` so it returns a ``TypeError`` for inputs that are not coo matrices (:issue:`26554`)
883894
- Bug in :func:`numpy.modf` on a :class:`SparseArray`. Now a tuple of :class:`SparseArray` is returned (:issue:`26946`).
884895

896+
885897
Build Changes
886898
^^^^^^^^^^^^^
887899

@@ -892,6 +904,7 @@ ExtensionArray
892904

893905
- Bug in :func:`factorize` when passing an ``ExtensionArray`` with a custom ``na_sentinel`` (:issue:`25696`).
894906
- :meth:`Series.count` miscounts NA values in ExtensionArrays (:issue:`26835`)
907+
- Added ``Series.__array_ufunc__`` to better handle NumPy ufuncs applied to Series backed by extension arrays (:issue:`23293`).
895908
- Keyword argument ``deep`` has been removed from :meth:`ExtensionArray.copy` (:issue:`27083`)
896909

897910
Other

pandas/_libs/groupby.pyx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -260,7 +260,7 @@ def group_shift_indexer(int64_t[:] out, const int64_t[:] labels,
260260
int ngroups, int periods):
261261
cdef:
262262
Py_ssize_t N, i, j, ii
263-
int offset, sign
263+
int offset = 0, sign
264264
int64_t lab, idxer, idxer_slot
265265
int64_t[:] label_seen = np.zeros(ngroups, dtype=np.int64)
266266
int64_t[:, :] label_indexer

pandas/_libs/hashtable.pxd

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,7 +41,7 @@ cdef class StringHashTable(HashTable):
4141

4242
cdef struct Int64VectorData:
4343
int64_t *data
44-
size_t n, m
44+
Py_ssize_t n, m
4545

4646
cdef class Int64Vector:
4747
cdef Int64VectorData *data

pandas/_libs/hashtable.pyx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -44,7 +44,7 @@ cdef int64_t NPY_NAT = util.get_nat()
4444
_SIZE_HINT_LIMIT = (1 << 20) + 7
4545

4646

47-
cdef size_t _INIT_VEC_CAP = 128
47+
cdef Py_ssize_t _INIT_VEC_CAP = 128
4848

4949
include "hashtable_class_helper.pxi"
5050
include "hashtable_func_helper.pxi"

pandas/_libs/hashtable_class_helper.pxi.in

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -659,7 +659,7 @@ cdef class StringHashTable(HashTable):
659659
int64_t[:] locs = np.empty(n, dtype=np.int64)
660660

661661
# these by-definition *must* be strings
662-
vecs = <char **>malloc(n * sizeof(char *))
662+
vecs = <const char **>malloc(n * sizeof(char *))
663663
for i in range(n):
664664
val = values[i]
665665

pandas/_libs/hashtable_func_helper.pxi.in

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -241,7 +241,7 @@ def ismember_{{dtype}}({{scalar}}[:] arr, {{scalar}}[:] values):
241241

242242
# construct the table
243243
n = len(values)
244-
kh_resize_{{ttype}}(table, min(n, len(values)))
244+
kh_resize_{{ttype}}(table, n)
245245

246246
{{if dtype == 'object'}}
247247
for i in range(n):

pandas/_libs/index.pyx

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -352,10 +352,10 @@ cdef class IndexEngine:
352352

353353
cdef Py_ssize_t _bin_search(ndarray values, object val) except -1:
354354
cdef:
355-
Py_ssize_t mid, lo = 0, hi = len(values) - 1
355+
Py_ssize_t mid = 0, lo = 0, hi = len(values) - 1
356356
object pval
357357

358-
if hi >= 0 and val > util.get_value_at(values, hi):
358+
if hi == 0 or (hi > 0 and val > util.get_value_at(values, hi)):
359359
return len(values)
360360

361361
while lo < hi:

pandas/_libs/lib.pyx

Lines changed: 8 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -76,7 +76,10 @@ def values_from_object(obj: object):
7676
""" return my values or the object if we are say an ndarray """
7777
func: object
7878

79-
func = getattr(obj, 'get_values', None)
79+
if getattr(obj, '_typ', '') == 'dataframe':
80+
return obj.values
81+
82+
func = getattr(obj, '_internal_get_values', None)
8083
if func is not None:
8184
obj = func()
8285

@@ -477,7 +480,7 @@ def maybe_indices_to_slice(ndarray[int64_t] indices, int max_len):
477480
def maybe_booleans_to_slice(ndarray[uint8_t] mask):
478481
cdef:
479482
Py_ssize_t i, n = len(mask)
480-
Py_ssize_t start, end
483+
Py_ssize_t start = 0, end = 0
481484
bint started = 0, finished = 0
482485

483486
for i in range(n):
@@ -1631,7 +1634,7 @@ def is_datetime_with_singletz_array(values: ndarray) -> bool:
16311634
Doesn't check values are datetime-like types.
16321635
"""
16331636
cdef:
1634-
Py_ssize_t i, j, n = len(values)
1637+
Py_ssize_t i = 0, j, n = len(values)
16351638
object base_val, base_tz, val, tz
16361639

16371640
if n == 0:
@@ -1913,8 +1916,8 @@ def maybe_convert_objects(ndarray[object] objects, bint try_float=0,
19131916
ndarray[int64_t] ints
19141917
ndarray[uint64_t] uints
19151918
ndarray[uint8_t] bools
1916-
ndarray[int64_t] idatetimes
1917-
ndarray[int64_t] itimedeltas
1919+
int64_t[:] idatetimes
1920+
int64_t[:] itimedeltas
19181921
Seen seen = Seen()
19191922
object val
19201923
float64_t fval, fnan

0 commit comments

Comments
 (0)