Skip to content

Commit b216f26

Browse files
authored
Merge branch 'master' into fixturize_frame_analytics
2 parents ca259f9 + 1a12c41 commit b216f26

File tree

147 files changed

+4472
-3453
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

147 files changed

+4472
-3453
lines changed

.travis.yml

+2-2
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ matrix:
6464
# In allow_failures
6565
- dist: trusty
6666
env:
67-
- JOB="3.6, NumPy dev" ENV_FILE="ci/travis-36-numpydev.yaml" TEST_ARGS="--skip-slow --skip-network" PANDAS_TESTING_MODE="deprecate"
67+
- JOB="3.7, NumPy dev" ENV_FILE="ci/travis-37-numpydev.yaml" TEST_ARGS="--skip-slow --skip-network -W error" PANDAS_TESTING_MODE="deprecate"
6868
addons:
6969
apt:
7070
packages:
@@ -79,7 +79,7 @@ matrix:
7979
- JOB="3.6, slow" ENV_FILE="ci/travis-36-slow.yaml" SLOW=true
8080
- dist: trusty
8181
env:
82-
- JOB="3.6, NumPy dev" ENV_FILE="ci/travis-36-numpydev.yaml" TEST_ARGS="--skip-slow --skip-network" PANDAS_TESTING_MODE="deprecate"
82+
- JOB="3.7, NumPy dev" ENV_FILE="ci/travis-37-numpydev.yaml" TEST_ARGS="--skip-slow --skip-network -W error" PANDAS_TESTING_MODE="deprecate"
8383
addons:
8484
apt:
8585
packages:

ci/doctests.sh

+3-3
Original file line numberDiff line numberDiff line change
@@ -21,21 +21,21 @@ if [ "$DOCTEST" ]; then
2121

2222
# DataFrame / Series docstrings
2323
pytest --doctest-modules -v pandas/core/frame.py \
24-
-k"-assign -axes -combine -isin -itertuples -join -nlargest -nsmallest -nunique -pivot_table -quantile -query -reindex -reindex_axis -replace -round -set_index -stack -to_dict -to_stata -transform"
24+
-k"-assign -axes -combine -isin -itertuples -join -nlargest -nsmallest -nunique -pivot_table -quantile -query -reindex -reindex_axis -replace -round -set_index -stack -to_dict -to_stata"
2525

2626
if [ $? -ne "0" ]; then
2727
RET=1
2828
fi
2929

3030
pytest --doctest-modules -v pandas/core/series.py \
31-
-k"-nlargest -nonzero -nsmallest -reindex -searchsorted -to_dict"
31+
-k"-nonzero -reindex -searchsorted -to_dict"
3232

3333
if [ $? -ne "0" ]; then
3434
RET=1
3535
fi
3636

3737
pytest --doctest-modules -v pandas/core/generic.py \
38-
-k"-_set_axis_name -_xs -describe -droplevel -groupby -interpolate -pct_change -pipe -reindex -reindex_axis -resample -sample -to_json -to_xarray -transform -transpose -values -xs"
38+
-k"-_set_axis_name -_xs -describe -droplevel -groupby -interpolate -pct_change -pipe -reindex -reindex_axis -resample -sample -to_json -to_xarray -transpose -values -xs"
3939

4040
if [ $? -ne "0" ]; then
4141
RET=1

ci/travis-36-numpydev.yaml renamed to ci/travis-37-numpydev.yaml

+1-1
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ name: pandas
22
channels:
33
- defaults
44
dependencies:
5-
- python=3.6*
5+
- python=3.7*
66
- pytz
77
- Cython>=0.28.2
88
# universal

doc/source/api.rst

+6
Original file line numberDiff line numberDiff line change
@@ -61,6 +61,12 @@ Excel
6161
read_excel
6262
ExcelFile.parse
6363

64+
.. autosummary::
65+
:toctree: generated/
66+
:template: autosummary/class_without_autosummary.rst
67+
68+
ExcelWriter
69+
6470
JSON
6571
~~~~
6672

doc/source/contributing.rst

+57
Original file line numberDiff line numberDiff line change
@@ -632,6 +632,14 @@ Otherwise, you need to do it manually:
632632
warnings.warn('Use new_func instead.', FutureWarning, stacklevel=2)
633633
new_func()
634634
635+
You'll also need to
636+
637+
1. write a new test that asserts a warning is issued when calling with the deprecated argument
638+
2. Update all of pandas existing tests and code to use the new argument
639+
640+
See :ref:`contributing.warnings` for more.
641+
642+
635643
.. _contributing.ci:
636644

637645
Testing With Continuous Integration
@@ -859,6 +867,55 @@ preferred if the inputs or logic are simple, with Hypothesis tests reserved
859867
for cases with complex logic or where there are too many combinations of
860868
options or subtle interactions to test (or think of!) all of them.
861869

870+
.. _contributing.warnings:
871+
872+
Testing Warnings
873+
~~~~~~~~~~~~~~~~
874+
875+
By default, one of pandas CI workers will fail if any unhandled warnings are emitted.
876+
877+
If your change involves checking that a warning is actually emitted, use
878+
``tm.assert_produces_warning(ExpectedWarning)``.
879+
880+
881+
.. code-block:: python
882+
883+
with tm.assert_prodcues_warning(FutureWarning):
884+
df.some_operation()
885+
886+
We prefer this to the ``pytest.warns`` context manager because ours checks that the warning's
887+
stacklevel is set correctly. The stacklevel is what ensure the *user's* file name and line number
888+
is printed in the warning, rather than something internal to pandas. It represents the number of
889+
function calls from user code (e.g. ``df.some_operation()``) to the function that actually emits
890+
the warning. Our linter will fail the build if you use ``pytest.warns`` in a test.
891+
892+
If you have a test that would emit a warning, but you aren't actually testing the
893+
warning itself (say because it's going to be removed in the future, or because we're
894+
matching a 3rd-party library's behavior), then use ``pytest.mark.filterwarnings`` to
895+
ignore the error.
896+
897+
.. code-block:: python
898+
899+
@pytest.mark.filterwarnings("ignore:msg:category")
900+
def test_thing(self):
901+
...
902+
903+
If the test generates a warning of class ``category`` whose message starts
904+
with ``msg``, the warning will be ignored and the test will pass.
905+
906+
If you need finer-grained control, you can use Python's usual
907+
`warnings module <https://docs.python.org/3/library/warnings.html>`__
908+
to control whether a warning is ignored / raised at different places within
909+
a single test.
910+
911+
.. code-block:: python
912+
913+
with warch.catch_warnings():
914+
warnings.simplefilter("ignore", FutureWarning)
915+
# Or use warnings.filterwarnings(...)
916+
917+
Alternatively, consider breaking up the unit test.
918+
862919

863920
Running the test suite
864921
----------------------

doc/source/timeseries.rst

+81-36
Original file line numberDiff line numberDiff line change
@@ -21,69 +21,114 @@
2121
Time Series / Date functionality
2222
********************************
2323

24-
pandas has proven very successful as a tool for working with time series data,
25-
especially in the financial data analysis space. Using the NumPy ``datetime64`` and ``timedelta64`` dtypes,
26-
we have consolidated a large number of features from other Python libraries like ``scikits.timeseries`` as well as created
24+
pandas contains extensive capabilities and features for working with time series data for all domains.
25+
Using the NumPy ``datetime64`` and ``timedelta64`` dtypes, pandas has consolidated a large number of
26+
features from other Python libraries like ``scikits.timeseries`` as well as created
2727
a tremendous amount of new functionality for manipulating time series data.
2828

29-
In working with time series data, we will frequently seek to:
29+
For example, pandas supports:
3030

31-
* generate sequences of fixed-frequency dates and time spans
32-
* conform or convert time series to a particular frequency
33-
* compute "relative" dates based on various non-standard time increments
34-
(e.g. 5 business days before the last business day of the year), or "roll"
35-
dates forward or backward
31+
Parsing time series information from various sources and formats
3632

37-
pandas provides a relatively compact and self-contained set of tools for
38-
performing the above tasks.
33+
.. ipython:: python
34+
35+
dti = pd.to_datetime(['1/1/2018', np.datetime64('2018-01-01'), datetime(2018, 1, 1)])
36+
dti
3937
40-
Create a range of dates:
38+
Generate sequences of fixed-frequency dates and time spans
4139

4240
.. ipython:: python
4341
44-
# 72 hours starting with midnight Jan 1st, 2011
45-
rng = pd.date_range('1/1/2011', periods=72, freq='H')
46-
rng[:5]
42+
dti = pd.date_range('2018-01-01', periods=3, freq='H')
43+
dti
4744
48-
Index pandas objects with dates:
45+
Manipulating and converting date times with timezone information
4946

5047
.. ipython:: python
5148
52-
ts = pd.Series(np.random.randn(len(rng)), index=rng)
53-
ts.head()
49+
dti = dti.tz_localize('UTC')
50+
dti
51+
dti.tz_convert('US/Pacific')
5452
55-
Change frequency and fill gaps:
53+
Resampling or converting a time series to a particular frequency
5654

5755
.. ipython:: python
5856
59-
# to 45 minute frequency and forward fill
60-
converted = ts.asfreq('45Min', method='pad')
61-
converted.head()
57+
idx = pd.date_range('2018-01-01', periods=5, freq='H')
58+
ts = pd.Series(range(len(idx)), index=idx)
59+
ts
60+
ts.resample('2H').mean()
6261
63-
Resample the series to a daily frequency:
62+
Performing date and time arithmetic with absolute or relative time increments
6463

6564
.. ipython:: python
6665
67-
# Daily means
68-
ts.resample('D').mean()
66+
friday = pd.Timestamp('2018-01-05')
67+
friday.day_name()
68+
# Add 1 day
69+
saturday = friday + pd.Timedelta('1 day')
70+
saturday.day_name()
71+
# Add 1 business day (Friday --> Monday)
72+
monday = friday + pd.tseries.offsets.BDay()
73+
monday.day_name()
74+
75+
pandas provides a relatively compact and self-contained set of tools for
76+
performing the above tasks and more.
6977

7078

7179
.. _timeseries.overview:
7280

7381
Overview
7482
--------
7583

76-
The following table shows the type of time-related classes pandas can handle and
77-
how to create them.
84+
pandas captures 4 general time related concepts:
85+
86+
#. Date times: A specific date and time with timezone support. Similar to ``datetime.datetime`` from the standard library.
87+
#. Time deltas: An absolute time duration. Similar to ``datetime.timedelta`` from the standard library.
88+
#. Time spans: A span of time defined by a point in time and its associated frequency.
89+
#. Date offsets: A relative time duration that respects calendar arithmetic. Similar to ``dateutil.relativedelta.relativedelta`` from the ``dateutil`` package.
7890

79-
================= =============================== ===================================================================
80-
Class Remarks How to create
81-
================= =============================== ===================================================================
82-
``Timestamp`` Represents a single timestamp ``to_datetime``, ``Timestamp``
83-
``DatetimeIndex`` Index of ``Timestamp`` ``to_datetime``, ``date_range``, ``bdate_range``, ``DatetimeIndex``
84-
``Period`` Represents a single time span ``Period``
85-
``PeriodIndex`` Index of ``Period`` ``period_range``, ``PeriodIndex``
86-
================= =============================== ===================================================================
91+
===================== ================= =================== ============================================ ========================================
92+
Concept Scalar Class Array Class pandas Data Type Primary Creation Method
93+
===================== ================= =================== ============================================ ========================================
94+
Date times ``Timestamp`` ``DatetimeIndex`` ``datetime64[ns]`` or ``datetime64[ns, tz]`` ``to_datetime`` or ``date_range``
95+
Time deltas ``Timedelta`` ``TimedeltaIndex`` ``timedelta64[ns]`` ``to_timedelta`` or ``timedelta_range``
96+
Time spans ``Period`` ``PeriodIndex`` ``period[freq]`` ``Period`` or ``period_range``
97+
Date offsets ``DateOffset`` ``None`` ``None`` ``DateOffset``
98+
===================== ================= =================== ============================================ ========================================
99+
100+
For time series data, it's conventional to represent the time component in the index of a :class:`Series` or :class:`DataFrame`
101+
so manipulations can be performed with respect to the time element.
102+
103+
.. ipython:: python
104+
105+
pd.Series(range(3), index=pd.date_range('2000', freq='D', periods=3))
106+
107+
However, :class:`Series` and :class:`DataFrame` can directly also support the time component as data itself.
108+
109+
.. ipython:: python
110+
111+
pd.Series(pd.date_range('2000', freq='D', periods=3))
112+
113+
:class:`Series` and :class:`DataFrame` have extended data type support and functionality for ``datetime`` and ``timedelta``
114+
data when the time data is used as data itself. The ``Period`` and ``DateOffset`` data will be stored as ``object`` data.
115+
116+
.. ipython:: python
117+
118+
pd.Series(pd.period_range('1/1/2011', freq='M', periods=3))
119+
pd.Series(pd.date_range('1/1/2011', freq='M', periods=3))
120+
121+
Lastly, pandas represents null date times, time deltas, and time spans as ``NaT`` which
122+
is useful for representing missing or null date like values and behaves similar
123+
as ``np.nan`` does for float data.
124+
125+
.. ipython:: python
126+
127+
pd.Timestamp(pd.NaT)
128+
pd.Timedelta(pd.NaT)
129+
pd.Period(pd.NaT)
130+
# Equality acts as np.nan would
131+
pd.NaT == pd.NaT
87132
88133
.. _timeseries.representation:
89134

@@ -1443,7 +1488,7 @@ time. The method for this is :meth:`~Series.shift`, which is available on all of
14431488
the pandas objects.
14441489

14451490
.. ipython:: python
1446-
1491+
ts = pd.Series(range(len(rng)), index=rng)
14471492
ts = ts[:5]
14481493
ts.shift(1)
14491494

doc/source/whatsnew/v0.24.0.txt

+15-6
Original file line numberDiff line numberDiff line change
@@ -170,9 +170,9 @@ Other Enhancements
170170
- :meth:`Series.droplevel` and :meth:`DataFrame.droplevel` are now implemented (:issue:`20342`)
171171
- Added support for reading from Google Cloud Storage via the ``gcsfs`` library (:issue:`19454`)
172172
- :func:`to_gbq` and :func:`read_gbq` signature and documentation updated to
173-
reflect changes from the `Pandas-GBQ library version 0.5.0
174-
<https://pandas-gbq.readthedocs.io/en/latest/changelog.html#changelog-0-5-0>`__.
175-
(:issue:`21627`)
173+
reflect changes from the `Pandas-GBQ library version 0.6.0
174+
<https://pandas-gbq.readthedocs.io/en/latest/changelog.html#changelog-0-6-0>`__.
175+
(:issue:`21627`, :issue:`22557`)
176176
- New method :meth:`HDFStore.walk` will recursively walk the group hierarchy of an HDF5 file (:issue:`10932`)
177177
- :func:`read_html` copies cell data across ``colspan`` and ``rowspan``, and it treats all-``th`` table rows as headers if ``header`` kwarg is not given and there is no ``thead`` (:issue:`17054`)
178178
- :meth:`Series.nlargest`, :meth:`Series.nsmallest`, :meth:`DataFrame.nlargest`, and :meth:`DataFrame.nsmallest` now accept the value ``"all"`` for the ``keep`` argument. This keeps all ties for the nth largest/smallest value (:issue:`16818`)
@@ -577,6 +577,7 @@ Removal of prior version deprecations/changes
577577
- Removed the ``pandas.formats.style`` shim for :class:`pandas.io.formats.style.Styler` (:issue:`16059`)
578578
- :meth:`Categorical.searchsorted` and :meth:`Series.searchsorted` have renamed the ``v`` argument to ``value`` (:issue:`14645`)
579579
- :meth:`TimedeltaIndex.searchsorted`, :meth:`DatetimeIndex.searchsorted`, and :meth:`PeriodIndex.searchsorted` have renamed the ``key`` argument to ``value`` (:issue:`14645`)
580+
- Removal of the previously deprecated module ``pandas.json`` (:issue:`19944`)
580581

581582
.. _whatsnew_0240.performance:
582583

@@ -635,7 +636,7 @@ Datetimelike
635636
- Bug in :meth:`DataFrame.eq` comparison against ``NaT`` incorrectly returning ``True`` or ``NaN`` (:issue:`15697`, :issue:`22163`)
636637
- Bug in :class:`DatetimeIndex` subtraction that incorrectly failed to raise ``OverflowError`` (:issue:`22492`, :issue:`22508`)
637638
- Bug in :class:`DatetimeIndex` incorrectly allowing indexing with ``Timedelta`` object (:issue:`20464`)
638-
-
639+
- Bug in :class:`DatetimeIndex` where frequency was being set if original frequency was ``None`` (:issue:`22150`)
639640

640641
Timedelta
641642
^^^^^^^^^
@@ -645,6 +646,7 @@ Timedelta
645646
- Bug in :class:`Series` with numeric dtype when adding or subtracting an an array or ``Series`` with ``timedelta64`` dtype (:issue:`22390`)
646647
- Bug in :class:`Index` with numeric dtype when multiplying or dividing an array with dtype ``timedelta64`` (:issue:`22390`)
647648
- Bug in :class:`TimedeltaIndex` incorrectly allowing indexing with ``Timestamp`` object (:issue:`20464`)
649+
- Fixed bug where subtracting :class:`Timedelta` from an object-dtyped array would raise ``TypeError`` (:issue:`21980`)
648650
-
649651
-
650652

@@ -720,13 +722,16 @@ Indexing
720722
- Bug where mixed indexes wouldn't allow integers for ``.at`` (:issue:`19860`)
721723
- ``Float64Index.get_loc`` now raises ``KeyError`` when boolean key passed. (:issue:`19087`)
722724
- Bug in :meth:`DataFrame.loc` when indexing with an :class:`IntervalIndex` (:issue:`19977`)
725+
- :class:`Index` no longer mangles ``None``, ``NaN`` and ``NaT``, i.e. they are treated as three different keys. However, for numeric Index all three are still coerced to a ``NaN`` (:issue:`22332`)
723726

724727
Missing
725728
^^^^^^^
726729

727730
- Bug in :func:`DataFrame.fillna` where a ``ValueError`` would raise when one column contained a ``datetime64[ns, tz]`` dtype (:issue:`15522`)
728731
- Bug in :func:`Series.hasnans` that could be incorrectly cached and return incorrect answers if null elements are introduced after an initial call (:issue:`19700`)
729-
- :func:`Series.isin` now treats all nans as equal also for ``np.object``-dtype. This behavior is consistent with the behavior for float64 (:issue:`22119`)
732+
- :func:`Series.isin` now treats all NaN-floats as equal also for `np.object`-dtype. This behavior is consistent with the behavior for float64 (:issue:`22119`)
733+
- :func:`unique` no longer mangles NaN-floats and the ``NaT``-object for `np.object`-dtype, i.e. ``NaT`` is no longer coerced to a NaN-value and is treated as a different entity. (:issue:`22295`)
734+
730735

731736
MultiIndex
732737
^^^^^^^^^^
@@ -742,6 +747,8 @@ I/O
742747
- :func:`read_excel()` will correctly show the deprecation warning for previously deprecated ``sheetname`` (:issue:`17994`)
743748
- :func:`read_csv()` will correctly parse timezone-aware datetimes (:issue:`22256`)
744749
- :func:`read_sas()` will parse numbers in sas7bdat-files that have width less than 8 bytes correctly. (:issue:`21616`)
750+
- :func:`read_sas()` will correctly parse sas7bdat files with many columns (:issue:`22628`)
751+
- :func:`read_sas()` will correctly parse sas7bdat files with data page types having also bit 7 set (so page type is 128 + 256 = 384) (:issue:`16615`)
745752

746753
Plotting
747754
^^^^^^^^
@@ -761,6 +768,7 @@ Groupby/Resample/Rolling
761768
- Bug in :meth:`Resampler.apply` when passing postiional arguments to applied func (:issue:`14615`).
762769
- Bug in :meth:`Series.resample` when passing ``numpy.timedelta64`` to ``loffset`` kwarg (:issue:`7687`).
763770
- Bug in :meth:`Resampler.asfreq` when frequency of ``TimedeltaIndex`` is a subperiod of a new frequency (:issue:`13022`).
771+
- Bug in :meth:`SeriesGroupBy.mean` when values were integral but could not fit inside of int64, overflowing instead. (:issue:`22487`)
764772

765773
Sparse
766774
^^^^^^
@@ -796,7 +804,8 @@ Other
796804
- :meth:`~pandas.io.formats.style.Styler.background_gradient` now takes a ``text_color_threshold`` parameter to automatically lighten the text color based on the luminance of the background color. This improves readability with dark background colors without the need to limit the background colormap range. (:issue:`21258`)
797805
- Require at least 0.28.2 version of ``cython`` to support read-only memoryviews (:issue:`21688`)
798806
- :meth:`~pandas.io.formats.style.Styler.background_gradient` now also supports tablewise application (in addition to rowwise and columnwise) with ``axis=None`` (:issue:`15204`)
799-
- :meth:`~pandas.io.formats.style.Styler.bar` now also supports tablewise application (in addition to rowwise and columnwise) with ``axis=None`` and setting clipping range with ``vmin`` and ``vmax``. ``NaN`` values are also handled properly. (:issue:`21548`, :issue:`21526`)
807+
- :meth:`~pandas.io.formats.style.Styler.bar` now also supports tablewise application (in addition to rowwise and columnwise) with ``axis=None`` and setting clipping range with ``vmin`` and ``vmax`` (:issue:`21548` and :issue:`21526`). ``NaN`` values are also handled properly.
808+
- Logical operations ``&, |, ^`` between :class:`Series` and :class:`Index` will no longer raise ``ValueError`` (:issue:`22092`)
800809
-
801810
-
802811
-

pandas/__init__.py

-3
Original file line numberDiff line numberDiff line change
@@ -61,9 +61,6 @@
6161
# extension module deprecations
6262
from pandas.util._depr_module import _DeprecatedModule
6363

64-
json = _DeprecatedModule(deprmod='pandas.json',
65-
moved={'dumps': 'pandas.io.json.dumps',
66-
'loads': 'pandas.io.json.loads'})
6764
parser = _DeprecatedModule(deprmod='pandas.parser',
6865
removals=['na_values'],
6966
moved={'CParserError': 'pandas.errors.ParserError'})

0 commit comments

Comments
 (0)