Skip to content

DOC: Clarify documentation of 'ambiguous' parameter #23408

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Nov 4, 2018
15 changes: 14 additions & 1 deletion pandas/_libs/tslibs/conversion.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -835,7 +835,20 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
vals : ndarray[int64_t]
tz : tzinfo or None
ambiguous : str, bool, or arraylike
If arraylike, must have the same length as vals
When clocks moved backward due to DST, ambiguous times may arise.
For example in Central European Time (UTC+01), when going from 03:00
DST to 02:00 non-DST, 02:30:00 local time occurs both at 00:30:00 UTC
and at 01:30:00 UTC. In such a situation, the `ambiguous` parameter
dictates how ambiguous times should be handled.

- 'infer' will attempt to infer fall dst-transition hours based on
order
- bool-ndarray where True signifies a DST time, False signifies a
non-DST time (note that this flag is only applicable for ambiguous
times, but the array must have the same length as vals)
- bool if True, treat all vals as DST. If False, treat them as non-DST
- 'NaT' will return NaT where there are ambiguous times
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the string 'NaT', or the singleton pd.NaT (or either?) Can we clarify that?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like the allowed types are a bit inconsistent among all these functions. Can you check whether that's the docs being wrong? or are the just different?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will return the pd.NaT singleton.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about whether the input should be pd.NaT or 'NaT'.

Copy link
Member

@mroeschke mroeschke Nov 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, ambiguous accepts the input 'NaT' and should raise on pd.NaT


nonexistent : str
If arraylike, must have the same length as vals

Expand Down
7 changes: 7 additions & 0 deletions pandas/_libs/tslibs/nattype.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -559,6 +559,13 @@ class NaTType(_NaT):
None will remove timezone holding local time.

ambiguous : bool, 'NaT', default 'raise'
When clocks moved backward due to DST, ambiguous times may arise.
For example in Central European Time (UTC+01), when going from
03:00 DST to 02:00 non-DST, 02:30:00 local time occurs both at
00:30:00 UTC and at 01:30:00 UTC. In such a situation, the
`ambiguous` parameter dictates how ambiguous times should be
handled.

- bool contains flags to determine if time is dst or not (note
that this flag is only applicable for ambiguous fall dst dates)
- 'NaT' will return NaT for an ambiguous time
Expand Down
7 changes: 7 additions & 0 deletions pandas/_libs/tslibs/timestamps.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -974,6 +974,13 @@ class Timestamp(_Timestamp):
None will remove timezone holding local time.

ambiguous : bool, 'NaT', default 'raise'
When clocks moved backward due to DST, ambiguous times may arise.
For example in Central European Time (UTC+01), when going from
03:00 DST to 02:00 non-DST, 02:30:00 local time occurs both at
00:30:00 UTC and at 01:30:00 UTC. In such a situation, the
`ambiguous` parameter dictates how ambiguous times should be
handled.

- bool contains flags to determine if time is dst or not (note
that this flag is only applicable for ambiguous fall dst dates)
- 'NaT' will return NaT for an ambiguous time
Expand Down
40 changes: 40 additions & 0 deletions pandas/core/arrays/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -606,6 +606,12 @@ def tz_localize(self, tz, ambiguous='raise', nonexistent='raise',
Time zone to convert timestamps to. Passing ``None`` will
remove the time zone information preserving local time.
ambiguous : 'infer', 'NaT', bool array, default 'raise'
When clocks moved backward due to DST, ambiguous times may arise.
For example in Central European Time (UTC+01), when going from
03:00 DST to 02:00 non-DST, 02:30:00 local time occurs both at
00:30:00 UTC and at 01:30:00 UTC. In such a situation, the
`ambiguous` parameter dictates how ambiguous times should be
handled.

- 'infer' will attempt to infer fall dst-transition hours based on
order
Expand Down Expand Up @@ -677,6 +683,40 @@ def tz_localize(self, tz, ambiguous='raise', nonexistent='raise',
DatetimeIndex(['2018-03-01 09:00:00', '2018-03-02 09:00:00',
'2018-03-03 09:00:00'],
dtype='datetime64[ns]', freq='D')

Be careful with DST changes. When there is sequential data, pandas can
infer the DST time:
>>> s = pd.to_datetime(pd.Series([
... '2018-10-28 01:30:00',
... '2018-10-28 02:00:00',
... '2018-10-28 02:30:00',
... '2018-10-28 02:00:00',
... '2018-10-28 02:30:00',
... '2018-10-28 03:00:00',
... '2018-10-28 03:30:00']))
>>> s.dt.tz_localize('CET', ambiguous='infer')
2018-10-28 01:30:00+02:00 0
2018-10-28 02:00:00+02:00 1
2018-10-28 02:30:00+02:00 2
2018-10-28 02:00:00+01:00 3
2018-10-28 02:30:00+01:00 4
2018-10-28 03:00:00+01:00 5
2018-10-28 03:30:00+01:00 6
dtype: int64

In some cases, inferring the DST is impossible. In such cases, you can
pass an ndarray to the ambiguous parameter to set the DST explicitly

>>> s = pd.to_datetime(pd.Series([
... '2018-10-28 01:20:00',
... '2018-10-28 02:36:00',
... '2018-10-28 03:46:00']))
>>> s.dt.tz_localize('CET', ambiguous=np.array([True, True, False]))
0 2018-10-28 01:20:00+02:00
1 2018-10-28 02:36:00+02:00
2 2018-10-28 03:46:00+01:00
dtype: datetime64[ns, CET]

"""
if errors is not None:
warnings.warn("The errors argument is deprecated and will be "
Expand Down
53 changes: 53 additions & 0 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -8623,6 +8623,13 @@ def tz_localize(self, tz, axis=0, level=None, copy=True,
copy : boolean, default True
Also make a copy of the underlying data
ambiguous : 'infer', bool-ndarray, 'NaT', default 'raise'
When clocks moved backward due to DST, ambiguous times may arise.
For example in Central European Time (UTC+01), when going from
03:00 DST to 02:00 non-DST, 02:30:00 local time occurs both at
00:30:00 UTC and at 01:30:00 UTC. In such a situation, the
`ambiguous` parameter dictates how ambiguous times should be
handled.

- 'infer' will attempt to infer fall dst-transition hours based on
order
- bool-ndarray where True signifies a DST time, False designates
Expand Down Expand Up @@ -8650,6 +8657,52 @@ def tz_localize(self, tz, axis=0, level=None, copy=True,
------
TypeError
If the TimeSeries is tz-aware and tz is not None.

Examples
--------

Localize local times:

>>> s = pd.Series([1],
... index=pd.DatetimeIndex(['2018-09-15 01:30:00']))
>>> s.tz_localize('CET')
2018-09-15 01:30:00+02:00 1
dtype: int64

Be careful with DST changes. When there is sequential data, pandas
can infer the DST time:

>>> s = pd.Series(range(7), index=pd.DatetimeIndex([
... '2018-10-28 01:30:00',
... '2018-10-28 02:00:00',
... '2018-10-28 02:30:00',
... '2018-10-28 02:00:00',
... '2018-10-28 02:30:00',
... '2018-10-28 03:00:00',
... '2018-10-28 03:30:00']))
>>> s.tz_localize('CET', ambiguous='infer')
2018-10-28 01:30:00+02:00 0
2018-10-28 02:00:00+02:00 1
2018-10-28 02:30:00+02:00 2
2018-10-28 02:00:00+01:00 3
2018-10-28 02:30:00+01:00 4
2018-10-28 03:00:00+01:00 5
2018-10-28 03:30:00+01:00 6
dtype: int64

In some cases, inferring the DST is impossible. In such cases, you can
pass an ndarray to the ambiguous parameter to set the DST explicitly

>>> s = pd.Series(range(3), index=pd.DatetimeIndex([
... '2018-10-28 01:20:00',
... '2018-10-28 02:36:00',
... '2018-10-28 03:46:00']))
>>> s.tz_localize('CET', ambiguous=np.array([True, True, False]))
2018-10-28 01:20:00+02:00 0
2018-10-28 02:36:00+02:00 1
2018-10-28 03:46:00+01:00 2
dtype: int64

"""
if nonexistent not in ('raise', 'NaT', 'shift'):
raise ValueError("The nonexistent argument must be one of 'raise',"
Expand Down
7 changes: 7 additions & 0 deletions pandas/core/indexes/datetimes.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,12 @@ class DatetimeIndex(DatetimeArrayMixin, DatelikeOps, TimelikeOps,
the 'left', 'right', or both sides (None)
tz : pytz.timezone or dateutil.tz.tzfile
ambiguous : 'infer', bool-ndarray, 'NaT', default 'raise'
When clocks moved backward due to DST, ambiguous times may arise.
For example in Central European Time (UTC+01), when going from 03:00
DST to 02:00 non-DST, 02:30:00 local time occurs both at 00:30:00 UTC
and at 01:30:00 UTC. In such a situation, the `ambiguous` parameter
dictates how ambiguous times should be handled.

- 'infer' will attempt to infer fall dst-transition hours based on
order
- bool-ndarray where True signifies a DST time, False signifies a
Expand Down Expand Up @@ -173,6 +179,7 @@ class DatetimeIndex(DatetimeArrayMixin, DatelikeOps, TimelikeOps,
TimedeltaIndex : Index of timedelta64 data
PeriodIndex : Index of Period data
pandas.to_datetime : Convert argument to datetime

"""
_resolution = cache_readonly(DatetimeArrayMixin._resolution.fget)

Expand Down