Skip to content

Commit e32d5be

Browse files
committed
Doc and cleanup
1 parent 4b588cd commit e32d5be

File tree

3 files changed

+16
-13
lines changed

3 files changed

+16
-13
lines changed

doc/source/io.rst

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -469,10 +469,11 @@ Individual columns can be parsed as a ``Categorical`` using a dict specification
469469
470470
pd.read_csv(StringIO(data), dtype={'col1': 'category'}).dtypes
471471
472-
Specifying ``dtype='cateogry'`` will result in a ``Categorical`` that is
473-
unordered, and whose ``categories`` are the unique values observed in the data.
474-
For more control on the categories and order, create a
475-
:class:`~pandas.api.types.CategoricalDtype` ahead of time.
472+
Specifying ``dtype='cateogry'`` will result in an unordered ``Categorical``
473+
whose ``categories`` are the unique values observed in the data. For more
474+
control on the categories and order, create a
475+
:class:`~pandas.api.types.CategoricalDtype` ahead of time, and pass that for
476+
that column's ``dtype``.
476477

477478
.. ipython:: python
478479
@@ -483,10 +484,13 @@ For more control on the categories and order, create a
483484
484485
.. note::
485486

486-
The resulting categories will always be parsed as strings (object dtype).
487-
If the categories are numeric they can be converted using the
488-
:func:`to_numeric` function, or as appropriate, another converter
489-
such as :func:`to_datetime`.
487+
With ``dtype='category'``, the resulting categories will always be parsed
488+
as strings (object dtype). If the categories are numeric they can be
489+
converted using the :func:`to_numeric` function, or as appropriate, another
490+
converter such as :func:`to_datetime`.
491+
492+
When ``dtype`` is a ``CategoricalDtype`` with homogenous ``categoriess`` (
493+
all numeric, all datetimes, etc.), the conversion is done automatically.
490494

491495
.. ipython:: python
492496

doc/source/whatsnew/v0.21.0.txt

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -163,6 +163,8 @@ Other Enhancements
163163
- :func:`Categorical.rename_categories` now accepts a dict-like argument as `new_categories` and only updates the categories found in that dict. (:issue:`17336`)
164164
- :func:`read_excel` raises ``ImportError`` with a better message if ``xlrd`` is not installed. (:issue:`17613`)
165165
- :meth:`DataFrame.assign` will preserve the original order of ``**kwargs`` for Python 3.6+ users instead of sorting the column names
166+
- Pass a :class:`~pandas.api.types.CategoricalDtype` to :meth:`read_csv` to parse categorical
167+
data as numeric, datetimes, or timedeltas, instead of strings. See :ref:`here <io.categorical>`. (:issue:`17643`)
166168

167169

168170
.. _whatsnew_0210.api_breaking:

pandas/_libs/parsers.pyx

Lines changed: 2 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1274,7 +1274,8 @@ cdef class TextReader:
12741274
na_hashset, self.c_encoding)
12751275
cats = Index(cats)
12761276

1277-
# Here is where we'll do the casting...
1277+
# Determine if we should convert inferred string
1278+
# categories to a specialized type
12781279
if (isinstance(dtype, CategoricalDtype) and
12791280
dtype.categories is not None):
12801281
if dtype.categories.is_numeric():
@@ -1283,13 +1284,9 @@ cdef class TextReader:
12831284
elif dtype.categories.is_all_dates:
12841285
# is ignore correct?
12851286
if is_datetime64_dtype(dtype.categories):
1286-
print("before", cats)
12871287
cats = to_datetime(cats, errors='ignore')
1288-
print("after", cats)
12891288
else:
1290-
print("before", cats)
12911289
cats = to_timedelta(cats, errors='ignore')
1292-
print("after", cats)
12931290

12941291
if (isinstance(dtype, CategoricalDtype) and
12951292
dtype.categories is not None):

0 commit comments

Comments
 (0)