Doc and cleanup

TomAugspurger · TomAugspurger · commit e32d5be89781 · 2017-09-25T13:32:59.000-05:00
diff --git a/doc/source/io.rst b/doc/source/io.rst
@@ -469,10 +469,11 @@ Individual columns can be parsed as a ``Categorical`` using a dict specification
 
    pd.read_csv(StringIO(data), dtype={'col1': 'category'}).dtypes
 
-Specifying ``dtype='cateogry'`` will result in a ``Categorical`` that is
-unordered, and whose ``categories`` are the unique values observed in the data.
-For more control on the categories and order, create a
-:class:`~pandas.api.types.CategoricalDtype` ahead of time.
+Specifying ``dtype='cateogry'`` will result in an unordered ``Categorical``
+whose ``categories`` are the unique values observed in the data. For more
+control on the categories and order, create a
+:class:`~pandas.api.types.CategoricalDtype` ahead of time, and pass that for
+that column's ``dtype``.
 
 .. ipython:: python
 
@@ -483,10 +484,13 @@ For more control on the categories and order, create a
 
 .. note::
 
-   The resulting categories will always be parsed as strings (object dtype).
-   If the categories are numeric they can be converted using the
-   :func:`to_numeric` function, or as appropriate, another converter
-   such as :func:`to_datetime`.
+   With ``dtype='category'``, the resulting categories will always be parsed
+   as strings (object dtype). If the categories are numeric they can be
+   converted using the :func:`to_numeric` function, or as appropriate, another
+   converter such as :func:`to_datetime`.
+
+   When ``dtype`` is a ``CategoricalDtype`` with homogenous ``categoriess`` (
+   all numeric, all datetimes, etc.), the conversion is done automatically.
 
    .. ipython:: python
 
diff --git a/doc/source/whatsnew/v0.21.0.txt b/doc/source/whatsnew/v0.21.0.txt
@@ -163,6 +163,8 @@ Other Enhancements
 - :func:`Categorical.rename_categories` now accepts a dict-like argument as `new_categories` and only updates the categories found in that dict. (:issue:`17336`)
 - :func:`read_excel` raises ``ImportError`` with a better message if ``xlrd`` is not installed. (:issue:`17613`)
 - :meth:`DataFrame.assign` will preserve the original order of ``**kwargs`` for Python 3.6+ users instead of sorting the column names
+- Pass a :class:`~pandas.api.types.CategoricalDtype` to :meth:`read_csv` to parse categorical
+  data as numeric, datetimes, or timedeltas, instead of strings. See :ref:`here <io.categorical>`. (:issue:`17643`)
 
 
 .. _whatsnew_0210.api_breaking:
diff --git a/pandas/_libs/parsers.pyx b/pandas/_libs/parsers.pyx
@@ -1274,7 +1274,8 @@ cdef class TextReader:
                 na_hashset, self.c_encoding)
             cats = Index(cats)
 
-            # Here is where we'll do the casting...
+            # Determine if we should convert inferred string
+            # categories to a specialized type
             if (isinstance(dtype, CategoricalDtype) and
                     dtype.categories is not None):
                 if dtype.categories.is_numeric():
@@ -1283,13 +1284,9 @@ cdef class TextReader:
                 elif dtype.categories.is_all_dates:
                     # is ignore correct?
                     if is_datetime64_dtype(dtype.categories):
-                        print("before", cats)
                         cats = to_datetime(cats, errors='ignore')
-                        print("after", cats)
                     else:
-                        print("before", cats)
                         cats = to_timedelta(cats, errors='ignore')
-                        print("after", cats)
 
             if (isinstance(dtype, CategoricalDtype) and
                     dtype.categories is not None):