Skip to content

Commit 064445f

Browse files
committed
ENH: add regex functionality to DataFrame.replace
add default of None to to_replace add ability to pass regex as to_replace regex Remove cruft more tests and add ability to pass regex and value Make exceptions more clear; push examples to missing_data.rst remove interpolation call make inplace work across axes in interpolate method ability to use nested dicts for regexs and others mostly doc updates formatting infer_types correction rls notes
1 parent 5f22794 commit 064445f

File tree

6 files changed

+946
-114
lines changed

6 files changed

+946
-114
lines changed

doc/source/api.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -465,6 +465,7 @@ Missing data handling
465465

466466
DataFrame.dropna
467467
DataFrame.fillna
468+
DataFrame.replace
468469

469470
Reshaping, sorting, transposing
470471
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -492,7 +493,6 @@ Combining / joining / merging
492493
DataFrame.append
493494
DataFrame.join
494495
DataFrame.merge
495-
DataFrame.replace
496496
DataFrame.update
497497

498498
Time series-related

doc/source/missing_data.rst

Lines changed: 127 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -334,6 +334,133 @@ missing and interpolate over them:
334334
335335
ser.replace([1, 2, 3], method='pad')
336336
337+
String/Regular Expression Replacement
338+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
339+
340+
.. note::
341+
342+
Python strings prefixed with the ``r`` character such as ``r'hello world'``
343+
are so-called "raw" strings. They have different semantics regarding
344+
backslashes than strings without this prefix. Backslashes in raw strings
345+
will be interpreted as an escaped backslash, e.g., ``r'\' == '\\'``. You
346+
should `read about them
347+
<http://docs.python.org/2/reference/lexical_analysis.html#string-literals>`_
348+
if this is unclear.
349+
350+
Replace the '.' with ``nan`` (str -> str)
351+
352+
.. ipython:: python
353+
354+
from numpy.random import rand, randn
355+
from numpy import nan
356+
from pandas import DataFrame
357+
d = {'a': range(4), 'b': list('ab..'), 'c': ['a', 'b', nan, 'd']}
358+
df = DataFrame(d)
359+
df.replace('.', nan)
360+
361+
Now do it with a regular expression that removes surrounding whitespace
362+
(regex -> regex)
363+
364+
.. ipython:: python
365+
366+
df.replace(r'\s*\.\s*', nan, regex=True)
367+
368+
Replace a few different values (list -> list)
369+
370+
.. ipython:: python
371+
372+
df.replace(['a', '.'], ['b', nan])
373+
374+
list of regex -> list of regex
375+
376+
.. ipython:: python
377+
378+
df.replace([r'\.', r'(a)'], ['dot', '\1stuff'], regex=True)
379+
380+
Only search in column ``'b'`` (dict -> dict)
381+
382+
.. ipython:: python
383+
384+
df.replace({'b': '.'}, {'b': nan})
385+
386+
Same as the previous example, but use a regular expression for
387+
searching instead (dict of regex -> dict)
388+
389+
.. ipython:: python
390+
391+
df.replace({'b': r'\s*\.\s*'}, {'b': nan}, regex=True)
392+
393+
You can pass nested dictionaries of regular expressions that use ``regex=True``
394+
395+
.. ipython:: python
396+
397+
df.replace({'b': {'b': r''}}, regex=True)
398+
399+
or you can pass the nested dictionary like so
400+
401+
.. ipython:: python
402+
403+
df.replace(regex={'b': {'b': r'\s*\.\s*'}})
404+
405+
You can also use the group of a regular expression match when replacing (dict
406+
of regex -> dict of regex), this works for lists as well
407+
408+
.. ipython:: python
409+
410+
df.replace({'b': r'\s*(\.)\s*'}, {'b': r'\1ty'}, regex=True)
411+
412+
You can pass a list of regular expressions, of which those that match
413+
will be replaced with a scalar (list of regex -> regex)
414+
415+
.. ipython:: python
416+
417+
df.replace([r'\s*\.\*', r'a|b'], nan, regex=True)
418+
419+
All of the regular expression examples can also be passed with the
420+
``to_replace`` argument as the ``regex`` argument. In this case the ``value``
421+
argument must be passed explicity by name or ``regex`` must be a nested
422+
dictionary. The previous example, in this case, would then be
423+
424+
.. ipython:: python
425+
426+
df.replace(regex=[r'\s*\.\*', r'a|b'], value=nan)
427+
428+
This can be convenient if you do not want to pass ``regex=True`` every time you
429+
want to use a regular expression.
430+
431+
.. note::
432+
433+
Anywhere in the above ``replace`` examples that you see a regular expression
434+
a compiled regular expression is valid as well.
435+
436+
Numeric Replacement
437+
^^^^^^^^^^^^^^^^^^^
438+
439+
Similiar to ``DataFrame.fillna``
440+
441+
.. ipython:: python
442+
443+
from numpy.random import rand, randn
444+
from numpy import nan
445+
from pandas import DataFrame
446+
from pandas.util.testing import assert_frame_equal
447+
df = DataFrame(randn(10, 2))
448+
df[rand(df.shape[0]) > 0.5] = 1.5
449+
df.replace(1.5, nan)
450+
451+
Replacing more than one value via lists works as well
452+
453+
.. ipython:: python
454+
455+
df00 = df.values[0, 0]
456+
df.replace([1.5, df00], [nan, 'a'])
457+
df[1].dtype
458+
459+
You can also operate on the DataFrame in place
460+
461+
.. ipython:: python
462+
463+
df.replace(1.5, nan, inplace=True)
337464
338465
Missing data casting rules and indexing
339466
---------------------------------------

doc/source/v0.11.1.txt

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,9 @@ Enhancements
5555
- ``fillna`` methods now raise a ``TypeError`` if the ``value`` parameter is
5656
a list or tuple.
5757
- Added module for reading and writing Stata files: pandas.io.stata (GH1512_)
58+
- ``DataFrame.replace()`` now allows regular expressions on contained
59+
``Series`` with object dtype. See the examples section in the regular docs
60+
and the generated documentation for the method for more details.
5861

5962
See the `full release notes
6063
<https://github.com/pydata/pandas/blob/master/RELEASE.rst>`__ or issue tracker
@@ -70,3 +73,4 @@ on GitHub for a complete list.
7073
.. _GH3590: https://github.com/pydata/pandas/issues/3590
7174
.. _GH3435: https://github.com/pydata/pandas/issues/3435
7275
.. _GH1512: https://github.com/pydata/pandas/issues/1512
76+
.. _GH2285: https://github.com/pydata/pandas/issues/2285

0 commit comments

Comments
 (0)