From 307ba0cf8ea34ba3ee8fef4b58a045598f3c694f Mon Sep 17 00:00:00 2001
From: Adrian Mastronardi <adrian@mastronardi.xyz>
Date: Fri, 21 Feb 2020 22:44:23 -0300
Subject: [PATCH] DOC: Move testing and pandas_development_faq pages from wiki
 to doc #30232 #20501

---
 doc/source/development/index.rst              |   2 +
 .../development/pandas_development_faq.rst    | 203 +++++++++++++++
 doc/source/development/testing.rst            | 246 ++++++++++++++++++
 3 files changed, 451 insertions(+)
 create mode 100644 doc/source/development/pandas_development_faq.rst
 create mode 100644 doc/source/development/testing.rst

diff --git a/doc/source/development/index.rst b/doc/source/development/index.rst
index f8a6bb6deb52d..5b4fcc5e74e83 100644
--- a/doc/source/development/index.rst
+++ b/doc/source/development/index.rst
@@ -14,10 +14,12 @@ Development
 
     contributing
     code_style
+    testing
     maintaining
     internals
     extending
     developer
     policies
     roadmap
+    pandas_development_faq
     meeting
diff --git a/doc/source/development/pandas_development_faq.rst b/doc/source/development/pandas_development_faq.rst
new file mode 100644
index 0000000000000..4cd7acd139c1a
--- /dev/null
+++ b/doc/source/development/pandas_development_faq.rst
@@ -0,0 +1,203 @@
+.. _pandas_development_faq:
+
+{{ header }}
+
+======================
+Pandas Development FAQ
+======================
+
+.. contents:: Table of contents:
+   :local:
+
+Purpose
+=======
+
+Based on https://github.com/pydata/pandas/pull/4404#issuecomment-22864665 this
+wiki page gathers oft-asked questions/comments from contributors to make the 
+contribution process a bit less painful.
+
+The aim is to make it easier for 
+
+* Core developers to give advice & accept new code contributions.
+* New contributors to find an easier way in for quick and efficient bug-fixes
+  or feature additions
+
+While some questions/comments/advice may be applicable to general programming, 
+these are things that directly relate to ``pandas`` development.
+
+* `**PR == pull request** <https://help.github.com/articles/using-pull-requests>`_
+* **core developer:** A person contributing on very high frequency & who is 
+  familiar with the code base and development process of ``pandas``.
+* **contributors:** The occasional contributor, maybe from a specific domain, 
+  contributes bug fixes, features or documentation with low frequency, may not 
+  be an every day programmer (e.g. programming scientists or engineer using 
+  pandas for data processing) and looks at things from an end-user perspective.
+
+Pandas Development & Release Process
+====================================
+
+Testing
+-------
+
+**Q:** What are some recommendations for writing unit tests?
+
+**A:** Your test should be self-contained. That is, it should test preferably a
+single thing, e.g., a method that you've added to the ``DataFrame`` class. Your
+test function/method should start with ``test_`` and the rest of the name should
+be related to whatever functionality you're testing, like ``test_replace_with_dict_regex``.
+
+**Q:** Help! I can't get the tests to run!
+
+**A:** You probably either have multiple Python versions installed and there's
+an ABI (application binary interface) issue or you forgot to build the extension
+modules in place. The latter can be done with
+
+.. code-block:: shell
+
+    python setup.py build_ext --inplace
+
+from the ``pandas`` directory.
+
+Travis
+------
+
+**Q:** Where do I need to change the settings in my GitHub configuration and/or
+Travis configuration for the Travis to start builds from my fork?
+
+**A:** To be filled out.
+
+**Q:** Why do I need a Travis file in my repo if it's already in the head repository?
+
+**A:** Because we're not using subversion. Okay, seriously, it's because as far
+as ``git`` is concerned *your* repository is the *only* one that exists. There's
+really no such thing as a "head" repository in the eyes of ``git``, those are concepts
+that we impose on it to make collaboration more effective and easier. This is one
+of the nice aspects of `distributed version control <http://en.wikipedia.org/wiki/Distributed_revision_control>`_.
+
+Documentation
+-------------
+
+**Q:** Does Travis build documentation?
+
+**A:** Currently, no. There are some issues surrounding Sphinx error reporting.
+We are investigating ways to solve this problem.
+
+Workflow
+--------
+
+* What is a typical workflow on my local fork?
+* Shall I work in a virtualenvironment?
+* Shall I work in a virtualenvironment and then copy my changes over into a
+  clean local fork of my own repo?
+
+**Q:** Who will be responsible for evaluating my PR?
+
+**A:** Technically, anyone with push rights to the ``pydata/pandas`` can evaluate
+it. In practice, there are a handful of people who are constantly watching the ``pandas``
+repo for new PRs, so most likely it'll be one of them that evaluates it. I'm not
+going to list names, but it's not that hard to figure out...
+
+Criteria for PR
+---------------
+
+**Q:** What are the criteria for acceptance of a PR?
+
+**A:** First and foremost, your fix **must not break any existing functionality**,
+one indicator of this is that your Travis build passes. Second, just give it some
+time. Everyone is busy and @wesm has not (yet?) amassed a ``pandas`` development army.
+
+**Q:** Do I need to open an issue first?
+
+**A:** Not necessarily. If you want to submit a documentation change, e.g., a
+typo fix, then opening an issue is not necessary.
+
+Coding Style 
+------------
+
+**Q:** What level of commenting is accepted?
+
+**A:** The common sense level. Don't overdo it on the comments, and make sure
+if you *do* comment that your comments explain *what* your code is doing, not
+*how* it is doing it (that's what code is for).
+
+Obligatory example:
+
+BAD:
+
+.. code-block:: python
+
+    # increment i
+    i += 1
+
+
+GOOD:
+
+.. code-block:: python
+
+    # add a person to the person count
+    i += 1
+
+
+Debugging
+---------
+
+**Q:** How can I debug without adding loads of ``print`` statements/calls everywhere?
+
+**A:** You can use the Python standard library's ``pdb`` and set a breakpoint. 
+Put ``import pdb; pdb.set_trace()`` at the line where you want to stop. 
+`ipdb <https://github.com/gotcha/ipdb>`_ is ``pdb`` with tab-completion and a few other
+bells and whistles, making debugging less painful. There's also `ipdbplugin <https://github.com/flavioamieiro/nose-ipdb>`_ which allows you to drop into ``ipdb`` from
+`nose <https://github.com/nose-devs/nose>`_ when a test fails via
+
+.. code-block:: shell
+
+    nosetests --ipdb # or --ipdb-failures
+
+**Q:** Would a logging hook be a solution?
+
+**A:** That's probably a bit overkill. See the suggestions above.
+
+Pandas Library 
+==============
+
+Source Comments
+---------------
+
+* It would be nice to add more source comments to quickly understand the context
+  when chiming in to fix an issue
+
+Testing
+-------
+
+**Q:** Why don't test functions have a docstring?
+
+**A:** If your tests are self-contained and aren't `sprawling ecosystems of spaghetti <http://cdn.memegenerator.net/instances/250x250/26336623.jpg>`_ then having a docstring
+is redundant. Also, the test name is usually (and should be!) very descriptive. 
+Remember there's no character limit for variable names. We're not using FORTRAN.
+
+**Q:** ``DataFrame`` and other ``pandas`` objects often many properties/methods.
+What is the level of detail that I should consider when I'm writing my test(s)?
+
+**A:** See the previous question/answer. Strive to test one and only one thing.
+You could even separate out your tests by their formal parameters if you want
+things to be *really* self-contained.
+
+**Q:** Should I consider possible corner cases of my implementation?
+
+**A:** The answer is a resounding **YES**! In some cases you may come across
+something that is very pathological. In those cases you should ask a core developer.
+
+Complexity
+----------
+
+* Some modules (e.g. io/parsers.py) seem to have grown into very high complexity.
+  It is very time consuming to find out what is done where just for fixing a small bug.
+* a splitting into several modules would be good
+* more in-code comments telling why something is done and under which condition and
+  for what expected result.
+
+
+Docstrings
+----------
+
+* even internal functions shall have a simple 1-line docstring
diff --git a/doc/source/development/testing.rst b/doc/source/development/testing.rst
new file mode 100644
index 0000000000000..269638a062371
--- /dev/null
+++ b/doc/source/development/testing.rst
@@ -0,0 +1,246 @@
+.. _testing:
+
+{{ header }}
+
+=======
+Testing
+=======
+
+.. contents:: Table of contents:
+   :local:
+
+First off - thank you for writing test cases - they're really important in
+developing pandas!
+
+Typical Imports
+===============
+
+.. code-block:: python
+
+    import nose
+    import unittest
+    import pandas.util.testing as tm
+    from pandas.util.testing import makeCustomDataframe as mkdf
+
+Making your tests behave well
+=============================
+
+``pandas`` committers run test cases after every change (as does Travis), so it's
+important that you make your tests well-behaved. Balancing that, it's important
+that your test cases cover the functionality of your addition, so that when
+others make changes, they can be confident that they aren't introducing errors
+in your code.  This includes:
+
+1. marking network-using test cases with ``@network`` (see below).
+2. marking slow tests with ``@slow``
+3. using smaller test cases where it makes sense (for example, if you're
+   testing a ``numexpr`` evaluation, you can generally just set ``expr._MIN_ELEMENTS = 0``
+   and go ahead, rather than needing to test on a frame of at least 10K
+   elements).
+4. making sure to skip tests (or even test files) if a required import is not
+   available.
+
+In addition, stylistically, the preference is to group multiple related tests
+under one test function and *not* to use the generator functionality of nose
+in order to keep the actual # of tests small.
+
+E.g.:
+
+.. code-block:: python
+
+    @slow
+    def test_million_element_arithmetic():
+        df = mkdf(100000, 100000)
+        tm.assert_frame_equal(df.mod(df) * df * 0, df * 0)
+
+Additional imports
+------------------
+
+When creating a subclass of ``unittest.TestCase`` there are useful instance
+methods such as ``self.assertEqual(a, b)`` that allow you to test the equality
+of two objects. These are not available *as functions* in the Python standard
+library. However, these methods are available as functions in the ``nose.tools``
+module. To use ``self.assertEqual(a, b)`` in a function you would put
+``from nose.tools import assert_equal`` somewhere in the file and then call it
+wherever you need it.
+
+**Important**: make sure to document failure conditions (and use the
+``assertRaisesRegexp`` where necessary to make it clearer *which* exception
+you want to get). Testing for ``Exception`` is strongly discouraged.
+
+Testing using a File
+====================
+
+This context manager allows safe read/write access to a temporary file, with
+a generated filename (or your filename if provided). The file will be
+automatically deleted when the context block is exited.
+
+.. code-block:: python
+
+    with tm.ensure_clean('my_file_path') as path:
+        # do something with the path
+
+Testing for Exceptions
+======================
+
+Generally, it's not acceptable to just check that something raises ``Exception``,
+because that tends to mask a lot of errors. For example, if a function's signature
+changes between releases, you could be catching the wrong kind of error altogether.
+Going forward, the goal is to have no test cases that pass if ``Exception`` or a
+subclass is raised (we're not quite there yet).
+
+Another element that is helpful is to use ``assertRaisesRegexp`` from ``pandas.util.testing``.
+It lets you be very explicit with what you expect (and prevents hiding errors like
+changing signatures, etc.)
+
+.. code-block:: python
+
+    with tm.assertRaises(ValueError):
+        raise ValueError("an error")
+    with tm.assertRaisesRegexp(TypeError, 'invalid literal'):
+        int('abc')
+
+Handling tests requiring network connectivity
+=============================================
+
+**Please run your tests without an internet connection before submitting a PR!** (it's really important that your tests *not* fail when you have no internet connection (i.e., they should skip with out a network connection). In general, network tests are finicky. All tests that involve networking *must* be marked as "network", either by using the ``network`` decorator or the ``with_connectivity_check`` decorator from ``pandas.util.testing``.Unless you *absolutely* need to test that a function/method correctly handles connectivity errors, you should use the ``network`` decorator, which will catch all ``IOError`` s (which includes ``URLError``). If you believe that your test case will only fail if you simply aren't connected to the internet, you can use the ``with_connectivity_test`` to check:
+
+.. code-block:: python
+
+    >>> @with_connectivity_check
+    ... def test_my_function():
+    ...     urllib2.urlopen("funny://rabbithead")
+    >>> test_my_function()
+    Traceback (most recent call last)
+        ...
+    URLError...#some message
+
+If you want to have the decorator always raise errors, just pass ``raise_on_error=True``
+to the ``network`` decorator:
+
+.. code-block:: python
+
+    >>> @network(raise_on_error=True)
+    ... def test2():
+    ...     raise URLError("WRONG!")
+    Traceback (most recent call last)
+        ...
+    URLError: WRONG!
+
+The ``with_connectivity_check`` decorator defaults to checking ``http://www.google.com``
+to determine whether it is connected. But if you had a test that depends on yahoo,
+it might make sense to check yahoo instead:
+
+.. code-block:: python
+
+    @with_connectivity_check("http://www.yahoo.com")
+    def some_test_with_yahoo():
+        # do something etc.
+
+It's a good idea to break up network tests into at least two parts:
+
+1. Tests that check that the code works and gracefully handles errors.
+2. Tests that really only matter if you have network connectivity (like making
+   sure that the current Google Analytics feed is being processed properly).
+
+For (1), you might want to use ``@network(raise_on_error=True)``, because those
+tests should *not* fail without connectivity.
+
+For (2), you should definitely suppress network errors, and, particularly if you
+have a slow test, you may even want to check for connectivity *first* (so the
+test never even runs if there isn't a network connection). You can do that easily
+by passing ``check_before_test=True`` to ``with_connectivity_check``:
+
+.. code-block:: python
+
+    @with_connectivity_check("http://www.somespecificsite.com", check_before_test=True)
+    def some_test():
+        for i in range(1000):
+            test_some_really_long_function(i)
+
+Testing for Warnings
+====================
+
+To test for warnings, you can use the ``assert_produces_warning`` contextmanager,
+which checks that your code produces a warning.
+
+Probably the most common case is just a test case for a DeprecationWarning:
+
+.. code-block:: python
+
+    >>> with assert_produces_warning(DeprecationWarning):
+    ...     some_function_that_raises_deprecation_warning()
+
+With no arguments, it checks that any warning is raised.
+
+.. code-block:: python
+
+    >>> import warnings
+    >>> with assert_produces_warning():
+    ...     warnings.warn(UserWarning())
+    ...
+
+When passed False, it checks that *no* warnings are raised.
+
+.. code-block:: python
+
+    >>> with assert_produces_warning(False):
+    ...     warnings.warn(RuntimeWarning())
+    ...
+    Traceback (most recent call last):
+        ...
+    AssertionError: Caused unexpected warning(s): ['RuntimeWarning'].
+
+Finally, if you pass it a warning class, it will check that the *specific* 
+class of warning was raised and no other.
+
+.. code-block:: python
+
+    >>> with assert_produces_warning(UserWarning):
+    ...     warnings.warn(RuntimeWarning())
+    Traceback (most recent call last):
+        ...
+    AssertionError: Did not see expected warning of class 'UserWarning'.
+
+Reading from either a URL or zip file
+=====================================
+
+Reading from a url
+------------------
+
+.. code-block:: python
+
+    from pandas.io.common import urlopen
+    with urlopen('http://www.google.com') as url:
+        raw_text = url.read()
+
+
+Reading a file named ``file.txt`` that's inside of a zip file named ``file.zip``
+--------------------------------------------------------------------------------
+
+.. code-block:: python
+
+    from pandas.io.common import ZipFile
+    with ZipFile('file.zip') as zf:
+        raw_text = zf.read('file.txt')
+
+Hook up travis-ci
+=================
+
+We use travis for testings the entire library across various python versions.
+If you [hook up your fork](http://about.travis-ci.org/docs/user/getting-started/)
+to run travis, then it is displayed prominently whether your pull request passes
+or fails the testing suite. This is incredibly helpful.
+
+If it shows that it passes, great! We can consider merging. If there's a failure,
+this let's you and us know there is something wrong, and needs some attention
+before it can be considered for merging.
+
+Sometimes Travis will say a change failed for reasons unrelated to your pull
+request. For example there could be a build error or network error. To get Travis
+to retest your pull request, do the following:
+
+.. code-block:: shell
+
+    git commit --amend -C HEAD
+    git push origin <yourbranch> -f`