From 307ba0cf8ea34ba3ee8fef4b58a045598f3c694f Mon Sep 17 00:00:00 2001 From: Adrian Mastronardi Date: Fri, 21 Feb 2020 22:44:23 -0300 Subject: [PATCH] DOC: Move testing and pandas_development_faq pages from wiki to doc #30232 #20501 --- doc/source/development/index.rst | 2 + .../development/pandas_development_faq.rst | 203 +++++++++++++++ doc/source/development/testing.rst | 246 ++++++++++++++++++ 3 files changed, 451 insertions(+) create mode 100644 doc/source/development/pandas_development_faq.rst create mode 100644 doc/source/development/testing.rst diff --git a/doc/source/development/index.rst b/doc/source/development/index.rst index f8a6bb6deb52d..5b4fcc5e74e83 100644 --- a/doc/source/development/index.rst +++ b/doc/source/development/index.rst @@ -14,10 +14,12 @@ Development contributing code_style + testing maintaining internals extending developer policies roadmap + pandas_development_faq meeting diff --git a/doc/source/development/pandas_development_faq.rst b/doc/source/development/pandas_development_faq.rst new file mode 100644 index 0000000000000..4cd7acd139c1a --- /dev/null +++ b/doc/source/development/pandas_development_faq.rst @@ -0,0 +1,203 @@ +.. _pandas_development_faq: + +{{ header }} + +====================== +Pandas Development FAQ +====================== + +.. contents:: Table of contents: + :local: + +Purpose +======= + +Based on https://github.com/pydata/pandas/pull/4404#issuecomment-22864665 this +wiki page gathers oft-asked questions/comments from contributors to make the +contribution process a bit less painful. + +The aim is to make it easier for + +* Core developers to give advice & accept new code contributions. +* New contributors to find an easier way in for quick and efficient bug-fixes + or feature additions + +While some questions/comments/advice may be applicable to general programming, +these are things that directly relate to ``pandas`` development. + +* `**PR == pull request** `_ +* **core developer:** A person contributing on very high frequency & who is + familiar with the code base and development process of ``pandas``. +* **contributors:** The occasional contributor, maybe from a specific domain, + contributes bug fixes, features or documentation with low frequency, may not + be an every day programmer (e.g. programming scientists or engineer using + pandas for data processing) and looks at things from an end-user perspective. + +Pandas Development & Release Process +==================================== + +Testing +------- + +**Q:** What are some recommendations for writing unit tests? + +**A:** Your test should be self-contained. That is, it should test preferably a +single thing, e.g., a method that you've added to the ``DataFrame`` class. Your +test function/method should start with ``test_`` and the rest of the name should +be related to whatever functionality you're testing, like ``test_replace_with_dict_regex``. + +**Q:** Help! I can't get the tests to run! + +**A:** You probably either have multiple Python versions installed and there's +an ABI (application binary interface) issue or you forgot to build the extension +modules in place. The latter can be done with + +.. code-block:: shell + + python setup.py build_ext --inplace + +from the ``pandas`` directory. + +Travis +------ + +**Q:** Where do I need to change the settings in my GitHub configuration and/or +Travis configuration for the Travis to start builds from my fork? + +**A:** To be filled out. + +**Q:** Why do I need a Travis file in my repo if it's already in the head repository? + +**A:** Because we're not using subversion. Okay, seriously, it's because as far +as ``git`` is concerned *your* repository is the *only* one that exists. There's +really no such thing as a "head" repository in the eyes of ``git``, those are concepts +that we impose on it to make collaboration more effective and easier. This is one +of the nice aspects of `distributed version control `_. + +Documentation +------------- + +**Q:** Does Travis build documentation? + +**A:** Currently, no. There are some issues surrounding Sphinx error reporting. +We are investigating ways to solve this problem. + +Workflow +-------- + +* What is a typical workflow on my local fork? +* Shall I work in a virtualenvironment? +* Shall I work in a virtualenvironment and then copy my changes over into a + clean local fork of my own repo? + +**Q:** Who will be responsible for evaluating my PR? + +**A:** Technically, anyone with push rights to the ``pydata/pandas`` can evaluate +it. In practice, there are a handful of people who are constantly watching the ``pandas`` +repo for new PRs, so most likely it'll be one of them that evaluates it. I'm not +going to list names, but it's not that hard to figure out... + +Criteria for PR +--------------- + +**Q:** What are the criteria for acceptance of a PR? + +**A:** First and foremost, your fix **must not break any existing functionality**, +one indicator of this is that your Travis build passes. Second, just give it some +time. Everyone is busy and @wesm has not (yet?) amassed a ``pandas`` development army. + +**Q:** Do I need to open an issue first? + +**A:** Not necessarily. If you want to submit a documentation change, e.g., a +typo fix, then opening an issue is not necessary. + +Coding Style +------------ + +**Q:** What level of commenting is accepted? + +**A:** The common sense level. Don't overdo it on the comments, and make sure +if you *do* comment that your comments explain *what* your code is doing, not +*how* it is doing it (that's what code is for). + +Obligatory example: + +BAD: + +.. code-block:: python + + # increment i + i += 1 + + +GOOD: + +.. code-block:: python + + # add a person to the person count + i += 1 + + +Debugging +--------- + +**Q:** How can I debug without adding loads of ``print`` statements/calls everywhere? + +**A:** You can use the Python standard library's ``pdb`` and set a breakpoint. +Put ``import pdb; pdb.set_trace()`` at the line where you want to stop. +`ipdb `_ is ``pdb`` with tab-completion and a few other +bells and whistles, making debugging less painful. There's also `ipdbplugin `_ which allows you to drop into ``ipdb`` from +`nose `_ when a test fails via + +.. code-block:: shell + + nosetests --ipdb # or --ipdb-failures + +**Q:** Would a logging hook be a solution? + +**A:** That's probably a bit overkill. See the suggestions above. + +Pandas Library +============== + +Source Comments +--------------- + +* It would be nice to add more source comments to quickly understand the context + when chiming in to fix an issue + +Testing +------- + +**Q:** Why don't test functions have a docstring? + +**A:** If your tests are self-contained and aren't `sprawling ecosystems of spaghetti `_ then having a docstring +is redundant. Also, the test name is usually (and should be!) very descriptive. +Remember there's no character limit for variable names. We're not using FORTRAN. + +**Q:** ``DataFrame`` and other ``pandas`` objects often many properties/methods. +What is the level of detail that I should consider when I'm writing my test(s)? + +**A:** See the previous question/answer. Strive to test one and only one thing. +You could even separate out your tests by their formal parameters if you want +things to be *really* self-contained. + +**Q:** Should I consider possible corner cases of my implementation? + +**A:** The answer is a resounding **YES**! In some cases you may come across +something that is very pathological. In those cases you should ask a core developer. + +Complexity +---------- + +* Some modules (e.g. io/parsers.py) seem to have grown into very high complexity. + It is very time consuming to find out what is done where just for fixing a small bug. +* a splitting into several modules would be good +* more in-code comments telling why something is done and under which condition and + for what expected result. + + +Docstrings +---------- + +* even internal functions shall have a simple 1-line docstring diff --git a/doc/source/development/testing.rst b/doc/source/development/testing.rst new file mode 100644 index 0000000000000..269638a062371 --- /dev/null +++ b/doc/source/development/testing.rst @@ -0,0 +1,246 @@ +.. _testing: + +{{ header }} + +======= +Testing +======= + +.. contents:: Table of contents: + :local: + +First off - thank you for writing test cases - they're really important in +developing pandas! + +Typical Imports +=============== + +.. code-block:: python + + import nose + import unittest + import pandas.util.testing as tm + from pandas.util.testing import makeCustomDataframe as mkdf + +Making your tests behave well +============================= + +``pandas`` committers run test cases after every change (as does Travis), so it's +important that you make your tests well-behaved. Balancing that, it's important +that your test cases cover the functionality of your addition, so that when +others make changes, they can be confident that they aren't introducing errors +in your code. This includes: + +1. marking network-using test cases with ``@network`` (see below). +2. marking slow tests with ``@slow`` +3. using smaller test cases where it makes sense (for example, if you're + testing a ``numexpr`` evaluation, you can generally just set ``expr._MIN_ELEMENTS = 0`` + and go ahead, rather than needing to test on a frame of at least 10K + elements). +4. making sure to skip tests (or even test files) if a required import is not + available. + +In addition, stylistically, the preference is to group multiple related tests +under one test function and *not* to use the generator functionality of nose +in order to keep the actual # of tests small. + +E.g.: + +.. code-block:: python + + @slow + def test_million_element_arithmetic(): + df = mkdf(100000, 100000) + tm.assert_frame_equal(df.mod(df) * df * 0, df * 0) + +Additional imports +------------------ + +When creating a subclass of ``unittest.TestCase`` there are useful instance +methods such as ``self.assertEqual(a, b)`` that allow you to test the equality +of two objects. These are not available *as functions* in the Python standard +library. However, these methods are available as functions in the ``nose.tools`` +module. To use ``self.assertEqual(a, b)`` in a function you would put +``from nose.tools import assert_equal`` somewhere in the file and then call it +wherever you need it. + +**Important**: make sure to document failure conditions (and use the +``assertRaisesRegexp`` where necessary to make it clearer *which* exception +you want to get). Testing for ``Exception`` is strongly discouraged. + +Testing using a File +==================== + +This context manager allows safe read/write access to a temporary file, with +a generated filename (or your filename if provided). The file will be +automatically deleted when the context block is exited. + +.. code-block:: python + + with tm.ensure_clean('my_file_path') as path: + # do something with the path + +Testing for Exceptions +====================== + +Generally, it's not acceptable to just check that something raises ``Exception``, +because that tends to mask a lot of errors. For example, if a function's signature +changes between releases, you could be catching the wrong kind of error altogether. +Going forward, the goal is to have no test cases that pass if ``Exception`` or a +subclass is raised (we're not quite there yet). + +Another element that is helpful is to use ``assertRaisesRegexp`` from ``pandas.util.testing``. +It lets you be very explicit with what you expect (and prevents hiding errors like +changing signatures, etc.) + +.. code-block:: python + + with tm.assertRaises(ValueError): + raise ValueError("an error") + with tm.assertRaisesRegexp(TypeError, 'invalid literal'): + int('abc') + +Handling tests requiring network connectivity +============================================= + +**Please run your tests without an internet connection before submitting a PR!** (it's really important that your tests *not* fail when you have no internet connection (i.e., they should skip with out a network connection). In general, network tests are finicky. All tests that involve networking *must* be marked as "network", either by using the ``network`` decorator or the ``with_connectivity_check`` decorator from ``pandas.util.testing``.Unless you *absolutely* need to test that a function/method correctly handles connectivity errors, you should use the ``network`` decorator, which will catch all ``IOError`` s (which includes ``URLError``). If you believe that your test case will only fail if you simply aren't connected to the internet, you can use the ``with_connectivity_test`` to check: + +.. code-block:: python + + >>> @with_connectivity_check + ... def test_my_function(): + ... urllib2.urlopen("funny://rabbithead") + >>> test_my_function() + Traceback (most recent call last) + ... + URLError...#some message + +If you want to have the decorator always raise errors, just pass ``raise_on_error=True`` +to the ``network`` decorator: + +.. code-block:: python + + >>> @network(raise_on_error=True) + ... def test2(): + ... raise URLError("WRONG!") + Traceback (most recent call last) + ... + URLError: WRONG! + +The ``with_connectivity_check`` decorator defaults to checking ``http://www.google.com`` +to determine whether it is connected. But if you had a test that depends on yahoo, +it might make sense to check yahoo instead: + +.. code-block:: python + + @with_connectivity_check("http://www.yahoo.com") + def some_test_with_yahoo(): + # do something etc. + +It's a good idea to break up network tests into at least two parts: + +1. Tests that check that the code works and gracefully handles errors. +2. Tests that really only matter if you have network connectivity (like making + sure that the current Google Analytics feed is being processed properly). + +For (1), you might want to use ``@network(raise_on_error=True)``, because those +tests should *not* fail without connectivity. + +For (2), you should definitely suppress network errors, and, particularly if you +have a slow test, you may even want to check for connectivity *first* (so the +test never even runs if there isn't a network connection). You can do that easily +by passing ``check_before_test=True`` to ``with_connectivity_check``: + +.. code-block:: python + + @with_connectivity_check("http://www.somespecificsite.com", check_before_test=True) + def some_test(): + for i in range(1000): + test_some_really_long_function(i) + +Testing for Warnings +==================== + +To test for warnings, you can use the ``assert_produces_warning`` contextmanager, +which checks that your code produces a warning. + +Probably the most common case is just a test case for a DeprecationWarning: + +.. code-block:: python + + >>> with assert_produces_warning(DeprecationWarning): + ... some_function_that_raises_deprecation_warning() + +With no arguments, it checks that any warning is raised. + +.. code-block:: python + + >>> import warnings + >>> with assert_produces_warning(): + ... warnings.warn(UserWarning()) + ... + +When passed False, it checks that *no* warnings are raised. + +.. code-block:: python + + >>> with assert_produces_warning(False): + ... warnings.warn(RuntimeWarning()) + ... + Traceback (most recent call last): + ... + AssertionError: Caused unexpected warning(s): ['RuntimeWarning']. + +Finally, if you pass it a warning class, it will check that the *specific* +class of warning was raised and no other. + +.. code-block:: python + + >>> with assert_produces_warning(UserWarning): + ... warnings.warn(RuntimeWarning()) + Traceback (most recent call last): + ... + AssertionError: Did not see expected warning of class 'UserWarning'. + +Reading from either a URL or zip file +===================================== + +Reading from a url +------------------ + +.. code-block:: python + + from pandas.io.common import urlopen + with urlopen('http://www.google.com') as url: + raw_text = url.read() + + +Reading a file named ``file.txt`` that's inside of a zip file named ``file.zip`` +-------------------------------------------------------------------------------- + +.. code-block:: python + + from pandas.io.common import ZipFile + with ZipFile('file.zip') as zf: + raw_text = zf.read('file.txt') + +Hook up travis-ci +================= + +We use travis for testings the entire library across various python versions. +If you [hook up your fork](http://about.travis-ci.org/docs/user/getting-started/) +to run travis, then it is displayed prominently whether your pull request passes +or fails the testing suite. This is incredibly helpful. + +If it shows that it passes, great! We can consider merging. If there's a failure, +this let's you and us know there is something wrong, and needs some attention +before it can be considered for merging. + +Sometimes Travis will say a change failed for reasons unrelated to your pull +request. For example there could be a build error or network error. To get Travis +to retest your pull request, do the following: + +.. code-block:: shell + + git commit --amend -C HEAD + git push origin -f`