Skip to content

TST: Make understandable assertion messages #10373

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
sinhrks opened this issue Jun 17, 2015 · 5 comments · Fixed by #10507
Closed

TST: Make understandable assertion messages #10373

sinhrks opened this issue Jun 17, 2015 · 5 comments · Fixed by #10507
Labels
Testing pandas testing functions or related to the test suite
Milestone

Comments

@sinhrks
Copy link
Member

sinhrks commented Jun 17, 2015

I think making assertion messages more understandable helps users who is starting contribution / using pandas as their dependencies. I'm considering examples like power assert, but more focuses what's differences are.

If it worth to be discussed, I'll prepare more examples to fix the specifications.

df1 = pd.DataFrame({'A': [1, 2], 'B': [3, 4]}, index=['a', 'b'])
df2 = pd.DataFrame({'A': [1, 2], 'B': [3, 5]}, index=['a', 'c'])

assert_frame_equal(df1, df2)
# AssertionError: DataFrames are not equal
# [values]: [[1 3] [2 4]] != [[1 3] [2 5]]
# [index values] ['a', 'b'] != ['a', 'c']
# [columns]: equal

idx1 = pd.Index([1, 2], name='x')
idx2 = pd.Index([1, 2], name='y')
assert_index_equal(idx1, idx2)
# AssertionError: Index are not equal
# [values]: equal
# [names]: 'x' != 'y' 

First target should be:

  • assert_frame_equal
  • assert_series_equal
  • assert_index_equal
@sinhrks sinhrks added the Testing pandas testing functions or related to the test suite label Jun 17, 2015
@sinhrks sinhrks added this to the 0.17.0 milestone Jun 17, 2015
@bashtage
Copy link
Contributor

Maybe just list differences?

# AssertionError: Index are not equal
# Differences found:
# Index:
# [names]: 'x' != 'y' 

so that if there are multiple issues, one might see

# AssertionError: Index are not equal
# Differences found:
# Index:
# [index names]: 'x' != 'y' 
# [index values] ['a', 'b'] != ['a', 'c']
# Values:
# [[1 3] [2 4]] != [[1 3] [2 5]]

so that an assertion could do a full DataFrame comparison before deciding to assert, rather than asserting on the first diff found (which can mean many iterations when there are multiple problems)

@sinhrks
Copy link
Member Author

sinhrks commented Jun 19, 2015

I think there are 2 approaches, and now I feel option 2 is preferable.

1. Use current assertion function internally, and output detail when it fails.

This should not increase testing time so much.

# renaming current ``assert_frame_equal`` to ``_assert_frame_equal``
def assert_frame_equal(...):
    try:
        _assert_frame_equal(...)
    except AssertionError:
        output_detailed_differences()
2. Output more understandable errors in each assertion steps (not output whole differences)

This is like what numpy does.

@shoyer
Copy link
Member

shoyer commented Jun 23, 2015

Agreed that option 2 is better. Option 1 is a path to madness.

@jreback
Copy link
Contributor

jreback commented Jun 23, 2015

agreed, a slightly more informative assert_*_equal would be good

@sinhrks
Copy link
Member Author

sinhrks commented Jun 24, 2015

Let me summarize the idea based on option 2 comparing to the numpy behavior.

import numpy as np
import pandas as pd
import pandas.util.testing as pdt
import numpy.testing.utils as npt

Index

Currently, pandas always shows the same assertion like below:

pdt.assert_index_equal(pd.Index([1, 2, 3]), pd.Index([1, 2, 3, 4]))
# AssertionError: [index] left [int64 Int64Index([1, 2, 3], dtype='int64')], right [Int64Index([1, 2, 3, 4], dtype='int64') int64]

Changing assertion message to indicate what the difference is based on following 3 points like numpy does.

  • Shape (size)
  • Values
  • Metadata (name, freq etc. pandas unique)
# numpy

# values
npt.assert_array_equal(np.array([1, 2, 3]), np.array([1, 2, 3, 4]))
# AssertionError: 
# Arrays are not equal
# 
# (shapes (3L,), (4L,) mismatch)
#  x: array([1, 2, 3])
#  y: array([1, 2, 3, 4])

# size
npt.assert_array_equal(np.array([1, 3, 2]), np.array([1, 2, 3]))
# AssertionError: 
# Arrays are not equal
# 
# (mismatch 66.6666666667%)
#  x: array([1, 3, 2])
#  y: array([1, 2, 3])

Series

Messages are different based on the category as below, but it is not enough clear.

  • Shape (size): AssertionError: Length of two iterators not the same: 3 != 4.
  • Index (same as above)
  • Values: AssertionError: expected 4.00000 but got 3.00000, with decimal 5.
  • Metadata (name): AssertionError: attr is not equal [name]: None != 'x'

DataFrame

Messages are different based on the category as below, but it is not enough clear.

  • Shape (size): AssertionError: Length of two iterators not the same: 3 != 4.
  • Index (same as above)
  • Column (same as above)
  • Values: AssertionError: expected 4.00000 but got 3.00000, with decimal 5.

Preferable message format

  • Short description what part is different
  • [left] and [right] comparison based on the subcategory

for example:

assert_index_equal(Index([1, 3, 2]), Index([1, 2, 3]))
# AssertionError: 
# Index values are not equal
# 
# (mismatch 66.6666666667%)
#  [left]: Index([1, 3, 2])
#  [right]: Index([1, 2, 3])

assert_index_equal(Index([1, 2, 3], name='x'), Index([1, 2, 3]))
# AssertionError: 
# Index metadata are not equal
# 
# metadata "name" are not equal
#  [left]: 'x'
#  [right]: None

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Testing pandas testing functions or related to the test suite
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants