Skip to content

Sum/product of empty object array is False/True #10639

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shoyer opened this issue Feb 21, 2018 · 5 comments
Closed

Sum/product of empty object array is False/True #10639

shoyer opened this issue Feb 21, 2018 · 5 comments

Comments

@shoyer
Copy link
Member

shoyer commented Feb 21, 2018

This strange behavior came up in pandas-dev/pandas#19813:

In [23]: np.array([], dtype=object).sum()
Out[23]: False

In [26]: np.array([], dtype=object).prod()
Out[26]: True

It's almost as if NumPy picks a dtype at random (e.g., bool in this case) to use for computing the result. A more obviously and likely more consistent choice would be to use the identities for the appropriate ufuncs (i.e., np.add.identity and np.mul.identity), which are the integers 0 and 1, respectively.

@eric-wieser
Copy link
Member

eric-wieser commented Feb 21, 2018

Duplicate of #8860

This is fixed by #8955, and isbecause:

>>> np.add.identity
False
>>> np.multiply.identity
True

Prior to #8952, these would report 0 and 1, but the results of sum and prod would still be booleans.

This is fixed in #8952 (master), which now causes your example to return 0 and 1.

However, np.logical_or.reduce(np.array([], object)) now returns 0 instead of False, which is less good.
This would be fixed by #8955

@seberg
Copy link
Member

seberg commented Feb 22, 2018

Should it be the long term goal to get rid of identies for object arrays assuming we have an identity/default kwarg?

@TomAugspurger
Copy link

What's the status here? Do we think np.logical_or.reduce(np.array([], object)) will change, and if so, will that "break" the "fix" for the original issue?

@seberg
Copy link
Member

seberg commented Jul 7, 2018

Well, we have an initial kwarg now (next release). So you will be able to override the default in the future. Note that there are some limitations to this, in the sense that the initial has to be a valid initial value for non-empty reductions and is not a default that is ignored for empty ones.

With a good reason, pretty much only useful for object dtype (plausibly NaN can sometimes make sense in this regard, too) we could add an additional default argument.

I do not think we currently plan on a deprecation to force initial to be given for object dtypes or changing the default itself to 0 and 1, which would probably give ugly FutureWarnings though.

@WarrenWeckesser
Copy link
Member

The examples shown above are all working now:

In [1]: import numpy as np

In [2]: np.__version__
Out[2]: '1.22.0.dev0+1504.gcfb981013'

In [3]: np.array([], dtype=object).sum()
Out[3]: 0

In [4]: np.array([], dtype=object).prod()
Out[4]: 1

In [5]: np.logical_or.reduce(np.array([], object))
Out[5]: False

I'm closing the issue, but if there are still problems lurking here, we can reopen or start a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants