-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
Numerical stability #4694
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I think your complaint is just that when you have float32s in numpy, you get more rounding error than when using python's builtin float64s? If so then the solution is just to tell numpy to use float64. If you can't store your data using float32 in general, then you can still ask specific operations to upcast internally by using the (Also in newer versions numpy does use a somewhat more numerically stable implementation for If this doesn't resolve your problem please re-open with more details? |
Err, when I said "If you can't store your data using float32 in general...", of course I meant float64, not float32. |
dtype=np.float64 does not works in my case. array.mean must be 2014-05-10 17:50 GMT+03:00, Nathaniel J. Smith [email protected]:
|
On Sat, May 10, 2014 at 10:23 AM, Lysovenko [email protected]:
This is probably improved in the upcoming 1.9 version. Could you give that Chuck |
On further checking... your array is all small integers, so it turns out that So the only possible way to get a more accurate
(This is with numpy 1.8.0.) |
This way of computing the variance of There is a Wikipedia page about the different numerical properties of various mathematically equivalent ways to compute it http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Na.C3.AFve_algorithm. The Python devs seem to agree that this is 'unreal bullshit' so in PEP 450 they are adding a standard library function to compute variance. On that page they have an example, with a comment
where Edit: Sorry, you probably knew this already, and you were wondering why the numpy mean is worse than your handwritten mean. I guess this is mainly because of the data types of the intermediate calculations, and this seems to have a long history in the numpy bug tracker #1033, #1063, #2448. Edit 2:
I tried this and got 0.202987 float32 for |
No, after digging around for way too long I see that this is not true for earlier versions of numpy. It's true that float32 happens to be able to store the sum of squares (27439206) exactly, even though it is slightly larger than 2^24 = 16777216. It's also true that newer versions of numpy (presumably post- #3685) compute the sum exactly in float32 arithmetic using a slightly clever summation algorithm. But if you use the straightforward summation then after you've accumulated a sum greater than 2^24, you start adding 64008 instead of 64009 every time you try to increment by 2532, and 63000 instead of 63001 when you try to increment by 2512 . In this data set, this happens 118 times, so the computed sum is 27439088, and this error accounts for the 'unreal bullshit' negativeness of the computed variance. |
It is no numerical stability in the NumPy (version1.6.2 but perhaps also in future fersions ).
The text was updated successfully, but these errors were encountered: