Skip to content
This repository was archived by the owner on Dec 22, 2021. It is now read-only.

Fix NumericRange[Double] #489

Closed
wants to merge 4 commits into from
Closed

Conversation

eed3si9n
Copy link
Member

@eed3si9n eed3si9n commented Mar 3, 2018

Ref scala/bug#8518
Ref scala/bug#9875
Ref scala/bug#9874
Ref scala/bug#8620
Ref scala/bug#8670

NumericRange[Double] has been broken because various parts of the code assumes sane plus operation, which is not possible with Double or Float. Within a few iterations of adding 0.1 or 0.2 it can quickly accumulate floating errors that are surprising.

scala> import collection.immutable.NumericRange
import collection.immutable.NumericRange

scala> val r = NumericRange.inclusive(1.0, 2.0, 0.2)(scala.math.Numeric.DoubleAsIfIntegral)
r: scala.collection.immutable.NumericRange.Inclusive[Double] = NumericRange 1.0 to 2.0 by 0.2

scala> assert(r(3) == r.toList(3), s"${r(3)} != ${r.toList(3)}")
java.lang.AssertionError: assertion failed: 1.6 != 1.5999999999999999

scala> val x = NumericRange.inclusive(0.0, 1.0, 0.1)(scala.math.Numeric.DoubleAsIfIntegral)
x: scala.collection.immutable.NumericRange.Inclusive[Double] = NumericRange 0.0 to 1.0 by 0.1

scala> x.drop(3).take(3).toList
res3: List[Double] = List(0.30000000000000004, 0.4) // take(3) mean taking three thing usually

When encountering Double or Float, this internally uses BigDecimal, which can add numbers without losing precision. This will produce range iterator that looks like List(1.0, 1.2, 1.4, 1.6, 1.8, 2.0) when converted into List, as opposed to List(1.0, 1.2, 1.4, 1.5999999999999999, 1.7999999999999998, 1.9999999999999998).

Ref scala/bug#8518
Ref scala/bug#9875
Ref scala/bug#9874

NumericRange[Double] has been broken because various parts of the code assumes sane plus operation, which is not possible with Double or Float. Within a few iterations of adding 0.1 or 0.2 it can quickly accumulate floating errors that are surprising.

```scala
scala> import collection.immutable.NumericRange
import collection.immutable.NumericRange

scala> val r = NumericRange.inclusive(1.0, 2.0, 0.2)(scala.math.Numeric.DoubleAsIfIntegral)
r: scala.collection.immutable.NumericRange.Inclusive[Double] = NumericRange 1.0 to 2.0 by 0.2

scala> assert(r(3) == r.toList(3), s"${r(3)} != ${r.toList(3)}")
java.lang.AssertionError: assertion failed: 1.6 != 1.5999999999999999

scala> val x = NumericRange.inclusive(0.0, 1.0, 0.1)(scala.math.Numeric.DoubleAsIfIntegral)
x: scala.collection.immutable.NumericRange.Inclusive[Double] = NumericRange 0.0 to 1.0 by 0.1

scala> x.drop(3).take(3).toList
res3: List[Double] = List(0.30000000000000004, 0.4)
```

When encountering Double or Float, this internally uses BigDecimal, which can add numbers without losing precision. This will produce range iterator that looks like `List(1.0, 1.2, 1.4, 1.6, 1.8, 2.0)` when converted into List, as opposed to `List(1.0, 1.2, 1.4, 1.5999999999999999, 1.7999999999999998, 1.9999999999999998)`.
@pnf
Copy link
Contributor

pnf commented Mar 3, 2018

Did you consider Kahan’s algorithm? It might be accurate enough and also more efficient.

@pnf
Copy link
Contributor

pnf commented Mar 3, 2018

@eed3si9n
Copy link
Member Author

eed3si9n commented Mar 3, 2018

@pnf Thanks for the pointer. It didn't occur to me, but it makes sense.

@eed3si9n
Copy link
Member Author

eed3si9n commented Mar 4, 2018

I implemented Kahan summation in https://github.com/eed3si9n/collection-strawman/tree/wip/double-range2, but it wasn't accurate enough out of the box.

For test I am using val r = NumericRange.inclusive(0.0, 1000.0, 0.1)(DoubleAsIfIntegral).

collection-strawman> junit/testOnly strawman.collection.NumericRangeTest
[info] Compiling 1 Scala source to /Users/eed3si9n/work/collection-strawman/collections/.jvm/target/scala-2.13.0-M2/classes...
[info] Test run started
[info] Test strawman.collection.NumericRangeTest.emptyIterator started
[info] Test strawman.collection.NumericRangeTest.nonEmptyIterator started
[info] Test strawman.collection.NumericRangeTest.doubleRangeToList started
[info] Test strawman.collection.NumericRangeTest.doubleRangeUneven started
[info] Test strawman.collection.NumericRangeTest.doubleRangeContains started
List(0.0, 0.1, 0.2, 0.30000000000000004, ..snip.. 
37.800000000000004, 37.9, 38.0, 38.1, 38.2, 38.300000000000004, 38.400000000000006, 
38.5, 38.6, 38.7, 38.800000000000004, 38.900000000000006, 39.0, 39.1, 39.2, 
39.300000000000004, 39.400000000000006, 39.5, 39.6, 39.7, 39.800000000000004, 
39.900000000000006, 40.0, 40.1, 40.2, 40.300000000000004, 40.400000000000006, 40.5, 
40.6, 40.7, 40.800000000000004, 40.900000000000006, 41.0, 41.1, 41.2, 41.300000000000004, 
41.400000000000006, 41.5, 41.6, 41.7, 41.800000000000004, 41.900000000000006, 42.0, 
..snip.., 1000.0)
[error] Test strawman.collection.NumericRangeTest.doubleRangeContains failed: java.lang.AssertionError: expected:<-1> but was:<3>, took 0.05 sec

As you can see there the lower bits are pretty noisy, and it's noisy enough for contains(r(i)) to not pass. This required me to run an aggressive round off. I chose 12 decimal, but it probably needed to be calibrated based on the range and step.

But probably because of the rounding, it actually became slower than doing things in BigDecimal land. Here's looping on 10000 points and calling contains(r(i)):

Kahan with rounding
11156 ms
11189 ms
11464 ms

BigDecial without rounding
3416 ms
3011 ms
3263 ms

@sjrd
Copy link
Member

sjrd commented Mar 4, 2018

Reading from my phone right now, so I'm not completely sure, but I think this PR will have severe consequences on the code size in Scala.js, because it will pull in the big number implementation every time one uses NumericRange, whether with floating point numbers or other Numerics.

@eed3si9n
Copy link
Member Author

eed3si9n commented Mar 4, 2018

@sjrd Perhaps I should refactor and provide Double specific Range class?

@julienrf
Copy link
Contributor

julienrf commented Mar 5, 2018

@eed3si9n It looks like you are still using BigDecimals in the base NumericRange implementation, right?

@eed3si9n
Copy link
Member Author

eed3si9n commented Mar 5, 2018

@julienrf It's used in NumericRange companion, but I tried to not use them in sealed class NumericRange[T], assuming that's where it hurts Scala.JS if it needs to code gen for a bunch of types.

@Ichoran
Copy link
Contributor

Ichoran commented Mar 5, 2018

Seth suggests we discuss this here instead of over at scala/bug#10759, so (reposting):

Is it possible to drop numeric ranges that pretend to be for Float and Double? These are just bad news due to numeric imprecision. If you say something like 0.1 to 0.7 by (0.1 + 0.2) you would think you would have three steps, but you either won't, or you will but other code will be randomly inaccurate (because you rounded things enough to make the above work). For what it's worth, Octave's equivalent, [0.1:x:0.7] produces [0.1, 0.4, 0.7] if x is up to 3e-16 above 0.3, but [0.1, 0.4] if it is 4e-16 above 0.3. (And you also get weird cases where for some values of x you end on the expected endpoint when the endpoint is small but miss it if it's bigger.) This just emphasizes that nobody can really "do it right".

Normally I don't like breaking changes, but in the case where there is no way to do the right thing except by not actually using the type that is stated, I think the better thing to do is remove the feature.

Note that sanity is possible if you have typed in numeric literals but not if you just have x to y by z because you can't know whether the imprecision in x and y and z is intended or not. If you can't even tell if your inputs honestly represent the intended values, using BigDecimal is not enough to recover non-surprising behavior. And I'm not sure being "better but still flawed" is justification enough to basically use a different data type than the one you're saying you're using.

@SethTisue
Copy link
Member

I lean towards keeping it, and perhaps adding some cautionary language to the Scaladoc.

If we get rid of DoubleRange and the like, it won't stop people from inlining their own versions into their code and making all the same mistakes. I think we should provide the best version we can and document any limitations or sharp edges. Some people will still get hurt, but they'd have gotten hurt anyway, maybe worse.

Let's give people clean needles, rather than hope that if they only have dirty needles, they'll stop taking drugs. Because they won't stop.

@optician
Copy link
Contributor

optician commented Mar 6, 2018

I agree with @SethTisue. Path through pitfall is unavoidable and better to have well-known pitfall rather than private handmade . Every ambitious dreamer can dig own anyway.

@eed3si9n
Copy link
Member Author

eed3si9n commented Mar 6, 2018

As it stands, Double range is so broken that I can't imagine anyone using it in production.
So, there is a danger in giving users a real-enough looking vehicle that works for day-to-day usage, but the wheels come off when going over 70mph on the highway (112km/h).

Although I enjoyed addressing a handful of issues in this PR, I also don't want to be responsible for accident with Real consequences (pun). I am ok with strawman withdrawing the Float and Double range. It would be cool if the compilation failed with suggestion to migrate to BigDecimal via a (restligeist) macro.

@SethTisue
Copy link
Member

As it stands, Double range is so broken that I can't imagine anyone using it in production

even after your rewrite?

@Ichoran
Copy link
Contributor

Ichoran commented Mar 6, 2018

@SethTisue - The problem is that it is a minefield, and the best way to avoid the mines is to use a BigDecimal range instead. Except then we are saying "this is a Double", but everything acts as if it is a BigDecimal (e.g. performance); we just don't say so.

So we should just come clean and say, "Look, Float and Double just aren't suitable for ranges. Here's a BigDecimal range that you can use almost as easily."

@eed3si9n
Copy link
Member Author

eed3si9n commented Mar 6, 2018

even after your rewrite?

After my change, some might be tempted to use it. But as @Ichoran's examples show, I am only handling sample-code usages of with literals, but not really solving the fundamental issue that adding two Doubles, even unassuming values like 0.1, is not accurate.

Related: it might be cool to reconsider BigDecimal literal scala/bug#5988

@eed3si9n
Copy link
Member Author

eed3si9n commented Mar 6, 2018

I am closing this for now.

@eed3si9n eed3si9n closed this Mar 6, 2018
@NthPortal
Copy link
Contributor

Is there a plan/issue/PR to get rid of Double and Float ranges?

@eed3si9n
Copy link
Member Author

eed3si9n commented Mar 15, 2018

I was gonna send a PR for this at some point, but I can open an issue first in case if anyone else wants to pick it up.

@eed3si9n
Copy link
Member Author

scala/bug#10781

eed3si9n added a commit to eed3si9n/scala that referenced this pull request Mar 25, 2018
Fixes scala/bug#10781

At the basis of range is an assumption that summation works reliably. Unfortunately it is not true for IEEE floating point number where numbers are stored as approximate fraction of binary numbers. Natually, Double and Float ranges are completely broken. See numbers of issues such as scala/bug#8518, scala/bug#9875, scala/bug#9874, scala/bug#8620, scala/bug#8670, and scala/bug#10759.

I've attempted to fix the Double range in scala/collection-strawman#489 by faking it further using BigDecimal, but ultimately I was not able to overcome the fact that IEEE floats are not suited for ranges.

This removes both Float and Double ranges, as well as the fake `Integral[T]` instances that underpins the ranges.
eed3si9n added a commit to eed3si9n/scala that referenced this pull request Mar 25, 2018
Fixes scala/bug#10781

At the basis of range is an assumption that summation works reliably. Unfortunately it is not true for IEEE floating point number where numbers are stored as approximate fraction of binary numbers. Natually, Double and Float ranges are completely broken. See numbers of issues such as scala/bug#8518, scala/bug#9875, scala/bug#9874, scala/bug#8620, scala/bug#8670, and scala/bug#10759.

I've attempted to fix the Double range in scala/collection-strawman#489 by faking it further using BigDecimal, but ultimately I was not able to overcome the fact that IEEE floats are not suited for ranges.

This removes both Float and Double ranges, as well as the fake `Integral[T]` instances that underpin their existence.
eed3si9n added a commit to eed3si9n/scala that referenced this pull request Mar 31, 2018
Fixes scala/bug#10781

At the basis of range is an assumption that summation works reliably. Unfortunately it is not true for IEEE floating point number where numbers are stored as approximate fraction of binary numbers. Natually, Double and Float ranges are completely broken. See numbers of issues such as scala/bug#8518, scala/bug#9875, scala/bug#9874, scala/bug#8620, scala/bug#8670, and scala/bug#10759.

I've attempted to fix the Double range in scala/collection-strawman#489 by faking it further using BigDecimal, but ultimately I was not able to overcome the fact that IEEE floats are not suited for ranges.

This removes both Float and Double ranges, as well as the fake `Integral[T]` instances that underpin their existence.
eed3si9n added a commit to eed3si9n/scala that referenced this pull request Apr 5, 2018
Fixes scala/bug#10781

At the basis of range is an assumption that summation works reliably. Unfortunately it is not true for IEEE floating point number where numbers are stored as approximate fraction of binary numbers. Natually, Double and Float ranges are completely broken. See numbers of issues such as scala/bug#8518, scala/bug#9875, scala/bug#9874, scala/bug#8620, scala/bug#8670, and scala/bug#10759.

I've attempted to fix the Double range in scala/collection-strawman#489 by faking it further using BigDecimal, but ultimately I was not able to overcome the fact that IEEE floats are not suited for ranges.

This removes both Float and Double ranges, as well as the fake `Integral[T]` instances that underpin their existence.
eed3si9n added a commit to eed3si9n/scala that referenced this pull request Apr 12, 2018
Fixes scala/bug#10781

At the basis of range is an assumption that summation works reliably. Unfortunately it is not true for IEEE floating point number where numbers are stored as approximate fraction of binary numbers. Natually, Double and Float ranges are completely broken. See numbers of issues such as scala/bug#8518, scala/bug#9875, scala/bug#9874, scala/bug#8620, scala/bug#8670, and scala/bug#10759.

I've attempted to fix the Double range in scala/collection-strawman#489 by faking it further using BigDecimal, but ultimately I was not able to overcome the fact that IEEE floats are not suited for ranges.

This removes both Float and Double ranges, as well as the fake `Integral[T]` instances that underpin their existence.
eed3si9n added a commit to eed3si9n/scala that referenced this pull request Apr 12, 2018
Fixes scala/bug#10781

At the basis of range is an assumption that summation works reliably. Unfortunately it is not true for IEEE floating point number where numbers are stored as approximate fraction of binary numbers. Natually, Double and Float ranges are completely broken. See numbers of issues such as scala/bug#8518, scala/bug#9875, scala/bug#9874, scala/bug#8620, scala/bug#8670, and scala/bug#10759.

I've attempted to fix the Double range in scala/collection-strawman#489 by faking it further using BigDecimal, but ultimately I was not able to overcome the fact that IEEE floats are not suited for ranges.

This removes both Float and Double ranges, as well as the fake `Integral[T]` instances that underpin their existence.
eed3si9n added a commit to eed3si9n/scala that referenced this pull request Apr 12, 2018
Fixes scala/bug#10781

At the basis of range is an assumption that summation works reliably. Unfortunately it is not true for IEEE floating point number where numbers are stored as approximate fraction of binary numbers. Natually, Double and Float ranges are completely broken. See numbers of issues such as scala/bug#8518, scala/bug#9875, scala/bug#9874, scala/bug#8620, scala/bug#8670, and scala/bug#10759.

I've attempted to fix the Double range in scala/collection-strawman#489 by faking it further using BigDecimal, but ultimately I was not able to overcome the fact that IEEE floats are not suited for ranges.

This removes both Float and Double ranges, as well as the fake `Integral[T]` instances that underpin their existence.
@SethTisue
Copy link
Member

uncle! I'm okay with the removal.

@eed3si9n eed3si9n deleted the wip/double-range branch April 25, 2018 17:34
eed3si9n added a commit to eed3si9n/scala that referenced this pull request Apr 26, 2018
Fixes scala/bug#10781

At the basis of range is an assumption that summation works reliably. Unfortunately it is not true for IEEE floating point number where numbers are stored as approximate fraction of binary numbers. Natually, Double and Float ranges are completely broken. See numbers of issues such as scala/bug#8518, scala/bug#9875, scala/bug#9874, scala/bug#8620, scala/bug#8670, and scala/bug#10759.

I've attempted to fix the Double range in scala/collection-strawman#489 by faking it further using BigDecimal, but ultimately I was not able to overcome the fact that IEEE floats are not suited for ranges.

This removes both Float and Double ranges, as well as the fake `Integral[T]` instances that underpin their existence.
eed3si9n added a commit to eed3si9n/scala that referenced this pull request May 4, 2018
Fixes scala/bug#10781

At the basis of range is an assumption that summation works reliably. Unfortunately it is not true for IEEE floating point number where numbers are stored as approximate fraction of binary numbers. Natually, Double and Float ranges are completely broken. See numbers of issues such as scala/bug#8518, scala/bug#9875, scala/bug#9874, scala/bug#8620, scala/bug#8670, and scala/bug#10759.

I've attempted to fix the Double range in scala/collection-strawman#489 by faking it further using BigDecimal, but ultimately I was not able to overcome the fact that IEEE floats are not suited for ranges.

This removes both Float and Double ranges, as well as the fake `Integral[T]` instances that underpin their existence.
eed3si9n added a commit to eed3si9n/scala that referenced this pull request May 4, 2018
Fixes scala/bug#10781

At the basis of range is an assumption that summation works reliably. Unfortunately it is not true for IEEE floating point number where numbers are stored as approximate fraction of binary numbers. Natually, Double and Float ranges are completely broken. See numbers of issues such as scala/bug#8518, scala/bug#9875, scala/bug#9874, scala/bug#8620, scala/bug#8670, and scala/bug#10759.

I've attempted to fix the Double range in scala/collection-strawman#489 by faking it further using BigDecimal, but ultimately I was not able to overcome the fact that IEEE floats are not suited for ranges.

This removes both Float and Double ranges, as well as the fake `Integral[T]` instances that underpin their existence.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants