Uniform distribution: bias and usize portability #809

dhardy · 2019-05-29T14:29:28Z

Bias

I noticed that the zone produced via signed extension is not a multiple of the range. This means that the implementations for 8- and 16-bit types (@pitdicker) were biased. I added a fix for this which does not appear to have any performance impact.

It's worth mentioning however that the bias is tiny: the biggest deviation from a multiple of the range I could find was 32579 (for range = 65355); since this is sampling from a u32 the probability of generating a biased sample is thus ~7.6e-6. With 100 million samples this bias is still lost in the noise.

Note: the included uniformity test is unfinished because it's pretty useless. Included for example but it should probably just be deleted.

Portability for isize/usize samples

As discussed in #805. Unfortunately this does have a performance hit:

# before:
test distr_uniform_isize                   ... bench:       1,628 ns/iter (+/- 198) = 4914 MB/s
test distr_uniform_usize16                 ... bench:       1,617 ns/iter (+/- 121) = 4947 MB/s
test distr_uniform_usize32                 ... bench:       1,612 ns/iter (+/- 146) = 4962 MB/s
test distr_uniform_usize64                 ... bench:       2,473 ns/iter (+/- 79) = 3234 MB/s
# after:
test distr_uniform_isize                   ... bench:       5,818 ns/iter (+/- 121) = 1375 MB/s
test distr_uniform_usize16                 ... bench:       1,721 ns/iter (+/- 21) = 4648 MB/s
test distr_uniform_usize32                 ... bench:       1,803 ns/iter (+/- 20) = 4437 MB/s
test distr_uniform_usize64                 ... bench:       2,426 ns/iter (+/- 15) = 3297 MB/s

It also results in quite a bit of redundant, ugly code. As such I'm not happy about adding it (though it would be nice to have).

Thoughts? @burdges @vks @pitdicker

burdges · 2019-05-29T20:37:59Z

It's just the usual hit form rejection sampling, yes?

dhardy · 2019-05-30T07:04:00Z

All of these use rejection sampling, which makes the cost variable (especially when the required range is within a factor of 2 of the type's range). I'm not sure why the isize benchmark here suffers so much; the only additional cost should be a bit more branching.

vks · 2019-05-31T11:02:20Z

I'm not sure why the isize benchmark here suffers so much; the only additional cost should be a bit more branching.

Maybe it is branch misprediction? You could try running the benchmarks separately with perf stat.

I think we should have the bias corrections for sure. I'm not so sure about the portability improvements, there is a lot of code duplication (that could be reduced with macros). I would prefer not to promise value-stability for different platforms for platform-dependent types, and state this in the documentation. The tests could be fixed accordingly.

I think for sampling usize everyone expects that this is platform-dependent, but for rand::seq it is less intuitive. We could still consider implementing your fix there instead, or just mention the caveat in the documentation.

dhardy · 2019-06-01T12:22:15Z

Good point @vks that this would be better done within seq code. I count eight uses of gen_range and one of Uniform<usize> within the alias-method weighted index implementation. As such, implementing the suggestion is not trivial but not too hard.

Compatibility: the changes to weighted::AliasMethod are breaking if (a) you use exhaustive match on weighted::WeightedError or (b) you use AliasMethod with more than u32::MAX elements (talk about using gigabytes of memory and non-scalable algorithms!).

Performance: there are some minor wins and losses; nothing too significant I think:

# before:
running 19 tests
test misc_sample_indices_100_of_1G           ... bench:       2,562 ns/iter (+/- 122)
test misc_sample_indices_100_of_1M           ... bench:       2,496 ns/iter (+/- 182)
test misc_sample_indices_100_of_1k           ... bench:         549 ns/iter (+/- 23)
test misc_sample_indices_10_of_1k            ... bench:          89 ns/iter (+/- 4)
test misc_sample_indices_1_of_1k             ... bench:          26 ns/iter (+/- 1)
test misc_sample_indices_200_of_1G           ... bench:       5,048 ns/iter (+/- 271)
test misc_sample_indices_400_of_1G           ... bench:       9,010 ns/iter (+/- 228)
test misc_sample_indices_600_of_1G           ... bench:      13,227 ns/iter (+/- 966)
test seq_iter_choose_from_1000               ... bench:       3,341 ns/iter (+/- 46) = 2394 MB/s
test seq_iter_choose_multiple_10_of_100      ... bench:         931 ns/iter (+/- 142)
test seq_iter_choose_multiple_fill_10_of_100 ... bench:         873 ns/iter (+/- 42)
test seq_iter_unhinted_choose_from_1000      ... bench:       4,850 ns/iter (+/- 234)
test seq_iter_window_hinted_choose_from_1000 ... bench:       1,618 ns/iter (+/- 50)
test seq_shuffle_100                         ... bench:         837 ns/iter (+/- 31)
test seq_slice_choose_1_of_1000              ... bench:       3,336 ns/iter (+/- 211) = 2398 MB/s
test seq_slice_choose_multiple_10_of_100     ... bench:         156 ns/iter (+/- 16)
test seq_slice_choose_multiple_1_of_1000     ... bench:          33 ns/iter (+/- 2)
test seq_slice_choose_multiple_90_of_100     ... bench:         961 ns/iter (+/- 102)
test seq_slice_choose_multiple_950_of_1000   ... bench:       9,164 ns/iter (+/- 377)

running 8 tests
test distr_weighted_alias_method_f64       ... bench:      10,569 ns/iter (+/- 479) = 756 MB/s
test distr_weighted_alias_method_i8        ... bench:       9,906 ns/iter (+/- 603) = 807 MB/s
test distr_weighted_alias_method_large_set ... bench:      10,802 ns/iter (+/- 510) = 740 MB/s
test distr_weighted_alias_method_u32       ... bench:       9,898 ns/iter (+/- 915) = 808 MB/s
test distr_weighted_f64                    ... bench:       8,544 ns/iter (+/- 365) = 936 MB/s
test distr_weighted_i8                     ... bench:      11,145 ns/iter (+/- 931) = 717 MB/s
test distr_weighted_large_set              ... bench:      64,942 ns/iter (+/- 2,191) = 123 MB/s
test distr_weighted_u32                    ... bench:      10,906 ns/iter (+/- 414) = 733 MB/s

# after:
running 19 tests
test misc_sample_indices_100_of_1G           ... bench:       2,851 ns/iter (+/- 263)
test misc_sample_indices_100_of_1M           ... bench:       2,787 ns/iter (+/- 33)
test misc_sample_indices_100_of_1k           ... bench:         543 ns/iter (+/- 26)
test misc_sample_indices_10_of_1k            ... bench:          91 ns/iter (+/- 3)
test misc_sample_indices_1_of_1k             ... bench:          26 ns/iter (+/- 0)
test misc_sample_indices_200_of_1G           ... bench:       4,076 ns/iter (+/- 264)
test misc_sample_indices_400_of_1G           ... bench:       8,293 ns/iter (+/- 225)
test misc_sample_indices_600_of_1G           ... bench:      12,066 ns/iter (+/- 363)
test seq_iter_choose_from_1000               ... bench:       3,768 ns/iter (+/- 73) = 2123 MB/s
test seq_iter_choose_multiple_10_of_100      ... bench:         890 ns/iter (+/- 22)
test seq_iter_choose_multiple_fill_10_of_100 ... bench:         882 ns/iter (+/- 32)
test seq_iter_unhinted_choose_from_1000      ... bench:       4,842 ns/iter (+/- 183)
test seq_iter_window_hinted_choose_from_1000 ... bench:       1,816 ns/iter (+/- 41)
test seq_shuffle_100                         ... bench:         941 ns/iter (+/- 123)
test seq_slice_choose_1_of_1000              ... bench:       3,884 ns/iter (+/- 509) = 2059 MB/s
test seq_slice_choose_multiple_10_of_100     ... bench:         178 ns/iter (+/- 70)
test seq_slice_choose_multiple_1_of_1000     ... bench:          33 ns/iter (+/- 4)
test seq_slice_choose_multiple_90_of_100     ... bench:         973 ns/iter (+/- 69)
test seq_slice_choose_multiple_950_of_1000   ... bench:       9,152 ns/iter (+/- 377)

running 8 tests
test distr_weighted_alias_method_f64       ... bench:      10,665 ns/iter (+/- 175) = 750 MB/s
test distr_weighted_alias_method_i8        ... bench:       9,513 ns/iter (+/- 213) = 840 MB/s
test distr_weighted_alias_method_large_set ... bench:      10,378 ns/iter (+/- 537) = 770 MB/s
test distr_weighted_alias_method_u32       ... bench:       9,581 ns/iter (+/- 234) = 834 MB/s
test distr_weighted_f64                    ... bench:       8,508 ns/iter (+/- 215) = 940 MB/s
test distr_weighted_i8                     ... bench:      10,747 ns/iter (+/- 183) = 744 MB/s
test distr_weighted_large_set              ... bench:      65,314 ns/iter (+/- 3,106) = 122 MB/s
test distr_weighted_u32                    ... bench:      10,897 ns/iter (+/- 235) = 734 MB/s

vks · 2019-06-03T09:53:15Z

src/seq/mod.rs

@@ -451,6 +451,18 @@ impl<'a, S: Index<usize, Output = T> + ?Sized + 'a, T: 'a> ExactSizeIterator
 }


+// Sample a number uniformly between 0 and `ubound`. Uses 32-bit sampling where
+// possible, primarily in order to produce the same output on 32-bit and 64-bit
+// platforms.


Maybe add #[inline] to encourage LLVM?

Makes sense but has negligible effect on benchmarks (seq_iter_choose_from_1000 and seq_iter_window_hinted_choose_from_1000 still being about 12% slower than before this PR). But we can live with this small hit.

vks · 2019-06-03T10:00:56Z

Uniform distributions for SIMD types are currently broken.
Do we want to document value stability among 32- and 64-bit platforms for rand::seq, or is it preferable to leave it unspecified?

Other than the broken tests, this looks good!

dhardy · 2019-06-03T11:03:53Z

I dropped the uniformity test which was responsible for most of the failures and appears useless.

vks · 2019-06-03T12:42:36Z

The remaining failures will be fixed by #813, so I think this can be merged.

The usize64 bench is noticably slower than the others, perhaps due to use of rejection sampling.

Signed extension of zone was incorrect. This method has near identical performance in benchmarks.

Primarily for value stability, also slight performance boost.

dhardy · 2019-06-03T13:25:49Z

Rebased on master; hopefully it passes this time

dhardy force-pushed the uniform-usize branch from 8711bcc to 3f4fda8 Compare June 1, 2019 12:18

vks reviewed Jun 3, 2019

View reviewed changes

dhardy force-pushed the uniform-usize branch from 3f4fda8 to 31348b7 Compare June 3, 2019 11:03

vks approved these changes Jun 3, 2019

View reviewed changes

dhardy added 6 commits June 3, 2019 14:25

Add benchmarks for uniform usize samples

25896a7

The usize64 bench is noticably slower than the others, perhaps due to use of rejection sampling.

Uniform distribution: correct bias (value breaking change)

fb7bb9f

Signed extension of zone was incorrect. This method has near identical performance in benchmarks.

seq module: make gen_range usage stable across platforms

b613749

Make seq::index::sample_rejection generic over uint index types

4a375f6

AliasMethod weighted index: use u32 internally

ca270f3

Primarily for value stability, also slight performance boost.

UniformInt: rename ints_to_reject/zone field

0ac3766

dhardy force-pushed the uniform-usize branch from 6e4abb9 to 0ac3766 Compare June 3, 2019 13:25

dhardy merged commit e108c47 into rust-random:master Jun 3, 2019

dhardy deleted the uniform-usize branch June 3, 2019 13:48

This was referenced Jun 3, 2019

Use ChaCha20 in StdRng and feature-gate SmallRng #792

Merged

Reproducibility of usize samples across architectures #805

Closed

dependabot bot mentioned this pull request Mar 15, 2021

Update rand requirement from 0.7 to 0.8 transparencies/yew#5

Open

robin-near mentioned this pull request Oct 25, 2022

[core] Fix proposals shuffling implementation near/nearcore#7921

Merged

dhardy mentioned this pull request Mar 22, 2024

Output of gen_range is platform dependent for usize #1399

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uniform distribution: bias and usize portability #809

Uniform distribution: bias and usize portability #809

Uh oh!

dhardy commented May 29, 2019

Uh oh!

burdges commented May 29, 2019

Uh oh!

dhardy commented May 30, 2019 •

edited

Loading

Uh oh!

vks commented May 31, 2019

Uh oh!

dhardy commented Jun 1, 2019

Uh oh!

vks Jun 3, 2019

Uh oh!

dhardy Jun 3, 2019

Uh oh!

vks commented Jun 3, 2019 •

edited

Loading

Uh oh!

dhardy commented Jun 3, 2019

Uh oh!

vks commented Jun 3, 2019

Uh oh!

dhardy commented Jun 3, 2019

Uh oh!

Uh oh!

Uh oh!

Uniform distribution: bias and usize portability #809

Uniform distribution: bias and usize portability #809

Uh oh!

Conversation

dhardy commented May 29, 2019

Bias

Portability for isize/usize samples

Uh oh!

burdges commented May 29, 2019

Uh oh!

dhardy commented May 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vks commented May 31, 2019

Uh oh!

dhardy commented Jun 1, 2019

Uh oh!

vks Jun 3, 2019

Choose a reason for hiding this comment

Uh oh!

dhardy Jun 3, 2019

Choose a reason for hiding this comment

Uh oh!

vks commented Jun 3, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dhardy commented Jun 3, 2019

Uh oh!

vks commented Jun 3, 2019

Uh oh!

dhardy commented Jun 3, 2019

Uh oh!

Uh oh!

dhardy commented May 30, 2019 •

edited

Loading

vks commented Jun 3, 2019 •

edited

Loading