-
Notifications
You must be signed in to change notification settings - Fork 298
WIP: Add HashMap variant with incremental resize #166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This patch adds a variant of HashMap, `IncrHashMap`, that supports incremental resizing. It uses the same underlying hash table implementation as `HashMap` (i.e., `RawTable`), but where the standard implementation performs all-at-once resizing when the map must grow to accommodate new elements, this implementation instead spreads the resizing load across subsequent inserts. To do this, it keeps both a new, resized map, and the old, pre-resize map around after a resize. The new map starts out mostly empty, and read operations search in both tables. New inserts go into the new map, but they also move a few items from the old map to the new one. When the old map is empty, it is dropped, and all operations hit only the new map. This map implementation offers more stable insert performance than `HashMap`, but it does so at a cost: - Reads and removals of old or missing keys are slower for a while after a resize. - Inserts are slower than their `HashMap` counterparts for a while after a resize. - Memory is not reclaimed immediately after a resize, only after all the keys have been moved. - `IncrHashMap` is slightly larger than `HashMap`, to account for the bookkeeping needed to manage two maps during and after a resize.
I've opened this as a draft, because I don't yet know if this is something that would even be reasonable to add to I cooked up a simple benchmark (code below), and preliminary results seem pretty promising:
const N: u32 = 1 << 21;
fn main() {
let mut hm = HashMap::new();
let mut t = Instant::now();
let mut mx = 0.0f64;
let mut sum = Duration::new(0, 0);
for i in 0..N {
hm.insert(i, i);
let t2 = Instant::now();
let took = t2.duration_since(t);
t = t2;
mx = mx.max(took.as_secs_f64());
sum += took;
println!("{} std {} ms", i, took.as_secs_f64() * 1000.0);
}
eprintln!(
"HashMap max: {:?}, mean: {:?}",
Duration::from_secs_f64(mx),
sum / N
);
let mut hm = IncrHashMap::new();
let mut t = Instant::now();
let mut mx = 0.0f64;
let mut sum = Duration::new(0, 0);
for i in 0..N {
hm.insert(i, i);
let t2 = Instant::now();
let took = t2.duration_since(t);
t = t2;
mx = mx.max(took.as_secs_f64());
sum += took;
println!("{} incremental {} ms", i, took.as_secs_f64() * 1000.0);
}
eprintln!(
"IncrHashMap max: {:?}, mean: {:?}",
Duration::from_secs_f64(mx),
sum / N
);
} |
Apart from the missing APIs, there are two key performance-sensitive bits that should probably be fixed before using this for anything serious. Both of them come down to avoiding a linear scan over the prefix of the old map that has already been moved. These are: Lines 800 to 806 in 9ee7cea
and Lines 1079 to 1080 in 9ee7cea
The remaining small linear factor that peeks in at the end in the graph above I believe comes from instantiating the iterator over the "old" map during a resize here: Lines 714 to 716 in 9ee7cea
It's not clear that we can be much smarter about that one, but it should also only matter for very large maps. |
I'll add that I'll hold off on working more on this until I hear back whether it's even feasible to land something like this in the first place. And if so, whether this is the right approach (as opposed to, say, making this the default behavior, or having a marker type on |
Interesting... It looks like CI also checks that the test suite runs on MSRV. Is that intentional? Do we want to limit test code to running on Rust 1.32 as well? |
Great work! I think it would be best for this to be a separate crate. |
Ah, that's fantastic, I missed the |
@Amanieu I think the only thing I need from |
@Amanieu I would also love to hear your take on the issues in #166 (comment). I think I need some way to "refresh" a |
I made a crate out of this over in https://github.com/jonhoo/griddle/. The issues above have been filed as issues: jonhoo/griddle#1 ("refreshing" an iterator), jonhoo/griddle#2 (resuming one iterator from another), jonhoo/griddle#3 (cost of starting a new iterator), and jonhoo/griddle#5 (using |
This patch adds a variant of HashMap,
IncrHashMap
, that supportsincremental resizing. It uses the same underlying hash table
implementation as
HashMap
(i.e.,RawTable
), but where the standardimplementation performs all-at-once resizing when the map must grow to
accommodate new elements, this implementation instead spreads the
resizing load across subsequent inserts.
To do this, it keeps both a new, resized map, and the old, pre-resize
map around after a resize. The new map starts out mostly empty, and read
operations search in both tables. New inserts go into the new map, but
they also move a few items from the old map to the new one. When the old
map is empty, it is dropped, and all operations hit only the new map.
This map implementation offers more stable insert performance than
HashMap
, but it does so at a cost:after a resize.
HashMap
counterparts for a whileafter a resize.
the keys have been moved.
IncrHashMap
is slightly larger thanHashMap
, to account for thebookkeeping needed to manage two maps during and after a resize.