-
Notifications
You must be signed in to change notification settings - Fork 7
Memory management (fixes #22, closes #24) #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This adds a test running step and a zig.zon manifest. Also adds to the .gitignore the new location of caches, .zig-out.
halfMatch now correctly manages its own memory, like it's supposed to.
The tests currently create Diffs with constant/static text, which causes a panic on any attempt to free said text. This pass fixes a couple leaks, and lays a foundation for consistently freeing any diff's .text field when no longer in use: this requires that, as it will be in real programs, the diff text is allocated to begin with.
I'll need to knock out all the tests in diffMergeSemantic so that I can flip the rest of the diffCleanupMerge tests on, without trying to free .rodata memory. The real annoyance of the decision to punt on memory management is in the tests, which all have to be rewritten. Whoever you are, you could at least have written the tests responsibly, and spared the person who cleans up your sloppy code from a bunch of drudgery.
Found a double free in the falsed-out one, so that's next up...
Also changing the use of .replaceRange, which lead me to think that the item at the pointer location was being, y'know, replaced, with a use of .insert, which is,, correct.
I had missed that some slices in cleanupSemantic were being taken from the same string/slice that was already at that index, due to the prior code renaming them. Ganbatte!
The library now manages its own memory, as it should.
I haven't looked closely at this PR yet but it seems like a nice improvement at a glance. One thing I would like to see is some kind of benchmark, perhaps just of the tests or diffing some large files. Also would like to hear back from someone who's run zls with these changes and a debug or release safe build for a few days or so. |
A benchmark would be a valid addition to the library, but one I don't have time to write. To be fair to the PR, the benchmark should be run against the existing library both with and without an arena used. The PR still allows for an arena, after all, but it doesn't require it. That would separate performance regressions caused by the memory management changes, from any which are due to not using an arena (and therefore necessary to use it correctly). Because all things being equal, it's probably faster to run most code with an arena, and free it all at the end, but this is a tradeoff for additional memory pressure, so that's not a guarantee. I did eliminate some unnecessary copying in the process of switching over to valid use of any allocator, so it's possible that the code is faster both with an arena and without. I look forward to your conclusions. |
How about just measuring the runtime of the tests before and after this PR? You could just modify build.zig to install the tests and run I agree with what you said about the existing code having an advantage. Seems like using an arena both before and after might be the closest to a fair comparison since free/destroy for arenas is a no-op afaik. I think that having some kind timings would go a long way to inspire confidence in these changes. Also if you use zls: build it with this PR in release safe and try running it for a few days or more and report back if you notice anything unusual? |
That's more straightforward, yes. It's a bit tricky to set up so that the tests can be run with either an arena or not, but taking a baseline is very simple. "bit tricky" meaning readily achievable with comptime and a conditional defer on deinitializing the test allocator, also moving it to a global. But just running an A/B test with the branches and system |
Original branch:
New branch using testing allocator:
So in Debug mode, 1.15X slower. But! That's in safety-mode, when the testing allocator doesn't actually release memory, and does comprehensive checks to prevent leaks and report use-after-free and double-free. Something interesting happens when we switch to
Which barely changes, and here's the
So in release mode, the original arena code is 1.06X slower than the memory management branch, a negative regression of 94%, 6% faster, I never know which framing of these numbers someone is accustomed to. This is readily interpreted: almost all allocations are u8, so the allocator has a lot of flexibility for reusing recently-freed memory. Also bear in mind that the tests are inherently doing more work, since all Diff strings except the comparison ones have to be duped so that freeing them doesn't segfault, so the practical savings are possibly greater than this quick check shows. Worth emphasizing that the short duration makes these numbers pretty fuzzy, and more and larger tests with some built-in benchmarking would be a useful addition. I also think it's worthwhile to set the tests up so the allocator can be swapped, if I find some time I might do that part. But as a preliminary, it looks like the changes have a minimal effect on performance, and a positive one where it counts. Perhaps worth noting: I ran each of the four tests three times and chose the middle value, as a cheap way to reduce bias, and also so that I'd spot it if there were large differences between runs. The difference were always under ten milliseconds. |
Thanks! One thing to consider, zls install notes do recommend using release safe build. So that might be the most important build mode to consider.
Oh thats right. I forgot about this. Maybe benching the tests isn't a good measurement because of this added work. My apologies if thats the case. Perhaps @SuperAuguste or @Techatrix or someone else might propose a better way for us to benchmark these changes from zls. Is there any existing way to benchmark zls we should be using here? If not perhaps a better benchmark might be to just time zls tests? I can't remember if all zls' tests use the same allocation strat as in main. Also please let us know these benchmarks are even necessary. I don't want to be making unnecessary work for anyone. |
I'm afraid I don't see what role zls plays here. To be clear, that's not a disagreement, I literally don't know what part zls can play in testing this. But (easily done) middle-of-three for ReleaseSafe for the
and for
That's within the margin of error where it's pure luck which one completed quicker. I think the important part here is that there are no observable performance regressions with the data I can easily generate. If anyone would like to add more test data, or set up some timing from within the test itself, I would be happy to merge that and run it as well. |
I was interested to see if there where any differences in memory usage so I did some benchmarks myself. diff Sema.zig & Sema.zig (100 times per run)[nix-shell:~/repos/diffz]$ poop -d 10000 ./bench1-old-arena-fast ./bench1-new-arena-fast ./bench1-new-gpa-fast
Benchmark 1 (190 runs): ./bench1-old-arena-fast
measurement mean ± σ min … max outliers delta
wall_time 52.0ms ± 324us 50.8ms … 53.1ms 12 ( 6%) 0%
peak_rss 3.37MB ± 15.8KB 3.32MB … 3.39MB 16 ( 8%) 0%
cpu_cycles 45.5M ± 317K 44.9M … 46.7M 2 ( 1%) 0%
instructions 14.1M ± 2.06K 14.1M … 14.1M 17 ( 9%) 0%
cache_references 1.22M ± 24.5K 1.15M … 1.25M 17 ( 9%) 0%
cache_misses 2.54K ± 1.99K 762 … 15.4K 21 (11%) 0%
branch_misses 695 ± 42.7 589 … 918 7 ( 4%) 0%
Benchmark 2 (191 runs): ./bench1-new-arena-fast
measurement mean ± σ min … max outliers delta
wall_time 51.9ms ± 334us 51.3ms … 53.2ms 9 ( 5%) - 0.3% ± 0.1%
peak_rss 3.50MB ± 17.3KB 3.46MB … 3.58MB 19 (10%) 💩+ 3.9% ± 0.1%
cpu_cycles 45.5M ± 360K 44.9M … 47.1M 5 ( 3%) - 0.1% ± 0.1%
instructions 14.1M ± 2.14K 14.1M … 14.1M 18 ( 9%) + 0.0% ± 0.0%
cache_references 1.22M ± 19.3K 1.14M … 1.26M 7 ( 4%) + 0.2% ± 0.4%
cache_misses 1.99K ± 1.78K 656 … 20.2K 15 ( 8%) ⚡- 21.6% ± 14.9%
branch_misses 629 ± 43.7 540 … 749 4 ( 2%) ⚡- 9.4% ± 1.2%
Benchmark 3 (186 runs): ./bench1-new-gpa-fast
measurement mean ± σ min … max outliers delta
wall_time 53.1ms ± 273us 52.6ms … 54.1ms 7 ( 4%) 💩+ 2.0% ± 0.1%
peak_rss 3.50MB ± 21.9KB 3.47MB … 3.61MB 1 ( 1%) 💩+ 3.9% ± 0.1%
cpu_cycles 45.7M ± 403K 45.0M … 47.3M 9 ( 5%) + 0.3% ± 0.2%
instructions 14.1M ± 1.81K 14.1M … 14.1M 14 ( 8%) + 0.2% ± 0.0%
cache_references 1.22M ± 12.8K 1.14M … 1.25M 5 ( 3%) + 0.1% ± 0.3%
cache_misses 3.13K ± 1.93K 1.34K … 20.0K 11 ( 6%) 💩+ 23.0% ± 15.6%
branch_misses 868 ± 78.7 611 … 1.10K 6 ( 3%) 💩+ 25.0% ± 1.8% diff Sema.zig & AstGen.zig[nix-shell:~/repos/diffz]$ poop -d 10000 ./bench2-old-arena-fast ./bench2-new-arena-fast ./bench2-new-gpa-fast
Benchmark 1 (10 runs): ./bench2-old-arena-fast
measurement mean ± σ min … max outliers delta
wall_time 1.01s ± 387us 1.01s … 1.01s 0 ( 0%) 0%
peak_rss 58.8MB ± 63.3KB 58.7MB … 58.9MB 0 ( 0%) 0%
cpu_cycles 3.83G ± 8.37M 3.82G … 3.84G 0 ( 0%) 0%
instructions 10.2G ± 48.3M 10.1G … 10.2G 0 ( 0%) 0%
cache_references 2.84M ± 64.2K 2.75M … 2.92M 0 ( 0%) 0%
cache_misses 68.8K ± 23.5K 54.6K … 123K 0 ( 0%) 0%
branch_misses 50.4M ± 135K 50.3M … 50.7M 0 ( 0%) 0%
Benchmark 2 (10 runs): ./bench2-new-arena-fast
measurement mean ± σ min … max outliers delta
wall_time 1.01s ± 422us 1.01s … 1.01s 0 ( 0%) - 0.0% ± 0.0%
peak_rss 58.8MB ± 67.7KB 58.7MB … 58.9MB 0 ( 0%) - 0.0% ± 0.1%
cpu_cycles 3.83G ± 10.5M 3.82G … 3.84G 0 ( 0%) - 0.0% ± 0.2%
instructions 10.2G ± 62.8M 10.1G … 10.3G 0 ( 0%) + 0.2% ± 0.5%
cache_references 2.84M ± 62.9K 2.73M … 2.95M 0 ( 0%) + 0.0% ± 2.1%
cache_misses 65.0K ± 26.4K 52.1K … 139K 1 (10%) - 5.5% ± 34.1%
branch_misses 50.5M ± 205K 50.1M … 50.8M 0 ( 0%) + 0.1% ± 0.3%
Benchmark 3 (10 runs): ./bench2-new-gpa-fast
measurement mean ± σ min … max outliers delta
wall_time 1.01s ± 370us 1.01s … 1.01s 0 ( 0%) - 0.1% ± 0.0%
peak_rss 41.7MB ± 41.9KB 41.6MB … 41.7MB 1 (10%) ⚡- 29.1% ± 0.1%
cpu_cycles 3.84G ± 1.96M 3.83G … 3.84G 0 ( 0%) + 0.1% ± 0.1%
instructions 9.97G ± 171M 9.50G … 10.1G 1 (10%) - 1.8% ± 1.2%
cache_references 2.61M ± 58.3K 2.48M … 2.69M 0 ( 0%) ⚡- 7.8% ± 2.0%
cache_misses 56.3K ± 8.48K 49.7K … 78.8K 1 (10%) - 18.1% ± 24.1%
branch_misses 50.4M ± 1.02M 47.5M … 51.0M 1 (10%) - 0.1% ± 1.4% diff Sema.zig & Sema2.zig (100 times per run)
[nix-shell:~/repos/diffz]$ poop -d 10000 ./bench3-old-arena-fast ./bench3-new-arena-fast ./bench3-new-gpa-fast
Benchmark 1 (70 runs): ./bench3-old-arena-fast
measurement mean ± σ min … max outliers delta
wall_time 143ms ± 642us 142ms … 145ms 0 ( 0%) 0%
peak_rss 5.90MB ± 6.38KB 5.90MB … 5.92MB 0 ( 0%) 0%
cpu_cycles 313M ± 848K 312M … 315M 0 ( 0%) 0%
instructions 997M ± 2.00K 997M … 997M 3 ( 4%) 0%
cache_references 1.56M ± 22.5K 1.46M … 1.59M 2 ( 3%) 0%
cache_misses 43.3K ± 6.13K 32.2K … 57.1K 0 ( 0%) 0%
branch_misses 2.68K ± 155 2.10K … 2.97K 1 ( 1%) 0%
Benchmark 2 (70 runs): ./bench3-new-arena-fast
measurement mean ± σ min … max outliers delta
wall_time 143ms ± 653us 142ms … 145ms 2 ( 3%) - 0.1% ± 0.2%
peak_rss 5.90MB ± 7.43KB 5.90MB … 5.93MB 0 ( 0%) + 0.0% ± 0.0%
cpu_cycles 313M ± 737K 312M … 315M 2 ( 3%) - 0.1% ± 0.1%
instructions 997M ± 2.96K 997M … 997M 4 ( 6%) + 0.0% ± 0.0%
cache_references 1.50M ± 12.6K 1.45M … 1.53M 3 ( 4%) ⚡- 4.0% ± 0.4%
cache_misses 37.0K ± 6.94K 20.6K … 53.8K 0 ( 0%) ⚡- 14.4% ± 5.0%
branch_misses 2.17K ± 152 1.83K … 2.46K 0 ( 0%) ⚡- 19.0% ± 1.9%
Benchmark 3 (71 runs): ./bench3-new-gpa-fast
measurement mean ± σ min … max outliers delta
wall_time 142ms ± 616us 140ms … 144ms 2 ( 3%) - 1.0% ± 0.1%
peak_rss 5.91MB ± 7.77KB 5.90MB … 5.93MB 0 ( 0%) + 0.0% ± 0.0%
cpu_cycles 303M ± 1.03M 302M … 308M 3 ( 4%) ⚡- 3.2% ± 0.1%
instructions 997M ± 5.43K 997M … 997M 6 ( 8%) + 0.0% ± 0.0%
cache_references 1.62M ± 31.9K 1.54M … 1.67M 0 ( 0%) 💩+ 3.7% ± 0.6%
cache_misses 49.0K ± 22.9K 13.1K … 81.8K 0 ( 0%) + 13.3% ± 12.8%
branch_misses 3.20K ± 258 2.33K … 3.71K 4 ( 6%) 💩+ 19.7% ± 2.6% There isn't much of an impact to wall time but there is a nice reduction on peak RSS in the Sema.zig vs. AstGen.zig run. |
Alright, that's sorted. Worth noting, that this:
Is where the leaks were: in the code without coverage. I described it incorrectly, though, it was in the Coverage now stands at 99.2%, and suggests that |
Co-authored-by: Techatrix <[email protected]>
…into memory-management
Co-authored-by: Techatrix <[email protected]>
I've just pulled the latest changes from this branch and am test driving. Didn't notice any issues since yesterday. |
Added Running this test takes an extraordinarily long time, because the nature of the algorithm is a bunch of tiny allocations, and no obvious way to batch them, and the exhaustive test function is quadratic by the number of allocations. It triggers line mode through the "front door" so to speak, and can no doubt be replaced with something shorter which calls Remaining un-covered lines are little things like the Diff format string, and some errdefers which I don't think are triggerable. But I'd rather have a few red lines in kcov than be wrong about that. |
This lets the test set it to a low value. That in turn allows the full allocation failure check to complete in a reasonable amount of time.
Added a field which configures the line-check threshold, using that and truncating the test data gets the allocation failure test down to something comparable to the other tests. That's probably going to be my last tweak to this branch. |
It is in the source text, and should be in diffz.
This PR adds proper memory management to
diffz
.Almost all the tests are ported to use
std.testing.allocator
, and they all pass. There's a few in thediff
test which I didn't feel like rewriting, I would encourage one of the maintainers to do so after the merge.