Memory management (fixes #22, closes #24) #23

mnemnion · 2024-07-03T13:51:08Z

This PR adds proper memory management to diffz.

Almost all the tests are ported to use std.testing.allocator, and they all pass. There's a few in the diff test which I didn't feel like rewriting, I would encourage one of the maintainers to do so after the merge.

This adds a test running step and a zig.zon manifest. Also adds to the .gitignore the new location of caches, .zig-out.

halfMatch now correctly manages its own memory, like it's supposed to.

The tests currently create Diffs with constant/static text, which causes a panic on any attempt to free said text. This pass fixes a couple leaks, and lays a foundation for consistently freeing any diff's .text field when no longer in use: this requires that, as it will be in real programs, the diff text is allocated to begin with.

I'll need to knock out all the tests in diffMergeSemantic so that I can flip the rest of the diffCleanupMerge tests on, without trying to free .rodata memory. The real annoyance of the decision to punt on memory management is in the tests, which all have to be rewritten. Whoever you are, you could at least have written the tests responsibly, and spared the person who cleans up your sloppy code from a bunch of drudgery.

Found a double free in the falsed-out one, so that's next up...

Also changing the use of .replaceRange, which lead me to think that the item at the pointer location was being, y'know, replaced, with a use of .insert, which is,, correct.

I had missed that some slices in cleanupSemantic were being taken from the same string/slice that was already at that index, due to the prior code renaming them. Ganbatte!

The library now manages its own memory, as it should.

travisstaloch · 2024-07-04T09:24:20Z

I haven't looked closely at this PR yet but it seems like a nice improvement at a glance. One thing I would like to see is some kind of benchmark, perhaps just of the tests or diffing some large files.

Also would like to hear back from someone who's run zls with these changes and a debug or release safe build for a few days or so.

mnemnion · 2024-07-06T19:56:30Z

One thing I would like to see is some kind of benchmark, perhaps just of the tests or diffing some large files.

A benchmark would be a valid addition to the library, but one I don't have time to write.

To be fair to the PR, the benchmark should be run against the existing library both with and without an arena used. The PR still allows for an arena, after all, but it doesn't require it. That would separate performance regressions caused by the memory management changes, from any which are due to not using an arena (and therefore necessary to use it correctly). Because all things being equal, it's probably faster to run most code with an arena, and free it all at the end, but this is a tradeoff for additional memory pressure, so that's not a guarantee.

I did eliminate some unnecessary copying in the process of switching over to valid use of any allocator, so it's possible that the code is faster both with an arena and without. I look forward to your conclusions.

travisstaloch · 2024-07-06T22:58:07Z

A benchmark would be a valid addition to the library, but one I don't have time to write.

How about just measuring the runtime of the tests before and after this PR? You could just modify build.zig to install the tests and run $ time zig-out/bin/tests to get a ballpark. And for more details copy the before and after test binaries somewhere and use something like hyperfine or poop to measure.

I agree with what you said about the existing code having an advantage. Seems like using an arena both before and after might be the closest to a fair comparison since free/destroy for arenas is a no-op afaik.

I think that having some kind timings would go a long way to inspire confidence in these changes.

Also if you use zls: build it with this PR in release safe and try running it for a few days or more and report back if you notice anything unusual?

mnemnion · 2024-07-06T23:17:49Z

How about just measuring the runtime of the tests before and after this PR?

That's more straightforward, yes. It's a bit tricky to set up so that the tests can be run with either an arena or not, but taking a baseline is very simple.

"bit tricky" meaning readily achievable with comptime and a conditional defer on deinitializing the test allocator, also moving it to a global. But just running an A/B test with the branches and system time is easy. Will let you know.

mnemnion · 2024-07-06T23:48:00Z

Original branch:

Executed in  177.64 millis    fish           external
   usr time  133.54 millis    0.05 millis  133.49 millis
   sys time   34.21 millis    1.77 millis   32.43 millis

New branch using testing allocator:

Executed in  204.25 millis    fish           external
   usr time  151.01 millis    0.04 millis  150.97 millis
   sys time   45.58 millis    1.26 millis   44.31 millis

So in Debug mode, 1.15X slower.

But! That's in safety-mode, when the testing allocator doesn't actually release memory, and does comprehensive checks to prevent leaks and report use-after-free and double-free.

Something interesting happens when we switch to -Doptimize=ReleaseFast, here's the original:

________________________________________________________
Executed in  177.07 millis    fish           external
   usr time  132.75 millis    0.05 millis  132.69 millis
   sys time   36.29 millis    1.86 millis   34.44 millis

Which barely changes, and here's the memory-management branch:

________________________________________________________
Executed in  166.93 millis    fish           external
   usr time  127.30 millis    0.04 millis  127.26 millis
   sys time   32.20 millis    1.24 millis   30.96 millis

So in release mode, the original arena code is 1.06X slower than the memory management branch, a negative regression of 94%, 6% faster, I never know which framing of these numbers someone is accustomed to. This is readily interpreted: almost all allocations are u8, so the allocator has a lot of flexibility for reusing recently-freed memory.

Also bear in mind that the tests are inherently doing more work, since all Diff strings except the comparison ones have to be duped so that freeing them doesn't segfault, so the practical savings are possibly greater than this quick check shows.

Worth emphasizing that the short duration makes these numbers pretty fuzzy, and more and larger tests with some built-in benchmarking would be a useful addition. I also think it's worthwhile to set the tests up so the allocator can be swapped, if I find some time I might do that part.

But as a preliminary, it looks like the changes have a minimal effect on performance, and a positive one where it counts.

Perhaps worth noting: I ran each of the four tests three times and chose the middle value, as a cheap way to reduce bias, and also so that I'd spot it if there were large differences between runs. The difference were always under ten milliseconds.

travisstaloch · 2024-07-07T00:54:20Z

Thanks! One thing to consider, zls install notes do recommend using release safe build. So that might be the most important build mode to consider.

Also bear in mind that the tests are inherently doing more work, since all Diff strings except the comparison ones have to be duped so that freeing them doesn't segfault.

Oh thats right. I forgot about this. Maybe benching the tests isn't a good measurement because of this added work. My apologies if thats the case.

Perhaps @SuperAuguste or @Techatrix or someone else might propose a better way for us to benchmark these changes from zls. Is there any existing way to benchmark zls we should be using here? If not perhaps a better benchmark might be to just time zls tests? I can't remember if all zls' tests use the same allocation strat as in main.

Also please let us know these benchmarks are even necessary. I don't want to be making unnecessary work for anyone.

mnemnion · 2024-07-07T05:05:59Z

I'm afraid I don't see what role zls plays here. To be clear, that's not a disagreement, I literally don't know what part zls can play in testing this.

But (easily done) middle-of-three for ReleaseSafe for the main branch:

________________________________________________________
Executed in  171.26 millis    fish           external
   usr time  129.24 millis    0.06 millis  129.18 millis
   sys time   31.87 millis    1.83 millis   30.04 millis

and for memory-management:

________________________________________________________
Executed in  174.95 millis    fish           external
   usr time  130.14 millis    0.04 millis  130.10 millis
   sys time   34.81 millis    2.07 millis   32.74 millis

That's within the margin of error where it's pure luck which one completed quicker. I think the important part here is that there are no observable performance regressions with the data I can easily generate.

If anyone would like to add more test data, or set up some timing from within the test itself, I would be happy to merge that and run it as well.

Techatrix · 2024-07-07T11:41:23Z

I was interested to see if there where any differences in memory usage so I did some benchmarks myself.
All runs are in ReleaseFast with check_lines == true.

diff Sema.zig & Sema.zig (100 times per run)

[nix-shell:~/repos/diffz]$ poop -d 10000 ./bench1-old-arena-fast ./bench1-new-arena-fast ./bench1-new-gpa-fast
Benchmark 1 (190 runs): ./bench1-old-arena-fast
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          52.0ms ±  324us    50.8ms … 53.1ms         12 ( 6%)        0%
  peak_rss           3.37MB ± 15.8KB    3.32MB … 3.39MB         16 ( 8%)        0%
  cpu_cycles         45.5M  ±  317K     44.9M  … 46.7M           2 ( 1%)        0%
  instructions       14.1M  ± 2.06K     14.1M  … 14.1M          17 ( 9%)        0%
  cache_references   1.22M  ± 24.5K     1.15M  … 1.25M          17 ( 9%)        0%
  cache_misses       2.54K  ± 1.99K      762   … 15.4K          21 (11%)        0%
  branch_misses       695   ± 42.7       589   …  918            7 ( 4%)        0%
Benchmark 2 (191 runs): ./bench1-new-arena-fast
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          51.9ms ±  334us    51.3ms … 53.2ms          9 ( 5%)          -  0.3% ±  0.1%
  peak_rss           3.50MB ± 17.3KB    3.46MB … 3.58MB         19 (10%)        💩+  3.9% ±  0.1%
  cpu_cycles         45.5M  ±  360K     44.9M  … 47.1M           5 ( 3%)          -  0.1% ±  0.1%
  instructions       14.1M  ± 2.14K     14.1M  … 14.1M          18 ( 9%)          +  0.0% ±  0.0%
  cache_references   1.22M  ± 19.3K     1.14M  … 1.26M           7 ( 4%)          +  0.2% ±  0.4%
  cache_misses       1.99K  ± 1.78K      656   … 20.2K          15 ( 8%)        ⚡- 21.6% ± 14.9%
  branch_misses       629   ± 43.7       540   …  749            4 ( 2%)        ⚡-  9.4% ±  1.2%
Benchmark 3 (186 runs): ./bench1-new-gpa-fast
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          53.1ms ±  273us    52.6ms … 54.1ms          7 ( 4%)        💩+  2.0% ±  0.1%
  peak_rss           3.50MB ± 21.9KB    3.47MB … 3.61MB          1 ( 1%)        💩+  3.9% ±  0.1%
  cpu_cycles         45.7M  ±  403K     45.0M  … 47.3M           9 ( 5%)          +  0.3% ±  0.2%
  instructions       14.1M  ± 1.81K     14.1M  … 14.1M          14 ( 8%)          +  0.2% ±  0.0%
  cache_references   1.22M  ± 12.8K     1.14M  … 1.25M           5 ( 3%)          +  0.1% ±  0.3%
  cache_misses       3.13K  ± 1.93K     1.34K  … 20.0K          11 ( 6%)        💩+ 23.0% ± 15.6%
  branch_misses       868   ± 78.7       611   … 1.10K           6 ( 3%)        💩+ 25.0% ±  1.8%

diff Sema.zig & AstGen.zig

[nix-shell:~/repos/diffz]$ poop -d 10000 ./bench2-old-arena-fast ./bench2-new-arena-fast ./bench2-new-gpa-fast
Benchmark 1 (10 runs): ./bench2-old-arena-fast
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          1.01s  ±  387us    1.01s  … 1.01s           0 ( 0%)        0%
  peak_rss           58.8MB ± 63.3KB    58.7MB … 58.9MB          0 ( 0%)        0%
  cpu_cycles         3.83G  ± 8.37M     3.82G  … 3.84G           0 ( 0%)        0%
  instructions       10.2G  ± 48.3M     10.1G  … 10.2G           0 ( 0%)        0%
  cache_references   2.84M  ± 64.2K     2.75M  … 2.92M           0 ( 0%)        0%
  cache_misses       68.8K  ± 23.5K     54.6K  …  123K           0 ( 0%)        0%
  branch_misses      50.4M  ±  135K     50.3M  … 50.7M           0 ( 0%)        0%
Benchmark 2 (10 runs): ./bench2-new-arena-fast
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          1.01s  ±  422us    1.01s  … 1.01s           0 ( 0%)          -  0.0% ±  0.0%
  peak_rss           58.8MB ± 67.7KB    58.7MB … 58.9MB          0 ( 0%)          -  0.0% ±  0.1%
  cpu_cycles         3.83G  ± 10.5M     3.82G  … 3.84G           0 ( 0%)          -  0.0% ±  0.2%
  instructions       10.2G  ± 62.8M     10.1G  … 10.3G           0 ( 0%)          +  0.2% ±  0.5%
  cache_references   2.84M  ± 62.9K     2.73M  … 2.95M           0 ( 0%)          +  0.0% ±  2.1%
  cache_misses       65.0K  ± 26.4K     52.1K  …  139K           1 (10%)          -  5.5% ± 34.1%
  branch_misses      50.5M  ±  205K     50.1M  … 50.8M           0 ( 0%)          +  0.1% ±  0.3%
Benchmark 3 (10 runs): ./bench2-new-gpa-fast
  measurement          mean ± σ            min … max           outliers         delta
  wall_time          1.01s  ±  370us    1.01s  … 1.01s           0 ( 0%)          -  0.1% ±  0.0%
  peak_rss           41.7MB ± 41.9KB    41.6MB … 41.7MB          1 (10%)        ⚡- 29.1% ±  0.1%
  cpu_cycles         3.84G  ± 1.96M     3.83G  … 3.84G           0 ( 0%)          +  0.1% ±  0.1%
  instructions       9.97G  ±  171M     9.50G  … 10.1G           1 (10%)          -  1.8% ±  1.2%
  cache_references   2.61M  ± 58.3K     2.48M  … 2.69M           0 ( 0%)        ⚡-  7.8% ±  2.0%
  cache_misses       56.3K  ± 8.48K     49.7K  … 78.8K           1 (10%)          - 18.1% ± 24.1%
  branch_misses      50.4M  ± 1.02M     47.5M  … 51.0M           1 (10%)          -  0.1% ±  1.4%

diff Sema.zig & Sema2.zig (100 times per run)

Sema2.zig is identical to Sema.zig except that I have added two new lines somewhere in the middle of the document.

[nix-shell:~/repos/diffz]$ poop -d 10000 ./bench3-old-arena-fast ./bench3-new-arena-fast ./bench3-new-gpa-fast
Benchmark 1 (70 runs): ./bench3-old-arena-fast
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           143ms ±  642us     142ms …  145ms          0 ( 0%)        0%
  peak_rss           5.90MB ± 6.38KB    5.90MB … 5.92MB          0 ( 0%)        0%
  cpu_cycles          313M  ±  848K      312M  …  315M           0 ( 0%)        0%
  instructions        997M  ± 2.00K      997M  …  997M           3 ( 4%)        0%
  cache_references   1.56M  ± 22.5K     1.46M  … 1.59M           2 ( 3%)        0%
  cache_misses       43.3K  ± 6.13K     32.2K  … 57.1K           0 ( 0%)        0%
  branch_misses      2.68K  ±  155      2.10K  … 2.97K           1 ( 1%)        0%
Benchmark 2 (70 runs): ./bench3-new-arena-fast
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           143ms ±  653us     142ms …  145ms          2 ( 3%)          -  0.1% ±  0.2%
  peak_rss           5.90MB ± 7.43KB    5.90MB … 5.93MB          0 ( 0%)          +  0.0% ±  0.0%
  cpu_cycles          313M  ±  737K      312M  …  315M           2 ( 3%)          -  0.1% ±  0.1%
  instructions        997M  ± 2.96K      997M  …  997M           4 ( 6%)          +  0.0% ±  0.0%
  cache_references   1.50M  ± 12.6K     1.45M  … 1.53M           3 ( 4%)        ⚡-  4.0% ±  0.4%
  cache_misses       37.0K  ± 6.94K     20.6K  … 53.8K           0 ( 0%)        ⚡- 14.4% ±  5.0%
  branch_misses      2.17K  ±  152      1.83K  … 2.46K           0 ( 0%)        ⚡- 19.0% ±  1.9%
Benchmark 3 (71 runs): ./bench3-new-gpa-fast
  measurement          mean ± σ            min … max           outliers         delta
  wall_time           142ms ±  616us     140ms …  144ms          2 ( 3%)          -  1.0% ±  0.1%
  peak_rss           5.91MB ± 7.77KB    5.90MB … 5.93MB          0 ( 0%)          +  0.0% ±  0.0%
  cpu_cycles          303M  ± 1.03M      302M  …  308M           3 ( 4%)        ⚡-  3.2% ±  0.1%
  instructions        997M  ± 5.43K      997M  …  997M           6 ( 8%)          +  0.0% ±  0.0%
  cache_references   1.62M  ± 31.9K     1.54M  … 1.67M           0 ( 0%)        💩+  3.7% ±  0.6%
  cache_misses       49.0K  ± 22.9K     13.1K  … 81.8K           0 ( 0%)          + 13.3% ± 12.8%
  branch_misses      3.20K  ±  258      2.33K  … 3.71K           4 ( 6%)        💩+ 19.7% ±  2.6%

There isn't much of an impact to wall time but there is a nice reduction on peak RSS in the Sema.zig vs. AstGen.zig run.
I assume that ZLS's diffs are most similar to the third benchmark so it won't be noticeably affected by these changes.

mnemnion · 2024-07-09T16:59:57Z

Alright, that's sorted.

Worth noting, that this:

It looks like missing coverage is related to diffLineMode only, which isn't exercising the inside of the halfMatch code. Maybe worth circling back to.

Is where the leaks were: in the code without coverage. I described it incorrectly, though, it was in the diffCompute path which uses a half match.

Coverage now stands at 99.2%, and suggests that diffLineMode doesn't have comprehensive leak checks yet (because the errdefers in that section of the code are not triggered by the test. Total line coverage is essential for this kind of code.

DiffMatchPatch.zig

build.zig

Co-authored-by: Techatrix <[email protected]>

…into memory-management

Co-authored-by: Techatrix <[email protected]>

DiffMatchPatch.zig

travisstaloch · 2024-07-09T23:31:43Z

I've just pulled the latest changes from this branch and am test driving. Didn't notice any issues since yesterday.

Co-authored-by: Techatrix <[email protected]>

…anagement

mnemnion · 2024-07-14T00:44:26Z

Added checkAllFailingAllocations test for diffLineMode which, yes, smoked out some bugs.

Running this test takes an extraordinarily long time, because the nature of the algorithm is a bunch of tiny allocations, and no obvious way to batch them, and the exhaustive test function is quadratic by the number of allocations.

It triggers line mode through the "front door" so to speak, and can no doubt be replaced with something shorter which calls diffLineMode directly.

Remaining un-covered lines are little things like the Diff format string, and some errdefers which I don't think are triggerable. But I'd rather have a few red lines in kcov than be wrong about that.

This lets the test set it to a low value. That in turn allows the full allocation failure check to complete in a reasonable amount of time.

mnemnion · 2024-07-14T18:19:22Z

Added a field which configures the line-check threshold, using that and truncating the test data gets the allocation failure test down to something comparable to the other tests.

That's probably going to be my last tweak to this branch.

It is in the source text, and should be in diffz.

mnemnion added 22 commits July 1, 2024 20:24

Add build.zig.zon, update build.zig, .gitignore

9c27090

This adds a test running step and a zig.zon manifest. Also adds to the .gitignore the new location of caches, .zig-out.

Module name is diffz, not DiffMatchPatch

4792861

More paths, .target, .optimize

10bcba0

Memory-managed halfMatch function

64bc1bf

halfMatch now correctly manages its own memory, like it's supposed to.

Managed memory in diffLinesToChars

0c7c4e2

Managed memory in diffCharsToLines

a5c993f

One test at a time...

7c5cccc

More tests

dbc3a84

Found a double free in the falsed-out one, so that's next up...

Fix two double-frees

0b2f274

Also changing the use of .replaceRange, which lead me to think that the item at the pointer location was being, y'know, replaced, with a use of .insert, which is,, correct.

Free two more clobbered diff texts

18b7531

Restore all diffCleanMerge tests

1ebee82

Restore frees to diffCleanupSemanticLossless

350a85e

Tests of diffCleanupSemanticLossless pass

fe21b51

alloc -> allocator

5ff7a76

Tests pass for diffBisect

b56449f

Catch a few leaks, lather, rinse, repeat

2388fff

Defer free until slice is taken

7a47bd7

I had missed that some slices in cleanupSemantic were being taken from the same string/slice that was already at that index, due to the prior code renaming them. Ganbatte!

Fait accompli

eecdb47

The library now manages its own memory, as it should.

Polish things up

7032b4b

Remove unused temp diff

87b1b32

Fix half_match code pathway to handle allocation fails

046fa0d

Techatrix requested changes Jul 9, 2024

View reviewed changes

mnemnion and others added 6 commits July 9, 2024 09:11

Update DiffMatchPatch.zig

353474a

Co-authored-by: Techatrix <[email protected]>

Put line var inside while loop

6979663

Merge branch 'memory-management' of https://github.com/mnemnion/diffz …

209455f

…into memory-management

Moves DiffList and functions

c662afb

Replace errdefer on best_common_text

4eea3ae

Co-authored-by: Techatrix <[email protected]>

Use assumeCapacity variant for empty ranges

71b80af

travisstaloch reviewed Jul 9, 2024

View reviewed changes

DiffMatchPatch.zig Outdated Show resolved Hide resolved

mnemnion and others added 4 commits July 9, 2024 19:50

Change boolInt

be49143

Move coverage directory to zig-out

a86b90c

Co-authored-by: Techatrix <[email protected]>

Merge remote-tracking branch 'origin/memory-management' into memory-m…

8859731

…anagement

Remove kcov directory from .gitignore

77d3eae

mnemnion changed the title ~~Memory management (fixes #22)~~ Memory management (fixes #22, closes #24) Jul 12, 2024

mnemnion added 3 commits July 13, 2024 20:13

Fix for diffLineMode memory leaks

aba5ffa

remove spurious file

07df57b

Fix type errors in line mode tests

01fe8cb

Make check_lines threshold configurable

1483c3e

This lets the test set it to a low value. That in turn allows the full allocation failure check to complete in a reasonable amount of time.

Make diffCleanupSemantic public

bff1f1d

It is in the source text, and should be in diffz.

travisstaloch mentioned this pull request Jul 21, 2024

Complete diff-match-patch port (closes #25) #26

Open

Techatrix added 2 commits August 31, 2024 18:26

resolve review comments

15c6643

add GitHub workflow

3fec21a

Techatrix merged commit 7cf1ba9 into ziglibs:main Aug 31, 2024

This was referenced Aug 31, 2024

Zig Build System #21

Closed

Zig libraries need to manage their own memory #22

Closed

mnemnion deleted the memory-management branch September 3, 2024 18:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory management (fixes #22, closes #24) #23

Memory management (fixes #22, closes #24) #23

mnemnion commented Jul 3, 2024

travisstaloch commented Jul 4, 2024

mnemnion commented Jul 6, 2024

travisstaloch commented Jul 6, 2024

mnemnion commented Jul 6, 2024

mnemnion commented Jul 6, 2024

travisstaloch commented Jul 7, 2024 •

edited

Loading

mnemnion commented Jul 7, 2024

Techatrix commented Jul 7, 2024

mnemnion commented Jul 9, 2024

travisstaloch commented Jul 9, 2024

mnemnion commented Jul 14, 2024

mnemnion commented Jul 14, 2024

Memory management (fixes #22, closes #24) #23

Memory management (fixes #22, closes #24) #23

Conversation

mnemnion commented Jul 3, 2024

travisstaloch commented Jul 4, 2024

mnemnion commented Jul 6, 2024

travisstaloch commented Jul 6, 2024

mnemnion commented Jul 6, 2024

mnemnion commented Jul 6, 2024

travisstaloch commented Jul 7, 2024 • edited Loading

mnemnion commented Jul 7, 2024

Techatrix commented Jul 7, 2024

mnemnion commented Jul 9, 2024

travisstaloch commented Jul 9, 2024

mnemnion commented Jul 14, 2024

mnemnion commented Jul 14, 2024

travisstaloch commented Jul 7, 2024 •

edited

Loading