Skip to content

Improve the performance of reverse dependencies using the default_versions #8737

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 29, 2024

Conversation

eth3lbert
Copy link
Contributor

@eth3lbert eth3lbert commented May 27, 2024

This PR is the second in a sequence of multiple PRs that improve the performance of reverse dependencies using the default_versions table.


The following are EXPLAIN ANALYZE outputs for current query and proposed query.
All queries run locally on my m1 MBP with pg15 installed from nixpkgs.

Current query visualization

Execution Time: 403.490 ms

EXPLAIN ANALYZE output
Limit  (cost=98499.75..98624.71 rows=10 width=107) (actual time=400.359..402.972 rows=10 loops=1)
  Output: dependencies_1.id, dependencies_1.version_id, dependencies_1.crate_id, dependencies_1.req, dependencies_1.optional, dependencies_1.default_features, dependencies_1.features, dependencies_1.target, dependencies_1.kind, dependencies_1.explicit_name, crate_downloads.downloads, crates.name, (count(*) OVER (?))
  Buffers: shared hit=659135 read=14070, temp read=8064 written=8064
  ->  Nested Loop  (cost=98499.75..374746.10 rows=22107 width=107) (actual time=400.358..402.970 rows=10 loops=1)
        Output: dependencies_1.id, dependencies_1.version_id, dependencies_1.crate_id, dependencies_1.req, dependencies_1.optional, dependencies_1.default_features, dependencies_1.features, dependencies_1.target, dependencies_1.kind, dependencies_1.explicit_name, crate_downloads.downloads, crates.name, (count(*) OVER (?))
        Buffers: shared hit=659135 read=14070, temp read=8064 written=8064
        ->  WindowAgg  (cost=98487.41..101338.47 rows=22107 width=32) (actual time=400.283..402.831 rows=10 loops=1)
              Output: crate_downloads.downloads, crates.name, versions.id, count(*) OVER (?)
              Buffers: shared hit=659090 read=14070, temp read=8064 written=8064
              ->  Gather Merge  (cost=98487.41..101062.14 rows=22107 width=24) (actual time=391.734..399.266 rows=36956 loops=1)
                    Output: crate_downloads.downloads, crates.name, versions.id
                    Workers Planned: 2
                    Workers Launched: 2
                    Buffers: shared hit=659090 read=14070, temp read=8064 written=8064
                    ->  Sort  (cost=97487.39..97510.42 rows=9211 width=24) (actual time=389.082..389.462 rows=12319 loops=3)
                          Output: crate_downloads.downloads, crates.name, versions.id
                          Sort Key: crate_downloads.downloads DESC, crates.name
                          Sort Method: quicksort  Memory: 1229kB
                          Buffers: shared hit=659090 read=14070, temp read=8064 written=8064
                          Worker 0:  actual time=387.884..388.251 rows=12329 loops=1
                            Sort Method: quicksort  Memory: 1213kB
                            Buffers: shared hit=219722 read=4649, temp read=2662 written=2662
                          Worker 1:  actual time=387.820..388.203 rows=12058 loops=1
                            Sort Method: quicksort  Memory: 1195kB
                            Buffers: shared hit=219657 read=4578, temp read=2635 written=2635
                          ->  Parallel Hash Semi Join  (cost=80001.43..96880.88 rows=9211 width=24) (actual time=290.325..386.225 rows=12319 loops=3)
                                Output: crate_downloads.downloads, crates.name, versions.id
                                Hash Cond: (versions.id = dependencies.version_id)
                                Buffers: shared hit=659060 read=14070, temp read=8064 written=8064
                                Worker 0:  actual time=289.209..384.986 rows=12329 loops=1
                                  Buffers: shared hit=219707 read=4649, temp read=2662 written=2662
                                Worker 1:  actual time=289.766..384.974 rows=12058 loops=1
                                  Buffers: shared hit=219642 read=4578, temp read=2635 written=2635
                                ->  Hash Join  (cost=65226.81..81874.83 rows=18422 width=24) (actual time=265.901..346.918 rows=46847 loops=3)
                                      Output: versions.id, crates.name, crate_downloads.downloads
                                      Hash Cond: (crates.id = versions.crate_id)
                                      Buffers: shared hit=655576 read=14070, temp read=8064 written=8064
                                      Worker 0:  actual time=265.851..346.747 rows=46697 loops=1
                                        Buffers: shared hit=218528 read=4649, temp read=2662 written=2662
                                      Worker 1:  actual time=265.990..346.542 rows=45812 loops=1
                                        Buffers: shared hit=218522 read=4578, temp read=2635 written=2635
                                      ->  Parallel Seq Scan on public.crates  (cost=0.00..16235.31 rows=60931 width=16) (actual time=0.047..16.055 rows=48745 loops=3)
                                            Output: crates.id, crates.name, crates.updated_at, crates.created_at, crates.description, crates.homepage, crates.documentation, crates.readme, crates.textsearchable_index_col, crates.repository, crates.max_upload_size, crates.max_features
                                            Buffers: shared hit=1556 read=14070
                                            Worker 0:  actual time=0.060..16.424 rows=48605 loops=1
                                              Buffers: shared hit=521 read=4649
                                            Worker 1:  actual time=0.057..16.268 rows=47642 loops=1
                                              Buffers: shared hit=515 read=4578
                                      ->  Hash  (cost=64674.13..64674.13 rows=44214 width=20) (actual time=265.739..265.741 rows=140540 loops=3)
                                            Output: versions.id, versions.crate_id, crate_downloads.downloads, crate_downloads.crate_id
                                            Buckets: 262144 (originally 65536)  Batches: 2 (originally 1)  Memory Usage: 6657kB
                                            Buffers: shared hit=654020, temp written=1026
                                            Worker 0:  actual time=265.712..265.713 rows=140540 loops=1
                                              Buffers: shared hit=218007, temp written=342
                                            Worker 1:  actual time=265.789..265.791 rows=140540 loops=1
                                              Buffers: shared hit=218007, temp written=342
                                            ->  Hash Join  (cost=62036.92..64674.13 rows=44214 width=20) (actual time=207.699..243.558 rows=140540 loops=3)
                                                  Output: versions.id, versions.crate_id, crate_downloads.downloads, crate_downloads.crate_id
                                                  Inner Unique: true
                                                  Hash Cond: (crate_downloads.crate_id = versions.crate_id)
                                                  Buffers: shared hit=654020
                                                  Worker 0:  actual time=206.916..243.315 rows=140540 loops=1
                                                    Buffers: shared hit=218007
                                                  Worker 1:  actual time=208.420..243.624 rows=140540 loops=1
                                                    Buffers: shared hit=218007
                                                  ->  Seq Scan on public.crate_downloads  (cost=0.00..2253.34 rows=146234 width=12) (actual time=0.015..5.172 rows=146234 loops=3)
                                                        Output: crate_downloads.crate_id, crate_downloads.downloads
                                                        Buffers: shared hit=2373
                                                        Worker 0:  actual time=0.018..5.288 rows=146234 loops=1
                                                          Buffers: shared hit=791
                                                        Worker 1:  actual time=0.020..5.117 rows=146234 loops=1
                                                          Buffers: shared hit=791
                                                  ->  Hash  (cost=61484.24..61484.24 rows=44214 width=8) (actual time=207.624..207.625 rows=140540 loops=3)
                                                        Output: versions.id, versions.crate_id
                                                        Buckets: 262144 (originally 65536)  Batches: 1 (originally 1)  Memory Usage: 7538kB
                                                        Buffers: shared hit=651647
                                                        Worker 0:  actual time=206.828..206.829 rows=140540 loops=1
                                                          Buffers: shared hit=217216
                                                        Worker 1:  actual time=208.343..208.343 rows=140540 loops=1
                                                          Buffers: shared hit=217216
                                                        ->  Subquery Scan on versions  (cost=0.43..61484.24 rows=44214 width=8) (actual time=0.039..192.356 rows=140540 loops=3)
                                                              Output: versions.id, versions.crate_id
                                                              Buffers: shared hit=651647
                                                              Worker 0:  actual time=0.034..191.457 rows=140540 loops=1
                                                                Buffers: shared hit=217216
                                                              Worker 1:  actual time=0.034..192.646 rows=140540 loops=1
                                                                Buffers: shared hit=217216
                                                              ->  Unique  (cost=0.43..61042.10 rows=44214 width=41) (actual time=0.037..182.625 rows=140540 loops=3)
                                                                    Output: versions_1.crate_id, versions_1.semver_no_prerelease, versions_1.id
                                                                    Buffers: shared hit=651647
                                                                    Worker 0:  actual time=0.032..181.752 rows=140540 loops=1
                                                                      Buffers: shared hit=217216
                                                                    Worker 1:  actual time=0.032..182.902 rows=140540 loops=1
                                                                      Buffers: shared hit=217216
                                                                    ->  Index Only Scan using index_versions_crate_id_semver_no_prerelease_id on public.versions versions_1  (cost=0.43..58358.43 rows=1073467 width=41) (actual time=0.037..136.226 rows=1071196 loops=3)
                                                                          Output: versions_1.crate_id, versions_1.semver_no_prerelease, versions_1.id
                                                                          Heap Fetches: 0
                                                                          Buffers: shared hit=651647
                                                                          Worker 0:  actual time=0.032..135.364 rows=1071196 loops=1
                                                                            Buffers: shared hit=217216
                                                                          Worker 1:  actual time=0.032..136.484 rows=1071196 loops=1
                                                                            Buffers: shared hit=217216
                                ->  Parallel Hash  (cost=12538.15..12538.15 rows=178918 width=4) (actual time=24.128..24.128 rows=138355 loops=3)
                                      Output: dependencies.version_id
                                      Buckets: 524288  Batches: 1  Memory Usage: 20352kB
                                      Buffers: shared hit=3432
                                      Worker 0:  actual time=23.302..23.302 rows=134966 loops=1
                                        Buffers: shared hit=1153
                                      Worker 1:  actual time=23.693..23.693 rows=134523 loops=1
                                        Buffers: shared hit=1094
                                      ->  Parallel Index Only Scan using dependencies_crate_id_version_id_idx on public.dependencies  (cost=0.43..12538.15 rows=178918 width=4) (actual time=0.033..10.222 rows=138355 loops=3)
                                            Output: dependencies.version_id
                                            Index Cond: (dependencies.crate_id = 463)
                                            Heap Fetches: 0
                                            Buffers: shared hit=3432
                                            Worker 0:  actual time=0.037..9.730 rows=134966 loops=1
                                              Buffers: shared hit=1153
                                            Worker 1:  actual time=0.027..9.977 rows=134523 loops=1
                                              Buffers: shared hit=1094
        ->  Limit  (cost=12.33..12.34 rows=1 width=79) (actual time=0.013..0.013 rows=1 loops=10)
              Output: dependencies_1.id, dependencies_1.version_id, dependencies_1.crate_id, dependencies_1.req, dependencies_1.optional, dependencies_1.default_features, dependencies_1.features, dependencies_1.target, dependencies_1.kind, dependencies_1.explicit_name
              Buffers: shared hit=45
              ->  Sort  (cost=12.33..12.34 rows=2 width=79) (actual time=0.013..0.013 rows=1 loops=10)
                    Output: dependencies_1.id, dependencies_1.version_id, dependencies_1.crate_id, dependencies_1.req, dependencies_1.optional, dependencies_1.default_features, dependencies_1.features, dependencies_1.target, dependencies_1.kind, dependencies_1.explicit_name
                    Sort Key: dependencies_1.id
                    Sort Method: quicksort  Memory: 25kB
                    Buffers: shared hit=45
                    ->  Index Scan using index_dependencies_version_id on public.dependencies dependencies_1  (cost=0.43..12.32 rows=2 width=79) (actual time=0.006..0.007 rows=1 loops=10)
                          Output: dependencies_1.id, dependencies_1.version_id, dependencies_1.crate_id, dependencies_1.req, dependencies_1.optional, dependencies_1.default_features, dependencies_1.features, dependencies_1.target, dependencies_1.kind, dependencies_1.explicit_name
                          Index Cond: (dependencies_1.version_id = versions.id)
                          Filter: (dependencies_1.crate_id = 463)
                          Rows Removed by Filter: 10
                          Buffers: shared hit=42
Planning:
  Buffers: shared hit=520
Planning Time: 1.442 ms
Execution Time: 403.490 ms

Proposed query with default_versions visualization

Execution Time: 179.527 ms

EXPLAIN ANALYZE output
Limit  (cost=32197.22..70948.48 rows=10 width=107) (actual time=117.403..179.205 rows=10 loops=1)
  Output: dependencies.id, dependencies.version_id, dependencies.crate_id, dependencies.req, dependencies.optional, dependencies.default_features, dependencies.features, dependencies.target, dependencies.kind, dependencies.explicit_name, crate_downloads.downloads, crates.name, ($1)
  Buffers: shared hit=128721
  CTE filterd_versions
    ->  Merge Semi Join  (cost=1.95..28067.53 rows=138323 width=8) (actual time=0.248..94.289 rows=36956 loops=1)
          Output: default_versions.crate_id, default_versions.version_id
          Merge Cond: (default_versions.version_id = dependencies_1.version_id)
          Buffers: shared hit=119437
          ->  Merge Anti Join  (cost=0.91..8394.94 rows=138323 width=8) (actual time=0.039..45.616 rows=140540 loops=1)
                Output: default_versions.crate_id, default_versions.version_id
                Merge Cond: (default_versions.version_id = versions.id)
                Buffers: shared hit=116030
                ->  Index Scan using default_versions_version_id_uindex on public.default_versions  (cost=0.42..6195.52 rows=146234 width=8) (actual time=0.015..26.461 rows=146234 loops=1)
                      Output: default_versions.crate_id, default_versions.version_id
                      Buffers: shared hit=100490
                ->  Index Only Scan using versions_id_yanked_idx on public.versions  (cost=0.29..1601.23 rows=61396 width=4) (actual time=0.018..5.199 rows=63667 loops=1)
                      Output: versions.id
                      Heap Fetches: 0
                      Buffers: shared hit=15540
          ->  Index Only Scan using dependencies_crate_id_version_id_idx on public.dependencies dependencies_1  (cost=0.43..15043.00 rows=429404 width=4) (actual time=0.018..27.312 rows=415065 loops=1)
                Output: dependencies_1.crate_id, dependencies_1.version_id
                Index Cond: (dependencies_1.crate_id = 463)
                Heap Fetches: 0
                Buffers: shared hit=3407
  InitPlan 2 (returns $1)
    ->  Aggregate  (cost=3112.27..3112.28 rows=1 width=8) (actual time=2.413..2.413 rows=1 loops=1)
          Output: count(*)
          ->  CTE Scan on filterd_versions filterd_versions_1  (cost=0.00..2766.46 rows=138323 width=0) (actual time=0.000..1.249 rows=36956 loops=1)
                Output: filterd_versions_1.crate_id, filterd_versions_1.version_id
  ->  Nested Loop  (cost=1017.41..536020151.53 rows=138323 width=107) (actual time=117.403..179.200 rows=10 loops=1)
        Output: dependencies.id, dependencies.version_id, dependencies.crate_id, dependencies.req, dependencies.optional, dependencies.default_features, dependencies.features, dependencies.target, dependencies.kind, dependencies.explicit_name, crate_downloads.downloads, crates.name, $1
        Buffers: shared hit=128721
        ->  Nested Loop  (cost=1005.07..534310829.39 rows=138323 width=24) (actual time=114.939..176.662 rows=10 loops=1)
              Output: filterd_versions.version_id, crates.name, crate_downloads.downloads
              Join Filter: (crates.id = filterd_versions.crate_id)
              Rows Removed by Join Filter: 990452
              Buffers: shared hit=128679
              ->  Gather Merge  (cost=1005.07..78067.57 rows=146234 width=28) (actual time=4.227..5.712 rows=27 loops=1)
                    Output: crates.name, crates.id, crate_downloads.downloads, crate_downloads.crate_id
                    Workers Planned: 2
                    Workers Launched: 2
                    Buffers: shared hit=9242
                    ->  Incremental Sort  (cost=5.05..60188.51 rows=60931 width=28) (actual time=0.274..3.235 rows=745 loops=3)
                          Output: crates.name, crates.id, crate_downloads.downloads, crate_downloads.crate_id
                          Sort Key: crate_downloads.downloads DESC, crates.name
                          Presorted Key: crate_downloads.downloads
                          Full-sort Groups: 1  Sort Method: quicksort  Average Memory: 27kB  Peak Memory: 27kB
                          Buffers: shared hit=9242
                          Worker 0:  actual time=0.313..4.691 rows=1105 loops=1
                            Full-sort Groups: 35  Sort Method: quicksort  Average Memory: 27kB  Peak Memory: 27kB
                            Buffers: shared hit=4551
                          Worker 1:  actual time=0.312..4.801 rows=1104 loops=1
                            Full-sort Groups: 35  Sort Method: quicksort  Average Memory: 27kB  Peak Memory: 27kB
                            Buffers: shared hit=4552
                          ->  Nested Loop  (cost=0.84..57834.95 rows=60931 width=28) (actual time=0.042..3.057 rows=758 loops=3)
                                Output: crates.name, crates.id, crate_downloads.downloads, crate_downloads.crate_id
                                Inner Unique: true
                                Buffers: shared hit=9119
                                Worker 0:  actual time=0.050..4.455 rows=1121 loops=1
                                  Buffers: shared hit=4491
                                Worker 1:  actual time=0.055..4.534 rows=1121 loops=1
                                  Buffers: shared hit=4492
                                ->  Parallel Index Only Scan using crate_downloads_downloads_crate_id_index on public.crate_downloads  (cost=0.42..4240.90 rows=60931 width=12) (actual time=0.026..0.075 rows=758 loops=3)
                                      Output: crate_downloads.downloads, crate_downloads.crate_id
                                      Heap Fetches: 0
                                      Buffers: shared hit=17
                                      Worker 0:  actual time=0.033..0.101 rows=1121 loops=1
                                        Buffers: shared hit=6
                                      Worker 1:  actual time=0.034..0.112 rows=1121 loops=1
                                        Buffers: shared hit=7
                                ->  Index Scan using packages_pkey on public.crates  (cost=0.42..0.88 rows=1 width=16) (actual time=0.004..0.004 rows=1 loops=2275)
                                      Output: crates.id, crates.name, crates.updated_at, crates.created_at, crates.description, crates.homepage, crates.documentation, crates.readme, crates.textsearchable_index_col, crates.repository, crates.max_upload_size, crates.max_features
                                      Index Cond: (crates.id = crate_downloads.crate_id)
                                      Buffers: shared hit=9102
                                      Worker 0:  actual time=0.004..0.004 rows=1 loops=1121
                                        Buffers: shared hit=4485
                                      Worker 1:  actual time=0.004..0.004 rows=1 loops=1121
                                        Buffers: shared hit=4485
              ->  CTE Scan on filterd_versions  (cost=0.00..2766.46 rows=138323 width=8) (actual time=0.009..4.849 rows=36684 loops=27)
                    Output: filterd_versions.crate_id, filterd_versions.version_id
                    Buffers: shared hit=119437
        ->  Limit  (cost=12.33..12.34 rows=1 width=79) (actual time=0.011..0.011 rows=1 loops=10)
              Output: dependencies.id, dependencies.version_id, dependencies.crate_id, dependencies.req, dependencies.optional, dependencies.default_features, dependencies.features, dependencies.target, dependencies.kind, dependencies.explicit_name
              Buffers: shared hit=42
              ->  Sort  (cost=12.33..12.34 rows=2 width=79) (actual time=0.010..0.010 rows=1 loops=10)
                    Output: dependencies.id, dependencies.version_id, dependencies.crate_id, dependencies.req, dependencies.optional, dependencies.default_features, dependencies.features, dependencies.target, dependencies.kind, dependencies.explicit_name
                    Sort Key: dependencies.id
                    Sort Method: quicksort  Memory: 25kB
                    Buffers: shared hit=42
                    ->  Index Scan using index_dependencies_version_id on public.dependencies  (cost=0.43..12.32 rows=2 width=79) (actual time=0.006..0.007 rows=1 loops=10)
                          Output: dependencies.id, dependencies.version_id, dependencies.crate_id, dependencies.req, dependencies.optional, dependencies.default_features, dependencies.features, dependencies.target, dependencies.kind, dependencies.explicit_name
                          Index Cond: (dependencies.version_id = filterd_versions.version_id)
                          Filter: (dependencies.crate_id = 463)
                          Rows Removed by Filter: 10
                          Buffers: shared hit=42
Planning:
  Buffers: shared hit=581
Planning Time: 1.742 ms
Execution Time: 179.527 ms

The proposed query using the default_versions table not only improves performance but also makes the results more accurate.

Here's the output of comparing results using `diff old.csv new.csv`
739c739
< 5079275,655553,463,~1,f,t,{derive},,0,,1385137,elasticsearch,7.17.7-alpha.1,36956
---
> 5079053,655526,463,~1,f,t,{derive},,0,,1385137,elasticsearch,8.5.0-alpha.1,36956
6223c6223
< 10476260,1146790,463,^1.0,f,t,{derive},,0,,14674,holochain_secure_primitive,0.3.1-rc.0,36956
---
> 10294768,1131971,463,^1.0,f,t,{derive},,0,,14674,holochain_secure_primitive,0.4.0-dev.1,36956
6767c6767
< 10476514,1146809,463,^1,f,t,{},,0,,12779,kitsune_p2p_bootstrap_client,0.3.1-rc.0,36956
---
> 10453123,1144904,463,^1,f,t,{},,0,,12779,kitsune_p2p_bootstrap_client,0.4.0-dev.3,36956
6815c6815
< 10476726,1146821,463,^1.0,f,t,{derive},,0,,12608,holochain_state_types,0.3.1-rc.0,36956
---
> 10453328,1144912,463,^1.0,f,t,{derive},,0,,12608,holochain_state_types,0.4.0-dev.3,36956
7739c7739
< 10476885,1146833,463,^1.0,f,t,{derive},,0,,10266,hc_sleuth,0.2.1-rc.0,36956
---
> 10453363,1144915,463,^1.0,f,t,{derive},,0,,10266,hc_sleuth,0.4.0-dev.3,36956
8872c8872
< 10476751,1146824,463,^1.0,t,t,{derive},,0,,8304,aitia,0.2.1-rc.0,36956
---
> 10453335,1144913,463,^1.0,t,t,{derive},,0,,8304,aitia,0.3.0-dev.2,36956
9140c9140
< 7779161,911085,463,^1.0.159,f,t,{derive},,0,,7886,cacti_weaver_protos_rs,2.0.0-alpha.2,36956
---
> 6305093,773201,463,^1.0.159,f,t,{derive},,0,,7886,cacti_weaver_protos_rs,2.0.0-alpha-prerelease,36956
14017c14017
< 3019454,443183,463,^1.0.130,t,t,{},,0,,3946,ron-reboot,0.1.0-preview11,36956
---
> 3019316,443155,463,^1.0.130,t,t,{},,0,,3946,ron-reboot,0.1.0-preview9,36956
28749c28749
< 9007709,1016976,463,^1,f,t,{},,0,,846,gringron_util,5.2.0-beta.2,36956
---
> 9007177,1016920,463,^1,f,t,{},,0,,846,gringron_util,5.2.0-beta.3.1,36956
28954,28955c28954,28955
< 9007742,1016978,463,^1,f,t,{},,0,,829,gringron_core,5.2.0-beta.2,36956
< 9007724,1016977,463,^1,f,t,{},,0,,829,gringron_keychain,5.2.0-beta.2,36956
---
> 9007245,1016925,463,^1,f,t,{},,0,,829,gringron_core,5.2.0-beta.3.1,36956
> 9007198,1016922,463,^1,f,t,{},,0,,829,gringron_keychain,5.2.0-beta.3.1,36956

We discussed and experimented with the query on Zulip.

Copy link

codecov bot commented May 27, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.65%. Comparing base (c6f57da) to head (5531db9).

Current head 5531db9 differs from pull request most recent head c5754bd

Please upload reports for the commit c5754bd to get more accurate results.

Additional details and impacted files
@@           Coverage Diff           @@
##             main    #8737   +/-   ##
=======================================
  Coverage   88.65%   88.65%           
=======================================
  Files         276      276           
  Lines       27556    27557    +1     
=======================================
+ Hits        24431    24432    +1     
  Misses       3125     3125           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Member

@Turbo87 Turbo87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for those that haven't followed the discussion on Zulip, could you include a summary of the performance numbers for these changes?

@Turbo87 Turbo87 added C-internal 🔧 Category: Nonessential work that would make the codebase more consistent or clear A-backend ⚙️ labels May 28, 2024
@eth3lbert
Copy link
Contributor Author

for those that haven't followed the discussion on Zulip, could you include a summary of the performance numbers for these changes?

Sure! I'll update the description to include the summary.

@eth3lbert eth3lbert requested a review from Turbo87 May 28, 2024 09:36
@eth3lbert
Copy link
Contributor Author

eth3lbert commented May 28, 2024

Including crate_id in the unique index default_versions_version_id_uindex can potentially further improve performance.
visualization
Execution Time: 168.507 ms

@Turbo87
Copy link
Member

Turbo87 commented May 28, 2024

Including crate_id in the unique index default_versions_version_id_uindex can potentially further improve performance.

I'm surprised by this. if we add the crate_id then we're essentially just copying the data that is in the table into an index, so what is the difference then between scanning over the index vs. scanning over the table itself? 🤔

@eth3lbert
Copy link
Contributor Author

eth3lbert commented May 28, 2024

I'm surprised by this. if we add the crate_id then we're essentially just copying the data that is in the table into an index, so what is the difference then between scanning over the index vs. scanning over the table itself? 🤔

In my understanding, an index scan requires one additional access compared to an index-only scan. Previously, this query used an index scan. Covering index can improve performance by enabling an index-only scan. This allows the query to be answered solely from the index, without accessing the underlying table data (heap).

FYI: https://www.postgresql.org/docs/current/indexes-index-only-scans.html#INDEXES-INDEX-ONLY-SCANS

@Turbo87 Turbo87 force-pushed the improve-rev-dep branch 2 times, most recently from b025480 to 5531db9 Compare May 29, 2024 14:34
@Turbo87 Turbo87 force-pushed the improve-rev-dep branch from 5531db9 to c5754bd Compare May 29, 2024 14:46
@Turbo87
Copy link
Member

Turbo87 commented May 29, 2024

I ran these queries on the production read-replica with these results:

nice work! 😍

@Turbo87 Turbo87 added C-enhancement ✨ Category: Adding new behavior or a change to the way an existing feature works and removed C-internal 🔧 Category: Nonessential work that would make the codebase more consistent or clear labels May 29, 2024
@Turbo87 Turbo87 enabled auto-merge May 29, 2024 14:49
@Turbo87 Turbo87 merged commit 316981c into rust-lang:main May 29, 2024
7 checks passed
@eth3lbert eth3lbert deleted the improve-rev-dep branch May 29, 2024 14:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-backend ⚙️ C-enhancement ✨ Category: Adding new behavior or a change to the way an existing feature works
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants