feat: suggest close matches using Levenshtein distance [POC] #836

dougbacelar · 2020-11-21T08:53:01Z

What:
Inspired by #582

This provides a way to suggest close matches to users when the query cannot find any elements.

render(<div data-testid="cat" />)
screen.getByTestId('kat');

// output
`Unable to find an element by: [data-testid="kat"]. Did you mean one of the following?
cat`

This is a POC, looking to gather feedback to see if its worth it pursuing.

Why:
Might make it easier to debug, specially when there are typos in the queries or when a certain element name has changed slightly.

How:
query by attribute:

iterate through all elements and calculate close matches
keep only matches that are the closest to the search string, keep all that are the same distance

calculate close matches:

initialise a dynamic programming table of size MxN where M = element text length and N = search string length
use the dp table above to calculate the Levenshtein distance between the element text and the search string

Note: this was implemented only on the byTestId query for now and behind a computeClosetMatches flag

Checklist:

Documentation added to the
docs site
Tests
Typescript definitions updated
Ready to be merged

src/queries/test-id.js

codesandbox-ci · 2020-11-21T08:54:10Z

This pull request is automatically built and testable in CodeSandbox.

To see build info of the built libraries, click here or the icon next to each commit SHA.

Latest deployment of this branch, based on commit 8ee261f:

Sandbox	Source
react-testing-library-examples	Configuration

codecov · 2020-11-21T09:13:04Z

Codecov Report

Merging #836 (8ee261f) into master (c6e7a83) will not change coverage.
The diff coverage is 100.00%.

@@            Coverage Diff            @@
##            master      #836   +/-   ##
=========================================
  Coverage   100.00%   100.00%           
=========================================
  Files           26        27    +1     
  Lines          934       965   +31     
  Branches       286       298   +12     
=========================================
+ Hits           934       965   +31

Impacted Files	Coverage Δ
src/config.js	`100.00% <ø> (ø)`
src/close-matches.js	`100.00% <100.00%> (ø)`
src/queries/test-id.js	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update c6e7a83...8ee261f. Read the comment docs.

kentcdodds

Thanks for this! I'm thinking this would be pretty useful. Good progress so far.

src/close-matches.js

src/queries/test-id.js

kentcdodds · 2020-11-23T18:24:13Z

src/queries/test-id.js

+  const closeMatches =
+    !computeCloseMatches || typeof id !== 'string'
+      ? []
+      : getCloseMatchesByAttribute(getTestIdAttribute(), c, id, options)


I'm concerned about this increasing the performance issues we already have with find* queries which are expected to fail at least once. Any chance we could lazily calculate this value so it's only run when the error is actually displayed? I don't know whether this is possible.

But perhaps my concern is unwarranted? Maybe this is faster than I think?

I'm concerned about this increasing the performance issues we already have with find* queries which are expected to fail at least once.

Good point. I tested a few times on my local running getByTestId('search', {computeCloseMatches: true}) vs getByTestId('search', {computeCloseMatches: false}).

Not a reliable benchmark, but when false it finished consistently at around 20ms. When true finished at around 35ms. It increases with the number of elements found, but not by much. Might become slightly quicker with the recommended lib.

Any chance we could lazily calculate this value so it's only run when the error is actually displayed?

An alternative would be to throw functions for find* queries. And then check if typeof lastError === 'function' when waitFor times out.

That might mean decoupling find* and get* queries a bit or perhaps make get queries depend on find queries instead of the other way around.

i.e: a get* query could be a find* query with {timeout: 0, interval: Infinity} ? (not sure if that would work)

Since that seems a bit involved, we could start with a config computeCloseMatches that defaults to false and give that some testing?

I'm ok with giving it a trial run and seeing what real-world experience with it will be like, so defaulting to disabled makes sense to me. Let's try it out using leven.

@kentcdodds do you think we should start with the testId query or add this feature to all 10 queries? Wondering if i can break down the work somehow...

Re. the performance concern. Maybe if we realise there is a huge performance impact we can skip the computation of close matches on find* queries(similar to what was done for the role query here: #590

I'm not sure. What does everyone else think?

IMO if the default is false, we can give it a try in more queries, though, this seems to me like a configuration that shouldn't be set to true by default at all, especially since it has a performance impact. I see this as something that will not be helpful in CI for example and will only take more time so if the developer wants to opt in they will have an option to do that.
Putting aside what I wrote above, I really like this PR and do think it can have a valuable impact so thanks for this :)

Can you please try a benchmark again using the new leven implementation?

dougbacelar added 3 commits November 21, 2020 17:26

implement close-matches api

371923b

implement closest matches suggestions on testId query

5dac03a

remove empty lines

788dbc2

dougbacelar commented Nov 21, 2020

View reviewed changes

src/queries/test-id.js Outdated Show resolved Hide resolved

dougbacelar changed the title ~~feat: suggest query close matches using Levenshtein distance [POC]~~ feat: suggest close matches using Levenshtein distance [POC] Nov 21, 2020

dougbacelar added 2 commits November 21, 2020 18:10

add extra test

e69fd6b

simplify conditional

b12cb29

dougbacelar mentioned this pull request Nov 21, 2020

Leverage levenshtein distance to help users debug easily when there are close matches #582

Open

kentcdodds reviewed Nov 23, 2020

View reviewed changes

dougbacelar added 2 commits November 28, 2020 16:44

add computeCloseMatches config option

cd54847

remove custom implementatin of levenshtein

8ee261f

dougbacelar closed this by deleting the head repository Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: suggest close matches using Levenshtein distance [POC] #836

feat: suggest close matches using Levenshtein distance [POC] #836

dougbacelar commented Nov 21, 2020

codesandbox-ci bot commented Nov 21, 2020 •

edited

Loading

codecov bot commented Nov 21, 2020 •

edited

Loading

kentcdodds left a comment

kentcdodds Nov 23, 2020

dougbacelar Nov 28, 2020 •

edited

Loading

kentcdodds Nov 29, 2020

dougbacelar Dec 5, 2020

kentcdodds Dec 5, 2020

MatanBobi Dec 6, 2020

nickserv Dec 22, 2020

feat: suggest close matches using Levenshtein distance [POC] #836

feat: suggest close matches using Levenshtein distance [POC] #836

Conversation

dougbacelar commented Nov 21, 2020

codesandbox-ci bot commented Nov 21, 2020 • edited Loading

codecov bot commented Nov 21, 2020 • edited Loading

Codecov Report

kentcdodds left a comment

Choose a reason for hiding this comment

kentcdodds Nov 23, 2020

Choose a reason for hiding this comment

dougbacelar Nov 28, 2020 • edited Loading

Choose a reason for hiding this comment

kentcdodds Nov 29, 2020

Choose a reason for hiding this comment

dougbacelar Dec 5, 2020

Choose a reason for hiding this comment

kentcdodds Dec 5, 2020

Choose a reason for hiding this comment

MatanBobi Dec 6, 2020

Choose a reason for hiding this comment

nickserv Dec 22, 2020

Choose a reason for hiding this comment

codesandbox-ci bot commented Nov 21, 2020 •

edited

Loading

codecov bot commented Nov 21, 2020 •

edited

Loading

dougbacelar Nov 28, 2020 •

edited

Loading