Skip to content

Improve search functions #721

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
May 8, 2020
Merged

Improve search functions #721

merged 8 commits into from
May 8, 2020

Conversation

Kixiron
Copy link
Member

@Kixiron Kixiron commented Apr 17, 2020

Closes #489, #156 and #13

  • Use fuzzy searching, pattern matching and description text searching to get better search matches
  • Fuzzy search gives better results and gives accurate rankings
  • Description searching allows you to search for concepts as well as crates, so serialization will also pull up results for serde (With any better name matches before it)
  • Also trimmed the string, so the leading/trailing whitespace thing ain't a thing

Copy link
Member

@jyn514 jyn514 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of comments! This already looks way better than the existing search though, thank you for tackling this :)

I would like to see a lot more tests added before this is merged - the search is now pretty complicated and I want to make sure this behaves as expected. Some things to test

  • exact matches should always appear first
  • crates that never had a successful build should not be shown
  • no more than LIMIT crates are returned
  • searching crate descriptions works
  • searching regex shows results for regex-syntax
  • searching Regex counts regex as an exact match
    • maybe we should just turn the query into all lowercase to start?
  • searching 'redex' shows results for regex (fuzzy searching)

@Kixiron
Copy link
Member Author

Kixiron commented Apr 19, 2020

#708 brings up a good point, does the actual date a crate was published matter when we have semver?

@jyn514
Copy link
Member

jyn514 commented Apr 19, 2020

I don't know how to sort by semver in SQL. I think we should wait on that until later, we're making a big change already.

@jyn514
Copy link
Member

jyn514 commented Apr 21, 2020

Looks like you need to remove the migration to get the tests to pass: https://github.com/rust-lang/docs.rs/pull/721/checks?check_run_id=606111571#step:14:80

@jyn514
Copy link
Member

jyn514 commented Apr 27, 2020

You need to rebase over the changes to CI. Additionally, you may need to write an extra (small) dockerfile to add the fuzzy-search extension to postgres, see https://stackoverflow.com/a/54630526/7669110 for an example of what that would look like.

Past that, the current blocker is the performance of the query. The old query took less than half a second, this one takes almost 3. We've been working on this in Discord.

@Kixiron
Copy link
Member Author

Kixiron commented May 5, 2020

Got a much-improved query in, had to remove description searching, as with it the query took 700-800ms, while without it the query takes a consistent 300ms

@jyn514 jyn514 merged commit 3125c8b into rust-lang:master May 8, 2020
@jyn514
Copy link
Member

jyn514 commented May 8, 2020

🎉 🎉 🎉

@Kixiron Kixiron deleted the microscope branch May 9, 2020 00:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

docs.rs search is ordered randomly for non-exact matches
4 participants