-
Notifications
You must be signed in to change notification settings - Fork 462
implement PartialEq, Eq and Hash #364
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
You should solve this by putting your |
The caching mechanism of the CLI (ruff_cli::cache) relies on ruff::settings::Settings implementing the Hash trait. The ruff::settings::Settings struct previously couldn't automatically derive the Hash implementation via the #[derive(Hash)] macro attribute since some of its field types intentionally[1] don't implement Hash (namely regex::Regex, globset::GlobMatcher and globset::GlobSet). The code therefore previously implemented the Hash trait by hand for the whole struct. Implementing Hash by hand for structs that are subject to change is a bad idea since it's very easy to forget to update the Hash implementation when adding a new field to the struct. And the Hash implementation indeed was already incorrect by omitting several fields from the hash. This commit introduces wrapper types for Regex, GlobMatcher & GlobSet that implement Hash so that we can still add #[derive(Hash)] to the Settings struct, guaranteeing a correct hash implementation. This commit also changes all occurrences of FxHashSet and FxHashMap within Settings to BTreeSet and BTreeMap since the B-Tree-based collections implement Hash. FxHashSet and FxHashMap are type aliases from the rustc_hash crate for HashSet and HashMap from the standard library.[2] While BTreeMap::get is more costly than HashMap::get (O(log(n)) vs O(1))[3] none of these hash maps/sets are expected to contain so many entries that this would actually matter and directly using BTree(Set|Map) instead of yet another wrapper type spares us a bunch of .into() calls around our codebase. [1]: rust-lang/regex#364 (comment) [2]: https://docs.rs/rustc-hash/ [3]: https://doc.rust-lang.org/std/collections/#performance
The caching mechanism of the CLI (ruff_cli::cache) relies on ruff::settings::Settings implementing the Hash trait. The ruff::settings::Settings struct previously couldn't automatically derive the Hash implementation via the #[derive(Hash)] macro attribute since some of its field types intentionally[1][2] don't implement Hash (namely regex::Regex, globset::GlobMatcher and globset::GlobSet and HashMap and HashSet from the standard library). The code therefore previously implemented the Hash trait by hand for the whole struct. Implementing Hash by hand for structs that are subject to change is a bad idea since it's very easy to forget to update the Hash implementation when adding a new field to the struct. And the Hash implementation indeed was already incorrect by omitting several fields from the hash. This commit introduces wrapper types for Regex, GlobMatcher, GlobSet, HashSet & HashMap that implement Hash so that we can still add #[derive(Hash)] to the Settings struct, guaranteeing a correct hash implementation. [1]: rust-lang/regex#364 (comment) [2]: The standard library doesn't impl<T: Hash + Ord> Hash for HashSet<T> presumably since sorted() requires an allocation and Hash implementations are generally expected to work without allocations.
The caching mechanism of the CLI (ruff_cli::cache) relies on ruff::settings::Settings implementing the Hash trait. The ruff::settings::Settings struct previously couldn't automatically derive the Hash implementation via the #[derive(Hash)] macro attribute since some of its field types intentionally[1][2] don't implement Hash (namely regex::Regex, globset::GlobMatcher and globset::GlobSet and HashMap and HashSet from the standard library). The code therefore previously implemented the Hash trait by hand for the whole struct. Implementing Hash by hand for structs that are subject to change is a bad idea since it's very easy to forget to update the Hash implementation when adding a new field to the struct. And the Hash implementation indeed was already incorrect by omitting several fields from the hash. This commit introduces wrapper types for Regex, GlobMatcher, GlobSet, HashSet & HashMap that implement Hash so that we can still add #[derive(Hash)] to the Settings struct, guaranteeing a correct hash implementation. [1]: rust-lang/regex#364 (comment) [2]: The standard library doesn't impl<T: Hash + Ord> Hash for HashSet<T> presumably since sorted() requires an allocation and Hash implementations are generally expected to work without allocations.
The caching mechanism of the CLI (ruff_cli::cache) relies on ruff::settings::Settings implementing the Hash trait. The ruff::settings::Settings struct previously couldn't automatically derive the Hash implementation via the #[derive(Hash)] macro attribute since some of its field types intentionally[1][2] don't implement Hash (namely regex::Regex, globset::GlobMatcher and globset::GlobSet and HashMap and HashSet from the standard library). The code therefore previously implemented the Hash trait by hand for the whole struct. Implementing Hash by hand for structs that are subject to change is a bad idea since it's very easy to forget to update the Hash implementation when adding a new field to the struct. And the Hash implementation indeed was already incorrect by omitting several fields from the hash. This commit introduces wrapper types for Regex, GlobMatcher, GlobSet, HashSet & HashMap that implement Hash so that we can still add #[derive(Hash)] to the Settings struct, guaranteeing a correct hash implementation. [1]: rust-lang/regex#364 (comment) [2]: The standard library doesn't impl<T: Hash + Ord> Hash for HashSet<T> presumably since sorted() requires an allocation and Hash implementations are generally expected to work without allocations.
Information for those who got here. A naive comparison can be implemented this way: struct Wrapper {
regex: Regex,
}
impl PartialEq for Wrapper {
fn eq(&self, other: &Self) -> bool {
self.regex.as_str() == other.regex.as_str()
}
} |
Not to necro an old issue, but I wonder if not having these implementations is worth it. If I understand correctly, the benefit of not having these implementations is that if a user expects the (And it can't be implemented that way, because it's expensive to compute, and However, I'm not sure if that's what users actually expect. Personally, I would expect Even if it was fast, comparing the underlying of the regex behavior might also be undesirable, since could you not have two regexes which compare true, since they match the same set of strings, but which have large efficiency differences? I.e., one is much more efficient than the other, and terminates much faster. In such a case, I'm not sure you would want The downside of not having these implementations, or at least the downside that I've personally run into, is that if you include a I ran into this while implementing a config file that contains a regex:
I wanted to do this in a test: assert_eq! {
Config::load("path.yaml"),
Config { regex: Regex::new("hello").unwrap() },
} But can't do so because I can't derive Feel free to ignore this comment! Don't want to force you to relitigate an old issue, I just wanted to add my 2¢. |
Without these traits being implemented, it is difficult to store Regex structs, or types containing them, in collections such as HashSet and HashMap. It also prevents any type containing a Regex from automatically deriving these traits. Would it be okay to implement these traits for the Regex struct, perhaps just using the
as_str()
function and performing comparisons / hashes on the output of that?The text was updated successfully, but these errors were encountered: