Skip to content

implement PartialEq, Eq and Hash #364

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
ashfordneil opened this issue May 12, 2017 · 4 comments
Closed

implement PartialEq, Eq and Hash #364

ashfordneil opened this issue May 12, 2017 · 4 comments

Comments

@ashfordneil
Copy link

Without these traits being implemented, it is difficult to store Regex structs, or types containing them, in collections such as HashSet and HashMap. It also prevents any type containing a Regex from automatically deriving these traits. Would it be okay to implement these traits for the Regex struct, perhaps just using the as_str() function and performing comparisons / hashes on the output of that?

@ashfordneil ashfordneil changed the title support PartialEq, Eq and Hash implement PartialEq, Eq and Hash May 12, 2017
@BurntSushi
Copy link
Member

regex 0.1 had this impl, but I removed it. It's way too misleading because finite state machines actually have a concept of equality that is very different from "do these regex strings match."

You should solve this by putting your Regex in a newtype and deriving the traits you want.

@BurntSushi
Copy link
Member

See #178 and #313

not-my-profile added a commit to not-my-profile/ruff that referenced this issue Jan 15, 2023
The caching mechanism of the CLI (ruff_cli::cache) relies on
ruff::settings::Settings implementing the Hash trait.

The ruff::settings::Settings struct previously couldn't automatically
derive the Hash implementation via the #[derive(Hash)] macro attribute
since some of its field types intentionally[1] don't implement Hash
(namely regex::Regex, globset::GlobMatcher and globset::GlobSet).

The code therefore previously implemented the Hash trait by hand for the
whole struct. Implementing Hash by hand for structs that are subject to
change is a bad idea since it's very easy to forget to update the Hash
implementation when adding a new field to the struct. And the Hash
implementation indeed was already incorrect by omitting several fields
from the hash.

This commit introduces wrapper types for Regex, GlobMatcher & GlobSet
that implement Hash so that we can still add #[derive(Hash)] to the
Settings struct, guaranteeing a correct hash implementation.

This commit also changes all occurrences of FxHashSet and FxHashMap
within Settings to BTreeSet and BTreeMap since the B-Tree-based
collections implement Hash. FxHashSet and FxHashMap are type aliases
from the rustc_hash crate for HashSet and HashMap from the standard
library.[2] While BTreeMap::get is more costly than HashMap::get
(O(log(n)) vs O(1))[3] none of these hash maps/sets are expected to
contain so many entries that this would actually matter and directly
using BTree(Set|Map) instead of yet another wrapper type spares us a
bunch of .into() calls around our codebase.

[1]: rust-lang/regex#364 (comment)
[2]: https://docs.rs/rustc-hash/
[3]: https://doc.rust-lang.org/std/collections/#performance
not-my-profile added a commit to not-my-profile/ruff that referenced this issue Jan 16, 2023
The caching mechanism of the CLI (ruff_cli::cache) relies on
ruff::settings::Settings implementing the Hash trait.

The ruff::settings::Settings struct previously couldn't automatically
derive the Hash implementation via the #[derive(Hash)] macro attribute
since some of its field types intentionally[1][2] don't implement Hash
(namely regex::Regex, globset::GlobMatcher and globset::GlobSet and
HashMap and HashSet from the standard library).

The code therefore previously implemented the Hash trait by hand for the
whole struct. Implementing Hash by hand for structs that are subject to
change is a bad idea since it's very easy to forget to update the Hash
implementation when adding a new field to the struct. And the Hash
implementation indeed was already incorrect by omitting several fields
from the hash.

This commit introduces wrapper types for Regex, GlobMatcher, GlobSet,
HashSet & HashMap that implement Hash so that we can still add
#[derive(Hash)] to the Settings struct, guaranteeing a correct hash
implementation.

[1]: rust-lang/regex#364 (comment)
[2]: The standard library doesn't impl<T: Hash + Ord> Hash for HashSet<T>
     presumably since sorted() requires an allocation and Hash
     implementations are generally expected to work without allocations.
not-my-profile added a commit to not-my-profile/ruff that referenced this issue Jan 16, 2023
The caching mechanism of the CLI (ruff_cli::cache) relies on
ruff::settings::Settings implementing the Hash trait.

The ruff::settings::Settings struct previously couldn't automatically
derive the Hash implementation via the #[derive(Hash)] macro attribute
since some of its field types intentionally[1][2] don't implement Hash
(namely regex::Regex, globset::GlobMatcher and globset::GlobSet and
HashMap and HashSet from the standard library).

The code therefore previously implemented the Hash trait by hand for the
whole struct. Implementing Hash by hand for structs that are subject to
change is a bad idea since it's very easy to forget to update the Hash
implementation when adding a new field to the struct. And the Hash
implementation indeed was already incorrect by omitting several fields
from the hash.

This commit introduces wrapper types for Regex, GlobMatcher, GlobSet,
HashSet & HashMap that implement Hash so that we can still add
#[derive(Hash)] to the Settings struct, guaranteeing a correct hash
implementation.

[1]: rust-lang/regex#364 (comment)
[2]: The standard library doesn't impl<T: Hash + Ord> Hash for HashSet<T>
     presumably since sorted() requires an allocation and Hash
     implementations are generally expected to work without allocations.
charliermarsh pushed a commit to astral-sh/ruff that referenced this issue Jan 16, 2023
The caching mechanism of the CLI (ruff_cli::cache) relies on
ruff::settings::Settings implementing the Hash trait.

The ruff::settings::Settings struct previously couldn't automatically
derive the Hash implementation via the #[derive(Hash)] macro attribute
since some of its field types intentionally[1][2] don't implement Hash
(namely regex::Regex, globset::GlobMatcher and globset::GlobSet and
HashMap and HashSet from the standard library).

The code therefore previously implemented the Hash trait by hand for the
whole struct. Implementing Hash by hand for structs that are subject to
change is a bad idea since it's very easy to forget to update the Hash
implementation when adding a new field to the struct. And the Hash
implementation indeed was already incorrect by omitting several fields
from the hash.

This commit introduces wrapper types for Regex, GlobMatcher, GlobSet,
HashSet & HashMap that implement Hash so that we can still add
#[derive(Hash)] to the Settings struct, guaranteeing a correct hash
implementation.

[1]: rust-lang/regex#364 (comment)
[2]: The standard library doesn't impl<T: Hash + Ord> Hash for HashSet<T>
     presumably since sorted() requires an allocation and Hash
     implementations are generally expected to work without allocations.
@therustmonk
Copy link

Information for those who got here. A naive comparison can be implemented this way:

struct Wrapper {
    regex: Regex,
}

impl PartialEq for Wrapper {
    fn eq(&self, other: &Self) -> bool {
        self.regex.as_str() == other.regex.as_str()
    }
}

@casey
Copy link

casey commented Mar 2, 2025

Not to necro an old issue, but I wonder if not having these implementations is worth it.

If I understand correctly, the benefit of not having these implementations is that if a user expects the == for Regex to compare the underlying abstract state machines of the regex, then they will be surprised when it doesn't.

(And it can't be implemented that way, because it's expensive to compute, and == is generally expected to be fast.)

However, I'm not sure if that's what users actually expect. Personally, I would expect == to just do string comparison, and would be surprised if \d+ == \d\d* was true.

Even if it was fast, comparing the underlying of the regex behavior might also be undesirable, since could you not have two regexes which compare true, since they match the same set of strings, but which have large efficiency differences? I.e., one is much more efficient than the other, and terminates much faster. In such a case, I'm not sure you would want == to return true, and obscure the differences between them.

The downside of not having these implementations, or at least the downside that I've personally run into, is that if you include a Regex in another type, and then want to derive ParitalEq, Eq, or Hash, you can't, because Regex doesn't implement them.

I ran into this while implementing a config file that contains a regex:

#[derive(Deserialize)]
struct Config {
  some_other_value: u32,
  #[serde_as(as = "DisplayFromStr")]
  regex: Regex,
}

I wanted to do this in a test:

assert_eq! {
  Config::load("path.yaml"),
  Config { regex: Regex::new("hello").unwrap() },
}

But can't do so because I can't derive PartialEq for Config. Another case of this that I've run into is when deriving clap::Parser derive where an argument is a Regex, and I can't add tests which exercise how arguments are parsed using assert_eq!(…).

Feel free to ignore this comment! Don't want to force you to relitigate an old issue, I just wanted to add my 2¢.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants