Skip to content

Spec request. Hasher: is write_u32 eqivalent to 4 calls of write_u8? #42951

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
stepancheg opened this issue Jun 28, 2017 · 12 comments · Fixed by #83390
Closed

Spec request. Hasher: is write_u32 eqivalent to 4 calls of write_u8? #42951

stepancheg opened this issue Jun 28, 2017 · 12 comments · Fixed by #83390
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. E-medium Call for participation: Medium difficulty. Experience needed to fix: Intermediate. I-needs-decision Issue: In need of a decision. P-medium Medium priority T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@stepancheg
Copy link
Contributor

stepancheg commented Jun 28, 2017

Currently Hasher::write_u32 is implemented as self.write(&unsafe { mem::transmute::<_, [u8; 4]>(i) }). The question is: is it a requirement for all Hasher implementations?

In other words, is is true, that for each hasher implementation, call

hasher.write_32(0x01020304);

must be equivalent to

hasher.write_8(0x04);
hasher.write_8(0x03);
hasher.write_8(0x02);
hasher.write_8(0x01);

?

(assuming host is little-endian).

Why is it important.

Suppose, you want to implement very simple hasher with state updated like state = state * 31 + update.

If Hasher does not have requirement as above, hasher can be implemented by simply casting any operand to u64 and avoiding dealing with unaligned/smaller than u64 integers:

impl Hasher for MyHasher {
    fn write_u8(&mut self, v: u8) {
        self.write_u64(v as u64);
    }

    fn write_u64(&mut self, v: u64) {
        self.state = (self.state * 31).wrapping_add(v);
    }
}
@steveklabnik steveklabnik added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Jun 28, 2017
@nagisa
Copy link
Member

nagisa commented Jun 29, 2017

I’m fairly sure there was a discussion about it before and we decided against doing any changes to the libstd impl. Don’t exactly remember the reason, though.

@stepancheg
Copy link
Contributor Author

If any decision is made, I wish it would be stated in the documentation of Hasher trait.

@qnighy
Copy link
Contributor

qnighy commented Jun 30, 2017

StableHasher in the compiler employs a different implementation to achieve platform-independence.

@michaelwoerister
Copy link
Member

I don't think there's any such requirement. The FxHasher in the compiler does exactly what your example does.

@Mark-Simulacrum Mark-Simulacrum added C-question A-docs Area: Documentation for any part of the project, including the compiler, standard library, and tools C-enhancement Category: An issue proposing an enhancement or a PR with one. and removed C-question labels Jul 28, 2017
@steveklabnik steveklabnik added the P-medium Medium priority label Aug 30, 2017
@steveklabnik steveklabnik added the E-medium Call for participation: Medium difficulty. Experience needed to fix: Intermediate. label May 29, 2018
@alercah
Copy link
Contributor

alercah commented Jul 9, 2018

I think I would be opposed to this requirement; it imposes a performance penalty for systems of the "wrong" endianness, since they would need to correct their endianness. The only requirement on Hasher should be that the same sequence of write* calls produces the same result, I think.

@clarfonthey
Copy link
Contributor

My spec request: given two byte slices first and second, is hasher.write(first); hasher.write(second); equivalent to hasher.write(first + second) ?

This way, we could define all the non-slice methods as explicitly being transmutes, always.

@steveklabnik steveklabnik removed the A-docs Area: Documentation for any part of the project, including the compiler, standard library, and tools label Jan 10, 2019
@steveklabnik
Copy link
Member

removing T-doc until @rust-lang/libs lets us know what to document here.

@Amanieu
Copy link
Member

Amanieu commented Jan 17, 2019

My personal take on this is that we should only guarantee that Hasher produces the same value for the exact same sequence of write* calls. This even applies to plain write calls on byte slices: a Hasher is only guaranteed to produce the same result if the same sequence of write calls are used with the same slice lengths.

For an example of where this could make a difference, consider this code from Abseil which hashes short slices by reading 0-16 bytes and padding anything not included in the length with zeroes or duplicates of the existing bytes.

@Mark-Simulacrum Mark-Simulacrum added the I-needs-decision Issue: In need of a decision. label Dec 9, 2019
@Amanieu
Copy link
Member

Amanieu commented Feb 3, 2021

Based on the libs team decision in #80303 (comment), I'm going to close this issue. The answer to the question in the issue title is no: write_u32 is not required to be equivalent to 4 calls of write_u8.

@Amanieu Amanieu closed this as completed Feb 3, 2021
@alercah
Copy link
Contributor

alercah commented Feb 4, 2021

Was there any updates to the documentation of Hasher? I didn't see any in a quick scan of threads.

@clarfonthey
Copy link
Contributor

Yeah, it seems like the thread was closed because the original idea was decided to not be done, but I agree that it shouldn't be closed until documentation is added.

@clarfonthey
Copy link
Contributor

cc @Amanieu who closed this since it shouldn't be closed. Opened #83390 to add docs for this, but think this issue should be kept open until we've finally decided on how to document this.

clarfonthey added a commit to clarfonthey/rust that referenced this issue Apr 26, 2021
bors added a commit to rust-lang-ci/rust that referenced this issue Apr 26, 2021
Document Hasher spec decision from rust-lang#42951

Since that ticket was closed without the decision actually being documented.

Fixes rust-lang#42951.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one. E-medium Call for participation: Medium difficulty. Experience needed to fix: Intermediate. I-needs-decision Issue: In need of a decision. P-medium Medium priority T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants