Skip to content

Support SHA256_Transform_shani() with MSVC #15312

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

cmb69
Copy link
Member

@cmb69 cmb69 commented Aug 9, 2024

This requires #15292 to be fixed; I've used the same macro guards as proposed by #15301.

I've only did most minimal cleanup; given that the implementation apparently does not rely on HAVE_FUNC_ATTRIBUTE_TARGET, I've removed the forward declaration and changed the guard.

As is, this does not support compile time __SHA__ detection, because that macro is apparently never defined on Windows. We would need to work around that in confutils.js like with other non standard MSVC macros:

avail.Add("sse", "__SSE__");
avail.Add("sse2", "__SSE2__");
avail.Add("sse3", "__SSE3__");
avail.Add("ssse3", "__SSSE3__");
avail.Add("sse4.1", "__SSE4_1__");
avail.Add("sse4.2", "__SSE4_2__");

However, I wouldn't know where to insert __SHA__; likely this is "non-linear", and as such the whole --enable-native-intrinsics might need to be changed, or an extra configure option be introduced. Probably low priority, since official builds only use SSE(2) anyway.

Note also that the current implementation appears to be suboptimal since there is neither proper support for HAVE_FUNC_ATTRIBUTE_TARGET (which is not available with MSVC anyway), nor for ZEND_INTRIN_*_FUNC_PTR style which is likely supported everywhere.

@TimWolla might want to have a look at this, although for now it's just a draft.

@TimWolla
Copy link
Member

TimWolla commented Aug 9, 2024

I've only did most minimal cleanup; given that the implementation apparently does not rely on HAVE_FUNC_ATTRIBUTE_TARGET, I've removed the forward declaration and changed the guard.

As the CI failure explains: The declaration with the target attribute is necessary for the build to work / for making the intrinsics actually legal to use within the function. Passing the flags to the compiler directly is the "native support".

@cmb69
Copy link
Member Author

cmb69 commented Aug 9, 2024

As the CI failure explains: The declaration with the target attribute is necessary for the build to work / for making the intrinsics actually legal to use within the function. Passing the flags to the compiler directly is the "native support".

Indeed, but this only works if HAVE_FUNC_ATTRIBUTE_TARGET is defined. A full fledged implementation is more complex. #10436 may serve as reference; especially

  1. With runtime detection of AVX2 performed by the module init function. The detection is done by checking the output of CPUID and then a function pointer is set accordingly. In this case, all calls to the UTF-8 validation routine are indirect calls through that function pointer.

would make sense, since that would support environments where HAVE_FUNC_ATTRIBUTE_TARGET is not defined. I assume this is faster than the current way of checking CPU support for each function call, and a bit slower than the HAVE_FUNC_ATTRIBUTE_TARGET style, although perhaps not by much, in which case we could skip the HAVE_FUNC_ATTRIBUTE_TARGET style. Maybe someone knows about the performance difference to expect.

So maybe we should start getting rid of the PHP_HASH_INTRIN_SHA_* defines, and move that to zend_portability.

@TimWolla
Copy link
Member

TimWolla commented Aug 9, 2024

Indeed, but this only works if HAVE_FUNC_ATTRIBUTE_TARGET is defined
since that would support environments where HAVE_FUNC_ATTRIBUTE_TARGET is not defined.

I don't follow here. The attribute is necessary to make the intrinsics available even if -mssse3 -msha is not explicitly passed, because otherwise the build fails. If there is an environment where the intrinsics are always available, without this "safety guardrail", then the SHA-NI implementation can simply be compiled unconditionally.

The CPU check at runtime is orthogonal to that.

I assume this is faster than the current way of checking CPU support for each function call,

I expect the branch to be easily predicted by a CPU, given that is will either always be taken or never be taken. In any case, a single SHA-256 transform still is fairly expensive and I expect it will easily dwarf the type to check the CPU features, as indicated by the 5x improvement that is already achieved with the current very simple implementation.

@cmb69
Copy link
Member Author

cmb69 commented Aug 9, 2024

I don't follow here. The attribute is necessary to make the intrinsics available even if -mssse3 -msha is not explicitly passed, because otherwise the build fails. If there is an environment where the intrinsics are always available, without this "safety guardrail", then the SHA-NI implementation can simply be compiled unconditionally.

Ah, indeed, I still didn't really understand that, and mixed up __attribute__((target())) and __attribute__(ifunc())). So we don't use the latter (which is fine by me), but still need to support MSVC, what that second commit of mine does, but in a very ugly and hacky way. I'll see if that can be improved.

@TimWolla
Copy link
Member

TimWolla commented Aug 9, 2024

I'll see if that can be improved.

The preprocessor check in hash_sha_ni.c can probably be replaced by:

#if PHP_HASH_INTRIN_SHA_NATIVE || PHP_HASH_INTRIN_SHA_RESOLVER

Then in php_hash_sha.h you would define PHP_HASH_INTRIN_SHA_RESOLVER for Windows and everything probably should work.

@cmb69
Copy link
Member Author

cmb69 commented Aug 9, 2024

Thanks @TimWolla. Simplified and rebased (to incorporate #15301).

@andypost
Copy link
Contributor

it solved issue with build on Alpine, thank you!

TimWolla added a commit to TimWolla/php-src that referenced this pull request Aug 14, 2024
…on of SHA256_Transform_shani

This fixes the build for amd64 platforms that do not have
`HAVE_FUNC_ATTRIBUTE_TARGET`, specifically Alpine/Musl as of now.

Closes phpGH-15384.
Related to phpGH-15312.
TimWolla added a commit that referenced this pull request Aug 14, 2024
…on of SHA256_Transform_shani (#15404)

This fixes the build for amd64 platforms that do not have
`HAVE_FUNC_ATTRIBUTE_TARGET`, specifically Alpine/Musl as of now.

Closes GH-15384.
Related to GH-15312.
@cmb69
Copy link
Member Author

cmb69 commented Sep 3, 2024

Rebased, and removed the erroneous ZEND_INTRIN_SSSE3_RESOLVER as noted above.

@cmb69
Copy link
Member Author

cmb69 commented Sep 3, 2024

Generally, zend_portability defines 4 relevant macros for each intrinsic family:

  • ZEND_INTRIN_*_NATIVE (has compile time support)
  • ZEND_INTRIN_*_RESOLVER (supports dynamic resolve)
  • ZEND_INTRIN_*_FUNC_PROTO (uses function attribute to dispatch)
  • ZEND_INTRIN_*_FUNC_PTR (uses a function pointer to dispatch)

In this case we have only PHP_HASH_INTRIN_SHA_NATIVE and PHP_HASH_INTRIN_SHA_RESOLVER; dispatching via a function pointer would not be very useful here, so there's no need for PHP_HASH_INTRIN_SHA_FUNC_PTR, and PHP_HASH_INTRIN_SHA_FUNC_PROTO is implicit in this case.

@cmb69 cmb69 marked this pull request as ready for review September 3, 2024 20:27
@cmb69 cmb69 requested review from TimWolla and arnaud-lb September 3, 2024 20:28
Copy link
Member

@arnaud-lb arnaud-lb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried in a VM: it uses SHA256_Transform_shani() as expected when the CPU has SSSE3/SHA features (with runtime detection), and the default implementation otherwise.

I get that MSVC can use the sha intrinsics without __attribute__((target)).

Looks good to me!

Copy link
Member

@TimWolla TimWolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, except for the unresolved thread in hash_sha_ni.c.

@cmb69 cmb69 closed this in 1d36927 Sep 4, 2024
@cmb69 cmb69 deleted the cmb/shani-msvc branch September 4, 2024 16:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants