-
Notifications
You must be signed in to change notification settings - Fork 13.3k
Improve optimizations for boolean values #15464
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I am not qualified to evaluate this patch, but do you have any benchmarks that demonstrate an improvement? |
See the test case given in #15203. (I only had the "Fixes" on a commit, added it to the PR description now). Before:
After:
The relevant optimization here is turning the store loop into a memset. It's not triggered for types which are not a byte-width multiple. |
let lltemp = builder.alloca(val_ty(llforeign_arg), ""); | ||
builder.store(llforeign_arg, lltemp); | ||
llforeign_arg = lltemp; | ||
llforeign_arg = if ty::type_is_bool(rust_ty) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a particular reason isn't this using the store_ty
abstraction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't, because this function doesn't use the regular building blocks like Block
, but uses a Builder
directly.
LLVM doesn't really like types with a bit-width that isn't a multiple of 8 and disable various optimizations if it encounters such types used with loads/stores. OTOH, booleans must be represented as i1 when used as SSA values. To get the best results, we must use i1 for SSA values, and i8 when storing the value to memory. By using range asserts on loads, LLVM can eliminate the required zero-extend and truncate operations. Fixes rust-lang#15203
LLVM doesn't handle i1 value in allocas/memory very well and skips a number of optimizations if it hits it. So we have to do the same thing that Clang does, using i1 for SSA values, but storing i8 in memory. Fixes #15203.
LLVM doesn't handle i1 value in allocas/memory very well and skips a number of optimizations if it hits it. So we have to do the same thing that Clang does, using i1 for SSA values, but storing i8 in memory.
Fixes #15203.