Skip to content
This repository was archived by the owner on Jan 26, 2022. It is now read-only.

Again, reconsider unsigned types (TC39 request) #191

Open
littledan opened this issue Jun 1, 2015 · 39 comments
Open

Again, reconsider unsigned types (TC39 request) #191

littledan opened this issue Jun 1, 2015 · 39 comments

Comments

@littledan
Copy link
Member

It's a bit redundant with integer types, but TypedArrays have unsigned and signed variants, as well as Uint8Clamped, even though the same data can be represented with signed ints. At TC-39, a committee member asked why we don't have unsigned variants. While the current API design makes sense to me, we should also consider making the SIMD API symmetric with the existing language features. The cost is that there are a large number of functions that are basically duplicated (e.g., add), so I don't know the answer one way or the other, but this bug documents the question. In theory, with value types, it's not actually a big deal to make more types.

@ghost
Copy link

ghost commented Jun 2, 2015

Would separate signed and unsigned types imply the need for lots of conversions when you wanted to mix signed and unsigned operations?

It's not important right now because I don't think any (many?) operations are supported that behave differently on signed or unsigned data, but once operations like that appear it would be unpleasant having to qualify type conversions all over the place.

@littledan
Copy link
Member Author

I guess you could have bitcast operations, like the ones that exist right now between float and int types, right? I'm not sure when you'll be switching back and forth between signed and unsigned operations in a way that's annoying to use though--do you have a specific use case in mind?

@ghost
Copy link

ghost commented Jun 3, 2015

Well, contrast SSE's universal __m128 C data type with NEON's int8x16_t, uint8x16_t, ..., int32x4_t, uint32x4_t, etc., and the amount of vreinterpretq_xnn_ymm() casts that the latter demands. See examples from skia, webp, and vpx. Some uses of vreinterpret in those are data width aliasing, but I found them by searching for simple signed<->unsigned casts of the same width.

I'm not entirely sure how the implications fit into JavaScript, though. It's just my experience with C that makes me wary of starting something hard to finish. Conversions might not come up much at the moment because most of the operations give the same result if you use the wrong signedness, and 32-bit leaves a lot of room to not care. It's the narrower types and all the saturating/halving/widening/narrowing operations that go with those types where it gets hairy.

As simple example of the implications: should uint32x4.ShiftRightByScalar() be an implicit logical shift, and should sint32x4.ShiftRightLogicalByScalar() be eliminated?

Oh, sorry, there's no such method as ShiftRightByScalar()... so the question is simply wether or not uint32x4.ShiftRightArithmeticByScalar() and sint32x4.ShiftRightLogicalByScalar() make any sense.

@littledan
Copy link
Member Author

Well, here are some other cases where it comes up besides shift (maybe you do want to call out shift types with different names because they really are different; I wasn't thinking about shift before):

  • Construction and extractLane/replaceLane: This simple usage was enough to make TypedArrays decide to have multiple versions for signed/unsigned. With our current API, programmers have to convert negative numbers into positive ones if they want a scalar extraction of an unsigned vector, and this is the only thing that unsigned TypedArrays provide users. You could argue, though, that it was a mistake to do this for TypedArrays.
  • Comparison operations like greaterThan. The unsigned version can just be implemented in terms of the signed one by adding the right splatted constant first, and then doing the comparison, but the builtin version is signed.
  • Saturating add: The polyfill saturates at signed min/max. Some use cases might want an unsigned saturation.

Maybe SIMD programmers don't need as much babying as I'm imagining; I'm definitely not suggesting that we add functions for unsigned comparison, unsigned saturation, etc, but maybe if we had types for unsigned SIMD vectors, then it would be easier to see that an operation is signed-only because the function doesn't exist on the unsigned type.

@PeterJensen
Copy link
Collaborator

We had previously decided to introduce 'unsigned' int operations for when signedness matters (division, shift, and compares). This is how LLVM does it too. However, it's not a nice symmetrical solution like introducing an unsigned base type would be. The ugliness of having to do converts is a big con though. These are verbose and very detrimental to code readability. I think I prefer the 'special' unsigned operations over introducing a new base type, but can go either way.

@littledan
Copy link
Member Author

I don't see any signed comparison operators in the polyfill. Is the intention to add them still? (I guess division doesn't matter since there's no integer division operations right now.)

@sunfishcode
Copy link
Member

Yes, the intention is to add them, unless there's consensus to support unsigned types instead.

@ghost
Copy link

ghost commented Jun 9, 2015

How about getters named .u and .s to make the conversions less noisy?

@littledan
Copy link
Member Author

In the call today, the consensus was to not add unsigned types, but to add the full set of missing unsigned operations. This means:

  • Unsigned comparisons
  • Unsigned saturating add and subtract

I guess we don't need unsigned extractLane/replaceLane, do we?

@littledan littledan changed the title Consider SIMD types for unsigned integers Add missing unsigned operations to integer types Jun 10, 2015
@sunfishcode
Copy link
Member

Unsigned comparisons, saturating add and subtract, and extractLane are in a patch in #209.

@littledan
Copy link
Member Author

@sunfishcode 's patch has been merged, and the spec has been updated in v0.7 1ba0295 . I don't see any more unsigned operations that are missing.

@littledan littledan reopened this Jul 29, 2015
@littledan
Copy link
Member Author

Sorry to reopen this, but Waldemar Horwat of TC39 has expressed that he still wants unsigned types. In particular, even if all the arithmetic operations have unsigned modes, there is still the issue where .toString() always prints out the value as signed integers.

@littledan littledan changed the title Add missing unsigned operations to integer types Again, reconsider unsigned types (TC39 request) Jul 29, 2015
@littledan
Copy link
Member Author

We discussed this at length in the meeting. I'm tempted to say that we should make this change to unsigned integer types. There's no real performance downside either way, it matches TypedArray precedent, and it also follows the majority precedent (with xmmintr.h the big huge exception). I looked around at some other SIMD interfaces to see what the consensus is. For a unified type, I found a couple low-level interfaces:
xmmintr.h
LLVM IR

For separate signed/unsigned types, I found many high-level languages:
D
Swift
Factor
Rust
AltiVec C types
NEON C types
GCC platform-independent vector extensions
LLVM user-level SIMD interface

On the downside, programmers will sometimes have to use an explicit cast when, based on their xmmintr.h and assembly experience, it'd be annoying to have to use that cast. And the additional duplicated functions (e.g. add) will take some memory (maybe we'll have 30% more functions or so).

This is the only remaining big issue with TC39. The people who will be reviewing the spec in detail, whose signoff we would need, are concerned about this. Does anyone object to making this change?

@PeterJensen
Copy link
Collaborator

The amount of extra functions that will need to be added is substantial. On Int32x4 there's around 30 or so sign-agnostic operations (arithmetic, logical, shift, load, store, initializers, compares). Assuming we need to have copies of those for the unsigned types as well it will add roughly 3x30 functions. In addition we'll need type casting operations between all the 128-bit types. Now there's 4 128-bit types (Int32x4, Int16x8, Int8x16, Float32x4), resulting in 12 (4x4-4) cast operations. With the addition of 3 more 128-bit types (UInt32x4, UInt16x8, UInt8x16) we'll need 7x7-7 = 42 cast operations total, that's an increase of 40, so all together we'll be adding around ~130 functions to the API.

Daniel did an exact computation and it came out to 256 functions before and 390 after. A ~50% increase.

@ghost
Copy link

ghost commented Jul 30, 2015

I don't know Javascript at all, but could the functions not be relaxed in the set of types that they accept? Perhaps by implementing sign-agnostic operations under sign-agnostic types and using sign-agnostic-type-casting getters to duck-type the problem away?

@PeterJensen
Copy link
Collaborator

Also worth keeping in mind: When increasing the size of the API by ~50% we also need 50% more unit tests.

@PeterJensen
Copy link
Collaborator

@simhos01 Not 100% sure I understand your suggestion. Do you suggest introducing SIMD.XIntAxB name spaces to hang the sign-agnostic operations on. It would be OK to use both IntAxB and UintAxB operands on those operations.

@littledan
Copy link
Member Author

Sign-agnostic ops would be a little more difficult for a VM to implement efficiently since they are be polymorphic not only in input type but also in output type most of the time. They would probably have to dispatch to different code paths internally. And, as Peter points out, where do you put them? I don't think it's a very good idea.

In making this decision, I think the main thing is to compare the preponderance of other languages and TypedArrays (and maybe planning for future consistency with int64) as evidence against the 50% increased number of functions, which has implementation complexity and memory overhead burden.

@littledan
Copy link
Member Author

By the way, on another thread, Peter made the good point that TypedArrays are already not being taken for everything because of Uint8ClampedArray. But I think that was more of a one-off mistake and the rest of TypedArrays is decent.

@PeterJensen
Copy link
Collaborator

Thanks @littledan That comment should've been made here. My mistake.

@PeterJensen
Copy link
Collaborator

@simhos01 's comment above regarding massive use of vreinterpretq_xnn_ymm() in example code from skia, webp, and vpx where vreinterpret calls are used for type casts is pretty important. It provides good evidence that it's not unlikely that type conversions between signed and unsigned SIMD values happens in real code. To convert two signed operands to unsigned in order to do an unsigned compare would result in code that looks like this

SIMD.Uint32x4.lessThan(SIMD.Uint32x4.fromInt32x4(a), SIMD.Uint32x4.fromInt32x4(b))

instead of simply saying:

SIMD.Int32x4.unsignedLessThan(a, b);

The latter being much easier to read.

Having to use a lot of these conversion can quickly result in hard to read (and maintain) code.

@jfbastien
Copy link

@PeterJensen I'm not sure I understand the point about vreinterpretq: it seems like it's reinterpreting bits for vectors of the same bit-width, but different element width. e.g. in the webp example:

static WEBP_INLINE void Load4x16(const uint8_t* src, int stride,
                                 uint8x16_t* const p1, uint8x16_t* const p0,
                                 uint8x16_t* const q0, uint8x16_t* const q1) {
  const uint32x4_t zero = vdupq_n_u32(0);
  uint32x4x4_t in;
  INIT_VECTOR4(in, zero, zero, zero, zero);
  src -= 2;
  LOADQ_LANE_32b(in.val[0], 0);
  LOADQ_LANE_32b(in.val[1], 0);
  LOADQ_LANE_32b(in.val[2], 0);
  LOADQ_LANE_32b(in.val[3], 0);
  LOADQ_LANE_32b(in.val[0], 1);
  LOADQ_LANE_32b(in.val[1], 1);
  LOADQ_LANE_32b(in.val[2], 1);
  LOADQ_LANE_32b(in.val[3], 1);
  LOADQ_LANE_32b(in.val[0], 2);
  LOADQ_LANE_32b(in.val[1], 2);
  LOADQ_LANE_32b(in.val[2], 2);
  LOADQ_LANE_32b(in.val[3], 2);
  LOADQ_LANE_32b(in.val[0], 3);
  LOADQ_LANE_32b(in.val[1], 3);
  LOADQ_LANE_32b(in.val[2], 3);
  LOADQ_LANE_32b(in.val[3], 3);
  // Transpose four 4x4 parts:
  {
    const uint8x16x2_t row01 = vtrnq_u8(vreinterpretq_u8_u32(in.val[0]),
                                        vreinterpretq_u8_u32(in.val[1]));
    const uint8x16x2_t row23 = vtrnq_u8(vreinterpretq_u8_u32(in.val[2]),
                                        vreinterpretq_u8_u32(in.val[3]));
    const uint16x8x2_t row02 = vtrnq_u16(vreinterpretq_u16_u8(row01.val[0]),
                                         vreinterpretq_u16_u8(row23.val[0]));
    const uint16x8x2_t row13 = vtrnq_u16(vreinterpretq_u16_u8(row01.val[1]),
                                         vreinterpretq_u16_u8(row23.val[1]));
    *p1 = vreinterpretq_u8_u16(row02.val[0]);
    *p0 = vreinterpretq_u8_u16(row13.val[0]);
    *q0 = vreinterpretq_u8_u16(row02.val[1]);
    *q1 = vreinterpretq_u8_u16(row13.val[1]);
  }
}

This isn't about sign as much as it is about number of elements.

There are some _s8_u8 and _s16_u16 conversions, but the majority AFAICT are just changing the number of elements.

@PeterJensen
Copy link
Collaborator

@jfbastien Yes the conversions that change the number of elements don't count.

My point was that the sign conversions do appear in real code, as witnessed by the _s8_u8 and s16_u16 conversions, and that the ergonomics of the JS code one would have to write to do this is pretty ugly.

@littledan
Copy link
Member Author

@PeterJensen This example is a bit hard for me to understand. Do you have any other examples? Maybe some sample JavaScript code with and without the casts would make the point especially clear.

@ghost
Copy link

ghost commented Aug 3, 2015

The cast in d = SIMD.Int32x4.fromUint32x4(SIMD.Uint32x4.sub(a, b)) is inevitable in many realistic uses of unsigned types. Then you'll have to cast back to produce another unsigned result.

I think this might be an invisible problem in the desktop world because most 32-bit code copes by staying signed and assuming that overflow never happens rather than by being careful (promotion to 64-bit is generally preferred over being careful). 16-bit and 8-bit DSP code comes up against its range limits much more often but signedness clarity is necessary only for a few operations, while enforcing it in the type makes all operations pay the overhead.

@littledan
Copy link
Member Author

In both the signed and unsigned cases, overflow of sub is well-defined by two's complement arithmetic. sub would be is identical on signed and unsigned operations except for the names of the types taken as arguments and given as a return value.

@ghost
Copy link

ghost commented Aug 4, 2015

The problem isn't the data, but its type. (unsigned)x - (unsigned)y might have a negative result, implying that the result should be signed (though it could still be out of range), or it might have a positive result as large as UINT_MAX, meaning that the result needs to be unsigned. The programmer must know some additional constraints or they wouldn't be able to write useful code, but whichever the language chooses by default will be the wrong type in some cases, and then the result will need to be cast to the correct type if it's to be used in a comparison or a saturating operation.

Some might consider a cast necessary simply as a matter of being up-front about the proper interpretation of the data at that stage. If I subtract two unsigned values and store the result in an unsigned type, am I not implying that the result must be positive or zero? Now clarity (say what you mean, use a type appropriate to the meaning of the data) is coming into conflict with brevity (don't make things unnecessarily verbose with extra casts).

@littledan
Copy link
Member Author

OK, I see, you may want signed output from an unsigned operation like subtraction. And it's messy-looking to have an intermediate value (before casting it) which plainly shows the result of an overflow.

I think C scalars have the same problem. If you do (x + y) > z, then the signedness of x and y (and z) inform the signedness of the comparison. Although C has very terse syntax for doing the casts, this is still an issue.

So there are two ways to solve it: Make separate signed and unsigned operations, or expect programmers to do explicit (or C-like implicit) casts. I think most high-level languages go with the latter one, but if this comes up a lot, it could be an argument for the former. I don't know what SIMD code tends to look like. Does anyone have an example code snippet where this came up (in any language)?

@ghost
Copy link

ghost commented Aug 4, 2015

I think adding separate signed and unsigned subtract might be compounding the problem of exploding the number of operations required, which was, originally, an argument against separate types. Add will suffer the same fate (eg., adding a negative bias to an unsigned value), and others can too, but I don't want to fuel those flames. Would we really want to gain unsignedSubtract() as a consequence of changes which were expected to do away with unsignedLessThan()?

I just wanted to highlight that the implications of separating signed and unsigned types comes with a lot of expressiveness expectations which are actually hard to live up to. I think the troubles around C's unsigned promotions and comparisons also highlight what can happen if you promise too much in your type system.

Initially I thought I was ambivalent, but now I think it might genuinely be too hard to separate signed and unsigned types in a coherent and useful way. Separating them doesn't seem to add much benefit and it makes other cases misleading or unclear or ugly.

@littledan
Copy link
Member Author

I don't think promotions are on the table.

What do you mean by unsignedSubtract()? What is the "negative bias to an unsigned value" that you're talking about? I didn't know there were more function names that we would have to add to support split signed/unsigned types.

@littledan
Copy link
Member Author

@simhos01 I am new to this area, so a more concrete example would help me understand what you mean by the difficulty. I don't understand why you'd want to cast back and forth just for a subtraction when overflow is defined as 2's complement. I don't understand why it would be bad to depend on that overflow.

@littledan
Copy link
Member Author

@simhos01 Will you be on the call today? If you can make it, that would really help me to understand the issue.

@PeterJensen
Copy link
Collaborator

Here's an example code snippet from the NEON version of the vp8 decoder:

uint8x16_t q3u8, q6u8;
int8x16_t  q14s8;
q14s8 = vqsubq_s8(vreinterpretq_s8_u8(q3u8),
                  vreinterpretq_s8_u8(q6u8));

If new Unsigned integers were introduced that would translate to this SIMD.js code:

q3u8 = SIMD.Uint8x16(...);
q6u8 = SIMD.Uint8x16(...);
q14s8 = SIMD.Int8x16.sub(SIMD.Int8x16.fromUint8x16(q3u8), SIMD.Int8x16.fromUint8x16(q6u8));

Without new Unsigned integers this code would look like this:

q3u8  = SIMD.Int8x16(...);
q6u8  = SIMD.Int8x16(...);
q14s8 = SIMD.Int8x16.sub(q3u8, q6u8);

Shorter, more readable.

Yesterday, Moh pointed out that by having less JS code, download sizes and parsing time is also reduced. Probably not significantly, but still

@littledan
Copy link
Member Author

After renaming to a minimized name, a function call for a cast should be just 4-5 characters. Also, here, it seems like you could do the cast to signed after the operation rather than before, resulting in just one function call.

It seems like the VP8 decoder that you linked to got by even though it has types that are similar to the proposed signed/unsigned split. There are 31 casts in that file, and only 13 are between signed and unsigned integer types of the same width, so keeping the types unified wouldn't even fix most of the casts (whereas going like xmmintr.h would do it I think).

@PeterJensen
Copy link
Collaborator

AFAIK minimizers can't minimize global functions/properties. You can of course alias them to shorter names by-hand, but as I said, the size of the JS code is probably not significant. I'm mostly concerned about the ergonomics/readability aspect.

Yes, there's a lot of other casts in that particular file that aren't relevant for this discussion, I was simply pointing to another example, like you were requesting.

@littledan
Copy link
Member Author

We discussed this in the call today and decided to make the plunge to split-up signed and unsigned types. This will be a big change for implementations, but seems like it won't have runtime cost. The change will make this spec easier to be standardized, and hopefully help programmability as well.

I'll make the spec change. @sunfishcode offered to do the tests at some point. Could someone sign up to change the polyfill? It would be a shame if the tests weren't in working state with respect to the polyfill. With the current way the code is organized, extending to unsigned types would mean a lot of duplication, but it looks to me like a refactoring could cut down the size of the polyfill and tests significantly while remaining faithful.

@ghost
Copy link

ghost commented Aug 5, 2015

For the sake of brevity, then, could I suggest casts as getters?

q3u8 = SIMD.Uint8x16(...);
q6u8 = SIMD.Uint8x16(...);
q14s8 = SIMD.Int8x16.sub(q3u8.signed, q6u8.signed);

Could also be:

q14s8 = SIMD.Uint8x16.sub(q3u8, q6u8).signed;

but I think the latter may be a little misleading.

Including the redundant casts (Uint8x16.unsigned() and Int8x16.signed(), etc.) might also make it easier to be explicit about the interpretation at different points in the code, and it would allow the user to duck-type wrapper functions (for better or worse) if they needed to.

@littledan
Copy link
Member Author

A user can define this themselves in a library:

Object.defineProperty(SIMD.Uint32x4.prototype, "signed", { configurable: true, get() { return SIMD.Int32x4.fromUint32x4(this.valueOf()) } })

We could also add the exact equivalent of this to the spec. But, this will likely not be usable asm.js and requires unboxing to be efficient, whether it's in the spec or in user code. People who want things like add to be methods can do the same, and it has the same performance-related issues.

In the spirit of "the extensible web", let's leave pretty things like this that can be added on top for users to innovate on, and start by standardizing a minimal low-level base. I honestly think it would be great, though, if people started using and exchanging libraries like this, and VMs learned how to optimize for them.

@littledan
Copy link
Member Author

Spec v0.8 describes the new split types. If anyone is implementing the change and runs into issues, please speak up here! Tests and polyfill still need to be done.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants