Skip to content

Guide: complex data types #15422

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 9, 2014

Conversation

steveklabnik
Copy link
Member

I'm not happy about the hand-waving around cmp, but I'm not sure how to get around it.


As you can see, the type of a tuple looks just like the tuple, but with each
position having a type name rather than the value. Careful readers will also
note that tuples are hetergenius: we have an `int` and a `&str` in this tuple.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling: hetergenius --> heterogeneous

@steveklabnik
Copy link
Member Author

Thanks! Go figure, I made sure there were no test errors, yet I commit spelling errors :(


### Structs

A struct is another form of a 'record type,' just like a tuple. There's a
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comma shouldn't be inside the quotes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comma is part of the overall sentence, not the part in quotes.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but comma inside the quotes is proper English.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, you're right. I'm sorry. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically speaking, it’s a matter of style, and putting the comma outside the quotes is the prevalent style in British English.[1] However, Rust documentation seems to generally follow American English (where putting the comma inside the quotes is more common), so putting the comma inside the quotes would be more consistent.

(Sometimes the British style is used in America, particularly in linguistic and philosophical works, because it is clearer and less ambiguous.[2])

@steveklabnik
Copy link
Member Author

I have fixed most of these issues. The 'newtype' thing remains. Do we have some consensus about what to do here?

@lilyball
Copy link
Contributor

lilyball commented Jul 8, 2014

I think the word "newtype" is fine, it maps to a Haskell concept, but it should be explained as something other than a "synonym".

@brson
Copy link
Contributor

brson commented Jul 8, 2014

Agree 'newtype' or 'newtype struct' is cromulent here.

@huonw
Copy link
Member

huonw commented Jul 8, 2014

it maps to a Haskell concept

I somewhat disagree: the Haskell concept has the additional runtime representation changes over normal data declarations (a newtype is stored without an extra layer of boxing), and we have no such thing, i.e. struct Foo(T) and struct Foo { x: T } are equally unboxed but only the former is a "newtype".

@lilyball
Copy link
Contributor

lilyball commented Jul 8, 2014

@huonw That's an implementation detail that AFAIK only matters to FFI, not really a semantic attribute of the construct. Rust also has the capability of unboxing newtypes during trans the same way it now unboxes nil-pointer-optimized types.

@huonw
Copy link
Member

huonw commented Jul 8, 2014

That's not what I was talking about at all.

In Rust, struct Foo(T) and struct Foo { x: T } have the same representation, no additional pointers, no additional indirection. In Haskell, newtype Foo = Foo T has the same representation as T, but data Foo = Foo T is essentially Box<T>, i.e. an extra layer of pointers.

@lilyball
Copy link
Contributor

lilyball commented Jul 9, 2014

@huonw I don't see how the behavior of data is relevant to Rust. newtype Foo = Foo T is in contrast to just using T directly, as far as this discussion is concerned, not in contrast to the behavior of data Foo = Foo T.

@huonw
Copy link
Member

huonw commented Jul 9, 2014

No, we're comparing the two ways to define wrapper structs. I.e. answering the question: if you need a wrapper type, how do you define it?

In Haskell, you have a choice between data and newtype; in Rust, you have a choice between struct { } and struct ().

We currently only call the () version a "newtype" even though the {} version is essentially identical, and, in particular, the two are identical in terms of Haskell's difference between data and newtype, that is, we're coopting a Haskell word but just entirely ignoring the actual point of Haskell having two keywords. I would much prefer if we called both {} and () versions "newtype" or, preferably, "wrappers" (and then, e.g. "tuple newtype" or "tuple wrappers" for the () version).

This over-emphasis/preferential treatment of one-element tuple structs results in beginners writing code like this (this is just an example of our misteachings, not picking on the code/author), where the person has unfortunately made life harder for themselves by using a tuple struct with all the ugly let &Primes(ref primes) = self; primes.foo() unwrapping, rather than just using a normal struct with a field access: self.primes.foo() .

(This last ugliness point could easily be addressed by allowing some form of field accesses on tuple structs, but we don't have that now, and the preferential treatment of () would still be wrong.)

@lilyball
Copy link
Contributor

lilyball commented Jul 9, 2014

I think you're still focusing too much on the distinction between data and newtype in Haskell. newtype may be taught as an alternative to data, but that's only because the use-case of newtype is generally first introduced with data (that is, wrapping a single underlying type in a brand new type).

Strictly speaking, Haskell's newtype is equivalent in Rust to both struct Foo(T) and struct Foo { x: T }. And you're right, we only call the first one a "newtype". But we do that because we don't need a name for the second one, calling that one a "struct" works perfectly fine. Haskell doesn't really have an equivalent to Rust structs though, as its data is boxed, so being able to specify a single-valued record as a newtype is important.

I think it's perfectly reasonable to call struct Foo { x: T } a "newtype" if you want, it just generally isn't done because nobody questions a single-field struct. If you want to use the word "newtype" in connection with struct Foo { x: T }, I have no objection.

I do agree that our current tutorial does suggest that struct Foo(T) is the right tool to use when you want a newtype, and that struct Foo { x: T } might be a better choice in many situations. And the guide here should probably be updated to reflect this. It's worth pointing out that a tuple-struct with a single field is a "newtype", but that it's also equivalent to a record-struct with a single field, and that the latter should be preferred in most cases. Personally, I think tuple structs are useful when the clients of your API can be expected to create these values, but rarely (or never) destructure them.

@huonw
Copy link
Member

huonw commented Jul 9, 2014

newtype may be taught as an alternative to data, but that's only because the use-case of newtype is generally first introduced with data (that is, wrapping a single underlying type in a brand new type).

This doesn't really make sense... of course newtype is introduced with data, since data is the general data type defining word in Haskell, i.e. it does struct and enum all in one. It is not possible to define a wrapper type without using data or newtype (just like you need to use struct (or enum) in Rust), so introducing newtype is essentially always going to be "you can use newtype as a way to define wrapper types without the overhead of data", since that's exactly what it is. (You could introduce newtype as completely separate "this is how you define wrapper types" construct, but that's strange: there's far more similarity between data and newtype than there are differences.)

In Rust, both () and {} are zero-overhead, so we don't need a special keyword for it, and I'm contesting our use of the special terminology should be generalised to cover the two equivalent definitions.

Strictly speaking, Haskell's newtype is equivalent in Rust to both struct Foo(T) and struct Foo { x: T }. And you're right, we only call the first one a "newtype". But we do that because we don't need a name for the second one, calling that one a "struct" works perfectly fine. Haskell doesn't really have an equivalent to Rust structs though, as its data is boxed, so being able to specify a single-valued record as a newtype is important.

I don't understand the relevance of this. The first sentence is exactly what I have been saying, and then you're trying to justify why Haskell has newtype as a keyword? Neither of those are under question.

I think it's perfectly reasonable to call struct Foo { x: T } a "newtype" if you want, it just generally isn't done because nobody questions a single-field struct

But "nobody" thinks of using a single-field struct, because we use special terminology for the single field tuple version, and ignore the normal version.

Personally, I think tuple structs are useful when the clients of your API can be expected to create these values, but rarely (or never) destructure them.

I wasn't saying they are useless, just that we're emphasising them too much by using a fancy name from another language where it has important meaning that is irrelevant to Rust.

bors added a commit that referenced this pull request Jul 9, 2014
…=brson

I'm not happy about the hand-waving around `cmp`, but I'm not sure how to get around it.
@bors bors closed this Jul 9, 2014
@bors bors merged commit 11c64f1 into rust-lang:master Jul 9, 2014
bors added a commit to rust-lang-ci/rust that referenced this pull request Aug 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.