Skip to content

Commit 0fe357f

Browse files
authored
Merge pull request #1022 from ehuss/ident-unicode
Expand on Unicode identifiers.
2 parents 5b46b59 + 91c95a1 commit 0fe357f

File tree

1 file changed

+46
-16
lines changed

1 file changed

+46
-16
lines changed

src/identifiers.md

Lines changed: 46 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -2,8 +2,8 @@
22

33
> **<sup>Lexer:<sup>**\
44
> IDENTIFIER_OR_KEYWORD :\
5-
> &nbsp;&nbsp; &nbsp;&nbsp; XID_start XID_continue<sup>\*</sup>\
6-
> &nbsp;&nbsp; | `_` XID_continue<sup>+</sup>
5+
> &nbsp;&nbsp; &nbsp;&nbsp; XID_Start XID_Continue<sup>\*</sup>\
6+
> &nbsp;&nbsp; | `_` XID_Continue<sup>+</sup>
77
>
88
> RAW_IDENTIFIER : `r#` IDENTIFIER_OR_KEYWORD <sub>*Except `crate`, `self`, `super`, `Self`*</sub>
99
>
@@ -12,29 +12,59 @@
1212
> IDENTIFIER :\
1313
> NON_KEYWORD_IDENTIFIER | RAW_IDENTIFIER
1414
15-
An identifier is any nonempty Unicode string of the following form:
15+
<!-- When updating the version, update the UAX links, too. -->
16+
Identifiers follow the specification in [Unicode Standard Annex #31][UAX31] for Unicode version 13.0, with the additions described below. Some examples of identifiers:
1617

17-
Either
18+
* `foo`
19+
* `_identifier`
20+
* `r#true`
21+
* `Москва`
22+
* `東京`
1823

19-
* The first character has property [`XID_start`].
20-
* The remaining characters have property [`XID_continue`].
24+
The profile used from UAX #31 is:
2125

22-
Or
26+
* Start := [`XID_Start`], plus the underscore character (U+005F)
27+
* Continue := [`XID_Continue`]
28+
* Medial := empty
2329

24-
* The first character is `_`.
25-
* The identifier is more than one character. `_` alone is not an identifier.
26-
* The remaining characters have property [`XID_continue`].
30+
> **Note**: Identifiers starting with an underscore are typically used to indicate an identifier that is intentionally unused, and will silence the unused warning in `rustc`.
2731
28-
> **Note**: [`XID_start`] and [`XID_continue`] as character properties cover the
29-
> character ranges used to form the more familiar C and Java language-family
30-
> identifiers.
32+
Identifiers may not be a [strict] or [reserved] keyword without the `r#` prefix described below in [raw identifiers](#raw-identifiers).
33+
34+
Zero width non-joiner (ZWNJ U+200C) and zero width joiner (ZWJ U+200D) characters are not allowed in identifiers.
35+
36+
Identifiers are restricted to the ASCII subset of [`XID_Start`] and [`XID_Continue`] in the following situations:
37+
38+
* [`extern crate`] declarations
39+
* External crate names referenced in a [path]
40+
* [Module] names loaded from the filesystem without a [`path` attribute]
41+
* [`no_mangle`] attributed items
42+
* Item names in [external blocks]
43+
44+
## Normalization
45+
46+
Identifiers are normalized using Normalization Form C (NFC) as defined in [Unicode Standard Annex #15][UAX15]. Two identifiers are equal if their NFC forms are equal.
47+
48+
[Procedural][proc-macro] and [declarative][mbe] macros receive normalized identifiers in their input.
49+
50+
## Raw identifiers
3151

3252
A raw identifier is like a normal identifier, but prefixed by `r#`. (Note that
3353
the `r#` prefix is not included as part of the actual identifier.)
3454
Unlike a normal identifier, a raw identifier may be any strict or reserved
3555
keyword except the ones listed above for `RAW_IDENTIFIER`.
3656

37-
[strict]: keywords.md#strict-keywords
57+
[`extern crate`]: items/extern-crates.md
58+
[`no_mangle`]: abi.md#the-no_mangle-attribute
59+
[`path` attribute]: items/modules.md#the-path-attribute
60+
[`XID_Continue`]: http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%3AXID_Continue%3A%5D&abb=on&g=&i=
61+
[`XID_Start`]: http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%3AXID_Start%3A%5D&abb=on&g=&i=
62+
[external blocks]: items/external-blocks.md
63+
[mbe]: macros-by-example.md
64+
[module]: items/modules.md
65+
[path]: paths.md
66+
[proc-macro]: procedural-macros.md
3867
[reserved]: keywords.md#reserved-keywords
39-
[`XID_start`]: http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%3AXID_Start%3A%5D&abb=on&g=&i=
40-
[`XID_continue`]: http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%3AXID_Continue%3A%5D&abb=on&g=&i=
68+
[strict]: keywords.md#strict-keywords
69+
[UAX15]: https://www.unicode.org/reports/tr15/tr15-50.html
70+
[UAX31]: https://www.unicode.org/reports/tr31/tr31-33.html

0 commit comments

Comments
 (0)