|
2 | 2 |
|
3 | 3 | > **<sup>Lexer:<sup>**\
|
4 | 4 | > IDENTIFIER_OR_KEYWORD :\
|
5 |
| -> XID_start XID_continue<sup>\*</sup>\ |
6 |
| -> | `_` XID_continue<sup>+</sup> |
| 5 | +> XID_Start XID_Continue<sup>\*</sup>\ |
| 6 | +> | `_` XID_Continue<sup>+</sup> |
7 | 7 | >
|
8 | 8 | > RAW_IDENTIFIER : `r#` IDENTIFIER_OR_KEYWORD <sub>*Except `crate`, `self`, `super`, `Self`*</sub>
|
9 | 9 | >
|
|
12 | 12 | > IDENTIFIER :\
|
13 | 13 | > NON_KEYWORD_IDENTIFIER | RAW_IDENTIFIER
|
14 | 14 |
|
15 |
| -An identifier is any nonempty Unicode string of the following form: |
| 15 | +<!-- When updating the version, update the UAX links, too. --> |
| 16 | +Identifiers follow the specification in [Unicode Standard Annex #31][UAX31] for Unicode version 13.0, with the additions described below. Some examples of identifiers: |
16 | 17 |
|
17 |
| -Either |
| 18 | +* `foo` |
| 19 | +* `_identifier` |
| 20 | +* `r#true` |
| 21 | +* `Москва` |
| 22 | +* `東京` |
18 | 23 |
|
19 |
| -* The first character has property [`XID_start`]. |
20 |
| -* The remaining characters have property [`XID_continue`]. |
| 24 | +The profile used from UAX #31 is: |
21 | 25 |
|
22 |
| -Or |
| 26 | +* Start := [`XID_Start`], plus the underscore character (U+005F) |
| 27 | +* Continue := [`XID_Continue`] |
| 28 | +* Medial := empty |
23 | 29 |
|
24 |
| -* The first character is `_`. |
25 |
| -* The identifier is more than one character. `_` alone is not an identifier. |
26 |
| -* The remaining characters have property [`XID_continue`]. |
| 30 | +> **Note**: Identifiers starting with an underscore are typically used to indicate an identifier that is intentionally unused, and will silence the unused warning in `rustc`. |
27 | 31 |
|
28 |
| -> **Note**: [`XID_start`] and [`XID_continue`] as character properties cover the |
29 |
| -> character ranges used to form the more familiar C and Java language-family |
30 |
| -> identifiers. |
| 32 | +Identifiers may not be a [strict] or [reserved] keyword without the `r#` prefix described below in [raw identifiers](#raw-identifiers). |
| 33 | + |
| 34 | +Zero width non-joiner (ZWNJ U+200C) and zero width joiner (ZWJ U+200D) characters are not allowed in identifiers. |
| 35 | + |
| 36 | +Identifiers are restricted to the ASCII subset of [`XID_Start`] and [`XID_Continue`] in the following situations: |
| 37 | + |
| 38 | +* [`extern crate`] declarations |
| 39 | +* External crate names referenced in a [path] |
| 40 | +* [Module] names loaded from the filesystem without a [`path` attribute] |
| 41 | +* [`no_mangle`] attributed items |
| 42 | +* Item names in [external blocks] |
| 43 | + |
| 44 | +## Normalization |
| 45 | + |
| 46 | +Identifiers are normalized using Normalization Form C (NFC) as defined in [Unicode Standard Annex #15][UAX15]. Two identifiers are equal if their NFC forms are equal. |
| 47 | + |
| 48 | +[Procedural][proc-macro] and [declarative][mbe] macros receive normalized identifiers in their input. |
| 49 | + |
| 50 | +## Raw identifiers |
31 | 51 |
|
32 | 52 | A raw identifier is like a normal identifier, but prefixed by `r#`. (Note that
|
33 | 53 | the `r#` prefix is not included as part of the actual identifier.)
|
34 | 54 | Unlike a normal identifier, a raw identifier may be any strict or reserved
|
35 | 55 | keyword except the ones listed above for `RAW_IDENTIFIER`.
|
36 | 56 |
|
37 |
| -[strict]: keywords.md#strict-keywords |
| 57 | +[`extern crate`]: items/extern-crates.md |
| 58 | +[`no_mangle`]: abi.md#the-no_mangle-attribute |
| 59 | +[`path` attribute]: items/modules.md#the-path-attribute |
| 60 | +[`XID_Continue`]: http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%3AXID_Continue%3A%5D&abb=on&g=&i= |
| 61 | +[`XID_Start`]: http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%3AXID_Start%3A%5D&abb=on&g=&i= |
| 62 | +[external blocks]: items/external-blocks.md |
| 63 | +[mbe]: macros-by-example.md |
| 64 | +[module]: items/modules.md |
| 65 | +[path]: paths.md |
| 66 | +[proc-macro]: procedural-macros.md |
38 | 67 | [reserved]: keywords.md#reserved-keywords
|
39 |
| -[`XID_start`]: http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%3AXID_Start%3A%5D&abb=on&g=&i= |
40 |
| -[`XID_continue`]: http://unicode.org/cldr/utility/list-unicodeset.jsp?a=%5B%3AXID_Continue%3A%5D&abb=on&g=&i= |
| 68 | +[strict]: keywords.md#strict-keywords |
| 69 | +[UAX15]: https://www.unicode.org/reports/tr15/tr15-50.html |
| 70 | +[UAX31]: https://www.unicode.org/reports/tr31/tr31-33.html |
0 commit comments