Tokens are primitive productions in the grammar defined by regular (non-recursive) languages. "Simple" tokens are given in string table production form, and occur in the rest of the grammar as double-quoted strings. Other tokens have exact rules given.
A literal is an expression consisting of a single token, rather than a sequence of tokens, that immediately and directly denotes the value it evaluates to, rather than referring to it by name or some other evaluation rule. A literal is a form of constant expression, so is evaluated (primarily) at compile time.
Example | # sets |
Characters | Escapes | |
---|---|---|---|---|
Character | 'H' |
N/A |
All Unicode | Quote & ASCII & Unicode |
String | "hello" |
N/A |
All Unicode | Quote & ASCII & Unicode |
Raw | r#"hello"# |
0... |
All Unicode | N/A |
Byte | b'H' |
N/A |
All ASCII | Quote & Byte |
Byte string | b"hello" |
N/A |
All ASCII | Quote & Byte |
Raw byte string | br#"hello"# |
0... |
All ASCII | N/A |
Name | |
---|---|
\x41 |
7-bit character code (exactly 2 digits, up to 0x7F) |
\n |
Newline |
\r |
Carriage return |
\t |
Tab |
\\ |
Backslash |
\0 |
Null |
Name | |
---|---|
\x7F |
8-bit character code (exactly 2 digits) |
\n |
Newline |
\r |
Carriage return |
\t |
Tab |
\\ |
Backslash |
\0 |
Null |
Name | |
---|---|
\u{7FFF} |
24-bit Unicode character code (up to 6 digits) |
Name | |
---|---|
\' |
Single quote |
\" |
Double quote |
Number literals* |
Example | Exponentiation | Suffixes |
---|---|---|---|
Decimal integer | 98_222 |
N/A |
Integer suffixes |
Hex integer | 0xff |
N/A |
Integer suffixes |
Octal integer | 0o77 |
N/A |
Integer suffixes |
Binary integer | 0b1111_0000 |
N/A |
Integer suffixes |
Floating-point | 123.0E+77 |
Optional |
Floating-point suffixes |
*
All number literals allow _
as a visual separator: 1_234.0E+18f64
Integer | Floating-point |
---|---|
u8 , i8 , u16 , i16 , u32 , i32 , u64 , i64 , isize , usize |
f32 , f64 |
Lexer
CHAR_LITERAL :
'
( ~['
\
\n \r \t] | QUOTE_ESCAPE | ASCII_ESCAPE | UNICODE_ESCAPE )'
QUOTE_ESCAPE :
\'
|\"
ASCII_ESCAPE :
\x
OCT_DIGIT HEX_DIGIT
|\n
|\r
|\t
|\\
|\0
UNICODE_ESCAPE :
\u{
( HEX_DIGIT_
* )1..6}
A character literal is a single Unicode character enclosed within two U+0027
(single-quote) characters, with the exception of U+0027
itself, which must be escaped by a preceding U+005C
character (\
).
Lexer
STRING_LITERAL :
"
(
~["
\
IsolatedCR]
| QUOTE_ESCAPE
| ASCII_ESCAPE
| UNICODE_ESCAPE
| STRING_CONTINUE
)*"
STRING_CONTINUE :
\
followed by \n
A string literal is a sequence of any Unicode characters enclosed within two U+0022
(double-quote) characters, with the exception of U+0022
itself, which must be escaped by a preceding U+005C
character (\
).
Line-break characters are allowed in string literals. Normally they represent themselves (i.e. no translation), but as a special exception, when an unescaped U+005C
character (\
) occurs immediately before the newline (U+000A
), the U+005C
character, the newline, and all whitespace at the beginning of the next line are ignored. Thus a
and b
are equal:
# #![allow(unused_variables)] #fn main() { let a = "foobar"; let b = "foo\ bar"; assert_eq!(a,b); #}
Some additional escapes are available in either character or non-raw string literals. An escape starts with a U+005C
(\
) and continues with one of the following forms:
U+0078
(x
) and is followed by exactly two hex digits. It denotes the Unicode code point equal to the provided hex value.U+0075
(u
) and is followed by up to six hex digits surrounded by braces U+007B
({
) and U+007D
(}
). It denotes the Unicode code point equal to the provided hex value.U+006E
(n
), U+0072
(r
), or U+0074
(t
), denoting the Unicode values U+000A
(LF), U+000D
(CR) or U+0009
(HT) respectively.U+0030
(0
) and denotes the Unicode value U+0000
(NUL).U+005C
(\
) which must be escaped in order to denote itself.Lexer
RAW_STRING_LITERAL :
r
RAW_STRING_CONTENTRAW_STRING_CONTENT :
"
( ~ IsolatedCR )* (non-greedy)"
|#
RAW_STRING_CONTENT#
Raw string literals do not process any escapes. They start with the character U+0072
(r
), followed by zero or more of the character U+0023
(#
) and a U+0022
(double-quote) character. The raw string body can contain any sequence of Unicode characters and is terminated only by another U+0022
(double-quote) character, followed by the same number of U+0023
(#
) characters that preceded the opening U+0022
(double-quote) character.
All Unicode characters contained in the raw string body represent themselves, the characters U+0022
(double-quote) (except when followed by at least as many U+0023
(#
) characters as were used to start the raw string literal) or U+005C
(\
) do not have any special meaning.
Examples for string literals:
# #![allow(unused_variables)] #fn main() { "foo"; r"foo"; // foo "\"foo\""; r#""foo""#; // "foo" "foo #\"# bar"; r##"foo #"# bar"##; // foo #"# bar "\x52"; "R"; r"R"; // R "\\x52"; r"\x52"; // \x52 #}
Lexer
BYTE_LITERAL :
b'
( ASCII_FOR_CHAR | BYTE_ESCAPE )'
ASCII_FOR_CHAR :
any ASCII (i.e. 0x00 to 0x7F), except'
,/
, \n, \r or \tBYTE_ESCAPE :
\x
HEX_DIGIT HEX_DIGIT
|\n
|\r
|\t
|\\
|\0
A byte literal is a single ASCII character (in the U+0000
to U+007F
range) or a single escape preceded by the characters U+0062
(b
) and U+0027
(single-quote), and followed by the character U+0027
. If the character U+0027
is present within the literal, it must be escaped by a preceding U+005C
(\
) character. It is equivalent to a u8
unsigned 8-bit integer number literal.
Lexer
BYTE_STRING_LITERAL :
b"
( ASCII_FOR_STRING | BYTE_ESCAPE | STRING_CONTINUE )*"
ASCII_FOR_STRING :
any ASCII (i.e 0x00 to 0x7F), except"
,/
and IsolatedCR
A non-raw byte string literal is a sequence of ASCII characters and escapes, preceded by the characters U+0062
(b
) and U+0022
(double-quote), and followed by the character U+0022
. If the character U+0022
is present within the literal, it must be escaped by a preceding U+005C
(\
) character. Alternatively, a byte string literal can be a raw byte string literal, defined below. A byte string literal of length n
is equivalent to a &'static [u8; n]
borrowed fixed-sized array of unsigned 8-bit integers.
Some additional escapes are available in either byte or non-raw byte string literals. An escape starts with a U+005C
(\
) and continues with one of the following forms:
U+0078
(x
) and is followed by exactly two hex digits. It denotes the byte equal to the provided hex value.U+006E
(n
), U+0072
(r
), or U+0074
(t
), denoting the bytes values 0x0A
(ASCII LF), 0x0D
(ASCII CR) or 0x09
(ASCII HT) respectively.U+0030
(0
) and denotes the byte value 0x00
(ASCII NUL).U+005C
(\
) which must be escaped in order to denote its ASCII encoding 0x5C
.Lexer
RAW_BYTE_STRING_LITERAL :
br
RAW_BYTE_STRING_CONTENTRAW_BYTE_STRING_CONTENT :
"
ASCII* (non-greedy)"
|#
RAW_STRING_CONTENT#
ASCII :
any ASCII (i.e. 0x00 to 0x7F)
Raw byte string literals do not process any escapes. They start with the character U+0062
(b
), followed by U+0072
(r
), followed by zero or more of the character U+0023
(#
), and a U+0022
(double-quote) character. The raw string body can contain any sequence of ASCII characters and is terminated only by another U+0022
(double-quote) character, followed by the same number of U+0023
(#
) characters that preceded the opening U+0022
(double-quote) character. A raw byte string literal can not contain any non-ASCII byte.
All characters contained in the raw string body represent their ASCII encoding, the characters U+0022
(double-quote) (except when followed by at least as many U+0023
(#
) characters as were used to start the raw string literal) or U+005C
(\
) do not have any special meaning.
Examples for byte string literals:
# #![allow(unused_variables)] #fn main() { b"foo"; br"foo"; // foo b"\"foo\""; br#""foo""#; // "foo" b"foo #\"# bar"; br##"foo #"# bar"##; // foo #"# bar b"\x52"; b"R"; br"R"; // R b"\\x52"; br"\x52"; // \x52 #}
A number literal is either an integer literal or a floating-point literal. The grammar for recognizing the two kinds of literals is mixed.
Lexer
INTEGER_LITERAL :
( DEC_LITERAL | BIN_LITERAL | OCT_LITERAL | HEX_LITERAL ) INTEGER_SUFFIX?DEC_LITERAL :
DEC_DIGIT (DEC_DIGIT|_
)*BIN_LITERAL :
0b
(BIN_DIGIT|_
)* BIN_DIGIT (BIN_DIGIT|_
)*OCT_LITERAL :
0o
(OCT_DIGIT|_
)* OCT_DIGIT (OCT_DIGIT|_
)*HEX_LITERAL :
0x
(HEX_DIGIT|_
)* HEX_DIGIT (HEX_DIGIT|_
)*BIN_DIGIT : [
0
-1
]OCT_DIGIT : [
0
-7
]DEC_DIGIT : [
0
-9
]HEX_DIGIT : [
0
-9
a
-f
A
-F
]INTEGER_SUFFIX :
u8
|u16
|u32
|u64
|usize
|i8
|u16
|i32
|i64
|usize
An integer literal has one of four forms:
U+0030
U+0078
(0x
) and continues as any mixture (with at least one digit) of hex digits and underscores.U+0030
U+006F
(0o
) and continues as any mixture (with at least one digit) of octal digits and underscores.U+0030
U+0062
(0b
) and continues as any mixture (with at least one digit) of binary digits and underscores.Like any literal, an integer literal may be followed (immediately, without any spaces) by an integer suffix, which forcibly sets the type of the literal. The integer suffix must be the name of one of the integral types: u8
, i8
, u16
, i16
, u32
, i32
, u64
, i64
, isize
, or usize
.
The type of an unsuffixed integer literal is determined by type inference:
If an integer type can be uniquely determined from the surrounding program context, the unsuffixed integer literal has that type.
If the program context under-constrains the type, it defaults to the signed 32-bit integer i32
.
If the program context over-constrains the type, it is considered a static type error.
Examples of integer literals of various forms:
# #![allow(unused_variables)] #fn main() { 123; // type i32 123i32; // type i32 123u32; // type u32 123_u32; // type u32 let a: u64 = 123; // type u64 0xff; // type i32 0xff_u8; // type u8 0o70; // type i32 0o70_i16; // type i16 0b1111_1111_1001_0000; // type i32 0b1111_1111_1001_0000i32; // type i64 0b________1; // type i32 0usize; // type usize #}
Examples of invalid integer literals:
// invalid suffixes 0invalidSuffix; // uses numbers of the wrong base 123AFB43; 0b0102; 0o0581; // integers too big for their type (they overflow) 128_i8; 256_u8; // bin, hex and octal literals must have at least one digit 0b_; 0b____;
Note that the Rust syntax considers -1i8
as an application of the unary minus operator to an integer literal 1i8
, rather than a single integer literal.
Lexer
FLOAT_LITERAL :
DEC_LITERAL.
(not immediately followed by.
,_
or an identifier)
| DEC_LITERAL FLOAT_EXPONENT
| DEC_LITERAL.
DEC_LITERAL FLOAT_EXPONENT?
| DEC_LITERAL (.
DEC_LITERAL)? FLOAT_EXPONENT? FLOAT_SUFFIXFLOAT_EXPONENT :
(e
|E
) (+
|-
)? (DEC_DIGIT|_
)* DEC_DIGIT (DEC_DIGIT|_
)*FLOAT_SUFFIX :
f32
|f64
A floating-point literal has one of two forms:
U+002E
(.
). This is optionally followed by another decimal literal, with an optional exponent.Like integer literals, a floating-point literal may be followed by a suffix, so long as the pre-suffix part does not end with U+002E
(.
). The suffix forcibly sets the type of the literal. There are two valid floating-point suffixes, f32
and f64
(the 32-bit and 64-bit floating point types), which explicitly determine the type of the literal.
The type of an unsuffixed floating-point literal is determined by type inference:
If a floating-point type can be uniquely determined from the surrounding program context, the unsuffixed floating-point literal has that type.
If the program context under-constrains the type, it defaults to f64
.
If the program context over-constrains the type, it is considered a static type error.
Examples of floating-point literals of various forms:
# #![allow(unused_variables)] #fn main() { 123.0f64; // type f64 0.1f64; // type f64 0.1f32; // type f32 12E+99_f64; // type f64 let x: f64 = 2.; // type f64 #}
This last example is different because it is not possible to use the suffix syntax with a floating point literal ending in a period. 2.f64
would attempt to call a method named f64
on 2
.
The representation semantics of floating-point numbers are described in "Machine Types".
Lexer
BOOLEAN_LITERAL :
true
|false
The two values of the boolean type are written true
and false
.
Symbols are a general class of printable tokens that play structural roles in a variety of grammar productions. They are a set of remaining miscellaneous printable tokens that do not otherwise appear as unary operators, binary operators, or keywords. They are catalogued in the Symbols section of the Grammar document.
© 2010 The Rust Project Developers
Licensed under the Apache License, Version 2.0 or the MIT license, at your option.
https://doc.rust-lang.org/reference/tokens.html