@@ -94,40 +94,96 @@ The descriptions of lexical analysis and syntax use a modified
94
94
`Backus–Naur form (BNF) <https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form >`_ grammar
95
95
notation. This uses the following style of definition:
96
96
97
- .. productionlist :: notation
98
- name: `lc_letter ` (`lc_letter ` | "_")*
99
- lc_letter: "a"..."z"
100
-
101
- The first line says that a ``name `` is an ``lc_letter `` followed by a sequence
102
- of zero or more ``lc_letter ``\ s and underscores. An ``lc_letter `` in turn is
103
- any of the single characters ``'a' `` through ``'z' ``. (This rule is actually
104
- adhered to for the names defined in lexical and grammar rules in this document.)
105
-
106
- Each rule begins with a name (which is the name defined by the rule) and
107
- ``::= ``. A vertical bar (``| ``) is used to separate alternatives; it is the
108
- least binding operator in this notation. A star (``* ``) means zero or more
109
- repetitions of the preceding item; likewise, a plus (``+ ``) means one or more
110
- repetitions, and a phrase enclosed in square brackets (``[ ] ``) means zero or
111
- one occurrences (in other words, the enclosed phrase is optional). The ``* ``
112
- and ``+ `` operators bind as tightly as possible; parentheses are used for
113
- grouping. Literal strings are enclosed in quotes. White space is only
114
- meaningful to separate tokens. Rules are normally contained on a single line;
115
- rules with many alternatives may be formatted alternatively with each line after
116
- the first beginning with a vertical bar.
117
-
118
- .. index :: lexical definitions, ASCII
119
-
120
- In lexical definitions (as the example above), two more conventions are used:
121
- Two literal characters separated by three dots mean a choice of any single
122
- character in the given (inclusive) range of ASCII characters. A phrase between
123
- angular brackets (``<...> ``) gives an informal description of the symbol
124
- defined; e.g., this could be used to describe the notion of 'control character'
125
- if needed.
126
-
127
- Even though the notation used is almost the same, there is a big difference
128
- between the meaning of lexical and syntactic definitions: a lexical definition
129
- operates on the individual characters of the input source, while a syntax
130
- definition operates on the stream of tokens generated by the lexical analysis.
131
- All uses of BNF in the next chapter ("Lexical Analysis") are lexical
132
- definitions; uses in subsequent chapters are syntactic definitions.
133
-
97
+ .. grammar-snippet ::
98
+ :group: notation
99
+
100
+ name: `letter ` (`letter ` | `digit ` | "_")*
101
+ letter: "a"..."z" | "A"..."Z"
102
+ digit: "0"..."9"
103
+
104
+ In this example, the first line says that a ``name `` is a ``letter `` followed
105
+ by a sequence of zero or more ``letter ``\ s, ``digit ``\ s, and underscores.
106
+ A ``letter `` in turn is any of the single characters ``'a' `` through
107
+ ``'z' `` and ``A `` through ``Z ``; a ``digit `` is a single character from ``0 ``
108
+ to ``9 ``.
109
+
110
+ Each rule begins with a name (which identifies the rule that's being defined)
111
+ followed by a colon, ``: ``.
112
+ The definition to the right of the colon uses the following syntax elements:
113
+
114
+ * ``name ``: A name refers to another rule.
115
+ Where possible, it is a link to the rule's definition.
116
+
117
+ * ``TOKEN ``: An uppercase name refers to a :term: `token `.
118
+ For the purposes of grammar definitions, tokens are the same as rules.
119
+
120
+ * ``"text" ``, ``'text' ``: Text in single or double quotes must match literally
121
+ (without the quotes). The type of quote is chosen according to the meaning
122
+ of ``text ``:
123
+
124
+ * ``'if' ``: A name in single quotes denotes a :ref: `keyword <keywords >`.
125
+ * ``"case" ``: A name in double quotes denotes a
126
+ :ref: `soft-keyword <soft-keywords >`.
127
+ * ``'@' ``: A non-letter symbol in single quotes denotes an
128
+ :py:data: `~token.OP ` token, that is, a :ref: `delimiter <delimiters >` or
129
+ :ref: `operator <operators >`.
130
+
131
+ * ``"a"..."z" ``: Two literal characters separated by three dots mean a choice
132
+ of any single character in the given (inclusive) range of ASCII characters.
133
+ * ``<...> ``: A phrase between angular brackets gives an informal description
134
+ of the matched symbol (for example, ``<any ASCII character except "\"> ``),
135
+ or an abbreviation that is defined in nearby text (for example, ``<Lu> ``).
136
+ * ``e1 e2 ``: Items separated only by whitespace denote a sequence.
137
+ Here, ``e1 `` must be followed by ``e2 ``.
138
+ * ``e1 | e2 ``: A vertical bar is used to separate alternatives.
139
+ It is the least tightly binding operator in this notation.
140
+ * ``e* ``: A star means zero or more repetitions of the preceding item.
141
+ * ``e+ ``: Likewise, a plus means one or more repetitions.
142
+ * ``[e] ``: A phrase enclosed in square brackets means zero or
143
+ one occurrences. In other words, the enclosed phrase is optional.
144
+ * ``e? ``: A question mark has exactly the same meaning as square brackets:
145
+ the preceding item is optional.
146
+ * ``(e) ``: Parentheses are used for grouping.
147
+
148
+ The unary operators (``* ``, ``+ ``, ``? ``) bind as tightly as possible.
149
+
150
+ White space is only meaningful to separate tokens.
151
+
152
+ Rules are normally contained on a single line, but rules that are too long
153
+ may be wrapped:
154
+
155
+ .. grammar-snippet ::
156
+ :group: notation
157
+
158
+ literal: `stringliteral ` | `bytesliteral `
159
+ | `integer` | `floatnumber` | `imagnumber`
160
+
161
+ Alternatively, rules may be formatted with the first line ending at the colon,
162
+ and each alternative beginning with a vertical bar on a new line.
163
+ For example:
164
+
165
+
166
+ .. grammar-snippet ::
167
+ :group: notation-alt
168
+
169
+ literal:
170
+ | `stringliteral`
171
+ | `bytesliteral`
172
+ | `integer`
173
+ | `floatnumber`
174
+ | `imagnumber`
175
+
176
+ This does *not * mean that there is an empty first alternative.
177
+
178
+ .. index :: lexical definitions
179
+
180
+ .. note ::
181
+
182
+ There is some difference between *lexical * and *syntactic * analysis:
183
+ the :term: `lexical analyzer ` operates on the individual characters of the
184
+ input source, while the *parser * (syntactic analyzer) operates on the stream
185
+ of :term: `tokens <token> ` generated by the lexical analysis.
186
+ However, in some cases the exact boundary between the two phases is a
187
+ CPython implementation detail.
188
+
189
+ This documentation uses the same BNF grammar for both.
0 commit comments