Skip to content

Commit 211f83d

Browse files
committed
gh-127833: Reword and expand the Notation section
Prepare the docs for using the notation used in the `python.gram` file. If we want to sync the two, the meta-syntax should be the same. Also, remove the distinction between lexical and syntactic rules. With f- and t-strings, the line between the two is blurry.
1 parent 4eacf38 commit 211f83d

File tree

1 file changed

+93
-37
lines changed

1 file changed

+93
-37
lines changed

Doc/reference/introduction.rst

Lines changed: 93 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -94,40 +94,96 @@ The descriptions of lexical analysis and syntax use a modified
9494
`Backus–Naur form (BNF) <https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form>`_ grammar
9595
notation. This uses the following style of definition:
9696

97-
.. productionlist:: notation
98-
name: `lc_letter` (`lc_letter` | "_")*
99-
lc_letter: "a"..."z"
100-
101-
The first line says that a ``name`` is an ``lc_letter`` followed by a sequence
102-
of zero or more ``lc_letter``\ s and underscores. An ``lc_letter`` in turn is
103-
any of the single characters ``'a'`` through ``'z'``. (This rule is actually
104-
adhered to for the names defined in lexical and grammar rules in this document.)
105-
106-
Each rule begins with a name (which is the name defined by the rule) and
107-
``::=``. A vertical bar (``|``) is used to separate alternatives; it is the
108-
least binding operator in this notation. A star (``*``) means zero or more
109-
repetitions of the preceding item; likewise, a plus (``+``) means one or more
110-
repetitions, and a phrase enclosed in square brackets (``[ ]``) means zero or
111-
one occurrences (in other words, the enclosed phrase is optional). The ``*``
112-
and ``+`` operators bind as tightly as possible; parentheses are used for
113-
grouping. Literal strings are enclosed in quotes. White space is only
114-
meaningful to separate tokens. Rules are normally contained on a single line;
115-
rules with many alternatives may be formatted alternatively with each line after
116-
the first beginning with a vertical bar.
117-
118-
.. index:: lexical definitions, ASCII
119-
120-
In lexical definitions (as the example above), two more conventions are used:
121-
Two literal characters separated by three dots mean a choice of any single
122-
character in the given (inclusive) range of ASCII characters. A phrase between
123-
angular brackets (``<...>``) gives an informal description of the symbol
124-
defined; e.g., this could be used to describe the notion of 'control character'
125-
if needed.
126-
127-
Even though the notation used is almost the same, there is a big difference
128-
between the meaning of lexical and syntactic definitions: a lexical definition
129-
operates on the individual characters of the input source, while a syntax
130-
definition operates on the stream of tokens generated by the lexical analysis.
131-
All uses of BNF in the next chapter ("Lexical Analysis") are lexical
132-
definitions; uses in subsequent chapters are syntactic definitions.
133-
97+
.. grammar-snippet::
98+
:group: notation
99+
100+
name: `letter` (`letter` | `digit` | "_")*
101+
letter: "a"..."z" | "A"..."Z"
102+
digit: "0"..."9"
103+
104+
In this example, the first line says that a ``name`` is a ``letter`` followed
105+
by a sequence of zero or more ``letter``\ s, ``digit``\ s, and underscores.
106+
A ``letter`` in turn is any of the single characters ``'a'`` through
107+
``'z'`` and ``A`` through ``Z``; a ``digit`` is a single character from ``0``
108+
to ``9``.
109+
110+
Each rule begins with a name (which identifies the rule that's being defined)
111+
followed by a colon, ``:``.
112+
The definition to the right of the colon uses the following syntax elements:
113+
114+
* ``name``: A name refers to another rule.
115+
Where possible, it is a link to the rule's definition.
116+
117+
* ``TOKEN``: An uppercase name refers to a :term:`token`.
118+
For the purposes of grammar definitions, tokens are the same as rules.
119+
120+
* ``"text"``, ``'text'``: Text in single or double quotes must match literally
121+
(without the quotes). The type of quote is chosen according to the meaning
122+
of ``text``:
123+
124+
* ``'if'``: A name in single quotes denotes a :ref:`keyword <keywords>`.
125+
* ``"case"``: A name in double quotes denotes a
126+
:ref:`soft-keyword <soft-keywords>`.
127+
* ``'@'``: A non-letter symbol in single quotes denotes an
128+
:py:data:`~token.OP` token, that is, a :ref:`delimiter <delimiters>` or
129+
:ref:`operator <operators>`.
130+
131+
* ``"a"..."z"``: Two literal characters separated by three dots mean a choice
132+
of any single character in the given (inclusive) range of ASCII characters.
133+
* ``<...>``: A phrase between angular brackets gives an informal description
134+
of the matched symbol (for example, ``<any ASCII character except "\">``),
135+
or an abbreviation that is defined in nearby text (for example, ``<Lu>``).
136+
* ``e1 e2``: Items separated only by whitespace denote a sequence.
137+
Here, ``e1`` must be followed by ``e2``.
138+
* ``e1 | e2``: A vertical bar is used to separate alternatives.
139+
It is the least tightly binding operator in this notation.
140+
* ``e*``: A star means zero or more repetitions of the preceding item.
141+
* ``e+``: Likewise, a plus means one or more repetitions.
142+
* ``[e]``: A phrase enclosed in square brackets means zero or
143+
one occurrences. In other words, the enclosed phrase is optional.
144+
* ``e?``: A question mark has exactly the same meaning as square brackets:
145+
the preceding item is optional.
146+
* ``(e)``: Parentheses are used for grouping.
147+
148+
The unary operators (``*``, ``+``, ``?``) bind as tightly as possible.
149+
150+
White space is only meaningful to separate tokens.
151+
152+
Rules are normally contained on a single line, but rules that are too long
153+
may be wrapped:
154+
155+
.. grammar-snippet::
156+
:group: notation
157+
158+
literal: `stringliteral` | `bytesliteral`
159+
| `integer` | `floatnumber` | `imagnumber`
160+
161+
Alternatively, rules may be formatted with the first line ending at the colon,
162+
and each alternative beginning with a vertical bar on a new line.
163+
For example:
164+
165+
166+
.. grammar-snippet::
167+
:group: notation-alt
168+
169+
literal:
170+
| `stringliteral`
171+
| `bytesliteral`
172+
| `integer`
173+
| `floatnumber`
174+
| `imagnumber`
175+
176+
This does *not* mean that there is an empty first alternative.
177+
178+
.. index:: lexical definitions
179+
180+
.. note::
181+
182+
There is some difference between *lexical* and *syntactic* analysis:
183+
the :term:`lexical analyzer` operates on the individual characters of the
184+
input source, while the *parser* (syntactic analyzer) operates on the stream
185+
of :term:`tokens <token>` generated by the lexical analysis.
186+
However, in some cases the exact boundary between the two phases is a
187+
CPython implementation detail.
188+
189+
This documentation uses the same BNF grammar for both.

0 commit comments

Comments
 (0)