Skip to content

Commit c49d925

Browse files
authored
Remove productionlist hard-coding in writers (#13326)
The ``productionlist`` directive operates in a line-based context, creating an ``addnodes.productionlist`` container of ``addnodes.production`` nodes, with one per production in the directive. However, the full state of the abstract document tree is not included in the produced nodes, with each builder/translator implementing a different way of appending the fixed separator ``::=`` and justifying the displayed text. This should not happen in the writer, and hard-coding such details hampers flexibility when documenting different abstract grammars. We move the specific form of the ``.. productionlist::`` directive to the logic in the directive body and have the writers apply minimal custom logic. LaTeX changes written by Jean-François B.
1 parent bb68e72 commit c49d925

File tree

10 files changed

+234
-167
lines changed

10 files changed

+234
-167
lines changed

CHANGES.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -108,6 +108,9 @@ Features added
108108
Patch by Jakob Lykke Andersen and Adam Turner.
109109
* #11280: Add ability to skip a particular section using the ``no-search`` class.
110110
Patch by Will Lachance.
111+
* #13326: Remove hardcoding from handling :class:`~sphinx.addnodes.productionlist`
112+
nodes in all writers, to improve flexibility.
113+
Patch by Adam Turner.
111114

112115
Bugs fixed
113116
----------

doc/usage/restructuredtext/directives.rst

Lines changed: 43 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -1642,49 +1642,51 @@ Grammar production displays
16421642
---------------------------
16431643

16441644
Special markup is available for displaying the productions of a formal grammar.
1645-
The markup is simple and does not attempt to model all aspects of BNF (or any
1646-
derived forms), but provides enough to allow context-free grammars to be
1647-
displayed in a way that causes uses of a symbol to be rendered as hyperlinks to
1648-
the definition of the symbol. There is this directive:
1649-
1650-
.. rst:directive:: .. productionlist:: [productionGroup]
1651-
1652-
This directive is used to enclose a group of productions. Each production
1653-
is given on a single line and consists of a name, separated by a colon from
1654-
the following definition. If the definition spans multiple lines, each
1655-
continuation line must begin with a colon placed at the same column as in
1656-
the first line.
1645+
The markup is simple and does not attempt to model all aspects of BNF_
1646+
(or any derived forms), but provides enough to allow context-free grammars
1647+
to be displayed in a way that causes uses of a symbol to be rendered
1648+
as hyperlinks to the definition of the symbol.
1649+
There is this directive:
1650+
1651+
.. _BNF: https://en.wikipedia.org/wiki/Backus%E2%80%93Naur_form
1652+
1653+
.. rst:directive:: .. productionlist:: [production_group]
1654+
1655+
This directive is used to enclose a group of productions.
1656+
Each production is given on a single line and consists of a name,
1657+
separated by a colon from the following definition.
1658+
If the definition spans multiple lines, each continuation line
1659+
must begin with a colon placed at the same column as in the first line.
16571660
Blank lines are not allowed within ``productionlist`` directive arguments.
16581661

1659-
The definition can contain token names which are marked as interpreted text
1660-
(e.g., "``sum ::= `integer` "+" `integer```") -- this generates
1661-
cross-references to the productions of these tokens. Outside of the
1662-
production list, you can reference to token productions using
1663-
:rst:role:`token`.
1664-
1665-
The *productionGroup* argument to :rst:dir:`productionlist` serves to
1666-
distinguish different sets of production lists that belong to different
1667-
grammars. Multiple production lists with the same *productionGroup* thus
1668-
define rules in the same scope.
1669-
1670-
Inside of the production list, tokens implicitly refer to productions
1671-
from the current group. You can refer to the production of another
1672-
grammar by prefixing the token with its group name and a colon, e.g,
1673-
"``otherGroup:sum``". If the group of the token should not be shown in
1674-
the production, it can be prefixed by a tilde, e.g.,
1675-
"``~otherGroup:sum``". To refer to a production from an unnamed
1676-
grammar, the token should be prefixed by a colon, e.g., "``:sum``".
1677-
1678-
Outside of the production list,
1679-
if you have given a *productionGroup* argument you must prefix the
1680-
token name in the cross-reference with the group name and a colon,
1681-
e.g., "``myGroup:sum``" instead of just "``sum``".
1682-
If the group should not be shown in the title of the link either
1683-
an explicit title can be given (e.g., "``myTitle <myGroup:sum>``"),
1684-
or the target can be prefixed with a tilde (e.g., "``~myGroup:sum``").
1685-
1686-
Note that no further reStructuredText parsing is done in the production,
1687-
so that you don't have to escape ``*`` or ``|`` characters.
1662+
The optional *production_group* directive argument serves to distinguish
1663+
different sets of production lists that belong to different grammars.
1664+
Multiple production lists with the same *production_group*
1665+
thus define rules in the same scope.
1666+
This can also be used to split the description of a long or complex grammar
1667+
accross multiple ``productionlist`` directives with the same *production_group*.
1668+
1669+
The definition can contain token names which are marked as interpreted text,
1670+
(e.g. "``sum ::= `integer` "+" `integer```"),
1671+
to generate cross-references to the productions of these tokens.
1672+
Such cross-references implicitly refer to productions from the current group.
1673+
To reference a production from another grammar, the token name
1674+
must be prefixed with the group name and a colon, e.g. "``other-group:sum``".
1675+
If the group of the token should not be shown in the production,
1676+
it can be prefixed by a tilde, e.g., "``~other-group:sum``".
1677+
To refer to a production from an unnamed grammar,
1678+
the token should be prefixed by a colon, e.g., "``:sum``".
1679+
No further reStructuredText parsing is done in the production,
1680+
so that special characters (``*``, ``|``, etc) do not need to be escaped.
1681+
1682+
Token productions can be cross-referenced outwith the production list
1683+
by using the :rst:role:`token` role.
1684+
If you have used a *production_group* argument,
1685+
the token name must be prefixed with the group name and a colon,
1686+
e.g., "``my_group:sum``" instead of just "``sum``".
1687+
Standard :ref:`cross-referencing modifiers <xref-modifiers>`
1688+
may be used with the ``:token:`` role,
1689+
such as custom link text and suppressing the group name with a tilde (``~``).
16881690

16891691
The following is an example taken from the Python Reference Manual::
16901692

sphinx/domains/std/__init__.py

Lines changed: 100 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22

33
from __future__ import annotations
44

5+
import operator
56
import re
67
from copy import copy
78
from typing import TYPE_CHECKING, cast
@@ -22,7 +23,7 @@
2223
from sphinx.util.parsing import nested_parse_to_nodes
2324

2425
if TYPE_CHECKING:
25-
from collections.abc import Callable, Iterable, Iterator, Set
26+
from collections.abc import Callable, Iterable, Iterator, MutableSequence, Set
2627
from typing import Any, ClassVar, Final
2728

2829
from docutils.nodes import Element, Node, system_message
@@ -553,7 +554,7 @@ def run(self) -> list[Node]:
553554
return [*messages, node]
554555

555556

556-
def token_xrefs(text: str, production_group: str = '') -> list[Node]:
557+
def token_xrefs(text: str, production_group: str = '') -> Iterable[Node]:
557558
if len(production_group) != 0:
558559
production_group += ':'
559560
retnodes: list[Node] = []
@@ -596,43 +597,107 @@ class ProductionList(SphinxDirective):
596597
final_argument_whitespace = True
597598
option_spec: ClassVar[OptionSpec] = {}
598599

600+
# The backslash handling is from ObjectDescription.get_signatures
601+
_nl_escape_re: Final = re.compile(r'\\\n')
602+
603+
# Get 'name' from triples of rawsource, name, definition (tokens)
604+
_name_getter = operator.itemgetter(1)
605+
599606
def run(self) -> list[Node]:
600-
domain = self.env.domains.standard_domain
601-
node: Element = addnodes.productionlist()
607+
name_getter = self._name_getter
608+
lines = self._nl_escape_re.sub('', self.arguments[0]).splitlines()
609+
610+
# Extract production_group argument.
611+
# Must be before extracting production definition triples.
612+
production_group = self.production_group(lines=lines, options=self.options)
613+
production_lines = list(self.production_definitions(lines))
614+
max_name_len = max(map(len, map(name_getter, production_lines)))
615+
node_location = self.get_location()
616+
617+
productions = [
618+
self.make_production(
619+
rawsource=rule,
620+
name=name,
621+
tokens=tokens,
622+
production_group=production_group,
623+
max_len=max_name_len,
624+
location=node_location,
625+
)
626+
for rule, name, tokens in production_lines
627+
]
628+
node = addnodes.productionlist('', *productions)
602629
self.set_source_info(node)
603-
# The backslash handling is from ObjectDescription.get_signatures
604-
nl_escape_re = re.compile(r'\\\n')
605-
lines = nl_escape_re.sub('', self.arguments[0]).split('\n')
606-
607-
production_group = ''
608-
first_rule_seen = False
609-
for rule in lines:
610-
if not first_rule_seen and ':' not in rule:
611-
production_group = rule.strip()
612-
continue
613-
first_rule_seen = True
614-
try:
615-
name, tokens = rule.split(':', 1)
616-
except ValueError:
617-
break
618-
subnode = addnodes.production(rule)
619-
name = name.strip()
620-
subnode['tokenname'] = name
621-
if subnode['tokenname']:
622-
prefix = 'grammar-token-%s' % production_group
623-
node_id = make_id(self.env, self.state.document, prefix, name)
624-
subnode['ids'].append(node_id)
625-
self.state.document.note_implicit_target(subnode, subnode)
626-
627-
if len(production_group) != 0:
628-
obj_name = f'{production_group}:{name}'
629-
else:
630-
obj_name = name
631-
domain.note_object('token', obj_name, node_id, location=node)
632-
subnode.extend(token_xrefs(tokens, production_group=production_group))
633-
node.append(subnode)
634630
return [node]
635631

632+
@staticmethod
633+
def production_group(
634+
*,
635+
lines: MutableSequence[str],
636+
options: dict[str, Any], # NoQA: ARG004
637+
) -> str:
638+
# get production_group
639+
if not lines or ':' in lines[0]:
640+
return ''
641+
production_group = lines[0].strip()
642+
lines[:] = lines[1:]
643+
return production_group
644+
645+
@staticmethod
646+
def production_definitions(
647+
lines: Iterable[str], /
648+
) -> Iterator[tuple[str, str, str]]:
649+
"""Yield triples of rawsource, name, definition (tokens)."""
650+
for line in lines:
651+
if ':' not in line:
652+
break
653+
name, _, tokens = line.partition(':')
654+
yield line, name.strip(), tokens.strip()
655+
656+
def make_production(
657+
self,
658+
*,
659+
rawsource: str,
660+
name: str,
661+
tokens: str,
662+
production_group: str,
663+
max_len: int,
664+
location: str,
665+
) -> addnodes.production:
666+
production_node = addnodes.production(rawsource, tokenname=name)
667+
if name:
668+
production_node += self.make_name_target(
669+
name=name, production_group=production_group, location=location
670+
)
671+
production_node.append(self.separator_node(name=name, max_len=max_len))
672+
production_node += token_xrefs(text=tokens, production_group=production_group)
673+
production_node.append(nodes.Text('\n'))
674+
return production_node
675+
676+
def make_name_target(
677+
self,
678+
*,
679+
name: str,
680+
production_group: str,
681+
location: str,
682+
) -> addnodes.literal_strong:
683+
"""Make a link target for the given production."""
684+
name_node = addnodes.literal_strong(name, name)
685+
prefix = f'grammar-token-{production_group}'
686+
node_id = make_id(self.env, self.state.document, prefix, name)
687+
name_node['ids'].append(node_id)
688+
self.state.document.note_implicit_target(name_node, name_node)
689+
obj_name = f'{production_group}:{name}' if production_group else name
690+
std = self.env.domains.standard_domain
691+
std.note_object('token', obj_name, node_id, location=location)
692+
return name_node
693+
694+
@staticmethod
695+
def separator_node(*, name: str, max_len: int) -> nodes.Text:
696+
"""Return seperator between 'name' and 'tokens'."""
697+
if name:
698+
return nodes.Text(' ::= '.rjust(max_len - len(name) + 5))
699+
return nodes.Text(' ' * (max_len + 5))
700+
636701

637702
class TokenXRefRole(XRefRole):
638703
def process_link(

sphinx/texinputs/sphinxlatexobjects.sty

Lines changed: 29 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
%% MODULE RELEASE DATA AND OBJECT DESCRIPTIONS
22
%
33
% change this info string if making any custom modification
4-
\ProvidesPackage{sphinxlatexobjects}[2023/07/23 documentation environments]
4+
\ProvidesPackage{sphinxlatexobjects}[2025/02/11 documentation environments]
55

66
% Provides support for this output mark-up from Sphinx latex writer:
77
%
@@ -279,18 +279,37 @@
279279
\newcommand{\pysigstopmultiline}{\sphinxsigismultilinefalse\itemsep\sphinxsignaturesep}%
280280

281281
% Production lists
282+
% This simply outputs the lines as is, in monospace font. Refers #13326.
283+
% (the left padding for multi-line alignment is from the nodes themselves,
284+
% and latex is configured below to obey such horizontal whitespace).
285+
%
286+
% - The legacy code used longtable and hardcoded the separator as ::=
287+
% via dedicated macros defined by the environment itself.
288+
% - Here the separator is part of the node. Any extra LaTeX mark-up would
289+
% have to originate from the writer itself to decorate it.
290+
% - The legacy code used strangely \parindent and \indent. Possibly
291+
% (unchecked) due to an earlier tabular usage, but a longtable does not
292+
% work in paragraph mode, so \parindent was without effect and
293+
% \indent only caused some extra blank line above display.
294+
% - The table had some whitespace on its left, which we imitate here via
295+
% \parindent usage (which works in our context...).
282296
%
283297
\newenvironment{productionlist}{%
284-
% \def\sphinxoptional##1{{\Large[}##1{\Large]}}
285-
\def\production##1##2{\\\sphinxcode{\sphinxupquote{##1}}&::=&\sphinxcode{\sphinxupquote{##2}}}%
286-
\def\productioncont##1{\\& &\sphinxcode{\sphinxupquote{##1}}}%
287-
\parindent=2em
288-
\indent
289-
\setlength{\LTpre}{0pt}%
290-
\setlength{\LTpost}{0pt}%
291-
\begin{longtable}[l]{lcl}
298+
\bigskip % imitate close enough legacy vertical whitespace, which was
299+
% visibly excessive
300+
\ttfamily % needed for space tokens to have same width as letters
301+
\parindent1em % width of a "quad", font-dependent, usually circa width of 2
302+
% letters
303+
\obeylines % line in = line out
304+
\parskip\z@skip % prevent the parskip vertical whitespace between lines,
305+
% which are technically to LaTeX now each its own paragraph
306+
\@vobeyspaces % obey whitespace
307+
% now a technicality to, only locally to this environment, prevent the
308+
% suppression of indentation of first line, if it comes right after
309+
% \section. Cf package indentfirst from which the code is borrowed.
310+
\let\@afterindentfalse\@afterindenttrue\@afterindenttrue
292311
}{%
293-
\end{longtable}
312+
\par % does not hurt...
294313
}
295314

296315
% Definition lists; requested by AMK for HOWTO documents. Probably useful

sphinx/writers/html5.py

Lines changed: 2 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -5,7 +5,7 @@
55
import posixpath
66
import re
77
import urllib.parse
8-
from typing import TYPE_CHECKING, cast
8+
from typing import TYPE_CHECKING
99

1010
from docutils import nodes
1111
from docutils.writers.html5_polyglot import HTMLTranslator as BaseTranslator
@@ -17,8 +17,6 @@
1717
from sphinx.util.images import get_image_size
1818

1919
if TYPE_CHECKING:
20-
from collections.abc import Iterable
21-
2220
from docutils.nodes import Element, Node, Text
2321

2422
from sphinx.builders import Builder
@@ -695,23 +693,9 @@ def depart_literal(self, node: Element) -> None:
695693

696694
def visit_productionlist(self, node: Element) -> None:
697695
self.body.append(self.starttag(node, 'pre'))
698-
productionlist = cast('Iterable[addnodes.production]', node)
699-
maxlen = max(len(production['tokenname']) for production in productionlist)
700-
lastname = None
701-
for production in productionlist:
702-
if production['tokenname']:
703-
lastname = production['tokenname'].ljust(maxlen)
704-
self.body.append(self.starttag(production, 'strong', ''))
705-
self.body.append(lastname + '</strong> ::= ')
706-
elif lastname is not None:
707-
self.body.append(' ' * (maxlen + 5))
708-
production.walkabout(self)
709-
self.body.append('\n')
710-
self.body.append('</pre>\n')
711-
raise nodes.SkipNode
712696

713697
def depart_productionlist(self, node: Element) -> None:
714-
pass
698+
self.body.append('</pre>\n')
715699

716700
def visit_production(self, node: Element) -> None:
717701
pass

0 commit comments

Comments
 (0)