From 6a6e8d4d684a24c5eef8820d450dd081aa480e9d Mon Sep 17 00:00:00 2001 From: Serhiy Storchaka Date: Thu, 10 Apr 2025 16:41:41 +0300 Subject: [PATCH] gh-106482: Clarify documentation of character set in RE (GH-106517) (cherry picked from commit 1557da622c89985d14b781bef91e9aaa6e1f88c4) Co-authored-by: Serhiy Storchaka Co-authored-by: Martin Panter Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com> --- Doc/library/re.rst | 15 ++++++++++++--- 1 file changed, 12 insertions(+), 3 deletions(-) diff --git a/Doc/library/re.rst b/Doc/library/re.rst index 9db6f1da3be4db..a4ec538ee813a6 100644 --- a/Doc/library/re.rst +++ b/Doc/library/re.rst @@ -250,14 +250,23 @@ The special characters are: ``[a\-z]``) or if it's placed as the first or last character (e.g. ``[-a]`` or ``[a-]``), it will match a literal ``'-'``. - * Special characters lose their special meaning inside sets. For example, + * Special characters except backslash lose their special meaning inside sets. + For example, ``[(+*)]`` will match any of the literal characters ``'('``, ``'+'``, ``'*'``, or ``')'``. .. index:: single: \ (backslash); in regular expressions - * Character classes such as ``\w`` or ``\S`` (defined below) are also accepted - inside a set, although the characters they match depend on the flags_ used. + * Backslash either escapes characters which have special meaning in a set + such as ``'-'``, ``']'``, ``'^'`` and ``'\\'`` itself or signals + a special sequence which represents a single character such as + ``\xa0`` or ``\n`` or a character class such as ``\w`` or ``\S`` + (defined below). + Note that ``\b`` represents a single "backspace" character, + not a word boundary as outside a set, and numeric escapes + such as ``\1`` are always octal escapes, not group references. + Special sequences which do not match a single character such as ``\A`` + and ``\Z`` are not allowed. .. index:: single: ^ (caret); in regular expressions