Skip to content

re matching looks broken for unicode #6860

Closed
@kevinjwalters

Description

@kevinjwalters

CircuitPython version

Adafruit CircuitPython 7.3.3 on 2022-08-29; Adafruit MagTag with ESP32S2

Code/REPL

import re

text1 = "CircuitPython loves regular expressions"
text2 = "CircuitPython \u2764 regular expressions"

regex1 = re.compile("re")

for text in (text1, text2):
    match = regex1.search(text)
    print(text)
    print("Match:", text[match.start():match.end()])
    print()

Behavior

The start() and end() are borked if there's preceeding unicode probably due to flaws in character vs byte counting for variable length unicode/utf world.

Description

No response

Additional information

Output from a MagTag:

Adafruit CircuitPython 7.3.3 on 2022-08-29; Adafruit MagTag with ESP32S2
>>>
soft reboot

Auto-reload is on. Simply save files over USB to run them or enter REPL to disable.
code.py output:
CircuitPython loves regular expressions
Match: re

CircuitPython ❤ regular expressions
Match: gu


Code done running.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions