-
-
Notifications
You must be signed in to change notification settings - Fork 3k
[mypyc] Implement str.lower()
and str.upper()
primitive
#19375
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[mypyc] Implement str.lower()
and str.upper()
primitive
#19375
Conversation
str.lower()
and str.upper()
primitive
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Left some comments -- the semantics are pretty tricky, and we need to be careful to catch all special cases. I'd suggest running a test (doesn't need to be included in this PR necessarily) comparing upper/lower of all length-1 strings with Python semantics.
#ifdef Py_UNICODE_TOLOWER | ||
return Py_UNICODE_TOLOWER(ch); | ||
#else | ||
// fallback: no-op for non-ASCII if macro is unavailable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason to expect that Py_UNICODE_TOLOWER
is not available? We shouldn't break functionality if a dependency is missing -- it's better to fail compilation. It seems to me that the best option is to remove the #ifdef
and assume ``Py_UNICODE_TOLOWER` is defined.
if (ch < 128) { | ||
return ascii_upper_table[ch]; | ||
} | ||
#ifdef Py_UNICODE_TOUPPER |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to above.
assert "abc".lower() == "abc" | ||
assert "AbC123".lower() == "abc123" | ||
assert "áÉÍ".lower() == "áéí" | ||
assert "😴🚀".lower() == "😴🚀" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also test special cases (verify that this agrees with normal Python semantics):
'SS'.lower() == 'ss'
'Σ'.lower()
'İ'.lower()
(changes length!)
assert "ABC".upper() == "ABC" | ||
assert "AbC123".upper() == "ABC123" | ||
assert "áéí".upper() == "ÁÉÍ" | ||
assert "😴🚀".upper() == "😴🚀" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also test special case (verify that this agrees with normal Python semantics):
'ß'.upper() == 'SS'
'ffi'.upper()
(length increases!)
Add primitive for
str.lower
andstr.upper
. Issue: mypyc/mypyc#1088