Skip to content

[mypyc] Implement str.lower() and str.upper() primitive #19375

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

Jahongir-Qurbonov
Copy link
Contributor

Add primitive for str.lower and str.upper. Issue: mypyc/mypyc#1088

@Jahongir-Qurbonov Jahongir-Qurbonov changed the title Add str.lower() and str.upper() primitives [mypyc] Implement str.lower() and str.upper() primitive Jul 4, 2025
@sterliakov sterliakov added the topic-mypyc mypyc bugs label Jul 4, 2025
Copy link
Collaborator

@JukkaL JukkaL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Left some comments -- the semantics are pretty tricky, and we need to be careful to catch all special cases. I'd suggest running a test (doesn't need to be included in this PR necessarily) comparing upper/lower of all length-1 strings with Python semantics.

#ifdef Py_UNICODE_TOLOWER
return Py_UNICODE_TOLOWER(ch);
#else
// fallback: no-op for non-ASCII if macro is unavailable
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason to expect that Py_UNICODE_TOLOWER is not available? We shouldn't break functionality if a dependency is missing -- it's better to fail compilation. It seems to me that the best option is to remove the #ifdef and assume ``Py_UNICODE_TOLOWER` is defined.

if (ch < 128) {
return ascii_upper_table[ch];
}
#ifdef Py_UNICODE_TOUPPER
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above.

assert "abc".lower() == "abc"
assert "AbC123".lower() == "abc123"
assert "áÉÍ".lower() == "áéí"
assert "😴🚀".lower() == "😴🚀"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also test special cases (verify that this agrees with normal Python semantics):

  • 'SS'.lower() == 'ss'
  • 'Σ'.lower()
  • 'İ'.lower() (changes length!)

assert "ABC".upper() == "ABC"
assert "AbC123".upper() == "ABC123"
assert "áéí".upper() == "ÁÉÍ"
assert "😴🚀".upper() == "😴🚀"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also test special case (verify that this agrees with normal Python semantics):

  • 'ß'.upper() == 'SS'
  • 'ffi'.upper() (length increases!)

@Jahongir-Qurbonov Jahongir-Qurbonov marked this pull request as draft July 4, 2025 18:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-mypyc mypyc bugs
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants