Add support for PEP 822 dedented strings (d-strings)#896
Conversation
Implements the `d` string prefix that automatically removes common indentation from triple-quoted strings at compile time. Supports all prefix combinations and orderings: d, dr/rd, db/bd, df/fd, dt/td, and three-prefix variants like dfr, rdb, etc. Closes #892 https://claude.ai/code/session_01Lu9bsuRrPN8R1yVJt5T7oj
Add tests for all orderings of d/r, d/b, d/b/r, d/f, d/f/r, d/t, and d/t/r prefix combinations. Add dt-string tests to specific.coco alongside existing t-string tests (requires py310+). https://claude.ai/code/session_01Lu9bsuRrPN8R1yVJt5T7oj
coconut/compiler/grammar.py
Outdated
| d_string_tokens = combine(any_len_perm(raw_r, required=(dedent_d,)) + string_item) | ||
| db_string_tokens = combine(any_len_perm(raw_r, required=(dedent_d, bit_b)) + string_item) | ||
| df_string_tokens = combine(any_len_perm(raw_r, required=(dedent_d, format_f)) + string_item) | ||
| dt_string_tokens = combine(any_len_perm(raw_r, required=(dedent_d, template_t)) + string_item) |
There was a problem hiding this comment.
Call these _ref instead of _tokens.
coconut/compiler/grammar.py
Outdated
| fixed_len_string_tokens = OneOrMore(nonbf_string) | OneOrMore(b_string | db_string) | ||
| f_string_atom = Forward() | ||
| f_string_atom_ref = ZeroOrMore(nonbf_string) + f_string + ZeroOrMore(nonb_string) | ||
| f_string_atom_ref = ZeroOrMore(nonbf_string) + (f_string | df_string | dt_string) + ZeroOrMore(nonb_string) |
There was a problem hiding this comment.
Since t_string isn't there, dt_string shouldn't be either.
coconut/compiler/compiler.py
Outdated
| text, strchar = self.get_ref("str", string[1:-1]) | ||
|
|
||
| # must be triple-quoted | ||
| if len(strchar) == 1: |
There was a problem hiding this comment.
Do len(strchar) != 3 to be more precise, here and in the other locations.
coconut/compiler/compiler.py
Outdated
|
|
||
| return ("r" if raw else "") + self.wrap_str(text, strchar[0], multiline=True) | ||
|
|
||
| def db_string_handle(self, original, loc, tokens): |
There was a problem hiding this comment.
d_string_handle and db_string_handle are similar enough that they should just be one function.
coconut/compiler/compiler.py
Outdated
| string = string[1:] | ||
| if string.startswith("r"): | ||
| raw = True | ||
| string = string[1:] |
There was a problem hiding this comment.
This doesn't look like it handles all orderings.
coconut/compiler/compiler.py
Outdated
|
|
||
| def df_string_handle(self, original, loc, tokens): | ||
| """Process d-string with f prefix.""" | ||
| return self._d_f_string_handle(original, loc, tokens, is_t=False) |
There was a problem hiding this comment.
Instead of wrapping these functions like this, just use one d_f_string_handle function that accepts an is_t parameter that is passed to cls.method.
coconut/compiler/compiler.py
Outdated
| elif indent.startswith(line_no_ph) and line_no_ph.strip() == "": | ||
| result_lines.append("") | ||
| else: | ||
| raise CoconutDeferredSyntaxError("inconsistent indentation in d-string", loc) |
There was a problem hiding this comment.
Find a way to use _d_string_dedent here to avoid repeating yourself.
- Rename d-string grammar rules from _tokens to _ref - Remove dt_string from f_string_atom_ref (matching t_string behavior) - Use len(strchar) != 3 instead of len(strchar) == 1 for precision - Merge d_string_handle and db_string_handle into one function with is_b param - Use _strip_raw_and_b helper to handle all prefix orderings robustly - Use cls.method with is_t param for dt_string instead of wrapper functions - Reuse _d_string_dedent with placeholder param to avoid code duplication in d_f_string_handle https://claude.ai/code/session_01Lu9bsuRrPN8R1yVJt5T7oj
| """Process PEP 822 d-strings (dedented strings), with optional b prefix.""" | ||
| string, = tokens | ||
|
|
||
| raw, has_b, string = self._strip_raw_and_b(string) |
There was a problem hiding this comment.
If we can just use has_b, why do we need is_b, or for that matter to have a separate db_string grammar element at all? What do we do for normal b strings?
| raise CoconutDeferredSyntaxError("d-string prefix requires triple-quoted string", loc) | ||
|
|
||
| # apply dedentation to the f-string parts using placeholder for expressions | ||
| placeholder = "\x00" |
There was a problem hiding this comment.
Are we sure null bytes can't appear in the string normally? At least we need to assert that they don't.
There was a problem hiding this comment.
Also, if there's a placeholder we know can't appear in the string and that naturally doesn't count as whitespace, that would make the implementation easier because we wouldn't need the placeholder logic in _d_string_dedent.
coconut/compiler/compiler.py
Outdated
| # blank lines are ignored except the last line (closing quotes line) | ||
| indent = None | ||
| for i, line in enumerate(lines): | ||
| is_last = (i == len(lines) - 1) |
There was a problem hiding this comment.
Unnecessary parens (and below).
coconut/compiler/compiler.py
Outdated
| """Strip r and b prefixes from a string token, returning (raw, has_b, string).""" | ||
| raw = False | ||
| has_b = False | ||
| while string and string[0] in "rRbB": |
There was a problem hiding this comment.
Make this just while string: and then check for bB in an elif, and break in the else.
- Remove is_b parameter; rely on has_b from _strip_raw_and_b since bit_b is not suppressed in grammar - Use strwrapper as placeholder instead of null byte (can't appear in string contents since str_proc uses it as delimiter), with assertion - Refactor _strip_raw_and_b to use while/elif/break pattern - Remove unnecessary parentheses in prefix construction https://claude.ai/code/session_01Lu9bsuRrPN8R1yVJt5T7oj
strwrapper can appear in string contents; null bytes cannot appear in Python source code, making them a safe placeholder. Keep the assertion as a safety check. https://claude.ai/code/session_01Lu9bsuRrPN8R1yVJt5T7oj
- Remove parens around `i == len(lines) - 1` assignments - Revert prefix construction back to inline ternary style https://claude.ai/code/session_01Lu9bsuRrPN8R1yVJt5T7oj
|
Resolves #892. |
Summary
This PR implements support for PEP 822 dedented strings (d-strings) in Coconut, adding four new string prefix variants:
d,db,df, anddt.Key Changes
Grammar updates (
grammar.py):dedent_dliteral and four new string forward declarations:d_string,db_string,df_string,dt_stringany_len_permto handle prefix combinations with optionalr(raw)Compiler implementation (
compiler.py):_d_string_dedent()static method implementing PEP 822 dedentation logic:d_string_handle(): Basic dedented strings with optional raw prefixdb_string_handle(): Dedented byte strings with optional raw prefixdf_string_handle()/dt_string_handle(): Dedented f-strings/t-strings_d_f_string_handle(): Shared logic for f/t variants that handles dedentation with expression placeholdersbind()methodTests (
primary_2.coco):dr)db)df)Implementation Details
The dedentation algorithm treats f-string expressions as non-whitespace placeholders during indentation calculation, ensuring expressions don't affect indent detection. All d-string variants require triple-quoted strings and must have content starting with a newline after the opening quotes, as per PEP 822.
https://claude.ai/code/session_01Lu9bsuRrPN8R1yVJt5T7oj