-
-
Notifications
You must be signed in to change notification settings - Fork 474
Description
Hi! I need help with serving JSON with UTF8 string.
Currently as far as I understood json response serialized by crow::json::dump_internal:
Line 1736 in f96189f
inline void dump_internal(const wvalue& v, std::string& out) const |
which in turn calls crow::json::escape for string:
Line 1809 in f96189f
case type::String: dump_string(v.s, out); break; |
Line 1729 in f96189f
inline void dump_string(const std::string& str, std::string& out) const |
Line 41 in f96189f
inline void escape(const std::string& str, std::string& ret) |
Commit cdd6139 removed 0 <= c &&
and changed char
to unsigned char
, so the logic stays the same - escape invisible chars 0 < ch < 32.
Commit df41cbe by @lcsdavid changes unsigned char
to auto
(e.g. char).
I'm not sure what exactly problem that was supposed to solve, but now not only invisible chars are escaped now (0 < char < 0x20 https://www.asciitable.com/), but the Unicode sequences may be escaped too from now on (if auto
-> char
-> signed char
is true on this architecture).
All bytes of multibyte utf8 codepoints contain the most significant bit on (e.g. 0x80), so signed char with the leading bit on is always negative for a two's complement (almost any architecture), and ch < 0x20 would be now true for any Unicode symbol.
https://en.wikipedia.org/wiki/UTF-8#Encoding
The original project took solution to store UTF8 sequences in std::string:
ipkn/crow#189
But with the mentioned commit that solution can't be applied.
Middleware just adds UTF8 headers to text and I'm not sure if the middleware is the right place to cancel mentioned escapes.
#202
I have almost no grasp at the codebase, but for me, it seems like it would be nice to have a customization point for defining escape function somehow or to introduce a new JSON value type raw_string that would not be escaped later. Maybe default
I could make a PR with high-level guidance about what may be acceptable in this situation.
For now, I just revert to unsigned char and that's totally fine for me, could someone kindly explain why that's wrong? And elaborating on how it must be done would be even greater! Thanks!