|
| 1 | +- Start Date: 2014-11-27 |
| 2 | +- RFC PR: |
| 3 | +- Rust Issue: |
| 4 | + |
| 5 | +# Summary |
| 6 | + |
| 7 | +Move the `std::ascii::Ascii` type and related traits to a new Cargo package on crates.io, |
| 8 | +and instead expose its functionality for `u8`, `[u8]`, `char`, and `str` types. |
| 9 | + |
| 10 | +# Motivation |
| 11 | + |
| 12 | +The `std::ascii::Ascii` type is a `u8` wrapper that enforces |
| 13 | +(unless `unsafe` code is used) |
| 14 | +that the value is in the ASCII range, |
| 15 | +similar to `char` with `u32` in the range of Unicode scalar values, |
| 16 | +and `String` with `Vec<u8>` containing well-formed UTF-8 data. |
| 17 | +`[Ascii]` and `Vec<Ascii>` are naturally strings of text entirely in the ASCII range. |
| 18 | + |
| 19 | +Using the type system like this to enforce data invariants is interesting, |
| 20 | +but in practice `Ascii` is not that useful. |
| 21 | +Data (such as from the network) is rarely guaranteed to be ASCII only, |
| 22 | +nor is it desirable to remove or replace non-ASCII bytes, |
| 23 | +even if ASCII-range-only operations are used. |
| 24 | +(For example, *ASCII case-insensitive matching* is common in HTML and CSS.) |
| 25 | + |
| 26 | +Every single use of the `Ascii` type in the Rust distribution |
| 27 | +is only to use the `to_lowercase` or `to_uppercase` method, |
| 28 | +then immediately convert back to `u8` or `char`. |
| 29 | + |
| 30 | +# Detailed design |
| 31 | + |
| 32 | +The `Ascii` type |
| 33 | +as well as the `AsciiCast`, `OwnedAsciiCast`, `AsciiStr`, and `IntoBytes` traits |
| 34 | +should be copied into a new `ascii` Cargo package on crates.io. |
| 35 | +The `std::ascii` copy should be deprecated and removed at some point before Rust 1.0. |
| 36 | + |
| 37 | +Currently, the `AsciiExt` trait is: |
| 38 | + |
| 39 | +```rust |
| 40 | +pub trait AsciiExt<T> { |
| 41 | + fn to_ascii_upper(&self) -> T; |
| 42 | + fn to_ascii_lower(&self) -> T; |
| 43 | + fn eq_ignore_ascii_case(&self, other: &Self) -> bool; |
| 44 | +} |
| 45 | + |
| 46 | +impl AsciiExt<String> for str { ... } |
| 47 | +impl AsciiExt<Vec<u8>> for [u8] { ... } |
| 48 | +``` |
| 49 | + |
| 50 | +It should gain new methods for the functionality that is being removed with `Ascii`, |
| 51 | +be implemented for `u8` and `char`, |
| 52 | +and (if this is stable enough yet) use an associated type instead of the `T` parameter: |
| 53 | + |
| 54 | +```rust |
| 55 | +pub trait AsciiExt { |
| 56 | + type Owned = Self; |
| 57 | + fn to_ascii_upper(&self) -> Owned; |
| 58 | + fn to_ascii_lower(&self) -> Owned; |
| 59 | + fn eq_ignore_ascii_case(&self, other: &Self) -> bool; |
| 60 | + fn is_ascii(&self) -> bool; |
| 61 | + |
| 62 | + // Maybe? See unresolved questions |
| 63 | + fn is_ascii_lowercase(&self) -> bool; |
| 64 | + fn is_ascii_uppercase(&self) -> bool; |
| 65 | + ... |
| 66 | +} |
| 67 | + |
| 68 | +impl AsciiExt for str { type Owned = String; ... } |
| 69 | +impl AsciiExt for [u8] { type Owned = Vec<u8>; ... } |
| 70 | +impl AsciiExt char { ... } |
| 71 | +impl AsciiExt u8 { ... } |
| 72 | +``` |
| 73 | + |
| 74 | +The `OwnedAsciiExt` trait should stay as it is: |
| 75 | + |
| 76 | +```rust |
| 77 | +pub trait OwnedAsciiExt { |
| 78 | + fn into_ascii_upper(self) -> Self; |
| 79 | + fn into_ascii_lower(self) -> Self; |
| 80 | +} |
| 81 | + |
| 82 | +impl OwnedAsciiExt for String { ... } |
| 83 | +impl OwnedAsciiExt for Vec<u8> { ... } |
| 84 | +``` |
| 85 | + |
| 86 | +The `std::ascii::escape_default` function has little to do with ASCII. |
| 87 | +I *think* it’s relevant to `b'x'` and `b"foo"` byte literals, |
| 88 | +which have types `u8` and `&'static [u8]`. |
| 89 | +I suggest moving it into `std::u8`. |
| 90 | + |
| 91 | + |
| 92 | +I (@SimonSapin) can help with the implementation work. |
| 93 | + |
| 94 | + |
| 95 | +# Drawbacks |
| 96 | + |
| 97 | +Code using `Ascii` (not only for e.g. `to_lowercase`) |
| 98 | +would need to install a Cargo package to get it. |
| 99 | +This is strictly more work than having it in `std`, |
| 100 | +but should still be easy. |
| 101 | + |
| 102 | +# Alternatives |
| 103 | + |
| 104 | +* The `Ascii` type could stay in `std::ascii` |
| 105 | +* Some variations per *Unresolved questions* below. |
| 106 | + |
| 107 | +# Unresolved questions |
| 108 | + |
| 109 | +* What to do with `std::ascii::escape_default`? |
| 110 | +* Rename the `AsciiExt` and `OwnedAsciiExt` traits? |
| 111 | +* Should they be in the prelude? The `Ascii` type and the related traits currently are. |
| 112 | +* Are associated type stable enough yet? |
| 113 | + If not, `AsciiExt` should temporarily keep its type parameter. |
| 114 | +* Which of all the `Ascii::is_*` methods should `AsciiExt` include? Those included should have `ascii` added in their name. |
| 115 | + * *Maybe* `is_lowercase`, `is_uppercase`, `is_alphabetic`, or `is_alphanumeric` could be useful, |
| 116 | + but I’d be fine with dropping them and reconsider if someone asks for them. |
| 117 | + The same result can be achieved |
| 118 | + with `.is_ascii() &&` and the corresponding `UnicodeChar` method, |
| 119 | + which in most cases has an ASCII fast path. |
| 120 | + And in some cases it’s an easy range check like `'a' <= c && c <= 'z'`. |
| 121 | + * `is_digit` and `is_hex` are identical to `Char::is_digit(10)` and `Char::is_digit(16)`. |
| 122 | + * `is_blank`, `is_control`, `is_graph`, `is_print`, and `is_punctuation` are never used |
| 123 | + in the Rust distribution or Servo. |
0 commit comments