Skip to content

Commit 3146008

Browse files
committed
RFC: std::ascii reform
1 parent 63535c1 commit 3146008

File tree

1 file changed

+123
-0
lines changed

1 file changed

+123
-0
lines changed

text/0000-std-ascii-reform.md

Lines changed: 123 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,123 @@
1+
- Start Date: 2014-11-27
2+
- RFC PR:
3+
- Rust Issue:
4+
5+
# Summary
6+
7+
Move the `std::ascii::Ascii` type and related traits to a new Cargo package on crates.io,
8+
and instead expose its functionality for `u8`, `[u8]`, `char`, and `str` types.
9+
10+
# Motivation
11+
12+
The `std::ascii::Ascii` type is a `u8` wrapper that enforces
13+
(unless `unsafe` code is used)
14+
that the value is in the ASCII range,
15+
similar to `char` with `u32` in the range of Unicode scalar values,
16+
and `String` with `Vec<u8>` containing well-formed UTF-8 data.
17+
`[Ascii]` and `Vec<Ascii>` are naturally strings of text entirely in the ASCII range.
18+
19+
Using the type system like this to enforce data invariants is interesting,
20+
but in practice `Ascii` is not that useful.
21+
Data (such as from the network) is rarely guaranteed to be ASCII only,
22+
nor is it desirable to remove or replace non-ASCII bytes,
23+
even if ASCII-range-only operations are used.
24+
(For example, *ASCII case-insensitive matching* is common in HTML and CSS.)
25+
26+
Every single use of the `Ascii` type in the Rust distribution
27+
is only to use the `to_lowercase` or `to_uppercase` method,
28+
then immediately convert back to `u8` or `char`.
29+
30+
# Detailed design
31+
32+
The `Ascii` type
33+
as well as the `AsciiCast`, `OwnedAsciiCast`, `AsciiStr`, and `IntoBytes` traits
34+
should be copied into a new `ascii` Cargo package on crates.io.
35+
The `std::ascii` copy should be deprecated and removed at some point before Rust 1.0.
36+
37+
Currently, the `AsciiExt` trait is:
38+
39+
```rust
40+
pub trait AsciiExt<T> {
41+
fn to_ascii_upper(&self) -> T;
42+
fn to_ascii_lower(&self) -> T;
43+
fn eq_ignore_ascii_case(&self, other: &Self) -> bool;
44+
}
45+
46+
impl AsciiExt<String> for str { ... }
47+
impl AsciiExt<Vec<u8>> for [u8] { ... }
48+
```
49+
50+
It should gain new methods for the functionality that is being removed with `Ascii`,
51+
be implemented for `u8` and `char`,
52+
and (if this is stable enough yet) use an associated type instead of the `T` parameter:
53+
54+
```rust
55+
pub trait AsciiExt {
56+
type Owned = Self;
57+
fn to_ascii_upper(&self) -> Owned;
58+
fn to_ascii_lower(&self) -> Owned;
59+
fn eq_ignore_ascii_case(&self, other: &Self) -> bool;
60+
fn is_ascii(&self) -> bool;
61+
62+
// Maybe? See unresolved questions
63+
fn is_ascii_lowercase(&self) -> bool;
64+
fn is_ascii_uppercase(&self) -> bool;
65+
...
66+
}
67+
68+
impl AsciiExt for str { type Owned = String; ... }
69+
impl AsciiExt for [u8] { type Owned = Vec<u8>; ... }
70+
impl AsciiExt char { ... }
71+
impl AsciiExt u8 { ... }
72+
```
73+
74+
The `OwnedAsciiExt` trait should stay as it is:
75+
76+
```rust
77+
pub trait OwnedAsciiExt {
78+
fn into_ascii_upper(self) -> Self;
79+
fn into_ascii_lower(self) -> Self;
80+
}
81+
82+
impl OwnedAsciiExt for String { ... }
83+
impl OwnedAsciiExt for Vec<u8> { ... }
84+
```
85+
86+
The `std::ascii::escape_default` function has little to do with ASCII.
87+
I *think* it’s relevant to `b'x'` and `b"foo"` byte literals,
88+
which have types `u8` and `&'static [u8]`.
89+
I suggest moving it into `std::u8`.
90+
91+
92+
I (@SimonSapin) can help with the implementation work.
93+
94+
95+
# Drawbacks
96+
97+
Code using `Ascii` (not only for e.g. `to_lowercase`)
98+
would need to install a Cargo package to get it.
99+
This is strictly more work than having it in `std`,
100+
but should still be easy.
101+
102+
# Alternatives
103+
104+
* The `Ascii` type could stay in `std::ascii`
105+
* Some variations per *Unresolved questions* below.
106+
107+
# Unresolved questions
108+
109+
* What to do with `std::ascii::escape_default`?
110+
* Rename the `AsciiExt` and `OwnedAsciiExt` traits?
111+
* Should they be in the prelude? The `Ascii` type and the related traits currently are.
112+
* Are associated type stable enough yet?
113+
If not, `AsciiExt` should temporarily keep its type parameter.
114+
* Which of all the `Ascii::is_*` methods should `AsciiExt` include? Those included should have `ascii` added in their name.
115+
* *Maybe* `is_lowercase`, `is_uppercase`, `is_alphabetic`, or `is_alphanumeric` could be useful,
116+
but I’d be fine with dropping them and reconsider if someone asks for them.
117+
The same result can be achieved
118+
with `.is_ascii() &&` and the corresponding `UnicodeChar` method,
119+
which in most cases has an ASCII fast path.
120+
And in some cases it’s an easy range check like `'a' <= c && c <= 'z'`.
121+
* `is_digit` and `is_hex` are identical to `Char::is_digit(10)` and `Char::is_digit(16)`.
122+
* `is_blank`, `is_control`, `is_graph`, `is_print`, and `is_punctuation` are never used
123+
in the Rust distribution or Servo.

0 commit comments

Comments
 (0)