-
Notifications
You must be signed in to change notification settings - Fork 122
Description
Co-authored with @jswrenn.
Overview
Add a TryFromBytes
trait, which supports byte-to-type conversions for non-FromBytes
types by performing runtime validation. Add a custom derive which generates this validation code automatically.
Many thanks to @kupiakos and @djkoloski for providing invaluable feedback and input on this design.
Progress
- Add
TryFromBytes
trait definition - Implement
TryFromBytes
for existingFromBytes
types - Add
try_from_ref
method; impl forbool
- Implement derive for structs
- Implement for slices
- Implement for arrays
- Allow deriving on
repr(packed)
structs - Allow deriving on unions
- Allow deriving on field-less enums with primitive reprs (
u8
,i16
, etc) - Allow deriving on field-less enums with
repr(C)
by treating the discriminant type as[u8; size_of::<Self>()]
-
Ptr
type should reason aboutUnsafeCell
overlap #873 - Implement
TryFromBytes
forfn()
andextern "C" fn()
types - Implement
TryFromBytes
forUnsafeCell<T>
- Make
TryFromBytes
a super-trait ofFromZeros
- Remove
#[doc(hidden)]
from all items which are intended to be public - Add to
TryFromBytes
docs to explain that you can't always round tripT -> [u8] -> T
(notably for pointer types), which could be confusing given that, forTryFromBytes
, the failure would show up at runtime - Rename methods consistent with Revising the (`Try`)`FromBytes` Conversion Methods in 0.8 #1095
-
TryFromBtyes
doc comment currently incorrectly says: "zerocopy does not permit implementingTryFromBytes
for any union type" - Consider this comment
- Non-breaking/blocking
- Allow deriving on data-full enums
- Non-breaking/non-blocking
- Add
try_from_mut
andtry_read_from
methods - Implement for unsized
UnsafeCell
- Consider that we may not need to require
T: Sized
(described in #251) if we use the design in #905 - Implement TryFromBytes for unsized UnsafeCell #1619
- Consider that we may not need to require
- Remove
Self: NoCell
bound fromtry_read_from
- Support deriving on unions without
Immutable
bound -
is_bit_valid
should promise not to mutate its argument's referent - Support custom validators for
TryFromBytes
#1330
- Add
Motivation
Many use cases involve types whose layout is well-defined, but which cannot implement FromBytes
because there exist bit patterns which are invalid (either they are unsound in terms of language semantics or they are unsafe in the sense of violating a library invariant).
Consider, for example, parsing an RPC message format. It would be desirable for performance reasons to be able to read a message into local memory, validate its structure, and if validation succeeds, treat that memory as containing a parsed message rather than needing to copy the message in order to transform it into a native Rust representation.
Here's a simple, hypothetical example of an RPC to request log messages from a process:
/// The arguments to the `RequestLogs` RPC (auto-generated by the RPC compiler).
#[repr(C)]
struct RequestLogsArgs {
max_logs: u64,
since: LogTime,
level: LogLevel,
}
/// Log time, measured as time on the process's monotonic clock.
#[repr(C)]
struct LogTime {
secs: u64,
// Invariant: In the range [0, 10^9)
nsecs: u32,
}
/// Level of log messages requested from `RequestLogs`.
#[repr(u8)]
enum LogLevel {
Trace,
Debug,
Info,
Warn,
Error,
}
None of these types can be FromBytes
. For LogLevel
, only the u8
values 0 through 4 correspond to enum variants, and constructing a LogLevel
from any other u8
would be unsound. For LogTime
, any sequence of the appropriate number of bytes would constitute a valid instance of LogTime
from Rust's perspective - it would not cause unsoundness - but some such sequences would violate the invariant that the nsecs
field is in the range [0, 10^9)
.
While these types can't be FromBytes
, we'd still like to be able to conditionally reinterpret a sequence of bytes as a RequestLogsArgs
- it's just that we need to perform runtime validation first. Ideally, we'd be able to write code like:
/// The arguments to the `RequestLogs` RPC (auto-generated by the RPC compiler).
#[derive(TryFromBytes)]
#[repr(C)]
struct RequestLogsArgs {
max_stats: u64,
since: LogTime,
level: LogLevel,
}
/// Log time, measured as time on the process's monotonic clock.
#[derive(TryFromBytes)]
#[TryFromBytes(validator = "is_valid")]
#[repr(C)]
struct LogTime {
secs: u64,
// Invariant: In the range [0, 10^9)
nsecs: u32,
}
impl LogTime {
fn is_valid(&self) -> bool {
self.nsecs < 1_000_000_000
}
}
/// Level of log messages requested from `RequestLogs`.
#[derive(TryFromBytes)]
#[repr(u8)]
enum LogLevel {
Trace,
Debug,
Info,
Warn,
Error,
}
The TryFromBytes
trait - the subject of this design - provides the ability to fallibly convert a byte sequence to a type, performing validation at runtime. At a minimum, the validation code simply ensures soundness - for example, in the case of LogLevel
, validating that byte values are in the range [0, 4]. The custom derive also supports user-defined validation like the LogTime::is_valid
method (note the validator
annotation on LogTime
), which can be used to enforce safety invariants that go above and beyond soundness.
Given these derives of TryFromBytes
, an implementation of this RPC could be as simple as:
fn serve_request_logs_rpc<F: FnMut(&RequestLogsArgs)>(server: &mut RpcServer, f: F) -> Result<()> {
loop {
let bytes = [0u8; mem::size_of::<RequestLogsArgs>()];
server.read_request(&mut bytes[..])?;
let args = RequestLogsArgs::try_from_bytes(&bytes[..]).ok_or(ParseError)?;
f(args);
}
}
The design proposed in this issue implements this API.
Design
TODO
This design builds on the following features:
- Support
KnownLayout
trait and custom DSTs #29 - Support field projection in any
#[repr(transparent)]
wrapper type #196
/// A value which might or might not constitute a valid instance of `T`.
// Builds on the custom MaybeUninit type described in #29
pub struct MaybeValid<T: AsMaybeUninit + ?Sized>(MaybeUninit<T>);
// Allows us to use the `project!` macro for field projection (proposed in #196)
unsafe impl<T, F> Projectable<F, AlignedByteArray<F>> for AlignedByteArray<T> {
type Inner = T;
}
impl<T> MaybeValid<T> {
/// Converts this `MaybeValid<T>` to a `T`.
///
/// # Safety
///
/// `self` must contain a valid `T`.
pub const unsafe fn assume_valid(self) -> T { ... }
/// Converts this `&MaybeValid<T>` to a `&T`.
///
/// # Safety
///
/// `self` must contain a valid `T`.
pub const unsafe fn assume_valid_ref(&self) -> &T { ... }
/// Converts this `&mut MaybeValid<T>` to a `&mut T`.
///
/// # Safety
///
/// `self` must contain a valid `T`.
pub unsafe fn assume_valid_mut(&mut self) -> &mut T { ... }
}
/// # Safety
///
/// `is_bit_valid` is correct. If not, can cause UB.
pub unsafe trait TryFromBytes {
fn is_bit_valid(bytes: &MaybeValid<Self>) -> bool;
fn try_from_ref(bytes: &[u8]) -> Option<&Self> {
let maybe_valid = Ref::<_, MaybeValid<T>>::new(bytes)?.into_ref();
if Self::is_bit_valid(maybe_valid) {
// SAFETY: `is_bit_valid` promises that it only returns true if
// its argument contains a valid `T`. This is exactly the safety
// precondition of `MaybeValid::assume_valid_ref`.
Some(unsafe { maybe_valid.assume_valid_ref() })
} else {
None
}
}
fn try_from_mut(bytes: &mut [u8]) -> Option<&mut Self>
where
Self: AsBytes + Sized,
{
let maybe_valid = Ref::<_, MaybeValid<T>>::new(bytes)?.into_mut();
if Self::is_bit_valid(maybe_valid) {
// SAFETY: `is_bit_valid` promises that it only returns true if
// its argument contains a valid `T`. This is exactly the safety
// precondition of `MaybeValid::assume_valid_ref`.
Some(unsafe { maybe_valid.assume_valid_ref() })
} else {
None
}
}
fn try_read_from(bytes: &[u8]) -> Option<Self>
where
Self: Sized
{
let maybe_valid = <MaybeValid<T> as FromBytes>::read_from(bytes)?;
if Self::is_bit_valid(&maybe_valid) {
// SAFETY: `is_bit_valid` promises that it only returns true if
// its argument contains a valid `T`. This is exactly the safety
// precondition of `MaybeValid::assume_valid`.
Some(unsafe { maybe_valid.assume_valid() })
} else {
None
}
}
}
Here's an example usage:
/// A type without any safety invariants.
#[derive(TryFromBytes)]
#[repr(C)]
struct MySimpleType {
b: bool,
}
// Code emitted by `derive(TryFromBytes)`
unsafe impl TryFromBytes for MySimpleType {
fn is_bit_valid(bytes: &MaybeValid<Self>) -> bool {
// `project!` is described in #196
let b: &MaybeValid<bool> = project!(&bytes.b);
TryFromBytes::is_bit_valid(b)
}
}
/// A type with invariants encoded using `validate`.
#[derive(TryFromBytes)]
#[TryFromBytes(validator = "validate")]
#[repr(C)]
struct MyComplexType {
b: bool,
}
// Code emitted by `derive(TryFromBytes)`
unsafe impl TryFromBytes for MyComplexType {
fn is_bit_valid(bytes: &AlignedByteArray<Self>) -> bool {
// `project!` is described in #196
let b: &MaybeValid<bool> = project!(&bytes.b);
if !TryFromBytes::is_bit_valid(b) { return false; }
// If there's no interior mutability, then we know this is sound because of preceding
// validation. TODO: What to do about interior mutability?
let slf: &MyComplexType = ...;
MyComplexType::validate(slf)
}
}
impl MyComplexType {
fn validate(slf: &MyComplexType) -> bool { ... }
}
Unions
See for a discussion of how to support unions in TryFromBytes
: #696
Relationship with other traits
There are obvious relationships between TryFromBytes
and the existing FromZeroes
and FromBytes
traits:
- If a type is
FromZeroes
, then it should probably beTryFromBytes
(at a minimum, we must know something about the type's layout and bit validity to determine that it is genuinelyFromZeroes
)- This implies that we should change
FromZeroes
to beFromZeroes: TryFromBytes
- This implies that we should change
- If a type is
FromBytes
, then it is triviallyTryFromBytes
(whereis_bit_valid
unconditionally returnstrue
)- This implies that we should provide a blanket impl
impl<T: FromBytes> TryFromBytes for T
- This implies that we should provide a blanket impl
Unfortunately, neither of these are possible today.
FromZeroes: TryFromBytes
The reason this bound doesn't work has to do with unsized types. As described in the previous section, working with unsized types is difficult. Luckily for FromZeroes
, it doesn't have to do anything with the types it's implemented for - it's just a marker trait. It can happily represent a claim about the bit validity of a type even if that type isn't constructible in practice (over time, FromZeroes
will become more useful as more unsized types become constructible). By contrast, TryFromBytes
is only useful if we can emit validation code (namely, is_bit_valid
). For that reason, we require that TryFromBytes: AsMaybeUninit
since that bound is required in order to support the MaybeValid
type required by is_bit_valid
.
This means that we have two options if we want FromZeroes: TryFromBytes
:
- We can keep
TryFromBytes: AsMaybeUninit
. As a result, some types which areFromZeroes
today can no longer beFromZeroes
, and some blanket impls ofFromZeroes
would require more complex bounds (e.g., today we writeimpl<T: FromZeroes> FromZeroes for Wrapping<T>
; under this system, we'd need to writeimpl<T: FromZeroes> FromZeroes for Wrapping<T> where <T as AsMaybeUninit>::MaybeUninit: Sized
, or alternatively we'd need to write one impl forT
and a different one for[T]
). - We could move the
AsMaybeUninit
bound out of definition ofTryFromBytes
and intois_bit_valid
(and callers). As a result, we can keep existing impls ofFromZeroes
, but nowT: TryFromBytes
is essentially useless - to do anything useful, you need to specifyT: TryFromBytes + AsMaybeUninit
.
Neither option seems preferable to just omitting FromZeroes: TryFromBytes
. Callers who require both can simply write T: FromZeroes + TryFromBytes
.
(Note that the same points apply if we consider FromBytes: TryFromBytes
)
impl<T: FromBytes> TryFromBytes for T
This conflicts with other blanket impls which we need for completeness:
impl<T: TryFromBytes> TryFromBytes for [T]
impl<const N: usize, T: TryFromBytes> TryFromBytes for [T; N]
As a result, we have to leave TryFromBytes
and FromBytes
as orthogonal. We may want to make it so that derive(FromBytes)
automatically emits an impl of TryFromBytes
, although in the general case that may require custom DST support.
Open questions
- Is there any way to recover the blanket impl of
TryFromBytes
forT: FromBytes
? UnlikeFromZeroes: TryFromBytes
, where you may need to perform runtime validation, if you know thatT: FromBytes
, then in principle you know thatis_bit_valid
can unconditionally returntrue
without inspecting its argument, and so in principle it shouldn't matter whether you can construct aMaybeValid<Self>
. Is there some way that we could allowFromBytes
types to specify<Self as AsMaybeUninit>::MaybeUninit = ()
or similar in order to bypass the "only sized types or slices can implementAsMaybeUninit
" problem?- One approach is to wait until a
KnownLayout
trait lands. There's a good chance that, under that design, we'd end up withFromZeroes: KnownLayout
. IfKnownLayout: AsMaybeUninit
(or just absorbs the current definition ofAsMaybeUninit
into itself), it'd solve this problem since all zerocopy traits would imply support forMaybeValid
.
- One approach is to wait until a
- In the first version of this feature, could we relax the
Self: Sized
bounds ontry_from_ref
andtry_from_mut
(without needing full custom-DST support)? - Should
derive(FromBytes)
emit an impl ofTryFromBytes
? What about custom DSTs? - What should the behavior for unions be? Should it validate that at least one variant is valid, or that all variants are valid? (This hinges somewhat on the outcome of rust-lang/unsafe-code-guidelines#438.)
- What bounds should we place on
T
when implementingTryFromBytes
forUnalign<T>
(#320)?
Future directions
- In this design, we ban interior mutability entirely. For references, this is unavoidable - e.g., if we were to allow types containing
UnsafeCell
intry_from_ref
, then the user could obtain an&UnsafeCell
and a&[u8]
view of the same memory, which is unsound (it's unsound to even exist under Stacked Borrows, and unsound to expose to safe code in all cases). For values (i.e.,try_read_from
), we'd like to be able to support this - as long as we have some way of performing validation, it should be fine to return anUnsafeCell
by value even if its bytes were copied from a&[u8]
. Actually supporting this in practice is complicated for a number of reasons, but perhaps a future extension could support it. Reasons it's complicated:is_bit_valid
operates on aNonNull<Self>
, so interior mutability isn't inherently a problem. However, it needs to be able to call a user's custom validator, which instead operates on a&Self
, which is a problem.- Even if we could solve the previous problem somehow, we'd need to have
is_bit_valid
require that it's argument not be either experiencing interior mutation or, under Stacked Borrows, contain anyUnsafeCell
s at all. When theNonNull<Self>
is synthesized from a&[u8]
, this isn't a problem, but if in the future we want to support type-to-type conditional transmutation, it might be a problem. If, in the future, merely containingUnsafeCell
s is fine, then we could potentially design a wrapper type which "disables" interior mutation and supports field projection. This might allow us to solve this problem.
- Extend TryFromBytes to support validation context #590
Prior art
The bytemuck crate defines a CheckedBitPattern
trait which serves a similar role to the proposed TryFromBytes
.
Unlike TryFromBytes
, CheckedBitPattern
introduces a separate associated Bits
type which is a type with the same layout as Self
except that all bit patterns are valid. This serves the same role as MaybeValid<Self>
in our design. One advantage for the Bits
type is that it may be more ergonomic to write validation code for it, which is important for manual implementations of CheckedBitPattern
. However, our design expects that manual implementations of TryFromBytes
will be very rare. Since CheckedBitPattern
's derive doesn't support custom validation, any type with safety invariants would need a manual implementation. By contrast, the TryFromBytes
derive's support for a custom validation function means that, from a completeness standpoint, it should never be necessary to implement TryFromBytes
manually. The only case in which a manual implementation might be warranted would be for performance reasons.